Foundations in Signal Processing, Communications and Networking Series Editors: W. Utschick, H. Boche, R. Mathar
Foundations in Signal Processing, Communications and Networking Series Editors: W. Utschick, H. Boche, R. Mathar Vol. 1. Dietl, G. K. E. Linear Estimation and Detection in Krylov Subspaces, 2007 ISBN 978-3-540-68478-7 Vol. 2. Dietrich, F. A. Robust Signal Processing for Wireless Communications, 2008 ISBN 978-3-540-74246-3 Vol. 3. Stanczak, S., Wiczanowski, M. and Boche, H. Fundamentals of Resource Allocation in Wireless Networks, 2009 ISBN 978-3-540-79385-4
Sławomir Sta´nczak · Marcin Wiczanowski · Holger Boche
Fundamentals of Resource Allocation in Wireless Networks Theory and Algorithms With 30 Figures Second Expanded Edition
123
PD Dr.-Ing. habil. Sławomir Sta´nczak Technische Universit¨at Berlin Heinrich Hertz Chair for Mobile Communications and Fraunhofer German-Sino Lab for Mobile Communications Einsteinufer 37, 10587 Berlin Germany
[email protected]
Dr.-Ing. Marcin Wiczanowski Technische Universit¨at Berlin Heinrich Hertz Chair for Mobile Communications and Fraunhofer German-Sino Lab for Mobile Communications Einsteinufer 37,10587 Berlin Germany
[email protected]
Prof. Dr.-Ing. Dr.rer.nat. Holger Boche Technische Universit¨at Berlin Heinrich Hertz Chair for Mobile Communications and Fraunhofer German-Sino Lab for Mobile Communications and Fraunhofer Institute for Telecommunications Heinrich Hertz Institute Einsteinufer 37, 10587 Berlin Germany
[email protected]
ISSN 1863-8538 e-ISSN 1863-8546 ISBN 978-3-540-79385-4 e-ISBN 978-3-540-79386-1 DOI 10.1007/978-3-540-79386-1 Springer Dordrecht Heidelberg London New York Library of Congress Control Number: 2009930679 c Springer-Verlag Berlin Heidelberg 2009 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: eStudio Calamar S.L. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
To our families
Der K¨ onig hat viele Gnade f¨ ur meine geringen Dienste, und das Publikum viel Nachsicht f¨ ur die unbedeutenden Versuche meiner Feder; ich w¨ unschte, dass ich einigermaßen etwas zu der Verbesserung des Geschmackes in meinem Lande, zur Ausbreitung der Wissenschaften beitragen k¨ onnte. Denn sie sind’s allein, die uns mit anderen Nationen verbinden, sie sind’s, die aus den entferntsten Geistern Freunde machen, und die angenehmste Vereinigung unter denen selbst erhalten, die leider durch Staatsverh¨ altnisse o¨fters getrennt werden. Clavigo, Johann Wolfgang von Goethe
Nader wiele laski raczy mi okazywa´c kr´ ol za moje skromne slu˙zby, publiczno´s´c za´s zbyt jest wyrozumiala dla niepozornych plod´ ow mego pi´ora, bylbym wszelako szcz¸e´sliwy mog¸ac si¸e nieco przyczyni´c do urobienia literackiego smaku w moim kraju, do rozprzestrzenienia nauk. Jedynie to bowiem mo˙ze nas zbli˙zy´c do innych nacji, tylko dzi¸eki temu zdobywamy w najdalszych stronach przyjaci´ ol po´sr´od przoduj¸acych umyslo´w, przyczyniaj¸ac si¸e do utrwalenia najcenniejszych wi¸ez´ow, kt´ ore niestety jak˙ze cz¸esto zrywaj¸a interesa pa´ nstwowe. Clavigo, Johann Wolfgang von Goethe (Tlumaczenie: Wanda Markowska)
Preface
The purpose of this book is to provide tools for a better understanding of the fundamental tradeoffs and interdependencies in wireless networks, with the goal of designing resource allocation strategies that exploit these interdependencies to achieve significant performance gains. Two facts prompted us to write it: First, future wireless applications will require a fundamental understanding of the design principles and control mechanisms in wireless networks. Second, the complexity of the network problems simply precludes the use of engineering common sense alone to identify good solutions, and so mathematics becomes the key avenue to cope with central technical problems in the design of wireless networks. In this book, two fields of mathematics play a central role: Perron-Frobenius theory for non-negative matrices and optimization theory. This book is a revised and expanded version of the research monograph “Resource Allocation in Wireless Networks” that was published as Lecture Notes in Computer Sciences (LNCS 4000) in 2006. Although the general structure has remained unchanged to a large extent, the book contains numerous additional results and more detailed discussion. For instance, there is a more extensive treatment of general nonnegative matrices and interference functions that are described by an axiomatic model. Additional material on max-min fairness, proportional fairness, utility-based power control with QoS (quality of service) support and stochastic power control has been added. The power control problem with interference suppression at the receiver side has been included as well. Finally, the material has been extended to provide additional QoS-based power control approaches and powerful primal-dual network-centric power control algorithms. The main body of the book consists of three largely independent parts; mathematical framework for network analysis, principles of resource allocation in wireless networks and resource allocation algorithms. The book ends with appendices containing supplementary results and aiding definitions. The main body of the book consists of three largely independent parts: mathematical framework, principles of resource allocation in wireless networks
VIII
Preface
and algorithms. It ends with appendices that contain some supplementary results. Below, we briefly summarize the content of each part. Mathematical Framework: Chaps. 1 and 2 deal with selected problems in the theory of nonnegative matrices and provide a theoretical basis for the resource allocation problem addressed in the subsequent parts of the book. It should be emphasized that our intent is not to provide a thorough treatment of this wide subject. Instead, we focus on problems that naturally appear in the design of resource allocation strategies for wireless networks. When developing such strategies, different characterizations of the Perron root of nonnegative matrices turn out to be vital to better understanding of fundamental tradeoffs between diverse optimization objectives. The Perron root can be viewed as a map from a convex parameter set into the set of positive reals. Chap. 1 is concerned with properties of this map and, in particular, with the question under which conditions it is a convex function of the parameter vector. In Chap. 2, we pose similar questions with regard to a positive solution to a system of linear equations with nonnegative coefficients. Applications that involve systems of linear equations with nonnegative coefficients are numerous, ranging from the physical and engineering sciences to other mathematical areas like graph theory and optimization. Such systems also occur in power control theory. Principles of resource allocation in wireless networks: The second part of the book (Chap. 5) deals with the problem of resource allocation in wireless networks. The book addresses the problem of joint power control and link scheduling, which has been extensively investigated in the literature and is known to be notoriously difficult to solve, even in a centralized manner. Although we provide interesting insights into this problem, our main focus lies on the power control problem under fixed and adaptive (interference combating) receivers. In particular, a class of utility functions is identified for which the socalled utility-based power control problem can be converted into an equivalent convex optimization problem. The convexity property is a key ingredient in the development of powerful and efficient utility-based power control algorithms. In addition to the “pure” utility-based approach to the power control problem, we also consider other power control strategies for wireless networks. This includes QoS-based power control where given QoS requirements are required to be satisfied with a minimum total transmit power, max-min SIR power control where, roughly speaking, the objective is to optimize the performance of the “worst” link, and utility-based power control with QoS support which is a combination of utility-based and QoS-based power control approaches. Algorithms: Chap. 6 presents distributed power control algorithms for a class of utility maximization problems in wireless networks with and without QoS support. We consider iterative optimization methods such as gradient projection algorithms as well as primal-dual algorithms that operate on the primal and dual variables of associated Lagrangian functions. Distributed implementation of the presented power control algorithms relies on the use of a so-called adjoint network to efficiently distribute some locally measurable
Preface
IX
quantities to other (logical) transmitters. This mitigates the problem of global coordination of the transmitters when carrying out power control iterations in distributed wireless networks. The main purpose of the appendices is to make the book more understandable to readers who are not familiar with some basic concepts and results from linear algebra and convex analysis. They further introduces the notation and terminology used throughout the book. The treatment is mostly superficial and formal proofs are presented only for the most important results. The exception is App. A.4 (Perron–Frobenius theory) that presents selected results from the Perron–Frobenius theory of nonnegative but not necessarily irreducible matrices, and thus it is of fundamental importance to the remainder of the book. In addition to the key theorems such as the Perron–Frobenius theorem for irreducible matrices, we also provide proofs for some non-standard results that deal with the issue of reducibility. The presentation is limited to results used somewhere in the book. This book is intended for post-graduate students, engineers and researchers working in the general area of design and analysis of wireless networks, with an especial interest in the problems of resource allocation, QoS control, medium access control, interference management. It can be used as a specialized textbook as well as a reference book. Courses based on parts of the material have been given by the authors at the Technische Universit¨ at Berlin. The prerequisites for reading this book are quite minimal. The book should be sufficiently self-contained, in the sense that it can be read without any supplementary material by anyone who has taken basic courses in calculus, linear algebra and probability. The authors would like to emphasize that this book does not offer a comprehensive state of the art overview of the theory and practice of resource allocation in wireless networks. In addition to the authors’ own work, the book contains a list of references that either were used to develop the presented theory or are known to the authors to deal with related research topics and are of sufficient relevance. The list is however by no means complete and undoubtedly subjective. Due to the rapid spread of wireless networking, the scarcity of wireless resources and growing expectations of users on service quality and connectivity, a great deal of important design principles have been developed over the past two decades. We hope that this book is an interesting contribution to this development and provides a sufficiently general theoretical framework for extensions, generalizations and improvements of existing resource allocation approaches and algorithms. In the light of time and space limitations, an inclusion of an exhaustive and precisely updated state of the art overview appears to be impossible. Acknowledgments: We are deeply grateful to the following organizations for funding our research: The Bundesministerium f¨ ur Bildung und Forschung (BMBF), the Deutsche Forschungsgemeinschaft (DFG) and the European Union. The authors also appreciate much the technical and financial support
X
Preface
of the Fraunhofer Institut f¨ ur Nachrichtentechnik (Heinrich-Hertz-Institut, HHI), the Fraunhofer German-Sino Lab for Mobile Communications (MCI) and the Technische Universit¨ at Berlin. Without their support, this book would not have come to fruition. We gratefully acknowledge our industrial partners, in particular AlcatelLucent Deutschland, Siemens Deutschland and Nokia Siemens Networks, for the fruitful cooperation in research projects which inspired several results of this book. We are also indebted to our colleagues for fruitful discussions, valuable suggestions and any type of support. We explicitly acknowledge here the help of Igor Bjelakovic, Angela Feistel, Mario Goldenbaum, Michal Kaliszan, Andreas Kortke, Ullrich M¨ onich, Tobias Oechtering and Katharina Schweers. Some of the results presented in this book were obtained in collaboration with Prof. Nick Bambos, Angela Feistel and Michal Kaliszan. Finally, we are deeply grateful to our families for their patience, support and understanding. This book is dedicated to you.
Berlin, June 2008 and March 2009
Slawomir Sta´ nczak Marcin Wiczanowski Holger Boche
Contents
List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XXI
Part I Mathematical Framework 1
On the Perron Root of Irreducible Matrices . . . . . . . . . . . . . . . 1.1 Some Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Some Bounds on the Perron Root and their Applications . . . . . 1.2.1 Concavity of the Perron Root on Some Subsets of Irreducible Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Kullback–Leibler Divergence Characterization . . . . . . . . . 1.2.3 A Rate Function Representation for Large Deviations of Finite Dimensional Markov Chains . . . . . . . . . . . . . . . . 1.2.4 Some Extended Perron Root Characterizations . . . . . . . . 1.2.5 Collatz–Wielandt-Type Characterization of the Perron Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Convexity of the Perron Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Some Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Sufficient Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Convexity of the Feasibility Set . . . . . . . . . . . . . . . . . . . . . 1.3.4 Necessary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Special Classes of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Symmetric Positive Semidefinite Matrices . . . . . . . . . . . . 1.5 The Perron Root under the Linear Mapping . . . . . . . . . . . . . . . . 1.5.1 Some Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 Disproof of the Conjecture . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 The Perron Root under Exponential Mapping . . . . . . . . . . . . . . . 1.6.1 A Necessary and Sufficient Condition on Strict Convexity of the Feasibility Set . . . . . . . . . . . . . . . . . . . . . 1.6.2 Graph-theoretic Interpretation . . . . . . . . . . . . . . . . . . . . . .
3 3 4 11 14 15 20 22 25 26 28 30 32 34 35 36 37 39 42 45 45 48
XII
Contents
1.7 Generalizations to Arbitrary Nonnegative Matrices . . . . . . . . . . 1.7.1 Log-Convexity of the Spectral Radius . . . . . . . . . . . . . . . . 1.7.2 Characterization of the Spectral Radius . . . . . . . . . . . . . . 1.7.3 Existence of Positive Eigenvectors . . . . . . . . . . . . . . . . . . . 1.7.4 Collatz–Wielandt-Type Characterization of the Spectral Radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
On the Positive Solution to a Linear System with Nonnegative Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Basic Concepts and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Feasibility Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Convexity Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Log-Convexity of the Positive Solution . . . . . . . . . . . . . . . 2.3.2 Convexity of the Feasibility Set . . . . . . . . . . . . . . . . . . . . . 2.3.3 Strict Log-Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Strict Convexity of the Feasibility Sets . . . . . . . . . . . . . . . 2.4 The Linear Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51 52 52 56 57 59 61 61 63 66 67 69 70 75 76
Part II Principles of Resource Allocation in Wireless Networks 3
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4
Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2 Medium Access Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.3 Wireless Communication Channel . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.3.1 Signal-to-Interference Ratio . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.3.2 Different Receiver Structures . . . . . . . . . . . . . . . . . . . . . . . . 98 4.3.3 Power Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.3.4 Data Rate Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.3.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5
Resource Allocation Problem in Communications Networks119 5.1 End-to-End Rate Control in Wired Networks . . . . . . . . . . . . . . . 119 5.1.1 Fairness Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.1.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.2 Problem Formulation for Wireless Networks . . . . . . . . . . . . . . . . 125 5.2.1 Joint Power Control and Link Scheduling . . . . . . . . . . . . . 126 5.2.2 Feasible Rate Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.2.3 End-to-End Window-Based Rate Control . . . . . . . . . . . . . 132 5.2.4 MAC Layer Fair Rate Control . . . . . . . . . . . . . . . . . . . . . . 134 5.2.5 Utility-Based Power Control . . . . . . . . . . . . . . . . . . . . . . . . 136 5.2.6 Efficiency-Fairness Trade-Off . . . . . . . . . . . . . . . . . . . . . . . . 141 5.2.7 Kuhn–Tucker Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.3 Interpretation in the QoS Domain . . . . . . . . . . . . . . . . . . . . . . . . . 150
Contents
XIII
5.4 Remarks on Joint Power Control and Link Scheduling . . . . . . . . 160 5.4.1 Optimal Joint Power Control and Link Scheduling . . . . . 160 5.4.2 High SIR Regime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.4.3 Low SIR Regime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.4.4 Wireless Links with Self-Interference . . . . . . . . . . . . . . . . . 167 5.5 QoS-based Power Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 5.5.1 Some Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 5.5.2 Axiomatic Interference Functions . . . . . . . . . . . . . . . . . . . . 174 5.5.3 QoS-Based Power Control Algorithms . . . . . . . . . . . . . . . . 180 5.6 Max-Min SIR Balancing Power Control . . . . . . . . . . . . . . . . . . . . 191 5.6.1 Some Preliminary Observations . . . . . . . . . . . . . . . . . . . . . 192 5.6.2 Characterization under Sum Power Constraints . . . . . . . 195 5.6.3 General Power Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 199 5.6.4 Some Consequences and Applications . . . . . . . . . . . . . . . . 204 5.7 Utility-based Power Control with QoS Support . . . . . . . . . . . . . . 210 5.7.1 Hard QoS Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 5.7.2 Soft QoS Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 5.8 Utility-Based Joint Power and Receiver Control . . . . . . . . . . . . . 222 5.8.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 5.8.2 Perfect Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 5.8.3 Decentralized Alternating Computation . . . . . . . . . . . . . . 226 5.8.4 Max-Min SIR Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 5.9 Additional Results for a Noiseless Case . . . . . . . . . . . . . . . . . . . . . 228 5.9.1 The Efficiency–Fairness Trade-off . . . . . . . . . . . . . . . . . . . . 229 5.9.2 Existence and Uniqueness of Log-SIR Fair Power Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 5.10 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 Part III Algorithms 6
Power Control Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 6.2 Some Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 6.3 Convex Statement of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . 264 6.4 Strong Convexity Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 6.5 Gradient Projection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 6.5.1 Global Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 6.5.2 Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 6.5.3 Diagonal Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 6.5.4 Projection on a Closed Convex Set . . . . . . . . . . . . . . . . . . 275 6.6 Distributed Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 6.6.1 Local and Global Parts of the Gradient Vector . . . . . . . . 276 6.6.2 Adjoint Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 6.6.3 Distributed Handshake Protocol . . . . . . . . . . . . . . . . . . . . . 282 6.6.4 Some Comparative Remarks . . . . . . . . . . . . . . . . . . . . . . . . 283
XIV
Contents
6.6.5 Noisy Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 6.7 Incorporation of QoS Requirements . . . . . . . . . . . . . . . . . . . . . . . . 288 6.7.1 Hard QoS Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 6.7.2 Soft QoS Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 6.8 Primal-Dual Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 6.8.1 Improving Efficiency by Primal-Dual Methods . . . . . . . . 304 6.8.2 Generalized Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 6.8.3 Primal-Dual Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 6.8.4 Decentralized Implementation . . . . . . . . . . . . . . . . . . . . . . . 322 6.8.5 Min-max Optimization Framework . . . . . . . . . . . . . . . . . . 326 6.8.6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 Part IV Appendices A
Some Concepts and Results from Matrix Analysis . . . . . . . . . 347 A.1 Vectors and Vector Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 A.2 Matrices and Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 A.3 Square Matrices and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . 351 A.3.1 Matrix Spectrum, Spectral Radius and Neumann Series 353 A.3.2 Orthogonal, Symmetric and Positive Semidefinite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 A.4 Perron–Frobenius Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 A.4.1 Perron–Frobenius Theorem for Irreducible Matrices . . . . 358 A.4.2 Perron–Frobenius Theorem for Primitive Matrices . . . . . 362 A.4.3 Some Extensions to Reducible Matrices . . . . . . . . . . . . . . 363 A.4.4 The Existence of a Positive Solution p to (αI − X)p = b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
B
Some Concepts and Results from Convex Analysis . . . . . . . . . 377 B.1 Sets and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 B.2 Convex Sets and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 B.2.1 Strong Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 B.2.2 Majorization and Schur-Convexity . . . . . . . . . . . . . . . . . . . 386 B.3 Log-Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 B.3.1 Inverse Functions of Monotonic Log-Convex Functions . 388 B.4 Basics of Optimization Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 B.4.1 Characterization of Numerical Convergence . . . . . . . . . . . 390 B.4.2 Convergence of Gradient Projection Algorithms . . . . . . . 392 B.4.3 Basics of Lagrangian Optimization Theory . . . . . . . . . . . . 395 B.4.4 Saddle Points, Saddle Functions, Min-Max Functions . . 398
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
List of Figures
1.1 1.2
1.3
2.1
2.2
2.3
The feasibility set F for some X ∈ XpK,Γ (Ω) with γ(x) = x, x > 0, K = 2 and Ω = Q2 . . . . . . . . . . . . . . . . . . . . . . . . . 38 ˜ G(V) and G(VVT ) for V = T given by (1.16) with K = 3. We ˜ see that neither G(V) is connected in the sense of Definition 1.66 nor G(V) is strongly connected in the traditional sense. . . . 50 ˜ G(V) and G(VVT ) for V given by (1.85) with K = 3. As ˜ in Fig. 1.2, neither G(V) is connected nor G(V) is strongly ˜ connected. In G(V), it is only possible to “go” from row node 3 to column node 2 and vice versa. In G(VVT ), node 3 is isolated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Illustration of Example 2.4: The feasibility set F(Pt ; P1 , P2 ) with X(ω) ≡ 0, γ(x) = ex − 1, x > 0, and u(ω) = (eω1 − 1, eω2 − 1). The constraints P1 , P2 and Pt are chosen to satisfy 0 < P1 , P2 < Pt and Pt < P1 + P2 . . . . . . . . 65 The l1 -norm p(ω(μ))1 as a function of μ ∈ [0, 1] for some ˆ 1 and p(ω) ˇ 1 are ˆ ω ˇ ∈ QK chosen such that p(ω) fixed ω, independent of the choice of γ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 F(P1 , P2 ) is equal to the intersection of F1 (P1 ) and F2 (P2 ). Thus, Fc (P1 , P2 ) is equal to the union of Fc1 (P1 ) and Fc2 (P2 ), each of which is a convex set if γ(x) = x, x > 0. However, the union of these sets is not convex in general. . . . . . . . . . . . . . . . . . . 77
XVI
4.1
4.2
5.1
5.2
5.3
List of Figures
There are Nt = 5 nodes represented by Nt = {1, 2, 3, 4, 5} and 10 wireless links: (1, 2), (2, 1), (2, 3), (3, 2), (2, 5), (5, 2), (3, 4), (4, 3), (4, 5), (5, 4). The wireless links are not numbered in the figure. Two flows entering the network at source nodes 1, 3 (S = {1, 3}) and destined for node 5 establish 6 (logical) links K = {1, 2, 3, 4, 5, 6}. For instance, (logical) links (or MAC layer flows) originating at node 2 are 2 and 3 so that we have K(2) = {2, 3}. These links share wireless link (2, 5). There are N = 4 origin nodes represented by N = {1, 2, 3, 4}. The flow rates are ν1 and ν2 . Packets of flow 2 take two different routes to their destination that is node 4. . . . . . . . . . . . . . . . . . . . . . . . . . . 88 The data rate per symbol against the SIR under a piecewise ˜ and a logarithmic rate function given constant rate function Φ by (4.23) for some suitably chosen constants κ1 > 0 and κ2 > 0. The figure also shows a linear approximation and a piecewise linear approximation (dashed lines). . . . . . . . . . . . . . . . . 109 Three flows compete for access to two links [1, 2]. Whereas flows 1 and 2 are one-link flows going through links 1 and 2, respectively, flow 3 uses both links. The links have fixed capacities C1 and C2 , respectively. Clearly, the maximum total throughput is C1 + C2 and, in the maximum, the longer flow must be shut off (ν3 = 0) so that the one-link flows can be allocated rates of ν1 = C1 and ν2 = C2 . In contrast, if C1 ≤ C2 , the max-min fair rate allocation is ν1 = C1 /2, ν2 = C2 − C1 /2 and ν3 = C1 /2. Thus, the total throughput is C2 + C1 /2 which is strictly smaller than C1 + C2 . Note that if C1 = C2 , then all source rates are equal under the max-min fair solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Assuming Φ−1 (x) = ex − 1, x ∈ R, the figure compares the modified utilities U (x) = Ψ (Φ−1 (x)), x > 0, with the traditional ones U (x) = Ψ (x), x > 0, for Ψ (x) = log(x), Ψ (x) = −1/x, x > 0, and Ψ (x) = log(x)/(1 + x), x > 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Throughput performance as a function of α ≥ 1 in a wireless network with string topology depicted on the left-hand side. There are five nodes, four end-to-end flows, four wireless links and ten logical links (single-hop flows). The gain matrix V ≥ 0 with trace(V) = 0 was chosen randomly in such way that I + V > 0 (a nonnegative primitive matrix). The transmit powers are given by (5.31) with Ψ (x) = Ψα (x), x > 0, w = 1 and z = 0 (noiseless channel). The right picture depicts end-to-end rates and the end-to-end total throughput. The rate function is Φ(x) = log(1 + x), x ≥ 0. . . . . . . . . . . . . . . . . . . . . 144
List of Figures
5.4
5.5
5.6
5.7
5.8
5.9
XVII
Any point in the feasible QoS region Fγ (P) can be expressed in terms of the rate vector as (U (ν1 ), . . . , U (νK )) for some unique ν = (ν1 , . . . , νK ) ∈ R where R is the feasible rate region defined by (5.11); it may also be written as (Ψ (SIR1 ), . . . , Ψ (SIRK )) for uniquely determined SIR levels SIRk , k ∈ K. The function γ is the inverse function of Ψ so that (γ(ω1 ), . . . , γ(ωK )) for some ω ∈ Fγ (P) is a point in the feasible SIR region. The function Φ and its inverse Φ−1 relate the feasible SIR region and the feasible rate region. In this example, each function is strictly increasing and the sets are downward comprehensive. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 The feasible QoS regions Fγ (P) for three different choices of γ, P and V ≥ 0. Left: ω ∗ is the (unique) maximum point of Fγ (P) as it maximizes x → wT x over Fγ (P) for any choice of w > 0. Other boundary points are not maximal although they maximize the inner product for a nonnegative weight vector. Middle: Every boundary point is maximal and maximizes the inner product for some positive weight vector. Right: Fγ (P) is ˆ However, there exists not convex but has a maximal point ω. ˆ ≥ wT x for all x ∈ Fγ (P). . . . . . . . . . . . 157 no w ≥ 0 for which wT ω The sets Rα with α = αn , n = 1 · · · 4 and α1 = 1, for a randomly chosen gain matrix V ≥ 0 with trace(V) = 0 and I + V > 0. R∞ is a limiting set that is approached as α → ∞. See also Remark 5.36. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 The feasible rate region for two mutually orthogonal links subject to a sum power constraint. The region is a strictly convex set so that link scheduling between arbitrary points on the boundary of the feasible rate region is suboptimal. . . . . . . . . 161 The feasible SIR region for two users under total power constraint Pt and individual power constraints on each link P1 < Pt and P2 < Pt . If there were no individual power constraints, a MAC policy involving a time sharing protocol between the points E and F , corresponding to power vectors (0, Pt ) and (Pt , 0), respectively, would be optimal. In contrast, when in addition individual power constraints are imposed, a time sharing protocol between A and D (that correspond to power vectors (0, P2 ) and (P1 , 0), respectively) is suboptimal. In this case, it is better to schedule either between A and B or between B and C or between C and D depending on the target signal-to-interference ratios. . . . . . . . . . . . . . . . . . . . . . . . . . 166 The feasible power region for two links subject to individual power constraints P = {p ∈ R2+ : p1 ≤ P1 , p2 ≤ P2 }. The vector p(ω) defined by (5.54) is the minimum point (element) of P(ω). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
XVIII List of Figures
5.10 The feasible SIR region (γ(x) = x, x > 0) under individual power constraints Pi and two different gain matrices V ≥ 0. p(ω)) and The following notation is used: γ¯k = SIRk (¯ ¯ (ω) and p ¯ (ω) are defined by γ¯k = SIRk (¯ p (ω)) where p (5.100) and (5.102), respectively. Left: V is chosen so that ¯ (ω) is not unique. Right: V SIR2 (p) = p2 /z1 , in which case p ¯ (ω) is unique and equal to (5.102). 194 is irreducible, in which case p 5.11 The figure shows an example of the feasible QoS region Fγ (P) defined by (5.105) with 2 users subject to individual power constraints. V is irreducible ˜ (n) ) ≤ 1) where and Fγ (P) = ∩n∈N {ω : ρ(Γ(ω)V ¯ corresponds to the Γ(ω) = diag(γ(ω1 ), γ(ω2 )). The point ω unique max-min SIR-balanced power allocation. The weight vector w is normal to a hyperplane which supports the feasible ¯ ∈ ∂Fγ (P). Note that N0 = {1}, and thus QoS region at ω ¯ . . . . . . . . . . 208 2∈ / N0 since the second constraint is not active at p 5.12 The SIR performance of five links as a function of α for some irreducible matrix V and Ψ (x) = log(x), x > 0. . . . . . . . . . . . . . . . 221 5.13 An illustration of power control policy (5.156). The depicted set is the intersection of a feasible QoS region Fγ (P), γ(x) = ex , x ∈ R, with R2+ . We have two links such that the first link is a best-effort link (B \ A = B = {1}) and the second link is a “pure” QoS link (A \ B = A = {2}). The dashed line represents the QoS requirement of the second link: All points above or on the line satisfy the QoS requirement. The corresponding SIR levels are larger than or equal to the SIR target of link 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 5.14 Exemplary networks with four entirely coupled subnetworks. The arrows model the interference between the links of different subnetworks (inter-subnetwork interference), with the arrow heads directed to the receivers where the interference is perceived. The interference within the subnetworks (intra-subnetwork interference) is not depicted and can be assumed arbitrary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 5.15 Exemplary networks with four entirely coupled subnetworks. The arrows model the inter-subnetwork interference, with the arrow heads directed to the receivers where the interference is perceived. The intra-subnetwork interference is not depicted as it can be assumed arbitrary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
List of Figures
XIX
5.16 The feasible log-SIR region Fγ (the half-plane). The region is convex but not strictly convex (see Definition 1.44). Hence, it is not possible to find a point on ∂Fγ where a hyperplane with the normal vector w(1) supports Fγ . In contrast, the hyperplane with the normal vector w(2) supports Fγ at every point on ∂Fγ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 5.17 The feasible log-SIR region Fγ under self-interference. In this case, the region is strictly convex, and hence a convex combination of two arbitrary points on ∂Fγ := {ω : ρ(Γ(ω)V) = 1} is an interior point of Fγ (except for the points on the boundary). Furthermore, the maximum in (5.180) exists and ω ∗ is a unique point on ∂Fγ where the hyperplane with normal vector w = (w1 , . . . , wK ) > 0 supports Fγ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 6.1
6.2
6.3
6.4
6.5
In the primal network, the received signal samples at E1 and E2 are y1 = h1,1 X1 + h1,2 X2 and y2 = h2,2 X2 + h2,1 X1 , respectively, where X1 , X2 are zero-mean independent information-bearing symbols with E[|X1 |2 ] = p1 , E[|X2 |2 ] = p2 . In the adjoint network, the roles change such that E1 and E2 transmit X1 /|h1,1 | and X2 /|h2,2 |, respectively. As a result, the received signal samples are y˜1 = h1,1 /|h1,1 |X1 + h2,1 /|h2,2 |X2 and y˜2 = h2,2 /|h2,2 |X2 + h1,2 /|h1,1 |X1 , respectively. . . . . . . . . . . . . . . 281 Exemplary convergence of the objective (6.62) obtained by Algorithm 6.3 with averaging of noisy estimates according to [3]. The variance of the estimates in steps 3, 7, 9 of Algorithm 6.3 is 0.3 · σk2 , k ∈ K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 Exemplary convergence of the objective (6.62) obtained by Algorithm 6.3 with averaging of noisy estimates according to [3]. The variance of estimates in steps 3, 7, 9 of Algorithm 6.3 is 0.15 · σk2 , k ∈ K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Comparison of convergence of the algorithm (6.107) (solid lines) with convergence of the conventional gradient projection algorithm (dashed lines), with optimally chosen constant step size, applied to the equivalent problem form (6.61) with A = ∅. 344 Convergence of Algorithm 6.4, with no averaging of iterates. The variance of the estimates in steps 3 and 7 of Algorithm 6.4 is 0.1 · σk2 for the interference power estimates and received power estimates as well as 0.05 · σk2 for the estimates of transmit powers, k ∈ K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
List of Symbols
a b a b a, b, c, α, β, μ, ... A, B, X, Y... A≥B A ≥ a, A > a A0 A0 A−1 AT AK (X) A×B A A◦B arg min f ,arg inf f arg max f ,arg sup f
“a significantly larger than b” “a significantly smaller than b” Scalars over R or C Matrices; Sect. A.2 Partial ordering; Sect. A.2 Partial ordering; Sect. A.2 Positive semidefiniteness; Definition A.21 Positive definiteness; Definition A.21 Matrix inverse; Sect. A.3 Transpose matrix; Definition A.5 Eq. (1.8) Cartesian product Sect. 5.2.1 Hadamard product; Sect. A.2 Sect. B.1, Remark B.12 Sect. B.1, Remark B.12
BK ¯K B B Br (p)
Sect. 1.7.2 Sect. 1.7.2 Sects. 4.3 and 5.2.1 Definition B.1
C cl(A) R ˜ R
Sect. A.1 Closure Eq. (5.11) Eq. (5.16)
det(A) δl diag(u) diag(X)
Matrix determinant; Sect. A.3 The Kronecker delta Diagonal matrix; Sect. A.2 Diagonal matrix; Sect. A.2
XXII
List of Symbols
diag(X, Y) dom(f )
Eq. (A.7) Domain of a function; Sect. B.1
ei EK (X) E+ K (X) η(p)
Sect. A.1 Sect. 1.7.2 Sect. 1.7.2 Eq. (6.30)
F ∂F Fc F(Pt ) F(P1 , . . . , PK ) Fk (α) F(Pt ; P1 , . . . , PK ) ∂F(Pt ) ∂F(P1 , . . . , PK ) Fc (Pt ) Fγ Fγ (P) ∂Fγ (P) f (x), x ∈ R f (x), x ∈ R F (p) Fe (s)
Eq. (1.61) and Eq. (2.5) Eq. (1.63) Eq. (1.68) Eq. (2.9) Eq. (2.11) Eq. (2.12) Eq. (2.13) Definition 2.16 Definition 2.16 Sect. 2.4 Eq. (5.55) Eq. (5.53) Eq. (5.59) The first derivative; Sect. B.1 The second derivative; Sect. B.1 Eq. (6.2) Eq. (6.12)
γ γk Γ(ω) gk (p)
Eqs. (1.56), (5.50) and Remark 5.39 Sect. 5.5.1 Eqs. (1.56) and (5.13) Eq. (6.21)
hk (s)
Eq. (6.14)
I, In I Ik inf f
Identity matrix; Sect. A.2 (Vector-valued) interference function; Sect. 5.5.2 Eq. (4.8), Sects. 4.3.2 and 5.5.2, Eq. (6.5) Infimum of a function f ; Theorem B.11
K = {1, . . . , K} K(n) Kk = K \ {k} ker(X)
Sects. 1.1 and 4.1 Eq. (4.1) Sect. 1.1 and 4.1 Matrix kernel; Sect. A.2
λp (ω) LCK (Ω)
Sect. 1.3.1 Definition 1.37
List of Symbols XXIII
lc(Ω) L = {1, . . . , L} limx→a f (x) lim supn→∞ fn (x)
Sect. 2.3 Sect. 4.1 Definition B.9 Eq. (B.4)
MK = RK×K ++ MK (Ω)
Definition A.24 Definition 1.35
N N0 Nt = {1, . . . , Nt } N = {1, . . . , N } N0 (p) N0 NK = RK×K + N+ K NK (Ω) NK,Γ (Ω) u X ∇k f (x) ∇f (x) ∇x f (x, y) ∇2 f (x) ∇2x f (x, y) ∇2x,y f (x, y) νs νk (p)
Natural numbers Nonnegative integers Sect. 4.1 Sect. 4.1 Eq. (5.118) Eq. (5.119) Definition A.24 Eq. 1.88 Definition 1.35 Eq. (2.8) Vector norms; Sect. A.1 Matrix norms; Sect. A.2 Definition B.13 Eq. (B.7) Remark B.18 Definition B.16 Remark B.18 Remark B.18 Sect. 4.1 Sect. 4.3.4
1 Ω ⊂ RK O(f, x0 ) O(f ) o(f, x0 ) o(f )
Sect. A.1 Eq. (1.53) Landau symbol; Landau symbol; Landau symbol; Landau symbol;
∂f ΠK Π+ K ω p(X) p(ω) Φ Ψ Ψα
Partial derivative of f ; Definition B.13 Sect. 1.1 Sect. 1.1 Sect. 1.3.1 Eq. (1.2) Eqs. (2.4) and (5.54) Eq. (4.22) (C.5-2)–(C.5-4) and Remark 5.8 Eq. 5.28
Eq. Eq. Eq. Eq.
(B.1) (B.2) (B.1) (B.2)
XXIV List of Symbols
Ψ˜α Ψe ψ ψe ΠS P ⊂ RK + Pn P+ ⊂ RK ++ P(ω) ⊂ RK + P◦ (ω) ⊂ RK + P◦+ (ω) ⊂ RK ++
Eq. 5.29 (C.5-4) (C.6-1)–(C.6-3), Eq. (5.48) and Remark 6.1 (C.6-3) Eq. (B.27) Eq. (4.18) Eq. (4.18) Eq. 5.26 and Eq. (6.10) Definition 5.40 Definition 5.40 Eq. 5.145
q(X) Q⊂R
Eq. (1.2) Sect. 1.3.1
R R− ⊂ R R+ ⊂ R R++ ⊂ R+ RK RK×K RK ++ (Ω) R(X) rank(X) Re{x}
Real numbers Sect. A.1 Sect. A.1 Sect. A.1 Sect. A.1 Sect. A.2 Sect. 2.1 Range/column space of a matrix; Sect. A.2 Matrix rank; Sect. A.2 The real part of a complex number x ∈ C
σ(A) sgn ρ(X) SK SK (X) SIRk (p) S ⊂ RK S sup f
Matrix spectrum; Definition A.10 The signum function; Eq. (4.12) Spectral radius; Definition A.10 Sect. 1.2 Eq. (1.3) Eq. (4.4) Eq. (6.11) Sect. 4.1 Supremum of a function f ; Theorem B.11
trace(X) θ(p)
Matrix trace; Sect. A.2 Eq. (6.30)
u≤v (u, v) u + c = u + c1
Partial ordering; Sect. A.1 Sect. A.1 Sect. A.1
p, q, s, u, v, z, ... u≥v
Vectors; Sect. A.1 Partial ordering; Sect. A.1
List of Symbols
u ≥ a, u > a V
Partial ordering; Sect. A.1 Eq. (4.6)
w ∈ RK + WK (X)
Weight vector; Sect. 5.2.5 Eq. (1.14)
XK ⊂ NK XK (Ω) XK,Γ (Ω) XsK,Γ (Ω) XpK,Γ (Ω) X0K,Γ (Ω)
Definition A.27 Definition 1.35 Eq. (1.57) Sect. 1.4.1 Sect. 1.4.2 Sect. 1.5
0 z ∈ RK +
Zero vector or matrix; Sects. A.1 and A.2 Noise vector in Part II; Eq. (4.7)
XXV
1 On the Perron Root of Irreducible Matrices
This chapter deals with the Perron root of nonnegative irreducible matrices. Applications abound with nonnegative and positive matrices so that it is natural to investigate their properties. In doing so, one of the central problems is to what extent the nonnegativity (positivity) is inherited by the eigenvalues and eigenvectors. The principal tools for the analysis of spectral properties of irreducible matrices are provided by Perron–Frobenius theory. A comprehensive reference on nonnegative matrices is [4]. Some basic results are summarized in App. A.4. For more information about the Perron–Frobenius theory, the reader is also referred to [5, 6, 7]. We have divided the chapter into two major parts. The purpose of the first part is to characterize the Perron root of irreducible matrices and present some interesting bounds on it. There exists a vast literature addressing the problem of estimating the Perron root of nonnegative irreducible matrices. Tight bounds on the Perron root have attracted a great deal of attention over several decades. A brief (and by no means extensive) summary of some related results can be found at the end of this chapter. In the second part, we consider the Perron root of matrix-valued functions of some parameter vector. In this case, each matrix entry is a continuous nonnegative function defined on some convex parameter set, with the constraint that the matrix is irreducible for every fixed parameter vector. As a result, the Perron root can be viewed as a positive real-valued function defined on a convex set. Now the objective is to provide conditions under which the Perron root is a convex (or concave) function of the parameter vector. Note that the convexity property is a key ingredient in the development of access control and resource allocation strategies for wireless networks.
1.1 Some Basic Definitions We use XK ⊂ RK×K , K ≥ 2, to denote the set of all K × K nonnegative irreducible matrices. Let ρ(X) be the Perron root of X ∈ XK . By the Perron– S. Stanczak et al., Fundamentals of Resource Allocation in Wireless Networks, Foundations in Signal Processing, Communications and Networking 3, 2nd edn., c Springer-Verlag Berlin Heidelberg 2009 DOI 10.1007/978-3-540-79386-1 1,
4
1 On the Perron Root of Irreducible Matrices
Frobenius theorem (Theorem A.32 in App. A.4.1), ρ(X) is a simple eigenvalue of X and is equal to its spectral radius so that ρ(X) = max |λ| and λ∈σ(X)
ρ(X) ∈ σ(X)
(1.1)
where σ(X) denotes the spectrum of X (Definition A.10). Due to (1.1), if X ∈ XK , ρ(X) is used to denote both the Perron root of X and its spectral radius. Moreover, if Xp = ρ(X)p XT q = ρ(X)q
(1.2)
holds for some q, p ∈ RK + , then both q := q(X) and p := p(X) are positive vectors, and there are no other nonnegative eigenvectors of X except for positive multiples of q and p, regardless of the eigenvalue. Unless something else is stated, assume that qT p = 1 or, equivalently, w1 = 1 where w = q ◦ p. For readability purposes, let ΠK denote the standard simplex in RK + , i.e. we have ΠK := {x ∈ RK + : x1 = 1} . K Furthermore, we define Π+ K := ΠK ∩ R++ , which contains all positive vectors whose elements sum up to 1. Hence, w = q ◦ p ∈ Π+ K for any X ∈ XK .
Definition 1.1 (Perron Eigenvectors). Let q, p ∈ RK ++ be positive left and right eigenvectors of X ∈ XK . If additionally q, p ∈ Π+ K , then the unique eigenvectors q and p are called the left and right Perron eigenvectors of X ∈ XK , respectively (see also Definition A.33). Throughout this chapter, we use q and p to designate positive left and right eigenvectors of X ∈ XK , respectively. In cases where ambiguity may occur, we write q(X) and p(X) to denote these eigenvectors of X. For clarity and simplicity, we also use the following notation K := {1, . . . , K} and Kk := K \ {k}. Caution: In the second part of the book, this notation is not used. In particular, p will not denote any positive right eigenvector.
1.2 Some Bounds on the Perron Root and their Applications This section presents several bounds on the Perron root of irreducible matrices. Some of these results provide a starting point for the development of the theory presented in the subsequent sections of this book, while others establish interesting connections, thereby helping to better understand the complex interrelations in practical systems. Let SK ⊂ NK be the set of all stochastic matrices of size K ×K (Definition A.26). Therefore, each row of A ∈ SK , say row k denoted by a(k) ∈ RK +,
1.2 Some Bounds on the Perron Root and their Applications
5
satisfies a(k) ∈ ΠK . For every fixed X ∈ XK , we define an associated subset SK (X) of SK as SK (X) = {A ∈ SK : (A)k,l = ak,l = 0 if and only if xk,l = 0} .
(1.3)
Note that since X is irreducible, every member of SK (X) is an irreducible stochastic matrix. Hence, we have SK (X) ⊂ XK for any X ∈ XK . Although not explicitly stated in this form, the following Perron root characterization can be deduced from [8, Equation 2.6 with 2.8 and 2.9] (see also the bibliographical notes at the end of this chapter). In this theorem and throughout the book, we use the following convention. (C.1-1) x log(x/0) ≡ ∞ and x log(0/x) ≡ −∞ for all x > 0, and 0 log(0/x) ≡ 0 for all x ≥ 0. The latter identity includes the case 0 log(0/0) ≡ 0. Theorem 1.2. Let X ∈ XK be arbitrary. Then, we have xk,l uk ak,l log ≤ log ρ(X) ak,l
(1.4)
k,l∈K
for all A ∈ SK (X) where u = (u1 , . . . , uK ) ∈ Π+ K is the left Perron eigenvector of A. Equality holds in (1.4) if and only if (A)k,l = ak,l =
xk,l pl , ρ(X)pk
1 ≤ k, l ≤ K
(1.5)
where p ∈ RK ++ is a positive right eigenvector of X. Proof. Since ρ(X/ρ(X)) = ρ(X)/ρ(X) = 1, we can assume that ρ(X) = 1. Let A ∈ SK (X) be fixed and define x e k,l l f (e) = (1.6) uk ak,l log ak,l ek k,l∈K
for an arbitrary e ∈ RK ++ . Note that f (1) is equal to the left hand side of (1.4). Moreover, f (e) is independent of the choice of e since e l uk ak,l log uk ak,l log(el ) − uk ak,l log(ek ) = ek k,l∈K k,l∈K k,l∈K log(el ) uk ak,l − uk log(ek ) ak,l = l∈K
=
l∈K
k∈K
ul log(el ) −
k∈K
l∈K
uk log(ek ) = 0
k∈K
where we used the fact that A is stochastic and AT u = u. Thus, without loss of generality, we can substitute any positive vector into (1.6). In particular,
6
1 On the Perron Root of Irreducible Matrices
we can substitute p (a positive right eigenvector of X) into (1.6) and confine our attention to matrices of the form xk,l pl xk,l pl ˜ k,l = x ˜k,l = = , 1 ≤ k, l ≤ K . (X) ρ(X)pk pk ˜ is stochastic and log(x) ≤ x − 1 for all x > 0 with equality if and Now as X only if x = 1, we obtain x ˜k,l x ˜k,l ak,l log ≤ ak,l −1 = x ˜k,l − ak,l = 1 − 1 = 0 ak,l ak,l l∈K
l∈K
l∈K
l∈K
˜ So for each 1 ≤ k ≤ K, with equality if and only if A = X. xk,l uk ak,l log ≤ 0 = log ρ(X) ak,l k∈K
l∈K
with equality attained if and only if A is given by (1.5). The following corollary is immediate. Corollary 1.3. Let X ∈ XK be arbitrary and fixed. Then, xk,l log ρ(X) = sup uk ak,l log ak,l A∈SK (X)
(1.7)
k,l∈K
where u = (u1 , . . . , uK ) ∈ Π+ K is the left Perron eigenvector of A. The Perron root characterization in (1.7) turns out to be of great value in proving the central result of the second part of this chapter, namely a sufficient condition on convexity of the Perron root. Moreover, this characterization provides some interesting insights into the properties of the Perron root. So it is worth dwelling on this for a moment. An interesting problem is the exact relationship between irreducible matrices that have the same maximizers in (1.7). To be precise, let X ∈ XK be arbitrary and suppose that the supremum in (1.7) is attained at A(X) ∈ SK (X). We define yk,l AK (X) := Y ∈ XK : A(X) = arg sup uk ak,l log . (1.8) ak,l A∈SK (Y) k,l∈K
Hence, by Theorem 1.2, AK (X) contains all irreducible matrices for which the supremum in (1.7) is attained at A(X). The following observation characterizes this set. Observation 1.4. Given an arbitrary X ∈ XK , we have Y ∈ AK (X) if and only if there exists a diagonal matrix D with positive diagonal entries such that ρ(Y) DXD−1 . (1.9) Y= ρ(X)
1.2 Some Bounds on the Perron Root and their Applications
7
Proof. Let A(Y) ∈ SK (Y) be any matrix such that yk,l uk ak,l log . A(Y) = arg sup ak,l A∈SK (Y) k,l∈K
It follows from (1.9) that p(Y) = Dp(X). Furthermore, considering Theorem 1.2 and (1.9), we obtain ρ(Y) d x yk,l (Dp)l 1 xk,l pl k k,l d l pl = = (A(Y))k,l = ρ(Y)(Dp)k ρ(Y)dk pk ρ(X) dl ρ(X)pk = (A(X))k,l where p = p(X). Therefore, AK (Y) = AK (X), and the observation follows. Interestingly, Theorem 1.2 gives rise to a well-known bound on the Perron root of the Hadamard product of two irreducible matrices1 [9, Observation 5.7.4]. Note that if X ◦ Y ∈ XK (entry-wise multiplication), then X ∈ XK and Y ∈ XK . Corollary 1.5. Let X ◦ Y ∈ XK be arbitrary. Then, ρ(X ◦ Y) ≤ ρ(X) · ρ(Y).
(1.10)
˜ and Y ˜ be given by Proof. For every 1 ≤ k, l ≤ K, let X xk,l xk,l yk,l > 0 yk,l xk,l yk,l > 0 x ˜k,l = y˜k,l = 0 xk,l yk,l = 0 0 xk,l yk,l = 0 ˜ = SK (Y). ˜ Hence, by Corollary 1.3, and note that SK (X ◦ Y) = SK (X) xk,l yk,l uk ak,l log logρ(X ◦ Y) = sup ak,l A∈SK (X◦Y) k,l∈K (a) xk,l yk,l ≤ sup uk ak,l log ak,l ak,l A∈SK (X◦Y) k,l∈K xk,l yk,l (b) = sup uk ak,l log + uk ak,l log ak,l ak,l A∈SK (X◦Y) k,l∈K k,l∈K (c) xk,l yk,l ≤ sup uk ak,l log uk ak,l log + sup ak,l ak,l A∈SK (X◦Y) k,l∈K A∈SK (X◦Y) k,l∈K xk,l yk,l uk ak,l log uk ak,l log + sup = sup a a ˜ ˜ k,l k,l A∈SK (X) A∈SK (Y) k,l∈K
k,l∈K
(d)
≤ log ρ(X) + log ρ(Y)
1
In fact, as shown in [9], the bound holds for any nonnegative matrices X, Y. Here, we assume the irreducibility property because we are going to apply Theorem 1.2.
8
1 On the Perron Root of Irreducible Matrices
where (a) is due to the fact a2k,l ≤ ak,l ≤ 1 for any A ∈ SK (X ◦ Y), (b) follows from log(xy) = log(x) + log(y) for all x, y > 0, (c) holds since sup(f + g) ≤ sup f + sup g for any functions f, g, and (d) follows from Corollary 1.3 and ˜ ⊆ SK (X) and SK (Y) ˜ ⊆ SK (Y). the fact that SK (X) Remark 1.6. It is interesting to point out that a necessary condition for equality to hold in (1.10) is that there exists a diagonal matrix D with positive ρ(X) diagonal elements such that X = ρ(Y) DYD−1 . This is because equality in (c) can hold only if Y ∈ AK (X). By Observation 1.4, however, this is true if and only if there exists a diagonal matrix D with positive diagonal elements ρ(X) DYD−1 . such that X = ρ(Y) The next result provides an upper bound on the logarithm of ρ(X), thereby giving rise to another type of Perron root characterization. In Sect. 1.2.4, using different techniques, we generalize this result by considering F (ρ(X)) where F : R++ → R pertains to some class of continuous functions, of which the logarithmic function is a special case. Theorem 1.7. Let X ∈ XK , and let w := w(X) = (w1 , . . . , wK ) = p ◦ q ∈ Π+ K , where q and p are positive left and right eigenvectors of X, respectively. Then, for all s ∈ RK ++ , log ρ(X) ≤
wk log
k∈K
(Xs)k , sk
(1.11)
with equality if s = p. Proof. Let s ∈ RK ˆk = sk /pk , 1 ≤ k ≤ K, which is well++ be arbitrary and let s defined since pk > 0 for each 1 ≤ k ≤ K. Since X is irreducible, Xs > 0 for any s > 0 (see App. A.4). Hence, the right-hand side of (1.11) is well-defined. We have xl,k ql log(ˆ sk ) wk log(ˆ sk ) = pk qk log(ˆ sk ) = pk ρ(X) k∈K k∈K k∈K l∈K xl,k pk = pl q l log(ˆ sk ). ρ(X)pl l∈K
k∈K
xl,k pk Since k∈K ρ(X)p = 1, 1 ≤ l ≤ K, we can apply Jensen’s inequality (see for l instance, [10, p. 62] and [11, p. 76]) to obtain 1 xl,k pk Xs l log(ˆ sk ) ≤ log xl,k pk sˆk = log − log ρ(X) ρ(X)pl ρ(X)pl pl k∈K
k∈K
(1.12) for every 1 ≤ l ≤ K with equality if s = p. Now combining this with the previous equality and the fact that w1 = 1 yields
1.2 Some Bounds on the Perron Root and their Applications
9
Xs l Xs l log ρ(X) ≤ wl log − wl log(ˆ sl ) = wl log pl sl l∈K
l∈K
l∈K
for all s > 0, with equality if s = p. Remark 1.8. In general, s = p is not necessary for the equality in (1.11) to hold. For instance, if all rows of X ∈ XK have only one positive entry, then there is actually no sum in (1.12) for each 1 ≤ l ≤ K. Therefore, in such cases, the right-hand side of (1.12) is identically equal to the left-hand side, regardless of the choice of s. An example of a K × K irreducible matrix with only one positive entry in each row is the circulant matrix with the first row given by (0, 0, . . . , 0, α) for some constant α > 0 (see (1.16)). As an immediate consequence of Theorem 1.7, one obtains: + Corollary 1.9. Let X ∈ XK , p ∈ RK ++ and w ∈ ΠK be as in Theorem 1.7. Then, (Xs)k log ρ(X) = inf wk log (1.13) K sk s∈R++ k∈K
The infimum is attained if s = p. Recall that Theorem 1.2 has been used to prove an upper bound on ρ(X ◦ Y) (Corollary 1.5). Interestingly, if we replace the entry-wise product (or the Hadamard product) by normal matrix multiplication, ρ(XY) can be arbitrarily large on XK . So even if X ∈ XK and Y ∈ XK are fixed and known, not much can be said about ρ(XY). However, if instead of Theorem 1.2, we consider Theorem 1.7, it is possible to derive a lower bound for ρ(XY) on the following set of irreducible matrices generated by arbitrary X ∈ XK : WK (X) := {Y ∈ XK : q(Y) ◦ p(Y) = q(X) ◦ p(X) ∈ Π+ K } ⊂ XK . (1.14) In words, given any X ∈ XK , WK (X) is a set of those irreducible matrices Y such that q(Y) ◦ p(Y) = q(X) ◦ p(X) = w(X). Note that for any X ∈ XK , there holds X, XT ∈ WK (X) . So, if X is not symmetric, the cardinality of WK (X) is at least 2. Corollary 1.10. Suppose that X ∈ XK is given, and let Y ∈ WK (X) be arbitrary but chosen such that XY ∈ XK . Then, ρ(X)ρ(Y) ≤ ρ(XY) ,
(1.15)
with equality if p(Y) = p(X). Proof. Let w = q(X)◦p(X) = q(Y)◦p(Y). By assumption, X, Y, XY ∈ XK , from which we have ρ(X), ρ(Y), ρ(XY) > 0. Therefore, (1.15) is true if and only if
10
1 On the Perron Root of Irreducible Matrices
log ρ(X) + log ρ(Y) ≤ log ρ(XY) . Now by Theorem 1.7, k∈K
(XYs)k (Ys)k (XYs)k wk log = wk log sk (Ys)k sk k∈K
=
wk log
k∈K
(XYs)k (Ys)k + wk log (Ys)k sk k∈K
≥ log ρ(X) + log ρ(Y) for all s ∈ RK ++ . So, by the Collatz–Wielandt formula (Theorem A.35)),
(XYs)k (XYs)k ≤ min max log 1≤k≤K sk sk s∈RK ++ k∈K (XYs)k = log min max = log ρ(XY) . K sk s∈R++ 1≤k≤K
log ρ(X) + log ρ(Y) ≤ min
s∈RK ++
wk log
This proves the bound. Considering Theorem 1.7, we see that there is equality in (1.15) if p(X) = p(Y), and the corollary follows. As a consequence of the corollary and the remark before, one has ρ(X)ρ(XT ) ≤ ρ(XXT ) with equality if X is symmetric. Note that we have made the assumption XY ∈ XK since the set of irreducible matrices is not closed under matrix multiplication. For instance, the circulant matrix ⎞ ⎛ 0 ··· 0 0 1 1 ··· 0 0 0
T := ⎝ .. .. .. . . .
.. ⎠ ∈ RK×K +
··· . 0 ··· 1 0 0 0 ··· 0 1 0
(1.16)
and TK−1 are both irreducible. However, their product T TK−1 = I is not irreducible. The last result in this section is a simple application of Theorem 1.7. Again, the result gives rise to a Perron root characterization that is a special case of the Perron root characterizations presented in Sect. 1.2.4. + Theorem 1.11. Let X ∈ XK , p ∈ RK ++ , and w ∈ ΠK be as in Theorem 1.7. Then, (Xs)k ρ(X) ≤ wk (1.17) sk k∈K
for all s ∈
RK ++ ,
with equality if and only if s = p.
1.2 Some Bounds on the Perron Root and their Applications
11
Proof. By Theorem 1.7 and w1 = 1, we have (Xs)k (Xs)k wk log wk = wk log − log ρ(X) . 0≤ sk ρ(X)sk k∈K
k∈K
k∈K
Now since log(x) ≤ x − 1 for all x > 0 with equality if and only if x = 1, this implies that (Xs)k (Xs)k wk −1 = wk − wk 0≤ ρ(X)sk ρ(X)sk k∈K k∈K k∈K (1.18) (Xs)k 1 = wk −1 ρ(X) sk k∈K
from which the bound in (1.17) follows. Moreover, equality holds if and only if s = p. The following corollary is immediate. + Corollary 1.12. Let X ∈ XK , p ∈ RK ++ , and w ∈ ΠK be as in Theorem 1.7. Then, (Xs)k ρ(X) = inf wk . (1.19) sk s∈RK ++ k∈K
The infimum is attained if and only if s = p. We point out that s = p is necessary and sufficient for the equality in (1.17) to hold for all irreducible matrices, including the circulant matrix in (1.16). This is because log(x) ≤ x − 1 for x > 0 with equality if and only if x = 1. Therefore, there is equality in (1.18) if and only if s = p. This stands in some contrast to Theorem 1.7. 1.2.1 Concavity of the Perron Root on Some Subsets of Irreducible Matrices In this section, we apply the above results to obtain bounds on the Perron root of ˆ + μX, ˇ X(μ) := (1 − μ)X μ ∈ [0, 1] ˇ ∈ XK are given. Note that X(μ) ∈ XK for all 0 ≤ ˆ ∈ XK and X where X μ ≤ 1. In particular, the results show that the Perron root is concave on some subsets of XK . Given an arbitrary X ∈ XK , we are particularly interested in WK (X) ⊂ XK defined by (1.14). The first theorem is an application of Theorem 1.7 and asserts that the Perron root is log-concave on WK (X) (for the definition of log-concavity, see App. B.3).
12
1 On the Perron Root of Irreducible Matrices
ˆ X ˇ ∈ WK (X) are Theorem 1.13. Suppose that X ∈ XK is given and X, arbitrary. Then, ˇ μ ˆ 1−μ ρ(X) (1.20) ρ(X(μ)) ≥ ρ(X) for all μ ∈ (0, 1). ˜ be a positive right eigenvector Proof. Assume that μ ∈ (0, 1) is fixed, and let p ˆ ◦ p(X) ˆ = q(X) ˇ ◦ p(X). ˇ Since w1 = 1 of X(μ). Furthermore, let w = q(X) pk for each 1 ≤ k ≤ K, one has and ρ(X(μ)) = (X(μ)˜ p)k /˜ log ρ(X(μ)) =
wk log ρ(X(μ)) =
wk log
(X(μ)˜ p) k
p˜k k∈K k∈K ˆ p)k + μ(X˜ ˇ p)k (1 − μ)(X˜ wk log = , μ ∈ (0, 1) . p˜k k∈K
Hence, by concavity of the logarithmic function, ˆ ˇ (X˜ p)k (X˜ p)k wk log wk log log ρ(X(μ)) ≥ (1 − μ) +μ p˜k p˜k k∈K
k∈K
ˆ + μ log ρ(X) ˇ , ≥ (1 − μ) log ρ(X) where the last step follows from Theorem 1.7. Interestingly, we can obtain a significantly stronger assertion if instead of Theorem 1.7, we consider Theorem 1.11. ˆ and X ˇ be as in Theorem 1.13. Then, Theorem 1.14. Let X ˆ + μρ(X), ˇ ρ(X(μ)) ≥ (1 − μ)ρ(X)
μ ∈ (0, 1) .
(1.21)
ˆ = αˇ ˇ Moreover, strict inequality holds if there is no α > 0 such that p(X) p(X). ˇ ˆ ◦ p(X) ˆ = q(X) ˇ ◦ p(X). Proof. Let μ ∈ (0, 1) be arbitrary, and let w = q(X) ˜ is a positive right eigenvector of X(μ). Proceeding essentially Suppose that p as above yields ρ(X(μ)) =
wk ρ(X(μ)) =
k∈K
= (1 − μ)
k∈K
k∈K
wk
wk
(X(μ)˜ p)k p˜k
ˆ p)k ˇ p)k (X˜ (X˜ +μ wk . p˜k p˜k k∈K
Now considering Theorem 1.11 proves the bound. To prove strict concavity, note that by Theorem 1.11, ˆ = ρ(X)
k∈K
wk
ˆ p)k (X˜ p˜k
and
ˇ = ρ(X)
k∈K
wk
ˇ p)k (X˜ p˜k
1.2 Some Bounds on the Perron Root and their Applications
13
ˆ and p ˇ for some α1 , α2 > 0 or, equiva˜ = α1 p(X) ˜ = α2 p(X) if and only if p ˆ = αp(X). ˇ Hence, lently, if and only if there exists α > 0 such that p(X) (1 − μ)
k∈K
wk
ˆ p)k ˇ p)k (X˜ (X˜ ˆ + μρ(X), ˇ +μ wk > (1 − μ)ρ(X) p˜k p˜k
μ ∈ (0, 1) .
k∈K
ˆ = αp(X). ˇ if there is no α > 0 such that p(X) For any X ∈ XK , the theorem shows that the Perron root is concave on WK (X). It is interesting to notice that W2 (X) = X2 for any X ∈ X2 such that trace(X). Consequently, Theorem 1.14 implies that if K = 2, the Perron root is concave on the set of traceless irreducible matrices. ˆ ˇ ∈ AK (X) Furthermore, an examination of Observation 1.4 reveals that X ˆ for some given X if and only if ˇ ˇ = ρ(X) DXD ˆ −1 . X ˆ ρ(X) ˆ and p(X) ˇ = Dp(X), ˆ we see that X ˇ ∈ AK (X) ˆ ˇ = D−1 q(X) Now since q(X) implies ˆ ◦ Dp(X) ˆ = w(X) ˆ . ˇ = (D−1 q(X)) w(X) Hence, we can conclude that AK (X) ⊆ WK (X)
(1.22)
for any X ∈ XK . This gives rise to the following corollary. ˆ X ˇ ∈ AK (X). Corollary 1.15. Suppose that X ∈ XK is given, and let X, Then, ˇ μ ˆ 1−μ ρ(X) ρ(X(μ)) ≥ ρ(X)
and
ˆ + μρ(X) ˇ ρ(X(μ)) ≥ (1 − μ)ρ(X)
for all μ ∈ (0, 1). Proof. Combine (1.22) with Theorems 1.13 and 1.14. Another consequence of Observation 1.4, the relationship (1.22) and Theorem 1.14 is the following. Corollary 1.16. Let X ∈ XK , and let D = I be an arbitrary diagonal matrix with positive diagonal entries. Then, ρ((1 − μ)X + μDXD−1 ) ≥ (1 − μ)ρ(X) + μρ(DXD−1 ) = ρ(X)
(1.23)
for all μ ∈ (0, 1). Finally, as XT ∈ WK (X) for any X ∈ XK , it follows from Theorem 1.14 that ρ((1 − μ)X + μXT ) is a concave function of μ ∈ [0, 1]. If, in addition, X = XT , we have p(X) = αp(XT ) for any constant α. This leads to the following corollary. Corollary 1.17. Let X with X = XT be given. Then, ρ((1 − μ)X + μXT ) is a strictly concave function of μ ∈ [0, 1].
14
1 On the Perron Root of Irreducible Matrices
1.2.2 Kullback–Leibler Divergence Characterization The Kullback–Leibler divergence (KLD) between two probability mass functions is one of the fundamental concepts in information theory and also in other fields like statistics and physics [12, 13, 14]. In fact, the mutual information of two random variables is equal to the KLD between their joint distributions and product distributions [13]. In this section, we characterize the Perron root of irreducible matrices in terms of the KLD generalized to any positive discrete measure. Let us start with a formal definition of the KLD. Definition 1.18. Suppose that x is a realization of a discrete random variable with the set of possible values in X. Then, the KLD between two probability mass functions f (x), x ∈ X, and g(x), x ∈ X, is defined as [13] D(f g) :=
x∈X
f (x) log
f (x) g(x)
(1.24)
where we considered (C.1-1). Note that the KLD is not a distance since it is not symmetric D(f g) = D(gf ) and does not satisfy the triangle inequality. Nevertheless, D(f g) is always nonnegative with D(f g) = 0 if and only if f = g. This immediately follows from the fact that log(x) ≤ x − 1 for every x > 0 with equality if and only if x = 1. Hence,
f (x) g(x) =− f (x) log g(x) f (x) x∈X x∈X f (x) − g(x) = 1 − 1 = 0 ≥
D(f g) =
f (x) log
x∈X
(1.25)
x∈X
with equality if and only if f = g. The definition of the KLD can be generalized in a natural manner to any nonnegative discrete measure as follows. Definition 1.19. Assume (C.1-1). Let X ∈ XK , and let A ∈ SK , where SK is the set of stochastic matrices. Then, ak,l ak,l log , 1≤k≤K (1.26) D(a(k) x(k) ) = xk,l l∈K
is the (generalized) KLD between the kth row of A and the kth row of X denoted by a(k) and x(k) , respectively. Note that due to (C.1-1), the matrix A in the above definition can be a stochastic nonpositive matrix. Comparing (1.26) with
(1.4) reveals that the (k) x(k) ). left-hand side of (1.4) multiplied by −1 is equal to k∈K uk D(a Combining this with Corollary 1.3 gives rise to a relationship between the KLD and the Perron root of nonnegative irreducible matrices.
1.2 Some Bounds on the Perron Root and their Applications
15
Observation 1.20. Let X ∈ XK be arbitrary, and let SK (X) be a set of irreducible stochastic matrices defined by (1.3). Then, log
1 = inf uk D(a(k) x(k) ) ρ(X) A∈SK (X)
(1.27)
k∈K
where u ∈ Π+ K is the left Perron eigenvector of A. The infimum in (1.27) is attained if and only if (A)k,l = ak,l =
xk,l pl , ρ(X)pk
1 ≤ k, l ≤ K
(1.28)
where p is a positive right eigenvector of X. On the other hand, when we consider Theorem 1.7, one obtains the following characterization. Observation 1.21. Given X ∈ XK , let w = q ◦ p ∈ Π+ K , where q and p are positive left and right eigenvectors of X, respectively. Then, log ρ(X) = inf D(ws) − D(wXs) . (1.29) s∈RK ++
The infimum is attained if s = p. 1.2.3 A Rate Function Representation for Large Deviations of Finite Dimensional Markov Chains In this section, we point out another interesting connection of a Perron root characterization to the rate function for large deviations of finite-state Markov chains. The basics of large deviations theory presented here can be found in [15]. Throughout, let S = {1, . . . , K} be the sample space, and suppose that X1 , . . . , Xn , . . . is a finite-state Markov chain with Xj taking values in S and with the initial state x0 ∈ S. The corresponding probability transition matrix P = (pi,j ) ∈ SK is assumed to be irreducible, where SK ⊂ [0, 1]K×K is the set of all stochastic matrices. For such Markov chains, we are interested in the limiting behavior of the following empirical mean Sn =
n n 1 1 11 (Xj ), . . . , 1K (Xj ) 1(Xj ) = n j=1 n j=1 1 Xj = i ∈ S 1i (Xj ) = . 0 Xj = i ∈ S
(1.30)
n Note that 1/n j=1 1i (Xj ) indicates the frequency with which the state i ∈ S is visited by the Markov chain. Hence, for every n ∈ N, the random vector
16
1 On the Perron Root of Irreducible Matrices
Sn is an element of the standard simplex ΠK . Moreover, {Sj }nj=1 is a Markov chain taking values in ΠK . Now suppose that Px0 is the corresponding Markov probability measure on ΠK (given the probability transition matrix P and the initial state x0 ∈ S). In other words, for any A ⊆ ΠK and n ∈ N, Px0 (Sn ∈ A) with Px0 (Sn ∈ ΠK ) = 1 is the probability that Sn is a member of A. Then, the large deviation principle (LDP) characterizes the limiting behavior of these probabilities as n tends to infinity. Definition 1.22. We say that the empirical mean Sn satisfies the LDP with a rate function R : RK → [0, ∞] if, for every A ⊆ RK ⊃ ΠK and any initial state x0 ∈ S, there holds −
inf
x∈int(A)
R(x) ≤ lim inf n→∞
1 log Px0 (Sn ∈ A) n
1 lim sup log Px0 (Sn ∈ A) ≤ − inf R(x) x∈cl(A) n→∞ n where int(A) and cl(A) denote the interior and closure of A, respectively. Note that the rate function R : RK → [0, ∞] is a lower semicontinuous mapping so that the sublevel sets {x ∈ RK : R(x) ≤ α} are closed for all α ∈ [0, ∞) [11, p. 78]. A rate function is said to be good if the sublevel sets are compact sets. Hence, the “goodness” means that the rate function attains its infimum over closed sets. In the case of independent identically distributed (i.i.d.) random vectors (in RK ), the Cramer’s theorem characterizes the rate function in terms of the Fenchel–Legendre transform [15, pp. 26–36]: (1.31) R∗ (x) := sup u, x − log M (u) u∈RK
where M : RK → (−∞, ∞] is the corresponding moment generating function. An important extension of this result to the non-i.i.d. case is known as the G¨ artner-Ellis theorem [15, pp. 43–44]. Roughly speaking, this theorem states that the LDP holds for a sequence of random vectors {Xn }, Xn ∈ RK , with the good rate function R∗ given by (1.31), provided that the logarithmic moment generating function exists,2 is lower semicontinuous and satisfies some smoothness conditions. In particular, the G¨ artner-Ellis theorem can be applied to the Markov chain (1.30) in which case the Perron root of the irreducible matrix P(u) = (pi,j (u)) given by pi,j (u) = pi,j e u,1(j) , i, j ∈ S plays the role of the logarithmic moment generating function R(u) [15, pp. 73–74]. 2
In this case, log M is defined as the limit log M (u) = limn→∞ log Mn (nu)/n where Mn is the moment generating function of Xn .
1.2 Some Bounds on the Perron Root and their Applications
17
Theorem 1.23. The empirical mean Sn satisfies the LPD with the good rate function R : RK → [0, ∞] given by R(x) = sup u, x − log ρ(P(u)) . (1.32) u∈RK
It is worth pointing out that the proof relies on the differentiability of log ρ(P(u)) and Theorem A.36. Hence, the theorem holds for any Markov chain that possesses these two properties. The following theorem of Sanov establishes a connection to the previous results in this book [15, p. 76]. Theorem 1.24. There holds R(z) = J(z) := sup s>0
k∈K
zk log
sk (sT P)k
if
z ∈ ΠK
(1.33)
and R(z) = +∞ if z ∈ / ΠK . Comparing this with Theorem 1.7 (or Corollary 1.9) reveals that −J(w) = log ρ(PT ) = log ρ(P) = 0
(1.34)
where z = w = q ◦ p ∈ ΠK and the last equality follows since the spectral radius of stochastic matrices is equal to 1 (see the text after Definition A.26). On the other hand, for any z ∈ ΠK and X = PT , one has
(sT P)k (sT P)k ≤ inf max log s>0 u>0 1≤k≤K sk sk k∈K (Xs)k (a) = log inf max = log ρ(X) = log ρ(P) = 0 s>0 1≤k≤K sk
−J(z) = inf
zk log
(1.35)
where equality (a) is due to the Collatz–Wielandt formula (Theorem A.35). Note that (1.35) can also be directly deduced from Theorem 1.24 as the rate function is a nonnegative function. Equation (1.34) in turn shows that the rate function is zero if z = w = q ◦ p. Now since 1 = (1, . . . , 1) is a positive right eigenvector of P associated with ρ(P), we can conclude that J(z) = 0 if z = q, that is, if z is the left Perron eigenvector of P, which is the unique stationary distribution of the Markov chain. Now the following theorem shows that the converse holds as well. Theorem 1.25. Suppose that J(z) = log ρ(P) = 0 holds for some z ∈ ΠK . Then, z = w = q. Remark 1.26. Actually, the theorem is an instance of Theorem 1.29 in the next section with F (x) = − log(x), x > 0. Below we provide an alternative proof for this special case that does not resort to the condition (1.40).
18
1 On the Perron Root of Irreducible Matrices
Proof. Let X = PT . Observe that if there exists z = w so that log ρ(X) = inf
s>0
zk log
k∈K
(Xs) k
sk
= −J(z) > −∞
then, by the proof of Theorem 1.11 and (1.35), ρ(X) = inf
s>0
zk
(Xs)k (Xs)k = inf zk . s>0 sk sk s1 =1 k∈K
k∈K
Consequently, we can confine our attention to this linear case. Furthermore, due to the ray property, we can assume that s1 = 1. Let u(μ) = (1−μ)ˆ u+μˇ u ˆ = log(ˆs) and u ˇ = log(ˇs) for some arbitrary ˆs > 0 and ˇs > 0 with ˆs = ˇs with u ˆ = c + u ˇ for any c ∈ R. We write and ˆs1 = ˇs1 = 1. From this, we have u
z (Xs) /s as a function of u(μ) to obtain k k k k HX (u(μ)) :=
zk
k∈K
(Xeu(μ) )k ,0 ≤ μ ≤ 1 euk (μ)
(1.36)
and prove that HX (u(μ)) is a strictly convex function of μ ∈ (0, 1). We have
(Xe(1−μ)ˆu+μˇu )k e(1−μ)ˆuk +μˇuk k∈K 1−μ μ (a) (Xeuˆ )k (Xeuˇ )k ≤ zk euˆk euˇk k∈K 1−μ μ (b) (Xeuˆ )k (Xeuˇ )k ≤ zk zk euˆk euˇk
HX (u(μ)) =
zk
k∈K
(c)
≤ (1 − μ)
k∈K
k∈K
(Xeuˇ )k (Xeuˆ )k zk + μ zk euˆk euˇk k∈K
for every 0 < μ < 1, where both (a) and (b) follow from H¨ older inequality (2.18) and (c) is a consequence of the well-known fact that the arithmetic mean bounds from above the geometric mean (B.18). So, HX (u(μ)) is convex. Now suppose that there is equality in (b). We write the irreducible matrix ¯ + X, ˜ where X ¯ = (¯ X ≥ 0 as X = X xk,l ) ≥ 0 is irreducible with exactly one ˜ is some nonnegative matrix. Thus, positive entry in each row and X HX (u(μ)) = HX ¯ (u(μ)) + HX ˜ (u(μ))
(1.37)
where HX ¯ (u(μ)) is strictly ˜ (u(μ)) is convex by the above derivation. So, if HX convex, so also is HX (u(μ)). In particular, by the equality condition in (A.5) and (1.37), if equality in (b) holds, then there exists a constant a ∈ R such that
1.2 Some Bounds on the Perron Root and their Applications
19
ˆ π(k) u ¯k,π(k) euˆ π(k) euˆ k ax ae = e , xk,π(k) > 0 ˇ π(k) = e u u ˇ u k e x ¯k,π(k) e e ˇ π(k) ¯ π : {1, . . . , K} → {1, . . . , K} is a permutawhere, due to irreducibility of X, tion with π(k) = k (no identity). So, we have u ˆk − u ˇk = a + u ˆπ(k) − u ˇπ(k) for each 1 ≤ k ≤ K. Now since π is a permutation on {1, . . . , K}, this can be satˆ = c+ u ˇ for some c ∈ R, which contradicts u ˆ = c+ u ˇ . This isfied if and only if u (u(μ)). proves the strict inequality in (b), and with it, strict convexity of H X
This in turn implies that E(z) = mins>0 k zk (Xs)k /sk is a strictly concave function of z ∈ ΠK . This is because otherwise there would exist μ ∈ (0, 1) and ˆ = z ˇ, such that, using z(μ) = (1 − μ)ˆ ˆ, z ˇ ∈ ΠK , z z + μˇ z ∈ ΠK , z u (Xe )k E(z(μ)) = inf z(μ) u K e k u∈R
∀1≤k≤K
k∈K
= (1 − μ) inf
u∈RK
(Xeu )k (Xeu )k ˆ u ˇ u + μ inf z z e k e k u∈RK
k∈K
k∈K
= (1 − μ)E(ˆ z) + μE(ˇ z) > −∞ ˆ=z ˇ, which where all the infima are equal and, by the strict convexity result, z ˆ = z ˇ. So, E(z) defined above is strictly concave and, by (1.34) contradicts z and (1.35), z = w > 0. This proves the theorem. If X = PT is irreducible but not stochastic, then the theorem remains valid with J(z) = 0 substituted by −J(z) = log ρ(P). Combining the results and observations in this section with Theorem 1.7 (or Corollary 1.9) provides a saddle point characterization of the Perron root ρ(X) expressed in terms of the function G : RK ++ × ΠK → R defined to be (see also the following section) (Xs) k . (1.38) G(s, z) = zk log sk k∈K
T
Indeed, it follows that if X = P is irreducible, then log ρ(X) = inf sup G(s, z) = sup inf G(s, z) = sup (−J(z)) . s>0 z∈ΠK
z∈ΠK s>0
z∈ΠK
The pair (p, w) is a saddle point of G (see App. B.4.4 for the definition of a saddle point) where w = q◦p > 0 is unique in Π+ K and the positive right eigenvector p > 0 is unique up to positive multiples. In the next section, this saddle point property is investigated in a more general setting with the logarithm in (1.38) substituted by a function belonging to some class of continuously differentiable functions. For extensions of the saddle-point characterizations to reducible matrices, the reader is referred to Sect. 1.7.4. Finally, we point out that, in Sect. 5.9.2, we use the results of Sect. 1.6 to gain insight into the problem of the existence and uniqueness of a positive maximizer in (1.33) with V = PT being any irreducible matrix (not necessarily stochastic one). Using the terminology of power control theory (the second part of the book), such a maximizer is called a log-SIR fair power allocation.
20
1 On the Perron Root of Irreducible Matrices
1.2.4 Some Extended Perron Root Characterizations As already mentioned, this section generalizes some of the previous results to a certain class of functions that depends on a nonnegative irreducible matrix X. Although the following definition of this function class may appear a bit artificial, the reader will be convinced of its importance in the second part of the book. In particular, it will become clear that for functions from outside of this class, the network problems may be hardly tractable or even not solvable efficiently under real-world conditions. Definition 1.27 (Function Class G(X)). Let X ∈ XK be given. Say that a continuous function φ : R++ → R pertains to a function class G(X) (written as φ ∈ G(X)) if (i) φ is continuously differentiable and strictly increasing, K (ii) for any fixed z ∈ Π+ K , the function H : R++ → R given by (Xs)k zk φ H(s) := sk
(1.39)
k∈K
attains its infimum on RK ++ . Moreover, all local minima are global and ∇H(s∗ ) = 0
(1.40)
is necessary and sufficient for s∗ ∈ RK ++ to be a global minimizer. Remark 1.28. Note that members of G(X) are not necessarily bijections from R++ onto R. In fact, they are injective functions (one-to-one maps) from R++ into R so that their ranges are in general subsets of R. The definition could also be modified to include the case of a function defined on some arbitrary open subset of R++ . The second requirement ensures that the interval {H(s) : s ∈ RK ++ } is bounded below (see Sect. B.1) for any fixed vector z > 0 and the greatest lower bound, which is the infimum of H over all positive vectors, is attained for some s∗ ∈ RK ++ if and only if (1.40) holds. However, this is not equivalent to saying that H : RK ++ → R is a convex function. Finally, note that (1.40) is a necessary optimality condition as RK ++ is an open subset of RK . Two prominent examples of functions belonging to G(X) for some X ∈ XK have already been considered in the foregoing sections. These are φ(x) = log(x), x > 0, and φ(x) = x, x > 0. It is obvious that in these two special cases, the first requirement in the above definition is satisfied. When φ is the linear function, it can be seen that the function H has a global minimum + on RK ++ for any choice of X ∈ XK and z ∈ ΠK . In contrast, this does not need to be true if φ is the logarithmic function. For instance, if X ∈ XK is equal to the circulant matrix given by (1.16), then {H(s) : s ∈ RK ++ } may fail
1.2 Some Bounds on the Perron Root and their Applications
21
to be bounded below.3 So, the logarithmic function pertains to G(X) if X is confined to some subset of XK for which H has a global minimum on RK ++ for any z ∈ Π+ K . Note that the set is not empty since it contains all positive matrices. If the infimum is attained, the requirement that every local minimum is a global one is satisfied in these two cases as well. This is because, with these choices of φ, the problem of minimizing H(s) over RK ++ can be transformed into an equivalent convex problem using the substitution x = log(s), s ∈ RK ++ . More precisely, using the results of Chap. 6, the function He (x) = H(ex ), x ∈ RK , can be shown to be convex when either φ(x) = x, x > 0, or φ(x) = log(x), x > 0. Therefore, every stationary point x∗ of He satisfying ∇He (x∗ ) = 0 is a global minimizer of He (x) over RK . At the same time, we have ∇He (x∗ ) = 0 if and only if ∇H(s∗ ) = 0 with x∗ = log(s∗ ), from which we conclude that every stationary point s∗ satisfying (1.40) is a global minimum of H(s) over RK ++ . For formal proofs, the reader is referred to Chap. 6. For the purpose of this section, it is sufficient to assume that φ ∈ G(X). The first result gives rise to a characterization of φ(ρ(X)) in terms of the T minima of H(s) over RK ++ . Recall that by assumption, q p = 1 implying that . p ◦ q ∈ Π+ K Theorem 1.29. Let X ∈ XK be given, and let φ ∈ G(X) be arbitrary. Suppose that w = p ◦ q ∈ Π+ K , which is a unique vector. Then, (Xs)k wk φ ≥ φ(ρ(X)) (1.41) sk k∈K
for all s ∈ RK ++ . Equality holds if s = p > 0 (unique up to positive multiples). H(s) with H(s) defined by (1.39) if and only if Moreover, p = arg mins∈RK ++ z = w > 0.
Remark 1.30. In fact, we have H(s) = k zk (Xs)k /sk ≥ φ(ρ(X)) for all s ∈ RK ++ if and only if z = w, which is more than what the theorem asserts. This immediately follows from Lemma 1.33 stated later in this section. Proof. Consider the following minimization problem: (Xs)k min H(s) = min zk φ sk s∈RK s∈RK ++ ++ k∈K
for some given z ∈ Π+ K . By assumption, the minimum exists and (1.40) is a necessary and sufficient condition for characterizing all global minimizers. Hence, s∗ > 0 minimizes H over RK ++ if and only if 3
Note that if w = q(X)◦p(X), then the interval is bounded below for any X ∈ XK since then, by Theorem 1.7, H(s) ≥ log ρ(X) for all s ∈ RK ++ , with ρ(X) > 0 whenever X ∈ XK .
22
1 On the Perron Root of Irreducible Matrices K
zj φ
j=1
(Xs∗ )j s∗j
∗ ∗ Xjk (Xs )k (Xs )k = z φ , k s∗j s∗k (s∗k )2
Using u(z, s) :=
z
1
s1
,
1≤k≤K.
z2 zK ,..., s2 sK
(1.42)
(1.43)
we can write (1.42) in matrix form to obtain (F (s∗ )X)T u(z, s∗ ) = F (s∗ )(Γ(s∗ ))−1 u(z, s∗ ) s1 sK with Γ(s) := diag (Xs) and , . . . , (Xs) 1 K (Xs)1 (Xs)K F (s) := diag φ ,...,φ . s1 sK
(1.44)
(1.45)
Now since (Xp)k /pk = ρ(X), 1 ≤ k ≤ K, it follows that F (p) = φ (ρ(X))I with φ (ρ(X)) > 0. Thus, if s∗ = p, the necessary and sufficient optimality condition (1.44) becomes XT u(z, p) = ρ(X)u(z, p) . So, p is a global minimizer if and only if u(z, p) = q. An examination of (1.43) reveals that this is true if and only if z = w = p ◦ q ∈ Π+ K , which is uniquely defined in Π+ K since p > 0 and q > 0 are unique eigenvectors of X ∈ XK up to a positive scaling factor. This proves the second part of the theorem. However, the lower bound (1.41) immediately follows as p is a global minimizer if z = w, and therefore, due to w1 = 1, we have (Xs)k (Xp)k wk φ wk φ wk φ(ρ(X)) = φ(ρ(X)) = = min sk pk s∈RK ++ k∈K
k∈K
k∈K
where p > 0 is unique up to positive multiples. It should be emphasized that due to positivity of q and p for any X ∈ XK , the weight vector w = p ◦ q is automatically positive. Hence, K addends appear in (1.41). 1.2.5 Collatz–Wielandt-Type Characterization of the Perron Root Based upon Theorem 1.29, in this section, we prove a saddle point characterization of the Perron root. Because of similarity to the Collatz–Wielandt formula [5], the characterization is referred to as Collatz–Wielandt-type characterization of the Perron root. A definition of a saddle point of a continuous function and some related definitions can be found in App. B.4.4. First consider the following simple lemma.
1.2 Some Bounds on the Perron Root and their Applications
Lemma 1.31. For any φ ∈ G(X) and X ∈ XK , there holds (Xs)k zk φ = φ(ρ(X)) . min max + sk s∈RK ++ z∈ΠK
23
(1.46)
k∈K
Moreover, the minimum is attained if and only if s = p. K Proof. For any z ∈ Π+ K and s ∈ R++ , we have (Xs)k (Xs)k zk φ ≤ max φ . 1≤k≤K sk sk k∈K
As z is positive, equality holds if and only if φ((Xs)k /sk ) = c, k = 1 . . . K, for some c ∈ R. Thus, since φ is strictly monotonic and X is irreducible, we see that the equality holds if and only if s = p > 0, in which case φ((Xp)k /pk ) = φ(ρ(X)) for each 1 ≤ k ≤ K. Moreover, for any z ∈ Π+ K , one has (Xs)k (Xs)k min zk φ ≤ min max φ K K sk sk s∈R++ s∈R++ 1≤k≤K k∈K (1.47) (Xs)k = φ min max = φ ρ(X) 1≤k≤K sk s∈RK ++ where, due to the assumption, the first minimum exists and the last equality follows from the Collatz–Wielandt formula for irreducible matrices (Theorem A.35). Equality holds if and only if s = p where p > 0 is unique (up to positive multiples) for any X ∈ XK . Now considering Theorem 1.29 proves the lemma. For the max-min part of the saddle point characterization, we need the following lemma. Lemma 1.32. Let X ∈ XK and φ ∈ G(X) be given. Then, the continuous function E : Π+ K → R defined by (Xs)k E(z) := min H(s) = min zk φ (1.48) sk s∈RK s∈RK ++ ++ k∈K
is strictly concave. Proof. Since H(αs) = H(s) for any α > 0, we can assume that s1 = 1. Concavity of E(z) is clear from the properties of the minimum operator. So we only need to show strict concavity. To this end, assume that E : Π+ K →R + ˇ ˆ = z ˇ ˆ ∈ Π+ and z ∈ Π with z is not strictly concave. Then, there must exist z K K such that
24
1 On the Perron Root of Irreducible Matrices
E(z(μ)) = (1 − μ)E(ˆ z) + μE(ˇ z) (Xs) (Xs) k k zˆk φ +μ min zˇk φ = (1 − μ) min K s s s∈RK s∈R k k ++ k∈K ++ k∈K ˆ H(s)
ˇ H(s)
for some μ ∈ (0, 1) where z(μ) = (1 − μ)ˆ z + μˇ z ∈ Π+ K . Clearly, the equality holds if and only if one of the following holds. (i) there exist ˆs ∈ RK s ∈ RK s = c ˇs for all c > 0 such that ++ and ˇ ++ with ˆ ˆ ˇ H(ˆs) = H(ˇs) = E(ˆ z) = E(ˇ z). ˆ ∗ ˇ ∗ (ii) there exists s∗ ∈ RK z) = E(ˇ z). ++ such that H(s ) = H(s ) = E(ˆ First, we consider (i). Let μ ∈ (0, 1) be arbitrary and define ˜s(μ) ∈ RK ++ as H(˜s(μ)) = E(z(μ)). In words, ˜s(μ) minimizes the function H with the weight vector being equal to z(μ). Then, we have ˆ s) + μH(ˇ ˇ s) = E(z(μ)) = H(˜s(μ)) = (1 − μ)H(˜ ˆ s(μ)) + μH(˜ ˇ s(μ)) . (1 − μ)H(ˆ This however contradicts ˆs = c ˇs for all c > 0, and hence disproves (i). Now let us turn our attention to (ii). By assumption, s∗ minimizes H over K R++ if and only if (1.40) is satisfied. Thus, proceeding essentially as in the proof of Theorem 1.29 shows that H attains its minimum at s∗ ∈ RK ++ for ˆ and z ˇ if and only if both z ˜=z ˆ and z ˜=z ˇ z, s∗ ) = F (s∗ )(Γ(s∗ ))−1 u(˜ z, s∗ ), z (F (s∗ )X)T u(˜ where F (s) is defined by (1.45), Γ(s) = diag(s1 /(Xs)1 , . . . , sK /(Xs)K ) is positive definite, and u(˜ z, s) = (˜ z1 /s1 , . . . , z˜K /sK ). Due to strict monotonicity of φ, the diagonal elements of F (s) are positive for all s ∈ RK ++ . Therefore, F (s∗ ) is invertible and Γ(s∗ )F (s∗ )−1 XT F (s∗ ) u(˜ z, s∗ ) = u(˜ z, s ∗ ) . Since X is irreducible, so also is A(s∗ ) := Γ(s∗ )F (s∗ )−1 XT F (s∗ ). This in turn implies that A(s∗ ) has unique (up to positive multiples) positive left and ˆ, z ˇ ∈ Π+ z, s∗ ) = u(ˇ z, s∗ ) right eigenvectors. Hence, since z K , we must have u(ˆ ˆ=z ˇ. But this contradicts z ˆ = z ˇ, and therefore completes or, equivalently, z the proof. Now we are in a position to prove the max-min part of the saddle point characterization. Lemma 1.33. Let X ∈ XK be arbitrary. Then, for any φ ∈ G(X), (Xs)k max min zk φ = φ(ρ(X)) . sk s∈RK z∈Π+ ++ K k∈K
(1.49)
1.3 Convexity of the Perron Root
Moreover, w = arg max+ min
z∈ΠK s∈RK ++
k∈K
(Xs)k zk φ sk
25
(1.50)
if and only if w = q ◦ p, which is a unique vector in Π+ K. Proof. Proceeding as in the proof of Lemma 1.31 yields (Xs)k (Xs)k U (z) := min zk φ ≤ min max φ = φ(ρ(X)) 1≤k≤K sk sk s∈RK s∈RK ++ ++ k∈K
for any z ∈ Π+ K . On the other hand, by Theorem 1.29, (Xs)k (Xp)k min wk φ wk φ = = φ(ρ(X)) sk pk s∈RK ++ k∈K
k∈K
and therefore w ∈ Π+ K is a maximizer of U . However, by Lemma 1.32, the function U is strictly concave, and hence w = p ◦ q is a unique maximizer in Π+ K. Now let us combine these results to obtain a saddle point characterization of the Perron root ρ(X). + Theorem 1.34. Let X ∈ XK , φ ∈ G(X) be given. Define G : RK ++ × ΠK → R as (Xs)k zk φ G(s, z) := . (1.51) sk k∈K
Then, + (a) the pair (p, w) ∈ RK ++ × ΠK is a saddle point of G, and φ ρ(X) = min max+ G(s, z) = max min G(s, z) , + s∈RK ++ z∈ΠK
(1.52)
z∈ΠK s∈RK ++
(b) p ∈ R++ is unique up to positive multiples, (c) w = q ◦ p is a unique vector in Π+ K.
1.3 Convexity of the Perron Root So far we have exclusively dealt with the Perron root of a fixed irreducible matrix. Beginning with this section, we shift our attention to a class of matrixvalued functions that maps a given convex parameter set Ω ⊆ RK into a subset of XK . This gives rise to the definition of a continuous function that maps Ω into the set of positive reals such that the output values of this function are equal to the Perron roots of the corresponding irreducible matrices. This map is of main interest in this section. To make our statements precise, we need to introduce some new definitions.
26
1 On the Perron Root of Irreducible Matrices
1.3.1 Some Definitions Let Qk ⊆ R, k = 1, . . . , K, be arbitrary nonempty open intervals on the real line, and let the parameter set Ω be the Cartesian product of these intervals: Ω := Q1 × · · · × QK ⊆ RK .
(1.53)
So, Ω is an open convex set. Suppose that {xk,l (ω) : Ω → R+ , 1 ≤ k, l ≤ K, ω ∈ Ω} is a collection of continuous functions. We write these functions in matrix form to obtain X(ω) = (xk,l (ω))1≤k,l≤K which is nothing but a matrix-valued function from Ω into NK , i.e., we have X : Ω → NK . Definition 1.35. We say that X : Ω → NK (X : Ω → XK ) is nonnegative (irreducible) on Ω if X(ω) ∈ NK (X(ω) ∈ XK ) for every fixed ω ∈ Ω. The set of all nonnegative (irreducible) matrix-valued functions on Ω is denoted by NK (Ω) (XK (Ω) ⊂ NK (Ω)). MK (Ω) ⊂ XK (Ω) is used to denote the set of all positive matrix-valued functions on Ω. Unless otherwise stated, it is assumed that X ∈ XK (Ω). Obviously, for every fixed ω ∈ Ω, ρ(X(ω)) is the Perron root of X(ω) ∈ XK . Hence, since the spectral radius of any matrix varies continuously with the matrix entries (Theorem A.8), ρ(X) : Ω → R++ is a continuous function. To avoid cumbersome notation, alongside ρ(X(ω)), we define λp (ω) := ρ(X(ω)),
ω ∈ Ω.
Moreover, for every fixed ω ∈ Ω, λp (ω) is referred to as the Perron root of ˆ ∈ Ω and ω ˇ ∈Ω X(ω) (or simply the Perron root). Throughout this chapter, ω are two arbitrary fixed parameter vectors, and ˆ + μω, ˇ ω(μ) := (1 − μ)ω
μ ∈ [0, 1]
is their convex combination. Given any μ ∈ [0, 1], the Perron roots of ˆ and λp (ω), ˇ reˆ and X(ω) ˇ are designated by λp (ω(μ)), λp (ω) X(ω(μ)), X(ω) spectively. This section is devoted to the problem of convexity of the Perron root λp (ω). More precisely, we are going to find out under which conditions on the matrix entries the Perron root is a convex function on the parameter set Ω. Interestingly, even if each entry of X(ω) is convex on Ω, simple examples show that this property is in general not inherited by the Perron root. Indeed, it is shown that for the Perron root to be convex for any choice of X ∈ XK (Ω), it is necessary and sufficient that each entry of X(ω) is log-convex on Ω. For the precise definition of log-convexity and some related results, the reader is referred to App. B.3.
1.3 Convexity of the Perron Root
27
Remark 1.36. Note that there is a subtle discrepancy between the standard definition of log-convexity and our definition. Indeed, as the logarithmic function is defined for positive reals, any log-convex function is by definition positive. In contrast, xk,l (ω) is nonnegative, and therefore may take zero value on Ω. To avoid this problem, we consider the extended-value logarithm by taking log(0) = −∞. Using this convention, the zero function xk,l (ω) ≡ 0 is log-convex on Ω. Furthermore, if xk,l (ω) = 0 for some ω ∈ Ω and xk,l is log-convex, then xk,l (ω) ≡ 0 for all ω ∈ Ω. The only reason for the extension is that it enables us to refer to the identically zero function as a log-convex function. Throughout the book, we use the following definition, which is a straightforward extension of the above notion of log-convexity to matrix-valued functions. Definition 1.37. We say that X ∈ NK (Ω) is log-convex (on Ω) if and only ˆ ω ˇ ∈ Ω, we have if for each 1 ≤ k, l ≤ K and all ω, ˆ 1−μ xk,l (ω) ˇ μ, xk,l (ω(μ)) ≤ xk,l (ω)
μ ∈ (0, 1) .
(1.54)
The set of all log-convex matrix-valued functions is denoted by LCK (Ω). In particular, note that xk,l (ω) ≡ 0 for all ω ∈ Ω satisfies (1.54), and therefore is log-convex on Ω. ˆ ω ˇ ∈ Ω with ω ˆ = ω, ˇ then the If there is strict inequality in (1.54) for all ω, function xk,l is said to be strictly log-convex. If X(ω) is confined to be irreducible on Ω, then LCK (Ω) should be considered to be a subset of XK (Ω). Otherwise, we have LCK (Ω) ⊂ NK (Ω). It is important to notice that logconvexity of X(ω) is an additional property on top of nonnegativity or irreducibility. For instance, given any collection of nonnegative and nonzero vectors {v(k,l) ∈ RK + , = 0, 1 ≤ k, l ≤ K}, X : Ω → NK given by X(ω) k,l = xk,l (ω) = v(k,l) , ω K K is nonnegative on Ω = RK + (X ∈ NK (R+ )) and irreducible on R++ (X ∈ K K )). Yet X ∈ / LC (R ) and X ∈ / LC (R ). In contrast, if XK (RK K K ++ + ++ X(ω) k,l = xk,l (ω) = v(k,l) , eω
where eω = (eω1 , . . . , eωK ), then X ∈ LCK (RK ) ⊂ XK (RK ). As far as applications in wireless networks are concerned, X(ω) usually has the following special form
where V ∈ XK and
X(ω) = Γ(ω)V
(1.55)
Γ(ω) := diag γ1 (ω1 ), . . . , γK (ωK )
(1.56)
28
1 On the Perron Root of Irreducible Matrices
with γk : Qk → R++ being a twice continuously differentiable and bijective (and hence also strictly monotonic) function. In fact, except for Sects. 1.3.2 and 1.3.3, we will restrict our attention to this special form. In case of wireless applications, the matrix V in (1.55) may change over time t ∈ R according to some stochastic process. Therefore, rather than with X(ω), one has to deal with X : Ω × R → XK and λp : Ω × R → R++ . In order to capture the effect of these variations on the network performance, it is often sufficient to assume that V is piecewise constant on R, that is, given any k ∈ Z and T > 0, we have V(t) = V(k) for all t ∈ [kT, (k + 1)T ) where V(k) ∈ XK is a randomly chosen matrix. The probability distribution of the random matrix on the set of irreducible matrices is usually not known. However, in many cases, it is reasonable to assume that V = V(k) can take on any value on XK . This gives rise to the following definition XK,Γ (Ω) := {Γ(ω)V, ω ∈ Ω : V ∈ XK } ⊂ XK (Ω) .
(1.57)
Note that X ∈ LCK (Ω) with X(ω) = Γ(ω)V and any fixed V ∈ NK if and only if Γ ∈ LCK (Ω). Consequently, XK,Γ (Ω) ⊂ LCK (Ω) if and only if Γ ∈ LCK (Ω). Remark 1.38. The notation Γ ∈ LCK (Ω) means that γk : Qk → R++ is logconvex for each 1 ≤ k ≤ K. The off-diagonal entries of Γ(ω) are zero, and hence, by definition, log-convex. The notation X ∈ XK,Γ (Ω) should mean that X(ω) = Γ(ω)V for some fixed γk : Qk → R++ , 1 ≤ k ≤ K, and V ∈ XK . 1.3.2 Sufficient Conditions In this section, we provide a sufficient condition for the Perron root λp (ω) to be log-convex on Ω. Subsequently, we consider the issue of strict log-convexity. The following result shows that if X ∈ LCK (Ω), then λp (ω) is log-convex on Ω, and therefore also convex. In particular, this implies that if Γ ∈ LCK (Ω), then the Perron root of X(ω) = Γ(ω)V is log-convex on Ω for all V ∈ XK . The converse problem is considered in the next section. Theorem 1.39. If X ∈ LCK (Ω) ⊂ XK (Ω), then ˆ 1−μ λp (ω) ˇ μ λp (ω(μ)) ≤ λp (ω)
(1.58)
ˆ ω ˇ ∈ Ω. for all μ ∈ (0, 1) and all ω, Proof. Let μ ∈ (0, 1) be arbitrary and fixed. As X ∈ XK (Ω), it follows that ˆ ∈ Ω and ω ˇ ∈ Ω. Thus, applying X(ω(μ)) ∈ XK , regardless of the choice of ω Theorem 1.2 to λp (ω(μ)) yields xk,l (ω(μ)) log λp (ω(μ)) = sup uk ak,l log ak,l A∈SK k,l∈K
1.3 Convexity of the Perron Root
29
where SK := SK (X(ω(μ))) is defined by (1.3). Since the logarithmic function is strictly increasing and X(ω) is log-convex on Ω, taking (1.54) into account on the right-hand side of the equality above gives ˆ 1−μ xk,l (ω) ˇ μ xk,l (ω) uk ak,l log log λp (ω(μ)) ≤ sup μ a1−μ A∈SK k,l ak,l k,l∈K ˆ xk,l (ω) uk ak,l log = sup (1 − μ) a k,l A∈SK k,l∈K ˇ xk,l (ω) +μ uk ak,l log . ak,l k,l∈K
Now since sup(f + g) ≤ sup f + sup g for any functions f and g, one obtains log λp (ω(μ)) ≤ (1 − μ) sup
A∈SK
+ μ sup A∈SK (a)
k,l∈K
k,l∈K
ˆ xk,l (ω) ak,l ˇ x (ω)
uk ak,l log
uk ak,l log
ak,l
ˆ xk,l (ω) ak,l ˆ A∈SK (X(ω)) k,l∈K ˇ xk,l (ω) +μ sup uk ak,l log ak,l ˇ A∈SK (X(ω))
= (1 − μ)
sup
k,l
uk ak,l log
k,l∈K
ˆ + μ log λp (ω) ˇ = (1 − μ) log λp (ω) ˆ ⊆ where (a) follows from the fact that the suprema are attained on SK (X(ω)) ˇ ⊆ SK , respectively. The last equation is an application of SK and SK (X(ω)) Corollary 1.3. The following result asserts that the Perron root λp (ω) is strictly logconvex if at least one entry of X(ω) is strictly log-convex on Ω. Theorem 1.40. Let X ∈ LCK (Ω) ⊂ XK (Ω) and suppose that at least one entry of X(ω) is strictly log-convex function. Then, ˆ 1−μ λp (ω) ˇ μ, λp (ω(μ)) < λp (ω)
ˆ ω ˇ ∈Ω ω,
(1.59)
for all μ ∈ (0, 1). Proof. Let everything be as in the above proof, and, without loss of generality, assume that ˆ 1−μ xk0 ,l0 (ω) ˇ μ, xk0 ,l0 (ω(μ)) < xk0 ,l0 (ω)
μ ∈ (0, 1)
(1.60)
for some 1 ≤ k0 , l0 ≤ K. Then, due to strict monotonicity of log(x), x > 0,
30
1 On the Perron Root of Irreducible Matrices
logλp (ω(μ)) xk,l (ω(μ)) uk ak,l log = sup ak,l A∈SK k,l∈K ˆ xk,l (ω) ≤ sup (1 − μ) uk ak,l log ak,l A∈SK
+μ
k∈Kk0 l∈Kl0
k∈Kk0 l∈Kl0
ˇ xk,l (ω) xk ,l (ω(μ)) uk ak,l log + uk0 ak0 ,l0 log 0 0 ak,l ak0 ,l0
(a)
< sup A∈SK
(1 − μ)
k,l∈K
uk ak,l log
ˆ ˇ xk,l (ω) xk,l (ω) +μ uk ak,l log ak,l ak,l k,l∈K
for all μ ∈ (0, 1) where (a) follows from (1.60). Now proceeding essentially as above completes the proof. Note that the condition of Theorem 1.40 is never satisfied when X ∈ XK (Ω) is of the form X(ω) = Γ(ω)V since then the value of xk,l (ω) is independent of ωj , j = k. Therefore, xk,l (ω) cannot be strictly log-convex on Ω. In this case, instead of demanding that at least one entry of X(ω) is strictly log-convex ˆ ω ˇ ∈ Ω, there is an entry xk,l (ω(μ)) on Ω, we could require that for every ω, of X(ω(μ)) that is a strictly log-convex function of μ ∈ (0, 1). Obviously, this requirement is satisfied if γk is strictly log-convex on Qk for each 1 ≤ k ≤ K (see also Theorem 2.13). 1.3.3 Convexity of the Feasibility Set Definition 1.41 (Feasibility Set). For any X ∈ XK (Ω), there is an associated set F ⊂ Ω, called the feasibility set, given by (1.61) F := ω ∈ Ω : λp (ω) ≤ 1 . If there is no ω ∈ Ω such that λp (ω) ≤ 1, then F is an empty set. In all that follows, it is assumed that F = ∅, which excludes the trivial case of F being an empty set. The importance of the feasibility set for wireless networks will become obvious later in the second part of the book. The reader is also referred to Chap. 2 where we will introduce the notion of a feasibility set under some additional constraints. Our main concern is the question whether or not the feasibility set is a convex set. As the Perron root is a continuous map from Ω into the set of reals, a sufficient condition for convexity of F immediately follows from Theorem 1.47 if one considers the fact that the geometric mean is bounded above by the arithmetic mean (see App. B.3). Indeed, for all μ ∈ (0, 1), we have
1.3 Convexity of the Perron Root
ˆ 1−μ λp (ω) ˇ μ ≤ (1 − μ)λp (ω) ˆ + μλp (ω) ˇ λp (ω) ˆ λp (ω)}, ˇ ˆ ω ˇ ∈ Ω. ≤ max{λp (ω), ω,
31
(1.62)
ˆ λp (ω)} ˇ ≤ Thus, by Theorem 1.39, if X(ω) is log-convex on Ω and max{λp (ω), 1, then λp (ω(μ)) ≤ 1 for all μ ∈ (0, 1). In other words, if X(ω) is log-convex on Ω, then ˆ ω ˇ ∈F ω(μ) ∈ F, ω, for all μ ∈ (0, 1). This is summarized in a corollary. Corollary 1.42. If X ∈ LCK (Ω) ⊂ XK (Ω), then F ⊂ Ω is a convex set. It is worth pointing out that the converse does not hold in general. To see this, consider the following simple example. Example 1.43. Let Ω = R2++ , and let a ω1 b ω 1 X(ω) = , b ω 2 a ω2
a ≥ 0, b > 0.
Clearly, X(ω) is irreducible for every ω ∈ R2++ but not log-convex on R2++ . The Perron root can be easily calculated to yield 1/2 1 2 λp (ω) = . a (ω1 + ω2 ) + a2 (ω1 − ω2 ) + 4 b2 ω1 ω2 2 Thus, the set of all ω ∈ R2++ satisfying λp (ω) = 1 is given by ω2 = f (ω1 ) :=
1 − a ω1 , (b2 − a2 )ω1 + a
ω1 ∈ (0, 1/a) .
Finally, the second derivative of f (x), x ∈ (0, 1/a), yields f (x) =
2 b2 (b − a)(a + b) . [(b2 − a2 )x + a]3
It is easy to see that the denominator is positive for all x ∈ (0, 1/a). On the other hand, the sign of the numerator is equal to the sign of b − a. Hence, the feasibility set is not convex when b > a (f (x) convex on (0, 1/a)) and becomes convex when a > b (f (x) concave on (0, 1/a)). If a = b, f (x) = 1/a − x, x ∈ (0, 1/a), is an open interval (line segment). Sometimes it is desired to know whether F is a strictly convex set in the following sense (we refer for instance to the discussion in Sect. 5.4). Definition 1.44 (Strictly Convex Feasibility Set). F is said to be strictly ˆ + μω ˇ is interior to F (relative to Ω) convex (or s-convex) if ω(μ) = (1 − μ)ω ˆ ω ˇ ∈ ∂F where for all μ ∈ (0, 1) and ω, ∂F = {ω ∈ Ω : λp (ω) = 1} .
(1.63)
32
1 On the Perron Root of Irreducible Matrices
Remark 1.45. For convenience, in what follows, “the boundary of F” always refers to ∂F, even if F has boundary points other than those in ∂F. According to this convention, F is strictly convex if any boundary point of F cannot be written as a convex combination of two other points of F. We point out that this definition is not established in convex analysis and is only introduced for the purposes of this book. Although convexity is sufficient for most applications, strict convexity provides additional information about the feasibility region. In particular, ˜ ∈ F such that if F is strictly convex, then, by definition, there exists ω ˜ for any ω, ˆ ω ˇ ∈ F. The following corollary is immediate. λp (ω(μ)) < λp (ω) Corollary 1.46. Let X ∈ LCK (Ω) ⊂ XK (Ω) with at least one entry being a strictly log-convex function on Ω. Then, F is strictly convex. As in case of Theorem 1.40, the condition of the corollary is never met when X ∈ XK,Γ (Ω). However, the set is strictly convex if γk is strictly log-convex on Qk for each 1 ≤ k ≤ K. 1.3.4 Necessary Conditions Having proved that X ∈ LCK (Ω) is sufficient for λp (ω) to be both log-convex and convex on Ω, now we turn our attention to a converse problem. More precisely, we are asking whether X ∈ LCK (Ω) is necessary for λp (ω) to be convex on Ω, regardless of the choice of X ∈ XK (Ω). In doing so, however, we restrict X to be a member of XK,Γ (Ω) defined by (1.57). This is equivalent to saying that X(ω) = Γ(ω)V for some V ∈ XK (see the remark below (1.57)). As a consequence, the problem reduces to finding V ∈ XK such that convexity of λp (ω) implies Γ ∈ LCK (Ω). It is somewhat surprising that if λp (ω) is required to be convex for all X ∈ XK,Γ (Ω) and all K > 1, then Γ(ω) must be log-convex on Ω. Theorem 1.47. Let γk : Qk → R++ be twice continuously differentiable. Suppose that λp (ω) is convex for all X ∈ XK,Γ (Ω) ⊂ XK (Ω) and all K > 1. Then, Γ ∈ LCK (Ω). Proof. Let Γ(ω) with γk : Qk → R++ be arbitrary, and let ωk ∈ Qk for k = 2, . . . , K be fixed. We choose V ∈ XK to be ⎛ ⎞ 0 0 ··· 0 1 ⎜ 1 0 0⎟ ⎜ γ2 (ω2 ) 0 · · · ⎟ ⎜ 0 ⎟ 1 · · · 0 0 ⎜ ⎟ γ3 (ω3 ) V=⎜ .. .. .. ⎟ .. ⎜ .. ⎟ . ⎝ . . . .⎠ 1 0 0 0 · · · γK (ω K) so that X ∈ XK,Γ (Ω) takes the form
1.3 Convexity of the Perron Root
⎛ 0 ⎜1 ⎜ ⎜ X(ω) = ⎜0 ⎜ .. ⎝.
0 ··· 0 ··· 1 ··· .. . . . .
0 0 0 .. .
0 0 ··· 1
⎞ γ(ω1 ) 0 ⎟ ⎟ 0 ⎟ ⎟ .. ⎟ . ⎠ 0
33
(1.64)
where γ ≡ γ1 : Q1 → R++ is twice continuously differentiable. We see that the Perron root of (1.64) is equal to 1/K f (ω1 ) = γ(ω1 ) .
(1.65)
So its second derivative yields 1 − 1) γ (ω1 ) + γ(ω1 ) γ (ω1 ) (K 2
f (ω1 ) =
K γ(ω1 )
1 2− K
1 which is nonnegative for all ω1 ∈ Q1 if and only if 0 ≤ ( K − 1)γ (ω1 ) + γ(ω1 ) γ (ω1 ) for all ω1 ∈ Q1 . This, in turn, is true for all K > 1 if and only if γ (x)2 ≤ γ(x) γ (x) for all x ∈ Q1 or, equivalently, if and only if γ is a log-convex function. Thus, Γ ∈ LCK (Ω) must hold. 2
Remark 1.48. Since log γ(x)1/K = 1/K log γ(x), x ∈ Q, we see that the Perron root (1.65) of (1.64) is log-convex if and only if γ(x)γ (x)−(γ (x))2 ≥ 0, x ∈ Q. Therefore, given any fixed K > 1, the Perron root λp (ω) is log-convex for all X ∈ XK,Γ (Ω) ⊂ XK (Ω) if and only if Γ ∈ LCK (Ω). The above theorem asserts that Γ ∈ LCK (Ω) must hold if λp (ω) is required to be convex for all X ∈ XK,Γ (Ω) and all K > 1. Thus, the theorem does not say anything when X(ω) = Γ(ω)V is either fixed or confined to belong to some subset of XK,Γ (Ω). In fact, it is shown in Sect. 1.4 that a less stringent property of Γ(ω) is sufficient for the Perron root to be convex on Ω if V ∈ XK is limited to satisfy either V = VT (symmetry) or ∀x∈RK xT Vx ≥ 0 (positive semidefinitness). It is also important to notice that Theorem 1.47 holds even if each function in X(ω) is positive for all ω ∈ Ω, i.e., if X ∈ XK,Γ (Ω) ⊂ MK (Ω). To see this, define a positive matrix X( ) (ω), ω ∈ Ω, as follows X( ) (ω) = X(ω) + 11T ,
>0
where X(ω) is given by (1.64). Furthermore, let λp (, ω) = ρ(X( ) (ω)) and suppose that γ : R → R++ in (1.64) is chosen such that the Perron root is convex for any positive matrix. Thus, in a special case of λp (, ω) with > 0, we have
34
1 On the Perron Root of Irreducible Matrices
ˆ + μλp (, ω), ˇ λp (, ω(μ)) ≤ (1 − μ)λp (, ω)
ˆ ω ˇ ∈Ω ω,
for all μ ∈ (0, 1) and > 0. So, by continuity of λp (, ω) with respect to > 0 (Theorem A.8), one obtains λp (ω(μ)) = lim λp (, ω(μ)) →0
ˆ + μ lim λp (, ω) ˇ = (1 − μ)λp (ω) ˆ + μλp (ω) ˇ ≤ (1 − μ) lim λp (, ω) →0
→0
for all μ ∈ (0, ) where λp (ω) is given by (1.65). Consequently, since λp (, ω) is convex for all > 0 (by assumption), so also is λp (ω). However, by the proof of Theorem 1.47, λp (ω) given by (1.65) is convex for all K > 1 if and only if γ is log-convex. We summarize this in an observation. Observation 1.49. If λp (ω) is convex on Ω for all X ∈ XK,Γ (Ω) ⊂ MK (Ω) and all K > 1, then Γ ∈ LCK (Ω). A remarkable fact about these results is that although λp (ω) is required to be convex, we arrive at log-convexity of Γ(ω), which is significantly stronger than convexity. Combining Theorem 1.39 and Theorem 1.47 shows that the following statements are equivalent (if γ is twice continuously differentiable): (i) λp (ω) is convex for all K > 1 and all X ∈ XK,Γ (Ω). (ii) λp (ω) is log-convex for all K > 1 and all X ∈ XK,Γ (Ω).
1.4 Special Classes of Matrices We continue the analysis with X(ω) of the form X(ω) = Γ(ω)V for some V ∈ XK . For simplicity, it is assumed that γ(x) := γ1 (x) = · · · = γK (x),
x∈Q
where γ : Q → R++ is a twice continuously differentiable and bijective function. Hence, throughout this section, Ω = QK . It is emphasized, however, that this assumption does not impact the generality of the analysis. As before, we use F and λp (ω) to denote the feasibility set and the Perron root of Γ(ω)V for some ω ∈ Ω, respectively. Obviously, if γ : Q → R++ is log-convex, then X ∈ LCK (Ω). Consequently, by Theorem 1.39, if γ(x) is log-convex, the Perron root λp (ω) is a log-convex function of the parameter vector ω. By Sect. 1.47, it can be inferred that log-convexity of γ is necessary when the Perron root is required to be convex on Ω = QK for all V ∈ XK and all K > 1. In this section, we put some restrictions on V ∈ XK . In particular, it is shown that the log-convexity requirement can be relaxed to a less stringent requirement when the matrix V ∈ XK is confined to be either symmetric or positive semidefinite.
1.4 Special Classes of Matrices
35
1.4.1 Symmetric Matrices In this section, we assume that V is symmetric (see App. A.3.2 for a definition). The following theorem provides a necessary and sufficient condition for the Perron root to be convex on Ω = QK for all K > 1 and all X ∈ XsK,Γ (Ω) where XsK,Γ (Ω) := {Γ(ω)V, ω ∈ Ω : V ∈ XK , V = VT } . Theorem 1.50. Let fγ : Q2 → R++ be given by fγ (x, y) = γ(x)γ(y). Then, the Perron root λp (ω) is convex on Ω = QK for all X ∈ XsK,Γ (Ω) and all K > 1 if and only if fγ is convex on Q2 . Proof. The necessity is easily verified by considering K = 2 and X(ω) = Γ(ω)V with V = 0 0 ∈ XK . In this case, we have λp (ω) = γ(ω1 )γ(ω2 ) from which the necessary condition immediately follows. To prove the converse, let X ∈ XsK,Γ (Ω) be arbitrary and note that due to the symmetry, λp (ω(μ)) = ρ(W(μ)),
ˆ ω ˇ ∈ QK , μ ∈ [0, 1] ω,
1 1 where W(μ) = wk,l (μ) := Γ 2 (ω(μ))VΓ 2 (ω(μ)). The entries of W(μ) are wk,l (μ) = γ(ωk (μ))vk,l γ(ωl (μ)) = fγ ωk (μ), ωl (μ) vk,l . Hence, by the convexity of fγ , wk,l (μ) ≤ (1 − μ)fγ (ˆ ωk , ω ˆ l )vk,l + μfγ (ˇ ωk , ω ˇ l )vk,l for all μ ∈ (0, 1). Now due the monotonicity of the Perron root (Theorem A.25), one obtains 1 1 1 1 2 (ω) 2 (ω) ˆ ˆ + μΓ 2 (ω)VΓ ˇ ˇ λp (ω(μ)) ≤ ρ (1 − μ)Γ 2 (ω)VΓ 1
1
1
1
2 (ω)) 2 (ω)) ˆ ˆ + μρ(Γ 2 (ω)VΓ ˇ ˇ ≤ (1 − μ)ρ(Γ 2 (ω)VΓ ˆ ˇ = (1 − μ)λp (ω) + μλp (ω)
where the second inequality follows from the fact that the spectral radius is convex on the set of symmetric matrices (Theorem A.20).
36
1 On the Perron Root of Irreducible Matrices
1.4.2 Symmetric Positive Semidefinite Matrices Now let us assume that V is confined to be a symmetric positive semidefinite matrix (Definition A.21). Hence, X ∈ XpK,Γ (Ω) where Ω = QK and XpK,Γ (Ω) := Γ(ω)V, ω ∈ Ω : V ∈ XK , V = VT , ∀x∈RK xT Vx ≥ 0 . Note that V ∈ XK is positive semidefinite if, roughly speaking, its diagonal entries are large enough when compared with the off-diagonal entries. It turns out that λp (ω) is convex on Ω if X ∈ XpK,Γ (Ω) and γ : Q → R++ is a convex function. Theorem 1.51. Suppose that X ∈ XpK,Γ (Ω) such that γ : Q → R++ is any convex function.4 Then, λp (ω) is convex on Ω = QK . Proof. Let X(ω) = Γ(ω)V ∈ XpK,Γ (Ω) be arbitrary. Thus, as V ∈ XK is 1
symmetric positive semidefinite, we can write V = AAT with A = UΛ 2 where U is orthogonal (Definition A.17) and Λ is a real diagonal matrix of the eigenvalues of V. Furthermore, 1
1
1
1
λp (ω) = λmax (Γ(ω) 2 VΓ(ω) 2 ) = λmax (Γ(ω) 2 AAT Γ(ω) 2 ) 1
(1.66)
1
= λmax (AT Γ(ω) 2 Γ(ω) 2 A) 1
1
1
1
where the largest eigenvalue λmax (Γ(ω) 2 VΓ(ω) 2 ) of Γ(ω) 2 VΓ(ω) 2 is equal 1 to the induced squared matrix 2-norm of Γ(ω) 2 A (see the definition of induced matrix norms in App. A.2). Therefore, 1
1
1
λp (ω) = max vT AT Γ(ω) 2 Γ(ω) 2 Av = max Γ(ω) 2 Av22 . v2 =1
v2 =1
1
1
The kth element of Γ(ω) 2 Av is equal to (Γ(ω) 2 Av)k = 1
Γ(ω) 2 Av22 =
√
γk l ak,l vl . So
2 γ(ωk ) ak,l vl .
k∈K
l∈K
Now considering both the convexity of γ and the fact that all sum terms in the above equation are positive yields 1
λp (ω(μ)) = max Γ(ω(μ)) 2 Av22 v2 =1 1 1 ˆ 2 Av22 +μΓ(ω) ˇ 2 Av22 ≤ max (1 − μ)Γ(ω) v2 =1
1
1
ˆ 2 Av22 + μ max Γ(ω) ˇ 2 Av22 ≤ (1 − μ) max Γ(ω) v2 =1
ˆ + μλp (ω), ˇ = (1 − μ)λp (ω) 4
v2 =1
ˆ ω ˇ ∈ QK ω,
Note that any convex function defined on an open set is continuous.
1.5 The Perron Root under the Linear Mapping
37
for all μ ∈ (0, 1), where the following identities 1
ˆ = max Γ(ω) ˆ 2 Av22 λp (ω) v2 =1
1
ˇ = max Γ(ω) ˇ 2 Av22 λp (ω) v2 =1
were used in the last step. An immediate consequence of the theorem is the following. Corollary 1.52. If X ∈ XpK,Γ (Ω) and γ : Q → R++ is any convex function, then F is a convex set. Interestingly, the feasibility set can be written as the intersection of certain (in general) nonconvex sets. To see this, note that, for any X ∈ XpK,Γ (Ω) with Ω = QK , one has 1 1 F = {ω ∈ QK : λmax Γ 2 (ω)VΓ 2 (ω) ≤ 1}. 1 1 Thus, since λmax Γ 2 (ω)VΓ 2 (ω) ≤ 1 if and only if λmin (Γ−1 (ω) − V) ≥ 0 or, equivalently, if and only if (1.67) 0 ≤ zT Γ−1 (ω) − V z for all z ∈ RK , we can write F = z∈RK M(z) where M(z) := {ω ∈ QK : zT Γ−1 (ω) − V z ≥ 0}. Hence, given any symmetric positive semidefinite matrix V ∈ XK , the feasibility set (associated with Γ(ω)V) is equal to the intersection of the sets M(z) with respect to all z ∈ RK . It is interesting to see that although M(z) is not convex in general, the intersection of these sets is a convex set, provided that γ is a convex function. This is illustrated in Fig. 1.1 for γ(x) = x, x > 0, in which case the complement of M(z) in QK denoted by Mc (z) is a convex set. This immediately follows from (1.67) whose right-hand side can be written as zT Γ−1 (ω)z − zT Vz =
|zk |2 − zT Vz. γ(ωk )
k∈K
Clearly, this function is convex if γ(x) = x, x > 0. Thus Mc (z) as the sublevel set of this function with respect to the zero value must be a convex set for any fixed z ∈ RK . The linear case γ(x) = x, x > 0, is of great practical interest, and hence is separately considered in the next section.
1.5 The Perron Root under the Linear Mapping In this section, we further proceed with matrix-valued functions X of the form X(ω) = Γ(ω)V where Γ(ω) = diag(γ(ω1 ), . . . , γ(ωK )). However, in contrast
38
1 On the Perron Root of Irreducible Matrices
ω2
Mc(z) zT (Γ−1 (ω) − V)z = 0
F ω1 Fig. 1.1: The feasibility set F for some X ∈ XpK,Γ (Ω) with γ(x) = x, x > 0, K = 2 and Ω = Q2 .
to the previous analysis, it is assumed that trace(V) = 0. Formally, this is written as X ∈ X0K,Γ (Ω) := {Γ(ω)V, ω ∈ Ω : V ∈ XK , trace(V) = 0} . So, in particular, note that V cannot be positive semidefinite. Under this assumption, we consider the important special case of the linear function: γ(x) = x, x > 0. Therefore, in this section, Ω = QK = RK ++ and Γ(ω) = diag(ω1 , . . . , ωK ) . Obviously, the linear function is not log-convex so that Theorem 1.39 does not apply in this case. In fact, if K = 2 and V = 0 0 , the Perron root of X(ω) = √ diag(ω)V can be easily found to be λp (ω) = ω1 ω2 . Consequently, instead of being convex, λp (ω) turns out to be concave on R2++ for all X ∈ X02,Γ (Ω). This observation might lead one to think that λp (ω) is concave in general, that is for all X ∈ X0K,Γ (Ω) and all K > 1. Further results that support this conjecture have been proved in Sect. 1.2.1 where it is shown that the Perron root is concave on some subsets of irreducible matrices. Observe that if λp (ω) was concave on RK ++ , then not F but its complement in RK ++ Fc = QK \ F = RK (1.68) ++ \ F would be a convex set. If true, this result would have an interesting consequence for optimal link scheduling policies for wireless networks (see Sect. 5.4.3). In Sect. 1.5.2, however, we will disprove the conjecture by showing that if γ(x) = x, x > 0, there exists K > 3 and X ∈ XK,Γ (Ω) such that λp (ω) is not concave. First, though, we will prove two conditions on the feasibility of
1.5 The Perron Root under the Linear Mapping
39
the parameter vector ω ∈ RK ++ . These results provide insight into the mutual dependence between distinct entries of ω ∈ F. 1.5.1 Some Bounds We exploit the bound in (1.10) to prove a subset and a superset of F. The following theorem provides a sufficient condition for ω to be a member of F. Theorem 1.53. If f (ω) =
k∈K
ωk ρ(V) ≤1 1 + ωk ρ(V)
(1.69)
then ω ∈ F. Proof. Let Y(ω) = (yk,l ) be given by yk,l = ωk (1−δk−l ), where δl denotes the Kronecker delta. Note that Y(ω) is irreducible for all ω ∈ RK ++ . Let p ∈ ΠK be the right Perron eigenvector of Y(ω). From p1 = 1 and pl = ω k pl − pk = ωk (1 − pk ) ρ(Y(ω))pk = Y(ω)p k = ωk l∈Kk
we have pk =
ωk ρ(Y(ω))+ωk , 1
≤ k ≤ K. Hence,
pk =
k∈K
l∈K
k∈K
ωk = 1. ρ(Y(ω)) + ωk
(1.70)
So, by Corollary 1.5 and Γ(ω)V = diag(ω)V = Y(ω) ◦ V, λp (ω) = ρ(D(ω)V) = ρ(Y(ω) ◦ V) ≤ ρ Y(ω) ρ(V). Combining this with (1.70) yields 1=
k∈K
ωk ≤ ρ(Y(ω)) + ωk
ωk
λp (ω) k∈K ρ(V)
+ ωk
=
k∈K
ωk ρ(V) . λp (ω) + ωk ρ(V)
Thus, if (1.69) holds, then k∈K
ωk ρ(V) ωk ρ(V) ≤1≤ 1 + ωk ρ(V) λp (ω) + ωk ρ(V) k∈K
or, equivalently, λp (ω) ≤ 1. The function f (ω) in (1.69) defines a set Fin ⊆ F given by Fin := {ω ∈ RK ++ : f (ω) ≤ 1} . It may be easily verified that for any V ∈ XK , f (ω) is strictly concave. Consequently, Fcin = RK ++ \ Fin is a convex set. Now we use (1.10) to prove a necessary condition for the feasibility of ω ∈ QK .
40
1 On the Perron Root of Irreducible Matrices
Theorem 1.54. If ω ∈ F, then 1 ≤ g(ω) =
k∈K
1 . 1 + ρ(V)ωk
(1.71)
Proof. Let Y(ω) be as in the preceding proof, and let 1/ω be defined as 1/ω = (1/ω1 , . . . , 1/ωK ) > 0. Since Γ−1 (ω)V = (diag(ω))−1 V = V◦Y(1/ω), ρ(V) = ρ diag(ω) V ◦ Y(1/ω) . Thus, by Corollary 1.5 and ρ(diag(ω)V) = λp (ω), log ρ(V) ≤ log ρ(Y(1/ω))+ log λp (ω). Since ω ∈ F or, equivalently, λp (ω) ≤ 1, this implies that ρ(V) ≤ ρ(Y(1/ω)) . Now note that (Y(1/ω))k,l = in the foregoing proof yields
1 ωk (1
pˆk =
(1.72)
− δk−l ). Hence, proceeding essentially as
1 1 + ωk ρ(Y(1/ω))
ˆ ∈ ΠK is the right Perron eigenvector of Y(1/ω). Thus, where p 1=
k∈K
pˆk =
k∈K
1 1 + ωk ρ(Y(1/ω))
and the theorem follows from (1.72). The function g(ω) in (1.71) defines a superset Fout of F given by Fout := {ω ∈ RK ++ : 1 ≤ g(ω)}. For any fixed V ∈ XK , the function g(ω) in (1.71) is strictly convex so that Fcout = RK ++ \ Fout is a convex set. Summarizing, we can state that Fin ⊆ F ⊆ Fout and Fcout ⊆ Fc ⊆ Fcin where both Fcin and Fcout are convex sets. Thus, Fc is embedded into two convex sets. Furthermore, if K = 2, the implicit function g(ω) = 1 is given by √ ρ(V) = 1/ ω1 ω2 . Consequently, in the two-dimensional case, Fc = Fcout is a convex set. Finally we use Theorem 1.2 as a starting point to prove a necessary conˆ ∈ SK (V) be a stochastic matrix so that dition on ω ∈ F. To this end, let A vk,l u ˆk a ˆk,l log log ρ(V) = (1.73) a ˆk,l k,l∈K
ˆ First we ˆ = (ˆ where u u1 , . . . , u ˆK ) ∈ ΠK is the left Perron eigenvector of A. consider the following lemma.
1.5 The Perron Root under the Linear Mapping
41
ˆ and u ˆ be as above. Then, Lemma 1.55. Suppose that V is irreducible. Let A we have u ˆk = yk · xk , 1 ≤ k ≤ K, where y and x with yT x = 1 are positive left and right eigenvectors of V, respectively. ˆ is Proof. By Theorem 1.2, A ˆ k,l = a (A) ˆk,l =
vk,l xl . ρ(V) xk
ˆTu ˆ=u ˆ and x > 0. Combining this with the above By definition, we have A equation yields u ˆk =
1 vl,k xk u ˆl , ρ(V) xl
a ˆl,k u ˆl =
l∈K
1≤k≤K
l∈K
or, equivalently, ρ(V) ·
u ˆl u ˆk = vl,k , xk xl
1 ≤ k ≤ K.
k∈K
Hence, the left eigenvectors are of the form yk = have yT x = 1.
u ˆk xk .
Since
k
u ˆk = 1, we
Now we are in a position to prove the announced necessary condition. To keep the result as general as possible, we allow γ : Q → RK ++ to be any continuous function. Theorem 1.56. Let V ∈ XK . If ω ∈ F, then we must have
K
x y γ(ωk ) k k ≤
k=1
1 ρ(V)
(1.74)
where, as in Lemma 1.55, y and x are positive left and right eigenvectors of V, respectively. ˆ and u ˆ into (1.4) ˆ be defined by (1.73). Substituting u ˆ and A Proof. Let A yields vk,l log λp (ω) = log ρ(Γ(ω)V) ≥ u ˆk a ˆk,l log + u ˆk a ˆk,l log γ(ωk ) . a ˆk,l k,l∈K
k,l∈K
By (1.73), the first term on the right-hand side is equal to log ρ(V) so that log λp (ω) ≥ log ρ(V) + u ˆk a ˆk,l log γ(ωk ) = log ρ(V) + u ˆk log γ(ωk ) k,l∈K
k∈K
K
= log ρ(V) + log
(γ(ωk ))uˆk k=1
42
1 On the Perron Root of Irreducible Matrices
ˆ is stochastic. Hence, by Lemma 1.55, where we used the fact that A K
(γ(ωk ))xk yk ≤ k=1
λp (ω) . ρ(V)
But, if ω ∈ F, then λp (ω) ≤ 1, and the theorem follows. !K Obviously, if γ(x) = x, x > 0, the bound reduces to k=1 (ωk )xk yk ≤
1 ρ(V) .
1.5.2 Disproof of the Conjecture Now we disprove the conjecture stated at the beginning of this section. More precisely, it is shown that there exists X ∈ X0K,Γ (Ω) with Ω = RK ++ , K > 1, ˆ + μω ˇ ∈ and γ(x) = x, x > 0, such that ω(μ) = (1 − μ)ω / Fc for some μ ∈ (0, 1) ˆ = ω. ˇ ˆ ω ˇ ∈ Fc with ω and ω, First suppose that the conjecture is true, that is, Fcγ is a convex set. This is equivalent to saying that (1.75) λp (ω(μ)) ≥ 1 ˆ ω ˇ ∈ RK for all μ ∈ (0, 1) and all ω, ++ with ˆ = λp (ω) ˇ = 1. λp (ω)
(1.76)
ˆ and ω ˇ lie on the boundary ∂F of F (Definition 1.44 and the In words, if both ω remark below), then the entire straight line connecting them must be either outside of the feasibility set or must entirely lie on ∂F. First we provide a necessary and sufficient condition for (1.75) with (1.76) to be satisfied. Note that in the following lemma, X is not necessarily a member of X0K,Γ (Ω). Lemma 1.57. Let γ(x) = x, x > 0, and let X ∈ XK,Γ (Ω) be arbitrary. Then, we have (1.75) with (1.76) if and only if λp (ω) is concave on Ω = RK ++ , i.e., if and only if ˆ + μλp (ω) ˇ λp (ω(μ)) ≥ (1 − μ)λp (ω) ˆ ω ˇ ∈ RK for all μ ∈ (0, 1) and ω, ++ . Proof. If the spectral radius is concave, then (1.75) with (1.76) immediately ˆ ω ˇ ∈ RK follows. So, we only need to prove the converse. To this end, let ω, ++ K be arbitrary, and let ˆs, ˇs ∈ R++ be defined as diag(ˆs)V =
1 ˆ diag(ω)V ˆ λp (ω)
diag(ˇs)V =
1 ˇ diag(ω)V . ˇ λp (ω)
Consequently, both λp (ˆs) = ρ(diag(ˆs)V) and λp (ˇs) = ρ(diag(ˇs)V) satisfy (1.76). Let a, b > 0 be chosen such that 0 < 1 − μ = a/(a + b) < 1 and 0 < μ = b/(a + b) < 1 for μ ∈ (0, 1). Now substituting s(μ) = (1 − μ)ˆs + μˇs with this choice of μ into the left-hand side of (1.75) yields
1.5 The Perron Root under the Linear Mapping
λp (aˆs + bˇs) ≥ a + b
43
(1.77)
where we used the fact that diag(aˆs + bˇs)V = a diag(ˆs)V + b diag(ˇs)V and λp (cx) = cλp (x) for any c > 0 and x ∈ RK ++ . Notice that (1.77) holds for all ˆ and b = ˜bλp (ω). ˇ Combining this a, b > 0. Now define a ˜, ˜b > 0 as a = a ˜λp (ω) with (1.77) and λp (ω) = ρ(diag(ω)V) gives ˆ s + ˜bλp (ω)ˇ ˇ s =ρ a ˆ ˇ ˜λp (ω)ˆ ˜λp (ω)diag(ˆ s)V + ˜bλp (ω)diag(ˇ s)V λp a ˆ ˇ =ρ a ˜ diag(ω)V + ˜b diag(ω)V ˆ + ˜b ω ˇ ≥a ˆ + ˜bλp ω) ˇ ˜ω ˜λp ω) = λp a for all a ˜, ˜b > 0. In particular, this must hold for a ˜ = 1 − α and ˜b = α with α ∈ (0, 1), in which case concavity of the spectral radius follows. It is worth pointing out that if ˆ λp (ω)} ˇ λp (ω(μ)) ≥ min{λp (ω), ˆ ω ˇ ∈ RK for all μ ∈ (0, 1) and all ω, ++ , then we have (1.75) with (1.76). Thus, by Lemma 1.57 and the fact that every concave function is quasiconcave (for the definition of quasiconcave functions, the reader is referred to [16]), we obtain the following corollary. Corollary 1.58. Let γ(x) = x, x > 0. Then, the following statements are equivalent. (a) Fc = RK ++ \ F is a convex set for all X ∈ XK,Γ (Ω) and all K > 1. (b) The Perron root is concave on RK ++ . (c) The Perron root is quasiconcave on RK ++ . Now we prove that if γ(x) = x, x > 0, then the Perron root is not concave in general, thereby disproving the conjecture. Lemma 1.59. Let γ(x) = x, x > 0. Then, there exist K > 1, X ∈ X0K,Γ (Ω) ˆ ω ˇ ∈ Ω = RK and ω, ++ such that ˆ + μλp (ω) ˇ λp (ω(μ)) < (1 − μ)λp (ω)
(1.78)
for some μ ∈ (0, 1).
Proof. For an arbitrary c > 0, let Vc = 0c 0c ≥ 0. Furthermore, define Vc 0 V= ˜ ∈ R4×4 + V Vc ˜ ∈ R2×2 is an arbitrary nonnegative matrix and 0 ∈ R2×2 denotes where V + + ˆ = (2, 2, 1, 1) and the zero matrix. Note that trace(V) = 0. Furthermore let ω ˇ = (1, 1, 2, 2) from which we obtain ω
44
1 On the Perron Root of Irreducible Matrices
ˆ = ρ(diag(ω)V) ˆ λp (ω) = max{2ρ(Vc ), ρ(Vc )} = 2 c ˇ = ρ(diag(ω)V) ˇ λp (ω) = max{ρ(Vc ), 2ρ(Vc )} = 2 c . Thus, λp (ω(μ)) = ρ(diag(ω(μ))V) yields λp (ω(μ)) = max (2(1 − μ) + μ)ρ(Vc ), ((1 − μ) + 2μ)ρ(Vc ) = c max{2 − μ, 1 + μ} ˆ + μλp (ω), ˇ < 2 c = (1 − μ)λp (ω) μ ∈ (0, 1) .
(1.79)
Note that since V defined above is reducible, so also is diag(ω)V. Thus, it remains to show that there exists X ∈ X04,Γ (R4++ ) for which (1.78) holds. To this end, suppose that Δ ∈ R4×4 is given by (Δ)k,l = 1 − δk−l , where δi is + the Kronecker delta. Let X( ) (ω) = diag(ω)V( ) where V( ) = V + Δ, > 0. Obviously, X( ) (ω) with trace(X( ) (ω)) = 0 is irreducible for all ω > 0 and > 0. Let λp (, ω) be the Perron root of X( ) (ω) and note that lim λp (, ω(μ)) = λp (ω(μ))
→0
for any fixed μ ∈ (0, 1). Thus, since the Perron root is continuous in > 0, it follows from (1.79) that there must exist > 0 (and hence an irreducible matrix V( ) ) and μ ∈ (0, 1) such that ˆ + μλp (, ω) ˇ . λp (, ω(μ)) < (1 − μ)λp (, ω) This completes the proof. Note that the construction of a counterexample in the proof of Lemma 1.59 requires two traceless irreducible matrices of order at least 2. Thus the proof does not work for K < 4. Also note that since ρ(X) = limn→+∞ Xn 1/n for any matrix norm (Theorem A.15) and (X + Y)n ≥ Xn + Yn with n ≥ 1 for any X, Y ∈ NK , we actually have ρ(X + Y) ≥ max{ρ(X), ρ(Y)}. In a special case when X = (1 − μ)A and Y = μB for some A, B ∈ XK , one obtains ρ((1 − μ)A + μB) ≥ max{(1 − μ)ρ(A), μρ(B)} . The counterexample in the proof is constructed in such a way that the lower bound is attained for all μ ∈ [0, 1]. We complete this section by summarizing Lemma 1.57 and Lemma 1.59 in a theorem. Theorem 1.60. Suppose that γ(x) = x, x > 0. Then, Fc is not a convex set ˆ ω ˇ ∈ Fc such that in general, i.e., there exist K > 1, X ∈ X0K,Γ (Ω) and ω, ˆ + μω ˇ ∈ ω(μ) = (1 − μ)ω / Fc for some μ ∈ (0, 1).
1.6 The Perron Root under Exponential Mapping
45
1.6 The Perron Root under Exponential Mapping As in the previous section, the matrix-valued function X is of the form X(ω) = Γ(ω)V where Γ(ω) = diag(γ(ω1 ), . . . , γ(ωK )) and V is an arbitrary nonnegative irreducible matrix whose trace is not necessarily equal to zero. Thus, throughout this section, (C.1-2) X(ω) = Γ(ω)V with V ∈ XK and trace(V) ≥ 0. Furthermore, we assume that γ is the exponential function: γ(x) = ex , x ∈ R. Under these assumptions, we are going to prove a condition on V (in addition to irreducibility) so that this condition is necessary and sufficient for the feasibility set F to be strictly convex (s-convex) in the sense of Definition 1.44. The notion of strict convexity is related to the existence and uniqueness of a so-called log-SIR fair power allocation (see Sect. 5.9.2). Note that Corollary 1.46 does not apply to this specific case as the exponential function is log-convex but not strictly log-convex. In fact, if V is of the form given by (1.16), then it may be easily verified that the feasibility set F is not s-convex if γ(x) = ex , x ∈ R. 1.6.1 A Necessary and Sufficient Condition on Strict Convexity of the Feasibility Set Lemma 1.61. The feasibility set F is strictly convex if and only if, for any ˆ, p ˇ > 0, with p ˆ = c p ˇ for all c > 0, there exists at least one index k ∈ K = p {1, . . . , K} such that
1
1
vk,l (ˆ pl ) 2 (ˇ pl ) 2 <
l∈K
vk,l pˆl
12
l∈K
vk,l pˇl
12
.
(1.80)
l∈K
ˆ, p ˇ > 0 be arbitrary with p ˆ = c p ˇ for all c > 0, and, for each Proof. Let p k ∈ K, define ω ˆ k and ω ˇ k to be pˆk ˆl l∈K vk,l p
pˇk . ˇl l∈K vk,l p
eωˆ k = γˆk =
eωˇ k = γˇk =
ˆ ∈ ∂F and ω ˇ ∈ ∂F or, equivalently, By construction, we have ω ˆ = λp (ω) ˇ = 1. λp (ω)
(1.81)
First suppose that F is strictly convex. Then, by definition, we have λp (ω(μ)) < ˆ + μω, ˇ for all μ ∈ (0, 1). Our goal is to show that (1.80) 1, ω(μ) = (1 − μ)ω holds for some k ∈ K. The proof is by contradiction. Hence, assume that ∀k∈K
l∈K
1
1
vk,l (ˆ pl ) 2 (ˇ pl ) 2 =
l∈K
vk,l pˆl
12 l∈K
vk,l pˇl
12
.
46
1 On the Perron Root of Irreducible Matrices
Due to (1.81) and the fact that eωk (μ) = (eωˆ k )1−μ (eωˇ k )μ for all μ ∈ [0, 1], the above equality implies that, for each k ∈ K,
1
1 1 eωˆ k
pl ) 2 (ˇ pl ) 2 ˆl 12 eωˇ k l∈K vk,l pˇl 12 eωk ( 2 ) l∈K vk,l (ˆ l∈K vk,l p = = 1. 1 1 pˆk pˇk (ˆ pk ) 2 (ˇ pk ) 2 1
1
˜ with p˜k = (ˆ In words, p pk ) 2 (ˇ pk ) 2 , k ∈ K, is an eigenvector of Γ(ω( 12 ))V. ˜ > 0 and V ≥ 0 is irreducible, it follows from the Perron– Thus, since p Frobenius theorem (Theorem A.32 in App. A.4.1) that λp (ω( 12 )) = 1 which contradicts strict convexity of F. As a consequence, (1.80) must hold for at least one index k ∈ K. ˆ = λp (ω) ˇ = ˆ ω ˇ with ω ˆ = ω ˇ be such that λp (ω) To prove the converse, let ω, ˆ and p ˇ are positive right eigenvectors of Γ(ω)V ˆ 1. Moreover, suppose that p ˇ and Γ(ω)V, respectively. Note that due to irreducibility of V, the vectors ˆ > 0 and p ˇ > 0 are unique up to positive scaling, and thus, as the exponential p ˆ = ω ˇ implies p ˆ = c p ˇ for all c > 0. Using these function is strictly increasing, ω definitions, it follows from (1.80) that 1
∀k∈K
eωk ( 2 )
1
l∈K
1
vk,l (ˆ pl ) 2 (ˇ pl ) 2
1
1
(ˆ pk ) 2 (ˇ pk ) 2
≤1
(1.82)
˜ with p˜k = with at least one strict inequality for some k ∈ K. So, if p 1 1 pk ) 2 , k ∈ K, is an eigenvector of Γ(ω( 12 ))V, then λp (ω( 12 )) < 1 due (ˆ pk ) 2 (ˇ ˜ > 0 is not an eigenvector to strict inequality in (1.82) for some k ∈ K. If p of Γ(ω( 12 ))V, then the Collatz–Wielandt formula (Theorem A.35) and (1.82) imply that
1 1 eωk ( 2 ) (Vp)k eωk ( 2 ) (V˜ p)k λp ω(1/2) = min max < max ≤1 p>0 1≤k≤K 1≤k≤K pk p˜k
where the strict inequality follows as the right Perron eigenvector and its positive multiples are the unique minimizers. By Corollary 1.42, however, the spectral radius is a convex function of μ ∈ (0, 1) as the exponential function is log-convex. Thus, λp (ω( 12 )) < 1 implies that λp (ω(μ)) < 1 for all μ ∈ (0, 1). ˆ ω ˇ ∈ ∂Fγ have been assumed to be arbitrary, we can conclude that F Since ω, is a strictly convex set. It is important to emphasize that a crucial ingredient in the proof of the lemma is the fact that the exponential function is not strictly log-convex. Indeed, by the proof of the lemma, Condition (1.80) would cease to be necessary for the feasible log-SIR region to be strictly convex if the exponential function was strictly log-convex. An immediate consequence of the lemma is that if F is not strictly convex, ˆ > 0 and p ˇ > 0 with p ˆ = c p ˇ for all c > 0 then there must be two vectors p such that, for all k ∈ K,
1.6 The Perron Root under Exponential Mapping
1
1
vk,l (ˆ pl ) 2 (ˇ pl ) 2 =
l∈K
vk,l pˆl
12
l∈K
vk,l pˇl
12
47
.
l∈K
ˆ ∈ ∂F and ω ˇ ∈ ∂F In other words, if F is not strictly convex, there must exist ω ˆ = ω ˇ such that, if p ˆ and p ˇ are√positive eigenvectors of Γ(ω)V ˆ with ω and ˇ ˜ with p˜k = pˆk pˇk , k ∈ K, is a positive eigenvector Γ(ω)V, respectively, then p of Γ(ω( 12 ))V and λp (ω( 12 )) = 1. Before stating the main result in this section, we still need to prove another lemma. Lemma 1.62. Given k ∈ K, let L(k) := {l ∈ K : vk,l > 0} and suppose that ˆ, p ˇ such that there exist positive vectors p l∈K
1
1
vk,l (ˆ pl ) 2 (ˇ pl ) 2 =
l∈K
vk,l pˆl
12
vk,l pˇl
12
(1.83)
l∈K
for some k ∈ K. Then, pˆl = c(k) pˇl for all l ∈ L(k). Moreover, if (1.83) holds for some k1 , k2 ∈ K with (VVT )k1 ,k2 > 0, then there are c(k1 ) > 0 and c(k2 ) > 0 such that pˆl = c(k1 ) pˇl and pˆl = c(k2 ) pˇl for all l ∈ L(k1 ) and l ∈ L(k2 ), respectively. In such a case, c(k1 ) = c(k2 ) . Proof. Let k ∈ K be arbitrary. By the Cauchy–Schwarz inequality, (1.83) holds if and only if there exists c(k) > 0 such that pˆl = c(k) pˇl for all l ∈ L(k). This proves the first part of the lemma. Now let k1 , k2 ∈ K with (VVT )k1 ,k2 > 0 be arbitrary indices for which (1.83) is true. Reasoning as above shows that there exist constants c(k1 ) > 0 and c(k2 ) > 0 so that both ∈ L(k1 ), and pˆl = c(k2 ) pˇl , l ∈ L(k2 ). Now note that since pˆl = c(k1 ) pˇl , l
T (VV )k1 ,k2 = l vk1 ,l vk2 ,l , we have (VVT )k1 ,k2 > 0 if and only if there exists l ∈ K with vk1 ,l > 0 and vk2 ,l > 0. This in turn is equivalent to saying that L(k1 ) ∩ L(k2 ) = ∅. As a consequence, if (VVT )k1 ,k2 > 0, then (1.83) implies that pˆl = c(k1 ) pˇl = c(k2 ) pˇl for all l ∈ L(k1 ) ∩ L(k2 ). However, since ˆ, p ˇ > 0 and L(k1 ) ∩ L(k2 ) is not an empty set, we must have c(k1 ) = c(k2 ) . p Now we are in a position to prove our main result. For some definitions related to nonnegative matrices used in the proof, the reader is referred to App. A.4.1 (see also the subsequent section). Theorem 1.63. Let V be irreducible. The feasibility set F is strictly convex if and only if VVT is irreducible. Proof. By Lemma 1.61, F is strictly convex if and only if there are no vectors ˆ, p ˇ > 0 with p ˆ = c p ˇ for all c > 0 such that (1.83) holds for all k ∈ K. First p ˆ is we show that if VVT is irreducible and (1.83) holds for all k ∈ K, then p ˇ . So, suppose that VVT is irreducible. As a consequence, the a multiple of p directed graph associated with VVT is strongly connected. This is equivalent to saying that, for any k, l ∈ K, k = l, there is a sequence of natural numbers T {ln }N n=0 for some N ≥ 1 with l0 = l and lN = k such that (VV )lr ,lr−1 > 0
48
1 On the Perron Root of Irreducible Matrices
for each r = 1, . . . , N . Now by Lemma 1.62, we have c(l) = c(l1 ) = · · · = c(lN −1 ) = c(k) > 0 where the constants are defined in Lemma 1.62. Thus, since k and l are arbitrary and VVT is irreducible, there must be a constant c = c(l) = c(k) > 0 such that pˆl = c pˇl for all l ∈ K. From this and Lemma 1.61, we can conclude that if VVT is irreducible, then F is strictly convex. To prove the converse, assume that VVT is reducible. We show that there ˆ, p ˇ > 0 with p ˆ = c p ˇ for all c > 0 that satisfy (1.83) for all exist vectors p k ∈ K. First note that since VVT is symmetric and reducible as well as V ≥ 0 is irreducible, there exists a permutation matrix P so that (see Sect. A.4.3) ⎛ ⎞ X(1)
⎜ 0 PT VVT P = ⎜ ⎝ . ..
0
···
(2)
..
X
0
.. . 0
.
0
.. .
··· 0 ··· X(M )
⎟ ⎟ ⎠
(1.84)
where all X , . . . , X ≥ 0 are symmetric irreducible matrices and M > 1 (due to reducibility of VVT ). Thus, without loss of generality, we can assume that VVT is given by the normal form on the right-hand side of (1.84). Moreover, assume that X(m) is of dimension Km × Km and define m ˆ = (ˆ ˆ (M ) ) with p ˆ (m) ∈ RK p p(1) , . . . , p + , 1 ≤ m ≤ M, as (1)
(M )
pˆk , k∈K ˆl l∈K vk,l p
eωˆ k =
ˆ is an eigenvector of Γ(ω)V ˆ for some given ω ˆ k ∈ R. By construction, p and ˆ = 1. Now let p ˇ > 0 be given by λp (ω) ⎛
⎞ ˆ (1) c(1) p ⎜ ⎟ .. ˇ=⎝ p ⎠ . ˆ (M ) c(M ) p for some positive constants c(1) , . . . , c(M ) . Clearly, whenever all the constants ˆ = c p ˇ for all c > 0. On the other hand, by construction are not equal, we have p ˆ and p ˇ satisfy (1.83). Therefore, by Lemma 1.61, F is not strictly and (1.84), p convex. 1.6.2 Graph-theoretic Interpretation This section presents a graph theoretical interpretation of the foregoing results. More material related to graph theory including many definitions used in this section can be found for instance in [17]. By Observation A.30 in the appendix, irreducibility of V is equivalent to strong connectivity of the graph G(V), which is defined to be the directed graph of K = {1, . . . , K} nodes, in which there is a directed edge (or link) leading from node l ∈ K to k ∈ K if and only if vk,l > 0. For the clarity of
1.6 The Perron Root under Exponential Mapping
49
the presentation, it is assumed in this section that V ∈ {0, 1}K×K , meaning that the entries of V are either 1 or 0.5 In this case, the matrix V is called the adjacency matrix of the graph G(V). This graph is said to be strongly connected if for each pair of nodes (k, l), there is an uninterrupted sequence of directed edges leading from l to k. Note that the direction matters in the definition of strong connectivity for directed graphs. However, irreducibility of V is not sufficient for VVT to be irreducible. A simple example of such a matrix is the matrix T given by (1.16). In this case, we have TTT = I, which, obviously, is not irreducible, although T is irreducible because G(T) is strongly connected: Node 1 in the graph is connected with the second one, which in turn is connected with the third one, and so on until the last node, node K, is achieved, which is connected with the first one. In fact, one can find a relatively large number of irreducible matrices that fail to satisfy the conditions of Theorem 1.63. For instance, VVT is not irreducible even if V is chosen to be ⎛ 0 ··· 0 0 1 ⎞ 1 ··· 0 0 1
V = ⎝ .. .. .. . . .
.. ⎠ ∈ RK×K . +
(1.85)
··· . 0 ··· 1 0 1 0 ··· 0 1 0
Indeed, despite the existence of directed edges from node K to all other nodes, VVT is not irreducible. This raises the question of what properties of V are necessary and sufficient for VVT to be irreducible? In order to answer this ˜ question, we will associate another graph G(V) with the matrix V. ˜ , let G(V) be the (undirected) Definition 1.64. For any given V ∈ RK×K + bipartite graph of 2K nodes (vertices) divided into two disjoint sets Kr and Kc (each of cardinality K), such that (i) there is no edge between nodes within each of these groups, and (ii) there is an undirected edge between k ∈ Kr and l ∈ Kc if and only if vk,l > 0. Due to (ii), it is reasonable to refer to nodes in Kr and Kc as row nodes and column nodes, respectively. It is worth pointing out that if the vertices are labeled so that Kr = (1, . . . , K) and Kc = (K +1, . . . , 2K), then the adjacency T ˜ [18, matrix of G(V) is a 2K × 2K partitioned matrix of the form 0 V V 0
p.15]. ˜ The notion of connectivity of G(V) is similar to the standard definition for undirected bipartite graphs. Remark 1.65. In the graph theory, an undirected graph is described as connected if each pair of distinct nodes is linked by a sequence of undirected edges 5
This does not impact the generality of the analysis since each unit entry of V can be substituted by any positive number with no impact on any result in this section.
50
1 On the Perron Root of Irreducible Matrices
having a connection in common with the other. Note that in an undirected graph, the direction does not matter so that the notions of weak and strong connectivity for undirected graphs do not make sense. ˜ Definition 1.66 (Connectivity of G(V)). For any V, two nodes k1 and k2 ˜ in G(V) are said to be connected if and only if there exists a sequence of edges (i, j) with (i, j) ∈ Kr × Kc or (i, j) ∈ Kc × Kr such that {(l0 , l1 ), (l1 , l2 ), . . . , (lN −2 , lN −1 ), (lN −1 , lN )} with l0 = k1 and lN = k2 . ˜ Stated informally, the definition says that G(V) is connected if and only if every two distinct nodes are linked by a sequence of undirected edges, each of which connects a row node with a column node. The figures below illustrate the definitions.
Kr
1
1
2
Kc
1 3
2 2 3
3
˜ G(V)
G(VVT )
˜ Fig. 1.2: G(V) and G(VVT ) for V = T given by (1.16) with K = 3. We see that nei˜ ther G(V) is connected in the sense of Definition 1.66 nor G(V) is strongly connected in the traditional sense.
Now we are in a position to relate irreducibility of VVT to connectivity ˜ of G(V). Theorem 1.67. For any irreducible matrix V ≥ 0, VVT is irreducible if and ˜ only if G(V) is connected. Proof. Since V is irreducible, every column node is connected with at least one row node. As a consequence, it is sufficient to show that every row node ˜ in G(V) is connected with all other row nodes. First suppose that VVT is irreducible. Then, for any pair k, l ∈ K, there is a sequence of natural numbers T {ln }N n=0 with l0 = l and lN = k such that (VV )lr ,lr−1 > 0 for each r = 1, . . . , N . Hence, L(lr ) ∩ L(lr+1 ) = ∅ for all r = 0, . . . , N − 1 where the set ˜ L(k) is defined in Lemma 1.62. In the context of the graph G(V), this is equivalent to saying that row node k is connected with row node l. Because ˜ this holds for any k and l, we can conclude that G(V) is connected.
1.7 Generalizations to Arbitrary Nonnegative Matrices
Kr
1
1
2
2
51
Kc 1 3 2
3
3
˜ G(V)
G(VVT )
˜ Fig. 1.3: G(V) and G(VVT ) for V given by (1.85) with K = 3. As in Fig. 1.2, neither ˜ ˜ G(V) is connected nor G(V) is strongly connected. In G(V), it is only possible to “go” from row node 3 to column node 2 and vice versa. In G(VVT ), node 3 is isolated.
˜ To prove the converse, note that if G(V) is connected, for each row node, say node k1 ∈ Kr , there is another row node, say k2 ∈ Kc , connected with k1 over some column node l ∈ Kc . Therefore, we have L(k1 ) ∩ L(k2 ) = ∅. ˜ Now since G(V) is connected, for any pair of row nodes k, l ∈ Kr , k = l, there must exist a sequence of row node–column node–row node such that k and l are connected. This implies that for any k, l ∈ Kr , k ≤ l, there is a sequence T {ln }N n=0 with l0 = l and lN = k such that (VV )lr ,lr−1 > 0 or, equivalently, L(lr ) ∩ L(lr−1 ) = ∅ for each r = 1, . . . , N . From this and Observation A.30, we obtain irreducibility of VVT . An example of an irreducible matrix V such that VVT is irreducible as well is the matrix V, in which all entries are equal to 1. An immediate consequence of Theorems 1.63 and 1.67 is the following corollary. ˜ Corollary 1.68. Fγ is strictly convex if and only if G(V) is connected.
1.7 Generalizations to Arbitrary Nonnegative Matrices We finish this chapter by making some remarks on reducible matrices. The weak form of the Perron–Frobenius theorem (Theorem A.39 in App. A.4) ensures that the spectral radius of any nonnegative matrix is an eigenvalue of the matrix and that associated eigenvectors are nonnegative. So, in contrast to irreducible matrices, there is no assertion regarding the uniqueness and positivity properties. For this reason, it is not clear which of the presented results carry over to nonnegative matrices. On the other hand, the proof of Theorem A.39 is based on the recognition that any nonnegative matrix can always be written as a limit of a sequence of positive matrices (and hence irreducible ones). Therefore, due to continuity of the spectral radius as a function of the matrix entries, it is justified to conjecture that some of the results remain valid (maybe in a milder form) in case of reducible matrices.
52
1 On the Perron Root of Irreducible Matrices
First we show that the main results of Sects. 1.3–1.5 hold for general nonnegative matrices. In fact, this extension is straightforward. In contrast, the problem of extending the results of Sects. 1.2.4 and 1.2.5 to nonnegative matrices is somewhat tricky. This problem is considered in Sects. 1.7.2–1.7.4. 1.7.1 Log-Convexity of the Spectral Radius Suppose that X ∈ NK is reducible. Then, by Definition A.27 and the fact that interchanging columns and rows of any square matrix does not affect the spectral radius of the matrix [6], we can assume that X ∈ NK is of the normal form (A.29) in App. A.4.3 with n = K. Without loss of generality, let us assume that all diagonal blocks in the normal form A.4.3 are irreducible. Note that if X is irreducible, then s = 1 and X = X(1) . If X(1) · · · X(s) are the irreducible diagonal blocks of X in the normal form A.29, then the spectral radius is given by (A.31). By Theorem A.39, ρ(X) is an eigenvalue of X but not necessarily a simple one (Definition A.7), which immediately follows from (A.31). Obviously, ρ(X(n) ) depends only on the entries of X(n) , which in turn implies that, for any matrix-valued function X : Ω → NK , ρ(X(n) (ω)) depends on the parameter vector ω only through the entries of the nth diagonal block. As a consequence, we can apply Theorem 1.39 to deduce that the Perron root of each diagonal block is log-convex on Ω if X ∈ LCK (Ω) ⊂ NK (Ω). Now since log-convexity is closed under pointwise maximum [16] (see also Sect. 2.3.1), we can make the following two observations. Observation 1.69. Let X ∈ LCK (Ω) ⊂ NK (Ω) be arbitrary. Then, ρ(X(ω)) is log-convex on Ω. Observation 1.70. Let X ∈ LCK (Ω) ⊂ NK (Ω) be arbitrary. Then, F = {ω ∈ Ω : ρ(X(ω)) ≤ 1} is a convex set. We also see from Observations 1.69 and 1.70 that the irreducibility property is not necessary for the results of Sect. 1.4 to hold. The converse results of Sect. 1.3.4 extend to arbitrary nonnegative matrices as well. The same holds for the results of Sect. 1.5.2. 1.7.2 Characterization of the Spectral Radius This section aims at extending the results of Sects. 1.2.4 and 1.2.5 to reducible matrices. We will prove that the characterizations of the Perron Root, and in particular the Collatz–Wielandt-type saddle point characterization, remain valid for some subclass of nonnegative matrices that is larger than XK . However, it will also be apparent that such characterizations are not applicable to general nonnegative matrices.
1.7 Generalizations to Arbitrary Nonnegative Matrices
53
It was mentioned above that in the case of nonnegative reducible matrices, ρ(X) ∈ σ(X) does not need to be a simple eigenvalue. Moreover, associated left and right eigenvectors are not necessarily unique up to positive multiples. This prompts us to introduce the notion of the eigenmanifolds RK (X), LK (X) and EK (X) of X defined to be RK (X) := p ∈ RK + : Xp = ρ(X)p T (1.86) LK (X) := q ∈ RK + : X q = ρ(X)q K T EK (X) := q ◦ p ∈ R+ : (q, p) ∈ LK (X) × RK (X), q p = 1 . + K K Furthermore, we define R+ K (X) := RK (X)∩R++ , LK (X) := LK (X)∩R++ and + K K EK (X) := EK (X) ∩ (R++ × R++ ). In words, these sets include only positive eigenvectors and therefore, by the Perron–Frobenius theory, they might be empty sets if X is reducible. In contrast, due to Theorem A.39, RK (X), LK (X) and EK (X) are not empty, regardless of X ∈ NK . Note that the notation z ∈ EK (X) implies that z is a Hadamard product of a nonnegative left and right eigenvector of X, both associated with the same eigenvalue, which is ρ(X). Notice that by the definitions of EK (X) and E+ K (X), we have z ∈ ΠK + and z ∈ Π+ K for any z ∈ EK (X) and z ∈ EK , respectively. In what follows, given X ∈ NK , the function φ : R++ → R is assumed to belong to the function class G(X) specified in Definition 1.27. For the function 6 H : RK ++ → R given by (see Definition 1.27) (Xs)k H(s) := zk φ (1.87) , z = (z1 , . . . , zK ) ∈ Π+ K sk k∈K
to be well defined, the matrix X is assumed to be confined to the set N+ K given by Xs > 0}. (1.88) N+ K := {X ∈ NK : ∃s∈RK ++ Since X is nonnegative, we see that if Xs > 0 holds for some arbitrary s ∈ K RK ++ , then we must have Xs > 0 for all s ∈ R++ . Obviously, for any K > 1, + + we have XK ⊆ NK ⊆ NK . The set NK is, however, a proper subset of NK for any K > 1 (N+ K ⊂ NK ) since any nonnegative matrix with some zero row + is not a member of N+ K . On the other hand, for all K > 1, NK is a proper + superset of XK (XK ⊂ NK ). This is because for all K > 1, there exists a nonnegative (row) stochastic matrix having one column with all entries being equal to zero. Clearly, such matrices are not irreducible but they belong to + N+ K . Finally, we point out that since X ∈ NK is not necessarily irreducible, the second condition of Definition 1.27 must be modified to read as follows: K (C.1-3) For any z ∈ Π+ K , every local infimum of the function H : R++ → R + defined by (1.87), with X ∈ NK , is global and finite. If the infimum is attained for some s∗ ∈ RK ++ , then (1.40) provides a necessary and sufficient condition for characterizing s∗ . 6
It is important to bear in mind that H is parameterized by the weight vector z.
54
1 On the Perron Root of Irreducible Matrices
So the modification takes into account the possibility that H has an infimum K at a point in RK + and has no minimum on R++ . First we are going to extend Theorem 1.29. As there is no guarantee that X has a positive eigenvector, it is clear that the theorem cannot hold in this form for general nonnegative matrices. The problem is that the equality in (1.41) cannot hold with s = p if p has at least one zero element. On the other hand, however, we see from the proof of Theorem A.39 that X can be written as a limit of positive matrices, each of which has positive left and right eigenvectors. So one may expect that the bound in (1.41) holds and can be approached arbitrarily closely. Below this intuitive approach is made precise. Theorem 1.71. Let X ∈ N+ K and φ ∈ G(X) be arbitrary, and suppose that (C.1-3) is true. Then, the following holds. (i) We have inf
s∈RK ++
wk φ
k∈K
(Xs)k sk
= φ(ρ(X)).
(1.89)
if and only if w ∈ EK (X). H(s) if and only if z = q ◦ p ∈ (ii) For any p ∈ RK (X), p = arg inf s∈RK ++ EK (X). Note that an immediate consequence of (ii) is that if X has a positive right eigenvector p, the infimum in (1.89) is attained for s = p > 0. ( ) Proof. Let X ∈ N+ = X + 11T K be arbitrary, let w ∈ EK (X), and let X for all ≥ 0. Hence, X( ) ∈ XK for all > 0, and
(X( ) s)k (Xs)k s1 = + , sk sk sk
s ∈ RK ++ ,
1≤k≤K.
(1.90)
Now let {w(n )}n∈N be any sequence in Π+ K with limn→∞ n = 0 such that lim w(n ) − w1 = 0 .
n→∞
(1.91)
By strict monotonicity and continuity of φ as well as by continuity of the spectral radius as a function of matrix elements (Theorem A.8), it follows from Theorem 1.29 that ( n ) (X s)k ( n ) φ(ρ(X)) = lim φ(ρ(X )) = lim min wk (n )φ n→∞ n→∞ s∈RK sk ++ k∈K (Xs)k ≥ lim inf wk (n )φ (1.92) n→∞ s∈RK sk ++ k∈K (Xs)k = inf wk φ . sk s∈RK ++ k∈K
1.7 Generalizations to Arbitrary Nonnegative Matrices
55
On the other hand, however,
(X( n ) s)k φ(ρ(X)) = lim min wk (n )φ n→∞ s∈RK sk ++ k∈K ( n ) (X s)k (wk (n ) − wk )φ = lim inf n→∞ s∈RK sk ++ k∈K (Xs)k (X( n ) s)k (Xs)k wk φ wk φ + −φ + sk sk sk k∈K k∈K (X( n ) s)k ≤ lim inf w(n ) − w1 max φ n→∞ s∈RK 1≤k≤K s k ++
a(n)
( n ) (X s)k (Xs)k (Xs)k + w1 max φ −φ wk φ + 1≤k≤K sk sk sk k∈K = inf
s∈RK ++
k∈K
b(n)
(Xs)k wk φ sk
where the last step follows since, by (1.90) and (1.91), the nonnegative sequences {a(n)} and {b(n)} tend to zero as n → ∞. So combining the above inequality with (1.92) yields (1.89), and therefore proves the “if” part of (i). To prove the converse of (i), which is to show that (1.89) can hold only if w ∈ EK (X), assume by contradiction that (1.89) is satisfied for some z∈ / EK (X). Then, letting p ∈ RK (X), it follows from (ii), which is proved subsequently (see also the remark after the proof), that H(p) =
k∈K
zk φ
(Xp) k
pk
> inf
s∈RK ++
zk φ
(Xs)
k∈K
k
sk
= φ(ρ(X))
(1.93)
where the last equality is However, since p ∈ RK (X)
due to the assumption.
and z1 = 1, we have k∈K zk φ((Xp)k /pk ) = k∈K zk φ(ρ(X)) = φ(ρ(X)) which contradicts (1.93) and completes the proof of the converse of (i). In order to show (ii), let H n (s) be given by (1.87) with X = X( n ) where n = 1/n, n ∈ N. Since X( n ) is irreducible for any n ∈ N, there exists a unique p( n ) > 0 with p( n ) 1 = 1 such that ρ(X( n ) )p( n ) = X( n ) p( n ) . Moreover, by Theorem 1.29, we have p( n ) = arg minH n (s)
(1.94)
s∈RK ++
( n ) if and only if z = w ∈ E+ ). Since the sequence {p( n ) }n∈N is bounded K (X ( n ) due to p 1 = 1, proceeding essentially as in the proof of Theorem A.39 ˜ ∈ RK (X) shows that {p( n ) }n∈N has a subsequence that converges to some p
56
1 On the Perron Root of Irreducible Matrices
˜ = 0, where the nonzero property is due to p( n ) > 0 and p( n ) 1 = 1. with p Thus, by (1.94), we have ˜ = arg inf H(s) ≥ 0 p s∈RK ++
if and only if z = w ∈ EK (X). This completes the proof. Remark 1.72. Given some nonnegative (but not necessarily irreducible) matrix X, the Collatz–Wielandt formula (Theorem A.35) implies that, for any p ∈ RK (X) and z ∈ ΠK , one has (Xp) (Xs)k k zk φ H(p) = = φ(ρ(X)) = φ inf max k∈K pk sk s∈RK ++ k∈K (Xs) (Xs) k k = inf max φ ≥ inf = inf H(s) . zk φ K k∈K s s s∈RK s∈R s∈RK k k ++ ++ ++ k∈K
Equality can hold only if φ((Xs∗ )1 /s∗1 ) = · · · = φ((Xs∗ )K /s∗K ), where H(s). By strict monotonicity of φ, this is equivalent to s∗ = arg inf s∈RK ++ (Xs∗ )1 /s∗1 = · · · = (Xs∗ )K /s∗K . Thus, if the above inequality holds with equality, then s∗ ∈ RK (X). So, by (ii) of Theorem 1.71, we can conclude that (Xp) (Xs) k k > inf zk φ zk φ pk s s∈RK k ++ k∈K
k∈K
for any z ∈ / EK (X). 1.7.3 Existence of Positive Eigenvectors As stated in Theorem 1.71, the infimum in (1.89) is attained for a positive right eigenvector of X ∈ N+ K associated with ρ(X), provided that such a vector exists. Definition 1.73. BK ⊂ N+ K is defined such that X ∈ BK if and only if X has a positive right eigenvector associated with ρ(X). An elegant characterization of BK is provided, for instance, by [6]. In App. A.4.3, we have summarized some of these results. Theorem A.43 characterizes the set of nonnegative matrices with positive right eigenvectors. The conclusion of the theorem is that X ∈ BK if and only if, when written in a normal form defined by (A.29) in App. A.4.3, each isolated diagonal block is maximal and there are no other maximal diagonal blocks. It is important to emphasize that X ∈ BK does not need to have a positive left eigenvector associated with ρ(X). Therefore, although the infimum in (1.89) is attained when X ∈ BK , the weight vector w may have zero components. From practical point of view, it is interesting to know whether w = q◦p is positive or not (see also the discussion in Sect. 5.9.1). This is equivalent to asking whether E+ K (X) is an empty set or not.
1.7 Generalizations to Arbitrary Nonnegative Matrices
57
¯ K be a subset of BK such that X ∈ B ¯ K if and only if Definition 1.74. Let B + EK (X) = ∅. From Theorem A.45 in App. A.4.3 it follows that a necessary and sufficient condition for E+ K (X) to be nonempty is that the normal form of X is a blockirreducible matrix (Definition A.44 in App. A.4.3) and each diagonal block ¯ K to denote the set of is maximal in the sense of Definition A.42. We use B such block-irreducible matrices. Therefore, we have E+ K (X) = ∅ if and only if ¯ K . Combining these observations, we can strengthen Theorem 1.71 as X∈B follows [19]. Theorem 1.75. Suppose that the conditions of Theorem 1.71 are fulfilled. Then, the following holds. (i) For any X ∈ BK ⊂ N+ K , we have φ(ρ(X)) ≥ min H(s) = min s∈RK ++
s∈RK ++
k∈K
(Xs)k zk φ sk
(1.95)
with equality if and only if z ∈ EK (X). ¯ K , then the inequality in (1.95) holds with equality for some (ii) If X ∈ B z > 0. In words, (ii) means that if X is block-irreducible and each diagonal block is maximal, then E+ K (X) = ∅. In Sect. 5.9.1, we will interpret these results in the context of power control in wireless networks. 1.7.4 Collatz–Wielandt-Type Characterization of the Spectral Radius Lemma A.46 in App. A.4.3 and the subsequent simple example make clear that the “sup-min” part of the conventional Collatz–Wielandt formula (Theorem A.35) cannot be extended to general nonnegative matrices. Indeed, for any X ∈ NK , Theorem A.47 and (A.33) imply that sup min s∈RK ++
k∈K
(Xs)k (Xs)k ≤ inf max = ρ(X) K k∈K sk sk s∈R++
(1.96)
where strict inequality can be shown to hold for some nonnegative reducible matrix (see the simple example in App. A.4.3). From the equality in (1.96), however, we can conclude that the “min-max” part of the Collatz–Wielandt formula extends to general nonnegative matrices, provided that “min-max” is replaced by “inf-max”. This characterization can be utilized to extend Lemma 1.31 to the set N+ K. + Lemma 1.76. For any φ ∈ G(X), define G : RK ++ × ΠK → R as
58
1 On the Perron Root of Irreducible Matrices
G(s, z) :=
k∈K
(Xs)k zk φ sk
.
(1.97)
Then, the following holds. (i) If X ∈ N+ K , then φ(ρ(X)) = inf
sup G(s, z) = inf
+ s∈RK ++ z∈Π K
max G(s, z) .
z∈ΠK s∈RK ++
(1.98)
maxz∈ΠK G(s, z) if and only if (p, w) ∈ RK (X) × (ii) (p, w) = arg inf s∈RK ++ EK (X). (iii) If X ∈ BK , then (1.99) φ(ρ(X)) = min max G(s, z) z∈ΠK s∈RK ++
with the minimum attained at some s = p ∈ R+ K (X). ¯ K ⊂ BK , then the maximum in (1.99) is attained at z = w ∈ (iv) If X ∈ B E+ K (X). Proof. With Sect. 1.7.2, the proof proceeds essentially as the proof of Lemma 1.31. Notice that the maxima in (1.98) and (1.99) are taken over ΠK so that z ∈ ΠK may contain zero entries. Alternatively, the weight vector could be restricted to belong to Π+ K , in which case the maxima must be replaced by ¯K. suprema, unless X ∈ B Now let us turn our attention to the problem of extending Lemma 1.33 to a larger class of nonnegative matrices belonging to the set N+ K . With Theorem 1.71 in hand, we can proceed essentially as in the proof of Lemma 1.33 to show that, for any X ∈ N+ K and φ ∈ G(X), (Xs)k (Xs)k φ(ρ(X)) = sup min zk φ wk φ = min . K sk sk s∈RK + z∈Π+ s∈R+ k∈K
K
k∈K
where both w ∈ EK (X) and the corresponding minimizer, which is a nonnegative right eigenvector of X, contain zero entries in general. By the previous section, however, we know that there exists a positive right eigenvector associated with ρ(X) if and only if X ∈ BK . So, if X ∈ BK , then we have (Xs)k zk φ sup min = φ(ρ(X)) (1.100) K sk z∈Π+ s∈R++ K
k∈K
where the minimum is attained at some p ∈ R+ K (X). Again, if additionally ¯ K ⊂ BK and p ∈ R+ (X) is a minimizer in (1.100), then the supremum X∈B K in (1.100) is attained at z = w = q ◦ p ∈ E+ K (X). Finally, we combine the above results with Lemma 1.76 to obtain the following theorem.
1.8 Bibliographical Notes
59
Theorem 1.77. Let X ∈ N+ K and φ ∈ G(X) be given. Then, the following holds. (i) We have φ ρ(X) =
inf sup G(s, z) = + s∈RK ++ z∈ΠK
sup inf G(s, z) . K z∈Π+ K s∈R++
(ii) We have (p, w) = arg inf
sup G(s, z)
+ s∈RK ++ z∈Π K
and (w, p) = arg sup
inf G(s, z)
s∈RK ++ z∈Π+ K
if and only if (p, w) ∈ RK (X) × EK (X). ¯ K , any pair (p, w) ∈ R+ (X) × E+ (X) is a saddle point of G (iii) If X ∈ B K K such that φ ρ(X) = G(p, w) = min max+ G(s, z) = max min G(s, z) . (1.101) + s∈RK ++ z∈ΠK
z∈ΠK s∈RK ++
Summarizing we can say that the saddle point characterization of Theo¯ K . Note that rem 1.34 holds (with some minor modifications) for any X ∈ B any pair (p, w) ∈ RK (X) × EK (X) is prevented from being a saddle point only by the fact that it does not pertain to the (open) domain of function G: Under extension of the domain of G to its closure (and adding some technical restrictions), Theorem 1.77 would characterize a saddle point also in the general case X ∈ N+ K.
1.8 Bibliographical Notes All the results presented in this chapter were obtained by the authors in the course of working on problems in wireless networks [20, 21, 22, 23, 24, 25, 19]. To the best of our knowledge, some of these results were novel at the time of publishing, but some others turned out to have been known in the mathematics community for a while. From the mathematical point of view, however, they are still of some interest due to the different approach as well as the different line of arguments. Moreover, we feel that some proofs are more elementary, simpler, and shorter. Below the reader will find a short list of references where we found alternative proofs of the results presented in this chapter or some closely related results. The Perron root characterization in Theorem 1.2 is an adapted form of the variational principle for pressure expressed in terms of nonnegative matrices. As aforementioned, this characterization can be deduced from [8, Equation 2.6 with 2.8 and 2.9]. In this form (but without proof), the theorem can be found in [29, Theorem 3.1]. Roughly speaking, both papers deal with the first and second partial derivatives of the Perron root with respect to the entries
60
1 On the Perron Root of Irreducible Matrices
of nonnegative matrices. The proof of Theorem 1.2 is elementary and seems to be novel. The assertion of Theorem 1.7 appears in [30, Equation 3.3] for positive matrices, but the result of [30] extends to arbitrary irreducible matrices. In contrast to [30], however, the proof presented here is elementary and therefore may be of interest in its own right. Theorem 4.1 in [30] is closely related to Theorem 1.11. However, the first one seems to apply only to matrices of the form X = AYB where A and B are diagonal positive definite and Y is positive semidefinite. There is an extension of this result [30, Theorem 4.2] showing that the inequalities proved in [30, Theorem 4.1] hold for any nonnegative irreducible matrix X such that X−1 is an M -matrix (Definition A.54). In this book, Theorem 1.11 immediately follows from Theorem 1.7 by considering the fact that log(x) ≤ x − 1 for all x > 0 with equality if and only if x = 1. Therefore, both results apply to an arbitrary nonnegative irreducible matrix. We point out that [30] provides a bunch of interesting results about the spectral radius of DX where D is diagonal positive definite and X is nonnegative. Also, it should be emphasized that [30, Equation 3.3] is a key ingredient in the proof of the saddle point characterization (1.52) with φ(x) = log(x), x > 0. The problem of convexity of the Perron root (Sects. 1.3) has also attracted some attention in the literature. In particular, it seems that Theorem 1.39 was first proved by [31] for nonnegative matrices whose entries are continuous functions of a scalar parameter on some interval. However, the extension to a parameter vector defined on some convex set is straightforward. Obviously, [31] used different techniques since Theorem 1.2 was not known at this time. In the engineering community, the result was rediscovered by [32]. Kingman’s theorem was used by [33] to prove inequalities of the form φ(eA+B ) ≤ φ(eA eB ) where A and B are complex matrices and φ is a real-valued continuous function of the eigenvalues of its matrix argument. In a special case, φ is a spectral radius, A is nonnegative, and B is diagonal real. There are also some interesting results on log-convexity of spectral functions. In [34], it was shown that f (D) = log(eD A) where D is diagonal and A nonnegative is convex on the set of diagonal matrices. Using different tools, the convexity of g(D) = max{Re λ : λ ∈ σ(A + D)} is shown in [35, 34, 36, 37]. In [38] (see also [37]), the convexity property of f and g was related to the convexity of certain sets of M -matrices, which in turn are related to the feasibility set defined in this book.
2 On the Positive Solution to a Linear System with Nonnegative Coefficients
This chapter deals with a positive solution p to the following system of linear equations with nonnegative coefficients: p = u + Xp .
(2.1)
K×K Here and hereafter, u ∈ RK is a given ++ is a given positive vector, X ∈ R+ K nonnegative matrix (not necessarily irreducible), and p ∈ R++ is a sought vector, provided that it exists.
2.1 Basic Concepts and Definitions Before starting with the analysis, we need to address the fundamental problem of the existence of a positive solution p to (2.1). This problem is addressed in App. A.4.4. In particular, by Theorem A.51, we know that a necessary and sufficient condition for p ≥ 0, p = 0, to exist is that ρ(X) < 1 where ρ(X) is the spectral radius of X. Moreover, as u is positive, there is a unique solution p, which is strictly positive and given by p = (I − X)−1 u . Theorem A.39 asserts that λp := ρ(X) is an eigenvalue of X, that is to say λp ∈ σ(X) where σ(X) is used to denote the spectrum of X (Definition A.10). Remark 2.1. Note that except for the nonnegativity, there are no additional constraints on X. In particular, X does not need to be irreducible. However, it is worth pointing out that if X is irreducible and its Perron root λp = ρ(X) > 0 satisfies λp < 1, then u = 0 does not need to be positive for (2.1) to have a unique positive solution p. This is one part of the assertion of Theorem A.52. Analogous to the previous chapter, we allow the entries of X to continuously depend on some parameter vector ω ∈ Ω where the parameter set Ω is S. Stanczak et al., Fundamentals of Resource Allocation in Wireless Networks, Foundations in Signal Processing, Communications and Networking 3, 2nd edn., c Springer-Verlag Berlin Heidelberg 2009 DOI 10.1007/978-3-540-79386-1 2,
62
2 On the Positive Solution to a Linear System with Nonnegative Coefficients
defined by (1.53) and is an open convex subset of RK . The only difference is that here the matrix is not required to be irreducible for all parameter vectors. In fact, X(ω) can even be identically the zero matrix, in which case, however, the problems addressed in this chapter are trivial. To be precise, let X(ω) := (xk,l (ω))1≤k,l≤K be a matrix-valued function whose entries xk,l : Ω → R+ are continuous functions defined on Ω. Considering Definition 1.35, this is formally written as X ∈ NK (Ω), in which case X is said to be nonnegative on Ω. To conform with the applications in wireless networks, we let each entry of the vector u in (2.1) be a continuous positive function of the parameter vector ω as well. We indicate this by writing u ∈ RK ++ (Ω). Now it follows from (2.1) and Theorem A.51 that, for any fixed ω ∈ Ω, there exists a unique positive vector p(ω) satisfying1 p(ω) = X(ω)p(ω) + u(ω)
(2.2)
λp (ω) := ρ(X(ω)) < 1 .
(2.3)
if and only if Moreover, for any ω ∈ Ω with λp (ω) < 1, −1 p(ω) = I − X(ω) u(ω) .
(2.4)
Let F be the set of those parameter vectors ω ∈ Ω for which a positive solution p(ω) to (2.2) exists. Formally, we have F := {ω ∈ Ω : λp (ω) < 1} .
(2.5)
Note that each entry of the vector p(ω) is a continuous map from F into the set of positive reals R++ . This is because if ω ∈ F, then the Neumann series
∞
∞ l −1 l (X(ω)) converges (Theorem A.16) and (I − X(ω)) = (X(ω)) . l=0 l=0 Therefore, since a composition of continuous maps is continuous, it follows from −1 u(ω), ω ∈ F, 1 ≤ k ≤ K (2.6) pk (ω) = eTk I − X(ω) that pk : F → R++ is continuous. In particular, this implies that the l1 -norm of p(ω) given by pk (ω) = 1T (I − X(ω))−1 u(ω), ω ∈ F (2.7) p(ω)1 = k∈K
is a continuous function on F as well. 1
ˆ ω ˇ ∈ However, as xk,l : Ω → R+ are not one-to-one maps, there may exist ω, ˆ = ω, ˇ such that p(ω) ˆ = p(ω). ˇ Ω, ω
2.2 Feasibility Sets
63
In this chapter, we analyze both pk (ω) and p(ω)1 as functions of the parameter vector ω ∈ F. In doing so, most of our interest is devoted to matrixvalued functions X(ω) of the form X(ω) = Γ(ω)V with V ∈ NK and Γ(ω) = diag γ1 (ω1 ), . . . , γK (ωK ) . Here and hereafter, γk : Qk → R++ is a continuous strictly monotonic (bijective) function and Qk ⊆ R is some open interval (see also Sect. 1.3.1). Formally, this is denoted by X ∈ NK,Γ (Ω) where NK,Γ (Ω) := {Γ(ω)V, ω ∈ Ω : V ∈ NK } ⊂ NK (Ω)
(2.8)
is the set of all nonnegative matrix-valued functions X(ω) of the form X(ω) = Γ(ω)V for some given γk : Qk → R++ , k = 1, . . . , K. In this special case, it will also be assumed that u(ω) = (γ1 (ω1 ), . . . , γK (ωK )). Exceptions are only Sects. 2.3.1 and 2.3.2, where X(ω) and u(ω) are not confined to this special form.
2.2 Feasibility Sets The set F defined by (2.5) contains all parameter vectors such that a positive solution to our system of linear equations exists. For this reason, if there are no additional constraints on p, F is referred to as the feasibility set. Notice that the definition is analogous to Definition 1.41, except that now the spectral radius must be strictly smaller than 1. Therefore, the parameter vectors satisfying λp (ω) = 1 are not members of F.2 In wireless networks, however, some additional constraints on p are imposed, which gives rise to the definition of some subset of F as the feasibility set. Constraints on the l1 -norm of p(ω) are common to applications in wireless communications networks. More precisely, we say that p(ω) is constrained in the l1 -norm if p(ω)1 ≤ Pt , ω ∈ Ω must hold for some given constant Pt > 0, referred to as a sum (or total) constraint. Consequently, in this case, the parameter vector ω ∈ Ω is feasible if and only if ω ∈ F(Pt ) where F(α) = {ω ∈ F : p(ω)1 ≤ α, α > 0} ⊆ F .
(2.9)
Notice that due to continuity of p(ω)1 , F(α) is monotonic in α > 0 with respect to set inclusion in the following sense: For any 0 < α ≤ β, there holds F(α) ⊆ F(β). Therefore, since F(α) ⊆ F for all α > 0, we have 2
In the previous chapter, F is the set of all the parameter vectors for which the homogenous system of linear equations (I − X(ω))p(ω) = 0, with X(ω) being irreducible for all ω ∈ Ω, has a positive solution p(ω).
64
2 On the Positive Solution to a Linear System with Nonnegative Coefficients
F=
"
F(α)
(2.10)
α>0
where the union is taken with respect to all α > 0. Another common situation encountered in wireless networks is that of constraining each element of p(ω) individually. Therefore, if there are positive constants P1 , . . . , PK such that pk (ω) ≤ Pk must hold for each 1 ≤ k ≤ K, we say that p(ω) is subject to individual constraints. Clearly, in this case, the set of all feasible parameter vectors is given by # F(P1 , . . . , PK ) := Fk (α) (2.11) α∈{P1 ,...,PK }
where Fk (α) := {ω ∈ F : pk (ω) ≤ α} .
(2.12)
These two types of constraints are often combined by imposing both individual and sum constraints on p(ω). Therefore, in this case, the feasibility set becomes F(Pt ; P1 , . . . , PK ) := F(Pt ) ∩ F(P1 , . . . , PK ) .
(2.13)
Note that F(Pt ; P1 , . . . , PK ) = F(Pt ) if Pt ≤ Pk for each 1 ≤ k ≤ K, and F(Pt ; P1 , . . . , PK ) = F(P1 , . . . , PK ) if k Pk ≤ Pt . Thus, both F(Pt ) and F(P1 , . . . , PK ) can be viewed as special cases of F(Pt ; P1 , . . . , PK ). Remark 2.2. In what follows, we exclude the trivial case where the feasibility set is an empty set. The next observation immediately follows from the connectedness of Ω, continuity of λp (ω), ω ∈ Ω, and p(ω), ω ∈ F, as well as [39, Theorem 4.22] . Observation 2.3. F(Pt ; P1 , . . . , PK ) is a connected set (see Definition B.1). It is important to emphasize that the geometry of the feasibility sets depends on the choice of X(ω) and u(ω), ω ∈ Ω. In particular, the feasibility set is not convex in general. To illustrate the definitions, let us consider an elementary example. Example 2.4. Let X(ω) = 0 for all ω ∈ Ω and u(ω) = (γ(ω1 ), . . . , γ(ωK )) where γ : Q → R++ is any continuous bijective function. We see that (2.4) reduces to p(ω) = (γ(ω1 ), . . . , γ(ωK )), and hence one obtains F = Ω = QK F(Pt ) = {ω ∈ F :
k
γ(ωk ) ≤ Pt }
F(P1 , . . . , PK ) = {ω ∈ F : γ(ωk ) ≤ Pk , 1 ≤ k ≤ K} . Clearly, F and F(P1 , . . . , PK ) are both convex sets, regardless of the choice of γ(x). In contrast, F(Pt ) is not convex in general. A sufficient condition for F(Pt ) (and also F(Pt ; P1 , . . . , PK )) to be a convex set is that γ(x) is convex.
2.2 Feasibility Sets
65
An important example of a convex function is γ(x) = ex − 1, x > 0. Assuming X(ω) = 0 for all ω ∈ Ω = R2++ and u(ω) = (eω1 − 1, eω2 − 1), Fig. 2.1 depicts the feasibility set F(Pt ; P1 , P2 ) ⊂ R2++ defined by (2.13) for some P1 , P2 and Pt . ω2
γ −1(P2) γ(ω1) + γ(ω2) = Pt F(Pt; P1, P2)
γ −1(P1)
ω1
Fig. 2.1: Illustration of Example 2.4: The feasibility set F(Pt ; P1 , P2 ) with X(ω) ≡ 0, γ(x) = ex − 1, x > 0, and u(ω) = (eω1 − 1, eω2 − 1). The constraints P1 , P2 and Pt are chosen to satisfy 0 < P1 , P2 < Pt and Pt < P1 + P2 .
Unfortunately, as the example below shows, convexity of γ(x) is not sufficient for F(Pt ) to be a convex set if X(ω) = Γ(ω)V = 0. Example 2.5. Suppose that X(ω) = γ(ω02 ) γ(ω01 ) for some ≥ 0. Furthermore, assume that u(ω) = (γ(ω1 ), γ(ω2 )) and γ(x) = ex − 1, x > 0. Thus, Ω = RK ++
F = {ω ∈ Ω : λp (ω) = (eω1 − 1)(eω2 − 1) < 1} .
Now we claim that F is not a convex set if > 0. To see this, we write λp (ω) = 1 with > 0 as a function of ω1 > 0 to obtain ω2 = f (ω1 ) = 2 ω1 e − 2 log 1+
2 (eω1 −1) , > 0. The function f (x), x > 0, is twice differentiable and its second derivative is strictly positive for all x > 0. Consequently, instead of c K the feasibility set F, its complement in RK ++ (F = R++ \ F) is convex. Now let us consider F(Pt ) with ≥ 0. Applying (2.7) to our special case yields p(ω)1 =
eω1 + eω2 − 2 + 2(eω1 − 1)(eω2 − 1) , ω ∈ F. 1 − 2 (eω1 − 1)(eω2 − 1)
Hence, writing p(ω)1 = Pt as a function of ω1 ∈ [0, log(1 + Pt )], one obtains
66
2 On the Positive Solution to a Linear System with Nonnegative Coefficients
ω2 = g(ω1 ) = log
(1 − ) (2 + Pt + Pt ) + eω1 ( (2 + Pt ) − 1) 1 + eω1 − 1 2 + Pt
where the argument under the logarithm is positive. Now if = 0, g(x) is ex (2+Pt ) concave on x ∈ [0, log(1 + Pt )] since then g (x) = − (2+P x 2 is strictly t −e ) negative on [0, log(1 + Pt )]. This implies that F(Pt ) is a convex set, which is in total agreement with the preceding example. On the other hand, if = 1, the second derivative of g(x), x ∈ [0, log(1 + Pt )], is g (x) =
ex (1 + Pt ) (2 + Pt ) (1 + Pt − ex (2 + Pt ))
2,
, x ∈ [0, log(1 + Pt )]
which is positive. Thus, if = 1, F(Pt ) is not convex but its complement Fc (Pt ) = RK + \ F(Pt ) is a convex set. An examination of the second derivative of g(x), x ∈ [0, log(1 + Pt )], shows that ⎧ √ ⎪ ⎨< 0 < h(Pt ) 1+x−1 g (x) > 0 > h(Pt ) . h(x) = ⎪ x ⎩ = 0 = h(Pt ) Since h(x) → 0 as x → ∞, we have g (x) > 0 for any > 0, which complies with the above discussion that f (x) is convex for any > 0. On the other hand, if x → 0, then h(x) → 1/2. So, at small values of Pt , convexity of F(Pt ) changes to convexity of Fc (Pt ) around the value ≈ 1/2. The example above demonstrates that the feasibility set may be a nonconvex set even if each entry of X(ω) is convex on Ω. As a consequence, a stronger property than convexity is necessary to guarantee convexity of F. In the following section, we show that if X(ω) is log-convex on Ω (see Definition 1.37), then pk (ω) is a log-convex function of ω ∈ F for each 1 ≤ k ≤ K.
2.3 Convexity Results In this section, we show that if X ∈ NK (Ω) and u ∈ RK ++ (Ω) are both logconvex on Ω, then pk : F → R++ given by (2.6) is log-convex for each 1 ≤ k ≤ K. This in turn implies that the feasibility set F(Pt ; P1 , . . . , PK ) is a convex set, regardless of the choice of P1 , . . . , PK > 0 and Pt > 0. Following that, we consider the problem of strict convexity. Recall that according to Definition 1.37, the notation X ∈ LCK (Ω) means that X ∈ NK (Ω) is log-convex on Ω. Furthermore, note that by this definition, the identically zero function is a log-convex function (see also the remark in Sect. 1.3). In an analogous manner, we say that u ∈ RK ++ (Ω) is log-convex on Ω if each entry of the vector u(ω) is a continuous log-convex function defined on Ω. Let us indicate this by writing u ∈ lc(Ω) ⊂ RK ++ (Ω).
2.3 Convexity Results
67
2.3.1 Log-Convexity of the Positive Solution Let ω(μ) with μ ∈ [0, 1] be a convex combination of two arbitrary vectors ˆ ω ˇ ∈ Ω: ω, ˆ + μω, ˇ ω(μ) = (1 − μ)ω μ ∈ [0, 1] . ˆ ω ˇ ∈ F ⊆ Ω, which implies that both Unless otherwise stated, assume that ω, ˆ > 0 and pk (ω) ˇ > 0 exists. pk (ω) Theorem 2.6. Let X ∈ LCK (Ω) ⊂ NK (Ω) and u ∈ lc(Ω) ⊂ RK ++ (Ω) be arbitrary. Then, pk (ω) is log-convex on F for each 1 ≤ k ≤ K, i.e., we have ˆ 1−μ pk (ω) ˇ μ, pk (ω(μ)) ≤ pk (ω)
1≤k≤K
(2.14)
ˆ ω ˇ ∈ F. for all μ ∈ (0, 1) and ω, ˆ ω ˇ ∈ F be arbitrary. Then, by Theorem 1.39 as well as by Sect. Proof. Let ω, 1.7, we know that λp (ω(μ)) < 1 for all μ ∈ (0, 1). Thus, for every μ ∈ (0, 1), there exists a unique positive pk (ω(μ)) given by (see (2.6)) ( )−1 pk (ω(μ)) = eTk I − X(ω(μ)) u(ω(μ)), 1 ≤ k ≤ K . Now let μ ∈ (0, 1) be arbitrary but fixed. Since ω(μ) ∈ F, we can expand (I − X(ω(μ)))−1 into a Neumann series (see Theorem A.16) to obtain ∞ l ( )−1 X(ω(μ)) . I − X(ω(μ)) = l=0
From this it follows that pk (ω(μ)) =
eTk
∞ ∞ l l X(ω(μ)) u(ω(μ)) = eTk X(ω(μ)) u(ω(μ)) l=0
l=0
∞ = gl ω(μ) . l=0
By assumption, all the entries of X(ω) and u(ω) are log-convex on F. Hence, (2.14) immediately follows from the above equation when one considers the following properties of log-convex functions: (i) If two positive functions f and g are log-convex, then f + g and f · g are log-convex. (ii) For any convergent sequence fn of log-convex functions, the limit f = limn→∞ fn is log-convex provided that the limit is strictly positive.
M Due to (i), gl : F → R++ is log-convex for each l ≥ 0 and l=0 gl (ω) is
M log-convex for any M > 0. Furthermore, since l=0 gl (ω) is monotonically increasing in M and gl is positive, it must converge to a positive limit as M → +∞. Hence, by (ii), pk (ω) is log-convex on F and (2.14) must hold.
68
2 On the Positive Solution to a Linear System with Nonnegative Coefficients
Remark 2.7. Recall that the spectral radius of X ∈ NK (Ω) can be expressed as follows (Theorem A.15) λp (ω) =
lim X(ω)m 1/m .
m→+∞
Thus, considering the two properties (i) and (ii) of log-convex functions in the proof of Theorem 2.6 and the fact that if f is log-convex, so also is f α for every positive α, shows that if the entries of X(ω) are log-convex functions on Ω, then λp (ω) is log-convex on Ω. This leads to an alternative proof of log-convexity of the spectral radius (see for instance [32]). A trivial but important consequence of the theorem is the following. Corollary
2.8. If X ∈ LCK (Ω) ⊂ NK (Ω) and u ∈ lc(Ω) ⊂ RK ++ , then p(ω)1 = k∈K pk (ω) is log-convex on F, that is to say, ˆ 1−μ ˇ μ1 p(ω(μ))1 ≤ p(ω) p(ω) 1
(2.15)
ˆ ω ˇ ∈ F. for all μ ∈ (0, 1) and ω, Proof. As log-convex functions are closed under addition, it is clear that the log-convexity property carries over to the l1 -norm of p(ω). More generally, we can say that if X ∈ LCK (Ω) and u ∈ lc(Ω), then ˆ . . . , pK (ω)) ˆ 1−μ F (p1 (ω), ˇ . . . , pK (ω)) ˇ μ F (p1 (ω(μ)), . . . , pK (ω(μ))) ≤ F (p1 (ω), ˆ ω ˇ ∈ F where F : RK for all μ ∈ (0, 1) and ω, ++ → R++ is any function that preserves log-convexity. Standard examples of such functions are
1. weighted sum: F (x1 , . . . , xK ) = k∈K wk xk , !K 2. weighted pointwise multiplication: F (x1 , . . . , xK ) = k=1 wk xk , and 3. pointwise maximum and supremum: F (x1 , . . . , xK ) = max1≤k≤K xk . The weighted sum operation and the pointwise multiplication operation preserve log-convexity as log-convex functions are closed under both addition and multiplication. The claim about the pointwise maximum operation follows since max{pk (ω(μ)) :1 ≤ k ≤ K} ˆ 1−μ pk (ω) ˇ μ : 1 ≤ k ≤ K} ≤ max{pk (ω) ˆ 1−μ : 1 ≤ k ≤ K} max{pk (ω) ˇ μ : 1 ≤ k ≤ K} ≤ max{pk (ω) ˆ : 1 ≤ k ≤ K}1−μ max{pk (ω) ˇ : 1 ≤ k ≤ K}μ = max{pk (ω) ˆ ω ˇ ∈ F. for all μ ∈ (0, 1) and ω,
2.3 Convexity Results
69
2.3.2 Convexity of the Feasibility Set Since the geometric mean is bounded above by the arithmetic mean (B.18), we have ˆ 1−μ pk (ω) ˇ μ ≤ (1 − μ)pk (ω) ˆ + μpk (ω) ˇ ≤ max{pk (ω), ˆ pk (ω)} ˇ pk (ω) ˆ ω ˇ ∈ F and μ ∈ (0, 1). Thus, if pk (ω) is log-convex on F, then the for all ω, above inequality implies that Fk (Pk ) defined by (2.12) is a convex set. By Theorem 2.6, we know that if X(ω) and u(ω) are both log-convex on Ω, then pk : F → R++ is log-convex for each 1 ≤ k ≤ K. Consequently, since the intersection of convex sets is convex, it follows from (2.11) that F(P1 , . . . , PK ) is a convex set if X ∈ LCK (Ω) and u ∈ lc(Ω). By Corollary 2.8 and (2.13), we see that this also true for F(Pt ) and F(Pt ; P1 , . . . , PK ). We summarize these observations in a corollary. Corollary 2.9. Suppose that X ∈ LCK (Ω) ⊂ NK (Ω) and u ∈ lc(Ω) ⊂ RK ++ (Ω). Then, F(P1 , . . . , PK ), F(Pt ) and F(Pt ; P1 , . . . , PK ) are convex sets, regardless of the choice of Pt , P1 , . . . , PK > 0. To illustrate the results, let us consider a simple example. Example 2.10. Let X(ω) and u(ω) be defined as in Example 2.5 except that now γ(x) = ex , x ∈ R. Clearly, the exponential function is log-convex on R. Thus, by Theorem 1.39 (note that the matrix X(ω) is irreducible for all ω ∈ set. In R2 ), the Perron root is log-convex and, by Corollary 1.42, F is a convex √ contrast to the previous example, all pairs satisfying λp (ω) = eω1 eω2 = 1 lie on a line given by ω2 = −ω1 − 2 log , which, of course, is both convex and concave. The nonnegative solution (2.4) yields * ω1 ω1 +ω2 + p(ω) =
e + e 1− 2 eω1 +ω2 eω2 + eω1 +ω2 1− 2 eω1 +ω2
,
2 eω1 +ω2 < 1 .
By Theorem 2.6, both entries are log-convex on R2 . All pairs (ω1 , ω2 ) satisfying p1 (ω) = P1 and p2 (ω) = P2 are ω2 = f (ω1 ) = log[(P1 − eω1 )/((1 + P1 ))] − ω1 , ω1 < log P1 , and ω2 = g(ω1 ) = log[P2 /(1 + eω1 + eω1 2 P2 )], ω2 < log P2 , respectively. It may be verified that f (x) is concave on (−∞, log P1 ) and g(x) is concave on R implying that F1 (P1 ), F2 (P2 ) and F(P1 , P2 ) are all convex sets. Similarly, p(ω)1 = Pt can be rewritten to give ω2 = h(ω1 ) = log[(Pt − eω1 )/(1+2eω1 +eω1 2 P2 )], ω1 < log Pt . Again, h(x) can be seen to be concave on (−∞, log Pt ), from which convexity of F(Pt ) follows. In the preceding example, instead of γ(x) = γ1 (x) = γ2 (x) = ex , x ∈ R, we could consider any log-convex functions γ1 : Q1 → R++ and γ2 : Q2 → R++ . In such a case, the unique positive solution p(ω) exists if and only if ω ∈ F = {ω ∈ Ω : λp (ω) = γ1 (ω1 )γ2 (ω2 ) < 1} and is given by
70
2 On the Positive Solution to a Linear System with Nonnegative Coefficients
*γ p(ω) =
1 (ω1 )+ γ1 (ω1 )γ2 (ω2 ) 1− 2 γ1 (ω1 )γ2 (ω2 ) γ2 (ω2 )+ γ1 (ω1 )γ2 (ω2 ) 1− 2 γ1 (ω1 )γ2 (ω2 )
+ ,
ω ∈ F.
(2.16)
It may be verified that if γ1 and γ2 are both log-convex, then each entry of p(ω) is log-convex on F. This in turn implies that the feasibility set F(Pt ; P1 , . . . , PK ) is a convex set, regardless of the choice of Pt > 0 and P1 , . . . , PK > 0. Finally, it is worth pointing out that the results presented in this chapter straightforwardly extends to the case when pk (ω) is either subject to p(ω)1 ≤ Pt (ω) or pk (ω) ≤ Pk (ω), 1 ≤ k ≤ K, for all ω ∈ Ω where Pt : Ω → R++ and Pk : Ω → R++ are given concave functions. So if pk (ω) is convex for each 1 ≤ k ≤ K, then {ω ∈ Ω : p(ω)1 ≤ Pt (ω)} and {ω ∈ Ω : pk (ω) ≤ Pk (ω)}, 1 ≤ k ≤ K, are convex sets. 2.3.3 Strict Log-Convexity When X(ω) and u(ω) are log-convex on Ω, Theorem 2.6 asserts that pk (ω) is a log-convex function of ω ∈ F. In this section, we strengthen this result by proving conditions on strict log-convexity. In the second part of the book, we will exploit these results to prove some interesting properties of the addressed power control problem. For the analysis in this section, it is assumed that X ∈ NK (Ω) and u ∈ RK ++ (Ω) are restricted to be of the following form: u(ω) = Γ(ω)z X(ω) = Γ(ω)V
with
trace(V) = 0 .
(2.17)
Here and hereafter, z = (z1 , . . . , zK ) is any fixed positive vector, V ∈ NK and γk : Qk → R++ , k = 1 . . . K are continuous and strictly monotonic (bijective) functions. Formally, we have X ∈ N0K,Γ (Ω) which is the subset of NK,Γ (Ω) defined by (2.8) such that trace(V) = 0. Lemma 2.11. Let X ∈ N0K,Γ (Ω) and u(ω) = Γ(ω)z, ω ∈ Ω, be arbitrary. Then, p : F → RK ++ defined by (2.4) is a bijection. Proof. By Theorem A.51, z > 0 and (2.5), p(ω) > 0 exists and is unique if and only if ω ∈ F. Thus, p(ω) is a function from F into RK ++ . It is a bijection as γk : Qk → R++ is bijective, in which case there is a function φ : RK ++ → F given by −1 φ(p) = γ1−1 p1 /(Vp + z)1 , . . . , γK pK /(Vp + z)K such that p(φ(p)) = p, p > 0, and φ(p(ω)) = ω, ω ∈ F. So, the lemma follows from Theorem B.7.
2.3 Convexity Results
71
It is important to emphasize that the positivity of the vector z is crucial for the results to hold. In contrast, the assumption trace(V) = 0 is merely motivated by practical applications and could be easily dropped. Note that due to this assumption, (Vs)k for any s ∈ RK is independent of sk for each 1 ≤ k ≤ K. In what follows, we extensively exploit the following special form of H¨older’s inequality (Theorem A.4): For any μ ∈ (0, 1) and u, v ∈ RK + , there holds 1 1 and q = , (2.18) u, v ≤ up vq , p = 1−μ μ with equality if and only if there exists a constant c > 0 such that μ
vk = c up−1 = c uk1−μ , k
1≤k≤K.
Finally, recall that γk : Qk → R++ , 1 ≤ k ≤ K, is said to be strictly logx)1−μ γk (ˇ x)μ for all μ ∈ (0, 1) and x ˆ, x ˇ ∈ Qk with convex if γk (x(μ)) < γk (ˆ x ˆ = x ˇ and x(μ) = (1 − μ)ˆ x + μˇ x. Similarly, we say that pk : F → R++ given by (2.6) is strictly log-convex for some 1 ≤ k ≤ K if pk (ω(μ)) < ˆ 1−μ pk (ω) ˇ μ for all μ ∈ (0, 1) and ω, ˆ ω ˇ ∈ F with ω ˆ = ω. ˇ The following pk (ω) result is a straightforward extension of Theorem 2.6 to the case of strictly log-convex functions γ1 , . . . , γK . Theorem 2.12. Let V ≥ 0 be arbitrary, and let γk : Q → R++ be strictly ˆ ω ˇ ∈ F with ω ˆ = ω, ˇ there log-convex for each 1 ≤ k ≤ K. Then, for all ω, ˆ 1−μ pk0 (ω) ˇ μ for all exists an index 1 ≤ k0 ≤ K such that pk0 (ω(μ)) < pk0 (ω) μ ∈ (0, 1). ˆ ω ˇ ∈ F be arbitrary, and let k0 be an index such that ω Proof. Let ω, ˆ k0 = ω ˇ k0 . ˆ + μω ˇ ∈ F for all μ ∈ (0, 1). By Theorem 2.6, we know that ω(μ) = (1 − μ)ω Therefore, for any μ ∈ (0, 1), it follows from (2.2) that pk0 (ω(μ)) = γk0 (ωk0 (μ)) Vp(ω(μ)) + z k . 0
So, by strict log-convexity of γk0 and positivity of the vector z, we have pk0 (ω(μ)) < γk0 (ˆ ωk0 )1−μ γk0 (ˇ ωk0 )μ Vp(ω(μ)) + z k . 0
Considering Theorem 2.6 and H¨ older’s inequality (2.18) yields 1−μ μ ˆ ˇ pk0 (ω(μ)) < γk0 (ˆ vk0 ,l pl (ω) vk0 ,l pl (ω) ωk0 )1−μ γk0 (ˇ ωk0 )μ + zk 0 l∈K
1−μ μ μ ˆ k ˇ k + zk1−μ Vp(ω) ≤ γk0 (ˆ ωk0 )1−μ γk0 (ˇ ωk0 )μ Vp(ω) z k 0 0 0
0
ˇ = ˆ u, u where3 3
For any vector u ∈ RK and any constant c ∈ R, (u)ck = [(u)k ]c = uck , 1 ≤ k ≤ K.
72
2 On the Positive Solution to a Linear System with Nonnegative Coefficients
* ˆ= u
ˆ ˆ 1−μ (Γ(ω)Vp( ω)) k0 ˆ 1−μ (Γ(ω)z) k0
+
ˇ= u
ˇ ˇ μk0 (Γ(ω)Vp( ω)) ˇ μk0 (Γ(ω)z)
.
By repeated application of (2.18), we obtain pk0 (ω(μ)) < ˆ u
u 1 1 ˇ 1−μ μ
1−μ μ ˆ ˆ + Γ(ω)z ˆ ˇ ˇ + Γ(ω)z ˇ = Γ(ω)Vp( ω) Γ(ω)Vp( ω) k0
k0
ˆ 1−μ pk0 (ω) ˇ μ. = pk0 (ω) This completes the proof. Remarkably, there are no additional restrictions on V ≥ 0. As shown below, we obtain a similar property if we drop the requirement on strict logconvexity of γk , 1 ≤ k ≤ K, and instead put some constraints on V. Theorem 2.13. Let γk : Qk → R++ be log-convex for each 1 ≤ k ≤ K. is chosen such that for each 1 ≤ l ≤ K, there exists Suppose that V ∈ RK×K + ˆ ω ˇ ∈ F with ω ˆ = ω, ˇ there exists k = l with vk,l > 0. Then, for any fixed ω, ˆ 1−μ pk0 (ω) ˇ μ for all μ ∈ (0, 1). k0 , 1 ≤ k0 ≤ K, so that pk0 (ω(μ)) < pk0 (ω) ˆ ω ˇ ∈ F with ω ˆ = ω ˇ be arbitrary. Since p(ω) is a bijection Proof. Let ω, ˆ = p(ω). ˇ Choose l0 , 1 ≤ l0 ≤ K, such that (Lemma 2.11), we have p(ω) ˆ = pl0 (ω) ˇ pl0 (ω)
(2.19)
and let k0 = l0 be any index with vk0 ,l0 > 0. Note that by assumption, there exists such an index. Using ⎛
1−μ ⎞ μ ⎞ ⎛
ˆ ˇ ωk0 )vk0 ,l pl (ω) ωk0 )vk0 ,l pl (ω) l∈Kl0 γk0 (ˆ l∈Kl0 γk0 (ˇ ⎟ ⎜ ⎠ ˆ=⎝ ˇ=⎝ u (γk0 (ˆ ωk0 )zk0 )1−μ ωk0 )zk0 )μ ⎠u (γk0 (ˇ 1−μ μ ˇ γk0 (ˇ ωk0 )vk0 ,l0 pl0 (ω) ˆ γk0 (ˆ ωk0 )vk0 ,l0 pl0 (ω) and considering log-convexity of γk , 1 ≤ k ≤ K, one obtains (a) ˇ pk0 (ω(μ)) = Γ(ω(μ))Vp(ω(μ)) + Γ(ω(μ))z ≤ ˆ u, u k0
(b)
1 ˇ ˆ 1−μ pk0 (ω) ˇ μ ≤ ˆ u 1−μ u μ1 = pk0 (ω)
for any μ ∈ (0, 1), where (a) follows from Theorem 2.6 and (b) from (2.18). Therefore, since vk0 ,l0 > 0 and zk0 > 0, we can have equality in (b) only if ˆ = pl0 (ω) ˇ which contradicts (2.19), and hence completes the proof. pl0 (ω) It is important to emphasize that Theorems 2.12 and 2.13 do not imply the existence of an index k such that pk (ω) is strictly log-convex on F. In fact, ˆ ω ˇ ∈ F, ω ˆ = ω, ˇ there is an index the theorems only assert that for any fixed ω, ˆ 1−μ pk (ω) ˇ μ for all μ ∈ (0, 1). However, this is k such that pk (ω(μ)) < pk (ω) sufficient to deduce the following corollary.
2.3 Convexity Results
73
Corollary 2.14. Suppose that at least one of the following holds. (i) For each 1 ≤ k ≤ K, γk : Qk → R++ is strictly log-convex. (ii) Each column of the matrix V has at least one positive entry. Then p(ω)1 is strictly log-convex on F. ˆ ω ˇ ∈ F with ω ˆ = ω ˇ be arbitrary. For any fixed μ ∈ (0, 1), we Proof. Let ω, have p(ω(μ))1 =
(a)
pk (ω(μ)) <
k∈K (b)
≤
ˆ 1−μ (pk (ω)) ˇ μ (pk (ω))
k∈K
1−μ μ ˆ ˇ ˆ 1−μ ˇ μ1 pk (ω) pk (ω) = p(ω) p(ω) 1
k∈K
k∈K
where (a) is either due to Theorem 2.12 or due to Theorem 2.13 depending on whether (i) or (ii) holds, and (b) follows from (2.18). Obviously, condition (ii) of the corollary, which is equivalent to the condition of Theorem 2.13, is weaker than irreducibility. For instance, the following two reducible matrices satisfy the condition of Theorem 2.13: ⎛ ⎞ ⎛ ⎞ 0100 011 ⎜1 0 0 0 ⎟ ⎟ V=⎜ and V = ⎝1 0 0⎠ . ⎝0 0 0 1 ⎠ 000 0010 With these particular choices of V and with γ1 (x) = γ2 (x) = γ3 (x) = ex , x ∈ R, we have (respectively) p1 (ω) = eω1 p2 (ω) + eω1 z1 p2 (ω) = eω2 p1 (ω) + eω2 z2 p3 (ω) = eω3 p4 (ω) + eω3 z3 p4 (ω) = eω4 p3 (ω) + eω4 z4
p1 (ω) = eω1 p2 (ω) + eω1 p3 (ω) + eω1 z1 p2 (ω) = eω2 p1 (ω) + eω2 z2 p3 (ω) = eω3 z3 .
In the first case, we see that p1 (ω) and p2 (ω) are strictly log-convex with respect to (ω1 , ω2 ) but they are independent of (ω3 , ω4 ). For p3 (ω) and p4 (ω), the situation is reversed so that p(ω)1 remains strictly log-convex on F. In the second example, we can write p1 (ω) as p1 (ω) = eω1
eω2 z2 + eω3 z3 + z1 1 − eω1 eω2
which is strictly log-convex on F. In contrast, the transpose matrix V = 0 1 0 1 0 0 does not satisfy the conditions of Theorem 2.13 since vk,3 = 0 for each 100 1 ≤ k ≤ K. In this case, the nonnegative solution p(ω), ω ∈ R3 , is given by
74
2 On the Positive Solution to a Linear System with Nonnegative Coefficients
p1 (ω) = eω1 p2 (ω) + eω1 z1 p2 (ω) = eω2 p1 (ω) + eω2 z2 p3 (ω) = eω3 p1 (ω) + eω3 z3 . We see that whereas p1 (ω) and p2 (ω) are independent of ω3 , p3 (ω) is a logconvex function of ω3 , though not strictly log-convex. Therefore, there is no index k such that pk (ω) is strictly log-convex along the third coordinate of ω (with ω1 and ω2 being fixed). Finally we show that if V is irreducible, pk (ω) is strictly log-convex on F for each 1 ≤ k ≤ K, regardless of whether γk is strictly log-convex or only log-convex. Theorem 2.15. Let γk : Qk → R++ , 1 ≤ k ≤ K, be log-convex, and let V ∈ XK . Then, pk (ω) is strictly log-convex on F for each 1 ≤ k ≤ K. ˆ ω ˇ ∈ F with ω ˆ = ω ˇ be arbitrary. Suppose that the theorem is Proof. Let ω, false. Then, there exists k0 and μ0 ∈ (0, 1) such that ˆ 1−μ0 pk0 (ω) ˇ μ0 . pk0 (ω(μ0 )) = pk0 (ω) So, by log-convexity of γk , Theorem 2.6 and H¨ older’s inequality, pk0 (ω(μ0 ))
= γk0 (ωk0 (μ0 )) vk0 ,l pl (ω(μ0 )) + zk0 l∈K
ˆ 1−μ0 pl (ω) ˇ μ0 + zk0 ≤ γk0 (ωk0 (μ0 )) vk0 ,l pl (ω)
(a)
l∈K
≤ γk0 (ˆ ωk 0 )
1−μ0
μ0
γk0 (ˇ ωk 0 )
1−μ0
ˆ (vk0 ,l pl (ω))
μ0
ˇ (vk0 ,l pl (ω))
+ zk 0
l∈K
ωk0 )1−μ0 γk0 (ˇ ωk0 )μ0 ≤ γk0 (ˆ 1−μ0 μ0 1−μ0 μ0 ˆ ˇ · vk0 ,l pl (ω) vk0 ,l pl (ω) + zk 0 zk 0 l∈K
l∈K
1−μ0 μ0 ˆ + zk 0 ˇ + zk 0 γk0 (ˇ ≤ γk0 (ˆ ωk 0 ) vk0 ,l pl (ω) ωk 0 ) vk0 ,l pl (ω)
(b)
l∈K 1−μ0
ˆ = pk0 (ω)
l∈K μ0
ˇ pk0 (ω)
.
So, in each step, we have equality. Now let N1 ⊂ {1, . . . , K} be a set of those indices l for which vk0 ,l > 0. As V is irreducible, we have N1 = ∅. Hence, since z is positive, it follows from (2.18) that there can be equality in (b) only if ˆ = pl (ω). ˇ Now suppose that N2 ⊂ {1, . . . , K} with N2 = N1 is a set ∀l∈N1 pl (ω) of all indices l such that there exists k1 ∈ N1 , k1 = k0 , with vk1 ,l > 0. Again, due to irreducibility of V, it holds N2 = ∅. Moreover, since there is equality
2.3 Convexity Results
75
ˆ 1−μ0 pk1 (ω) ˇ μ0 for each k1 ∈ N1 , we can in (a) if only if pk1 (ω(μ0 )) = pk1 (ω) ˆ = pl (ω). ˇ Now reason along the same lines as above to show that ∀l∈N2 pl (ω) since V is irreducible, we can proceed in this way until there are no indices left to obtain ˆ = pk (ω) ˇ . ∀1≤k≤K pk (ω) ˆ = ω, ˇ Clearly, since z is positive and p(ω) is a bijection, this implies that ω ˆ = ω ˇ and therefore completes the proof. which contradicts ω Figure 2.2 depicts p(ω(μ))1 as a function of μ ∈ [0, 1] for three different log-convex functions γ(x) = γ1 (x) = . . . = γK (x), x > 0, and a randomly chosen irreducible matrix V. Since γ(x) = ex /(1 − ex ) is strictly log-convex on Q = (−∞, 0) and γ(x) = 1/x is strictly log-convex on (0, +∞), it follows from Theorem 2.12 that p(ω)1 is strictly log-convex on QK . In contrast, γ(x) = ex is not strictly log-convex on R. Nevertheless, since V is irreducible, Theorem 2.15 asserts that the l1 -norm is strictly log-convex.
p(ω(μ))1
12
ˆ 1−μ p(ω) μ)μ1 1 p(ˇ
8
γ(x) = exp(x)
4
γ(x) = γ(x) = 0
0.2
0.4
μ
exp(x) 1−exp(x)
1 x
0.6
0.8
1
ˆ ω ˇ ∈ QK Fig. 2.2: The l1 -norm p(ω(μ)) 1 as a function of μ ∈ [0, 1] for some fixed ω, ˆ 1 and p(ω)
ˇ 1 are independent of the choice of γ. chosen such that p(ω)
2.3.4 Strict Convexity of the Feasibility Sets The results in the preceding section may be used to deduce strict convexity of the feasibility set in the following sense (see also Definition 1.44). Definition 2.16. F(Pt ) (respectively, F(P1 , . . . , PK )) is said to be strictly ˆ + μω ˇ is interior to F(Pt ) (reconvex (or s-convex) if ω(μ) = (1 − μ)ω ˆ ω ˇ ∈ ∂F(Pt ) (respectively, spectively, F(P1 , . . . , PK )) for all μ ∈ (0, 1) and ω, ˆ ω ˇ ∈ ∂F(P1 , . . . , PK )), ω ˆ = ω, ˇ where ω,
76
2 On the Positive Solution to a Linear System with Nonnegative Coefficients
∂F(Pt ) = {ω ∈ F : p(ω)1 = Pt } ∂F(P1 , . . . , PK ) = {ω ∈ F : ∃1≤k≤K pk (ω) = Pk } .
(2.20)
Under the setup of Corollary 2.14, F(Pt ) is a strictly convex set for all Pt > 0 since then p(ω)1 is strictly log-convex. These conditions however are not necessary for F(Pt ) to be a strictly convex set (see Example 2.4). As far as F(P1 , . . . , PK ) is concerned, the set is strictly convex when pk (ω) is strictly log-convex for each 1 ≤ k ≤ K. Therefore, we have the following corollary. Corollary 2.17. Under the setup of Theorem 2.15, F (P1 , . . . , PK ) is a strictly convex set for any P1 , . . . , PK > 0. Of course, if F(P1 , . . . , PK ) is strictly convex, so also is F (Pt ; P1 , . . . , PK ).
2.4 The Linear Case In this section, we further focus on the special case (2.17) except that now γ(x) = γ1 (x) = · · · = γK (x) = x,
x > 0.
Hence, we have Ω = QK = RK ++ . The linear case has already been considered in Sect. 1.5 where it is shown that Fc is not a convex set in general. More precisely, Theorem 1.60 asserts that there exist V ∈ XK and K > 1 such that neither F nor its complement Fc = RK ++ \F is a convex set. In this section, we will use this result to show that Fc (Pt ) = RK ++ \ F(Pt ) is in general not convex either. However, note that this does not exclude the possibility of convexity of Fc (Pt ) for some special choices of Pt , K and V. For instance, consider K = 2, z = (1, 1) and V = 0 0 for any fixed > 0. Then, we see that the set of pairs (ω1 , ω2 ) ∈ ∂F(Pt ) (see Definition 2.16) must satisfy ω2 = f (ω1 ) = (Pt − ω1 )/(1 + 2ω1 + 2 ω1 Pt ). Now it may be verified that f (x) =
−(1 + Pt )2 , (1 + (2 + Pt )x)2
x > 0.
Thus, as the numerator is independent of x and the denominator is monotonically increasing in x > 0, we must have f (x) ≥ 0 for every x > 0. From this, it follows that f (x) is not concave but convex on R++ . As a consequence of this, Fc (Pt ) = R2++ \ F(Pt ) is a convex set if K = 2 and γ1 (x) = γ2 (x) = x, x > 0. As in Sect. 1.5, this simple example might suggest that Fc (Pt ) is a convex set in general, which in turn would allow us to draw some interesting conclusions with respect to optimal link scheduling in wireless networks. Unfortunately, simple reasoning shows that such a general statement is not possible. Theorem 2.18. There exist at least one Pt > 0 and an irreducible matrix V ≥ 0 for some K > 1 such that Fc (Pt ) is not convex.
2.4 The Linear Case
77
Proof. The proof is by contradiction. So, assume that Fc (Pt ) is convex for all Pt > 0, K > 1 and all V ∈ XK . Therefore, as the intersection of convex sets is convex, it follows from (see (2.10)) # Fc (Pt ) Fc = Pt >0
that Fc is a convex set for all K > 1 and all V ∈ XK . However, this contradicts Theorem 1.60, and therefore prove the assertion. Notice that the theorem only deals with the feasibility set when p(ω) is constrained in the l1 -norm. When each element of p(ω) is constrained individually, the complement of the feasibility set defined by (2.11) is not convex even if K = 2. Indeed, proceeding essentially as before shows that p1 (ω) = P1 and p2 (ω) = P2 are both convex if they are written explicitly as functions of ω1 . However, even though Fc1 (P1 ) and Fc2 (P2 ) are both convex sets, the set Fc (P1 , P2 ) = (F1 (P1 ) ∩ F2 (P2 ))c = Fc1 (P1 ) ∪ Fc2 (P2 ) does not need to be convex as the union of convex sets is not convex in general. This is illustrated in Fig. 2.3. Obviously, the same reasoning applies to hybrid
ω2 p(ω)1 = Pt
p2(ω) = P2
F(P1, P2)
p1(ω) = P1
ω1 Fig. 2.3: F(P1 , P2 ) is equal to the intersection of F1 (P1 ) and F2 (P2 ). Thus, Fc (P1 , P2 ) is equal to the union of Fc1 (P1 ) and Fc2 (P2 ), each of which is a convex set if γ(x) = x, x > 0. However, the union of these sets is not convex in general.
78
2 On the Positive Solution to a Linear System with Nonnegative Coefficients
constraints, in which case neither the feasibility set F(Pt ; P1 , . . . , PK ) given by (2.13) nor its complement is a convex set in general. This immediately follows from (2.13) and the above discussion.
3 Introduction
Wireless networking has been a vibrant research area over the last two decades. During this time, we have observed the evolution of a number of different wireless communications standards that support a wide range of services. They include delay-sensitive applications such as voice and real-time video that usually have strict requirements with respect to quality of service (QoS) parameters such as data rate, delay and/or bit error rate. In such cases, a network designer must ensure that the QoS requirements are satisfied permanently. Data applications, however, may have fundamentally different QoS requirements and traffic characteristics than video or voice applications. In fact, most data applications are delay-insensitive, and therefore may tolerate larger transmission delays. The principal contributor to many of the problems and limitations that beset wireless networks is the radio propagation channel or, simply, the wireless channel. Transmission signals can be severely distorted by the wireless channel whose parameters such as path delay, path amplitude, and carrier phase shifts may vary with time and frequency. Strict limitation on communication resources such as the power and the bandwidth is another major design criterion. As a consequence, the wireless channel is error-prone and highly unreliable being subject to several impairment factors that are of transient nature, such as those caused by co-channel interference or multipaths. In fact, a unique characteristic of wireless networks being absent in wired networks is that the channel behavior is a function of the interference level and location of the subscriber unit. Excessive interference can significantly deteriorate the network performance and waste scarce communication resources. For this reason, strategies for resource allocation and interference management are usually necessary in wireless networks to provide acceptable QoS levels to the users. The resource allocation problem is significantly aggravated when subscriber units self-configure to form a network without the aid of any established infrastructure. These so-called ad hoc wireless networks have a huge potential for many exciting applications, but also pose new technical challenges. S. Stanczak et al., Fundamentals of Resource Allocation in Wireless Networks, Foundations in Signal Processing, Communications and Networking 3, 2nd edn., c Springer-Verlag Berlin Heidelberg 2009 DOI 10.1007/978-3-540-79386-1 3,
82
3 Introduction
There are different mechanisms for resource allocation and interference management in wireless networks. The most important ones include congestion control, routing, link scheduling and power control [40]. Each of these components of the overall network design can be targeted separately, thereby ignoring important interdependencies between them. Exploiting these interdependencies through a joint optimization of these components may lead to significant performance gains, but it may otherwise be computationally prohibitive to be of any use in practice. In this book, we mainly focus on the power control problem static wireless networks and briefly discuss the possibility of combining power control with a hop-by-hop congestion control. Roughly speaking, the power control problem addresses the issue of coordinating transmit powers of links such that some aggregate utility function of link rates attains its maximum. We are convinced that power control will be of great importance for wireless ad hoc networks. Due to the lack of a central network controller in such networks, link scheduling strategies are notoriously difficult to implement. Thus, a reasonable approach is to avoid only strong interference from neighboring links, and then use an appropriate power control policy to manage the remaining interference in a network. As we focus on static wireless networks, transmit powers are to be periodically adjusted to changing channel and network conditions (dynamic power control). This in turn presumes a relatively low up to moderate network dynamics. In contrast, in highly dynamic wireless networks, one should consider resource allocation schemes for stochastic wireless networks [41, 42, 43, 44, 45, 46, 47, 48] Potential applications of the power control schemes presented in this book are for instance envisaged in wireless mesh networks to control transmit powers of a fixed number of stationary base stations (mesh routers). These base stations create wireless backbone via multi-hop ad-hoc networking and have practically unlimited energy supply. Early work on power control focused on the so-called max-min SIR balancing problem, where the objective is to maximize the minimum signal-tointerference ratio (SIR). A closely related approach that we call QoS-based power control aims at satisfying given desired SIR levels (SIR targets) with a minimum total transmit power. Both approaches have been extensively studied and are fairly well understood [49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66]. See also [67] for joint power control, scheduling and routing. Optimal power allocations in the sense of QoS-based power control can be found by means of iterative algorithms that allow distributed implementation, provided that the SIR targets are feasible (see for instance [54, 55, 63] and [68] for combined power control and cell-site selection). However, the notion of being able to guarantee quality of service to applications is simply unrealistic in many ad hoc wireless networks [40]. The channel and network dynamics of such networks coupled with multi-hop routing make it difficult to ensure some requirements permanently. In addition, a number of (elastic) data applications such as file transfer or electronic mail do not have such permanent requirements. Such applications can temporarily accept low
3 Introduction
83
QoS levels so that link QoS can be provided according to some link prices. For these reasons, best-effort or utility-based power control aiming at maximizing some aggregate utility function of link rates (or other quantities) appears to fit better the needs and characteristics of some wireless communications applications. Utility-based strategies implicitly use the relative delay tolerance of data applications as well as the network and channel dynamics to improve the network performance. At the same time, the use of monotonically increasing and strictly concave utility functions ensures the desired degree of (link-layer and end-to-end) fairness [1, 69, 70]. For these reasons, utility-based approaches to resource allocation and interference management in wireless networks have attracted a great deal of attention over the last decade. Theoretical work on utility-based power control or, more generally, cross-layer design includes [71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87]. Notice that the reference list is by no means complete, nor does it include all critical publications on this topic. Further references can be found in the papers listed above and in [88]. The power control problem has been also analyzed within the framework of game theory. See for instance [89, 90, 91, 92, 93]. What makes the power control problem difficult to solve in general is the lack of convexity due to the mutual interference. Yet the convexity is usually a crucial prerequisite for implementing power control algorithms in practical systems as this property opens the door to a widely developed theory and efficient solutions. Moreover, if the problem is convex, a global convergence of the algorithms can be guaranteed. Based on the theory presented in Chap. 2, we will identify a class of utility functions for which the power control problem can be transformed into a convex optimization problem. The new utility functions differ from traditional ones but, under a standard rate model in wireless systems, they are still monotonically increasing and strictly concave functions of link rates. This part of the book is structured as follows. Chap. 4 introduces the network and system model, which includes a brief description of the medium access control (MAC) layer and detailed information about the physical layer. We consider two examples of wireless networks to illustrate the definitions. Chap. 5 formulates the problem of resource allocation in communications networks. Based on some currently existing approaches for rate control in wired networks, we formulate the utility maximization problem for elastic traffic in wireless networks, which then gives rise to a utility-based power control problem (Sect. 5.2). Sect. 5.5 deals with QoS-based power control and Sect. 5.6 addresses the problem of characterizing max-min SIR-balanced power allocations. The problem of utility-based power control with QoS support is considered in Sect. 5.7. Some remarks on joint power and receiver control completes the chapter.
4 Network Model
4.1 Basic Definitions A wireless communications network is a collection of nodes being capable of communicating with each other over wireless communications links. Let Nt := {1, . . . , Nt } be the set of nodes (the subscript t in Nt stands for “total”), and let (n, m) with n = m represent a wireless link from node n ∈ Nt to node m ∈ Nt . We say that there is a wireless link (n, m) with n = m if both (i) node n is allowed to transmit data to node m, and (ii) a minimum signal-to-noise ratio (SNR), being necessary for successful transmission at some minimum data rate, can be achieved on link (n, m), in the absence of interference and with transmit power on this link subject to some power constraints. It is reasonable to assume that wireless links are bidirectional in the sense that (n, m) exists if and only if there exists (m, n). We label links (in any particular way) by the integers 1, 2, . . . , L and use L = {1, . . . , L} to denote a set of all wireless links.1 In the literature, the pair (Nt , L) is often referred to as the network topology and can be also used to denote the associate network topology graph, which is an undirected graph where a vertex corresponds to a node in the network, and an edge between two vertices represents a wireless link between the corresponding nodes. We use an on/off flow model by which messages are characterized by a sequence of bits flowing into the network at a given rate [2]. Successive message arrivals are separated by random durations (inter arrival times) in which no flow enters the network. Messages are generated by some (message) sources and originate at so-called source nodes where they are usually broken into shorter strings of bits called packets. We assume that there are S sources/flows (packet streams) represented by S = {1, 2, . . . , S}, each flow having a unique 1
If it is necessary to specify which nodes are connected by link l ∈ L, then we write l = l(n, m) when l is a wireless link from node n to node m.
S. Stanczak et al., Fundamentals of Resource Allocation in Wireless Networks, Foundations in Signal Processing, Communications and Networking 3, 2nd edn., c Springer-Verlag Berlin Heidelberg 2009 DOI 10.1007/978-3-540-79386-1 4,
86
4 Network Model
origin and destination. The packets are passed from node to node to their destinations according to some routing protocol. On the way to a destination, packets of a single flow can take different routes (Fig. 4.1), but every packet is transmitted once on exactly one route (multiple copies of the same packet cannot be transmitted along different routes). We further assume that no packets travel in a loop and that, for every flow, there is a at least one path (a sequence of connected links) from source node to destination node. All nodes (including source and destination nodes) may act as relays with packets being losslessly decoded and encoded at each relay (decode-and-forward relaying). No packets are dropped so that there is no packet loss due to some rate adaptation and coding mechanisms. The expected traffic (data rate), in nats per time unit, of flow s ∈ S is denoted by νs ; it is referred to as source rate or flow rate.2 Note that in this book, (end-to-end) flow rate and source rate mean the same and are used exchangeably. There are no special demands on the arrival statistics, except that the traffic should not be very bursty. Indeed, some form of power control and link scheduling usually helps to improve the network performance in case of continuous data stream or long packet bursts. Remark 4.1. We point out that in this book, the physical links and the traditional notion of the network topology graph play minor roles. Also, as network aspects such as routing and congestion control are not in the focus of this book, the notions of (end-to-end) flows and source nodes are used sporadically. In fact, some of the above basic definitions are needed only in Sects. 5.1 and 5.2.3, where we provide a brief overview of end-to-end rate control in wired and wireless networks, respectively. The reader will realize that in the context of resource allocation in wireless networks, more important are other graphs such as the interference graph or graphs constructed from logical links and logical queues (see, for instance, Sect. 5.9.1). Furthermore, we are not so much interested in nodes and, in particular, source nodes but rather in nodes at which logical links originate. This is because the total transmit power of links originating at the same node is limited by the maximum transmit power which can be delivered by the corresponding amplifier (see also Sect. 4.3.3). Thus, the following definitions are probably more significant for the subjects covered by this book. The flows share wireless links by competing for access to wireless resources such as power, time and frequency. If routes are fixed, nodes along the flow paths maintain per-flow queuing, thereby establishing a number of logical links on wireless links, each logical link associated with a flow. Without loss of generality, assume that there are K logical links labeled by natural numbers from 1 to K. Connections over logical links are referred to as MAC (medium access control) layer flows, being one-hop flows between neighboring nodes. With each logical link, there are associated one logical transmitter and one 2
For the definition of a “nat” as a statistical unit of information, see Remark 4.14.
4.2 Medium Access Control
87
logical receiver equipped with an encoder and a decoder for encoding and decoding the data, respectively. The data to be sent over a logical link is stored in a distinct logical queue (per-flow queuing). So, each logical link can be viewed as connecting its logical queue with the corresponding downstream logical queue. Let K = {1, . . . , K} be the set of these links labeled in such a way that the set of logical links originating at node n ∈ N ⊆ Nt is K(n) =
,n−1
|K(j)| + 1, . . . ,
j=1
n
|K(j)| .
(4.1)
j=1
Here and throughout the book, N is the set of those nodes, at which at least one logical link originates and |K(n)|, n ∈ N , denotes the cardinality of K(n) ⊆ K with |K(0)| = 0, |K(n)| ≥ 1 for each n ∈ N and ∪n∈N K(n) = K (see Fig. 4.1). In other words, logical links originating at node 1 are labeled from 1 to |K(1)|, at node 2 from |K(1)| + 1 to |K(1)| + |K(2)| and so forth. Let us refer to the nodes in N as origin nodes. Without loss of generality, we can assume that there are N, N ≤ Nt , origin nodes and N = {1, . . . , N } so that Nt \ N = {N + 1, . . . , Nt }. Finally, in order to simplify the notation, we define Kk ⊂ K to be the set Kk := K \ {k}. Remark 4.2. For convenience, if there is no risk of misunderstanding, we frequently omit the word ”logical” in this book when referring to logical transmitters, logical receivers and logical links. So, unless otherwise stated, transmitters, receivers and links should be understood as logical elements in abstract structures. It is important to point out that L (and hence also the network topology) may change over time due to mobility of nodes or other time varying factors. However, these variations are usually on a much larger time scale than frame intervals, and therefore are neglected in this book. Actually, for the theory and algorithms presented here, it is essential that the radio propagation channel remains constant for the duration of a frame interval, with transitions between different channel states occurring at the frame boundaries. At the beginning of every frame, transmit powers are adjusted to changed channel and network conditions.
4.2 Medium Access Control The purpose of data link control (DLC) is to provide reliable data transfer across the physical link. To this end, the DLC layer places some overhead control bits at the beginning of each packet and some more overhead bits at the end of each packet, resulting in a longer string of bits called a frame. These
88
4 Network Model 2 2
5 3
1 ν1
6
1
4
ν2
5
4
3
Fig. 4.1: There are Nt = 5 nodes represented by Nt = {1, 2, 3, 4, 5} and 10 wireless links: (1, 2), (2, 1), (2, 3), (3, 2), (2, 5), (5, 2), (3, 4), (4, 3), (4, 5), (5, 4). The wireless links are not numbered in the figure. Two flows entering the network at source nodes 1, 3 (S = {1, 3}) and destined for node 5 establish 6 (logical) links K = {1, 2, 3, 4, 5, 6}. For instance, (logical) links (or MAC layer flows) originating at node 2 are 2 and 3 so that we have K(2) = {2, 3}. These links share wireless link (2, 5). There are N = 4 origin nodes represented by N = {1, 2, 3, 4}. The flow rates are ν1 and ν2 . Packets of flow 2 take two different routes to their destination that is node 4.
overhead bits determine whether an error has occurred during the transmission and, if errors occur, they require retransmissions. These bits also determine where one data frame ends and the next one starts (framing). Another important component of the DLC layer is medium access control (MAC). It is often considered as the lower layer (the MAC layer) of the DLC layer. The MAC protocols dictate how different links (MAC layer flows) share available communication resources such as power and bandwidth. Methods for dividing the spectrum into different channels (the so-called channelization) and assigning them to different links include time division multiple access (TDMA), frequency division multiple access (FDMA), code division multiple access (CDMA), and space division multiple access (SDMA) where the latter one is usually used in combination with TDMA or CDMA.3 Obviously, hybrid combinations of all these methods are also possible. FDMA is the oldest way for multiple radio transmitters to share the radio spectrum. Here each transmitter is assigned a distinct frequency channel so that receivers can discriminate among them by tuning to the desired channel. However, FDMA is inflexible and inefficient in handling flows with different bit rates. It would be necessary to modify FDMA so as to allocate frequency bands of different bandwidth to different links to accommodate differences in bit 3
Note that here the term “multiple access” refers to any situation where different logical transmitters (including those located at the same node) access the wireless channel.
4.2 Medium Access Control
89
requirements. This requires simultaneous demodulation of multiple channels in different frequency bands, which is not a practicable solution. In case of TDMA, time is divided into non-overlapping time slots. Each link is assigned one or multiple time slots such that there is only one link active at any time. TDMA is more flexible than FDMA in handling flows of various bit rates, but does not necessarily do this efficiently. The main difficulty with TDMA is the need for very accurate synchronization. The efficiency of both FDMA and TDMA can be significantly improved by means of spatial reuse and dynamic allocation of bandwidth in terms of frequency or time. However, this requires a lot of coordination between nodes, which is difficult to achieve in networks without a fixed infrastructure. The problem can sometimes be alleviated by introducing a temporal hierarchical infrastructure where some nodes take over the role of local network controllers. However, such approaches still generate a lot of overhead traffic and therefore waste scarce wireless resources. Unlike FDMA and TDMA, in code division multiple access (CDMA), the signal of every link occupies the entire frequency band at the same time. Each signal is modulated by a distinct signature sequence in such a manner that it enables the receivers to separate out different links. To ensure sufficiently low interference level, the signature sequences have good correlation properties. Often it is desired that some sequences are mutually orthogonal. However, establishing and maintaining the orthogonality in wireless networks is a quite tricky task. This is particularly true for fully asynchronous multipath channels in which case a complete elimination of multiple access and intersymbol interference requires an allocation of signature sequences with zero aperiodic correlation side-lobes.4 In fact, it was shown [94] that there are no such sequences in finite dimensional complex spaces. Yet the orthogonality can be established if all signals are at least coarsely synchronized at all receivers. One can also assign mutually orthogonal signature sequences to links originating at the same node. The problem is, however, that the number of mutually orthogonal sequences is strongly limited, making the reuse of sequences, and hence coordination between nodes necessary. Also maintenance of a coarse synchronization between different nodes may be a problem. Consequently, the use of nonorthogonal sequences with relatively good correlation properties (semi-orthogonal sequences) appears to be a better strategy for some real-world applications. The advantage of this approach is that there is no need for precise synchronization between links originating at different nodes. Moreover, little coordination is necessary if the number of sequences is relatively large. If the set is large enough, nodes can even pick up sequences randomly from a given set of sequences, with a low probability of choosing the same sequence. However, it should be noticed that there is a fundamental trade-off between the number of available signature sequences and their cor4
Aperiodic cross-correlations and autocorrelations not at the origin are referred to as aperiodic correlation side-lobes.
90
4 Network Model
relation properties. As a result, when semi-orthogonal sequences are used, the number of simultaneously active links is interference-limited. This means that the more links are active at the same time, the higher the level of interference, which in turn leads to the performance degradation of all links. In the presence of interference, the network performance can be significantly improved by taking advantage of power control in combination with some link scheduling. These are two central mechanisms for resource allocation and interference management. Roughly speaking, whereas a link scheduling policy chooses groups of links that are to be activated at the same time, the power control part controls the interference level at the links by adjusting their transmit powers so as to achieve some sharing objectives. In order to achieve the best performance, power control and link scheduling should be optimized jointly. This problem however is in general notoriously difficult to solve even in a centralized manner, not to mention the implementation of such policies in ad hoc wireless networks. To the best of our knowledge, there is no efficient distributed mechanism for assigning a number of time slots to different wireless links. For these reasons, heuristics algorithms for link scheduling are quite common. A popular approach is to schedule neighboring links in different subframes, which is based on the common assumption that concurrent transmission of neighboring links will (with high probability) cause strong interference. Our main focus in this book is on the power control problem for groups of interfering links that share the entire frequency band. We can assume any fixed link scheduling policy (in the time domain), including a suitable collision avoidance mechanism (see the remark in Sect. 4.3.1). Of course, pure TDMA is not of interest here since then the power control problem is trivial. However, it should be emphasized that the theory presented in this book does not necessarily apply to CDMA-based networks. For instance, interference may occur in any network with spatial reuse of resources. This is for instance true when multiple antennas are used to spatially separate different signals (see the example in Sect. 4.3.5).
4.3 Wireless Communication Channel At the physical layer, frames are split up into shorter strings of bits and transmitted via the wireless channel. The function of the physical layer is to provide wireless links to the DLC layer while satisfying some quality of service requirements with respect to the bit loss and rate. To achieve this, there is a certain arrangement of several components on each side of a radio propagation channel (called a wireless channel) such as modulators, amplifiers, filters and mixers. The wireless channel distorts transmit signals in a way that can vary with time, frequency and other system conditions. These channel distortions are usually of a random nature, and thus cannot be exactly predicted. In general, time-varying channel conditions are commonly
4.3 Wireless Communication Channel
91
referred to as fading of which different types can be identified [95, 96]: multipath fading, frequency-selective fading, frequency-flat fading, time-selective fading and etc. Still worse, wireless links share the available wireless channel, making each link prone to interference from other links. All this implies that the capacity of wireless links exhibits an ephemeral and dynamic nature, depending on both the wireless channel condition and transmit powers of all interfering links. There is a huge amount of literature on physical layer methods for improving the overall network performance (see [97] and references therein). These methods include different multi-user and multiple antenna techniques whose purpose is to combat interference from other links as well as to mitigate the detrimental impact of the wireless channel. Such techniques may be used to make physical links robust against interference and channel variations, thereby simplifying the network design significantly.5 However, from the standpoint of practical design, there are some important disadvantages as well. First of all, most of these techniques may entail a significant increase of additional control traffic due to the increased demand on global information. So far, it is not clear whether the benefit of additional complexity outweighs these additional costs. Another important problem is an increased sensitivity of these methods against, for instance, imperfect channel state information. For these reasons, many of these techniques have not found wide usage in contemporary wireless networks, especially if they operate without a central network controller. Throughout this book, each (logical) link is a point-to-point communication link equipped with a linear receiver, followed by a single-user decoder (see Sect. 4.3.2). The interference at the output of each receiver or, equivalently, at the input to each decoder is treated as noise. We adopt a block fading channel model in which the radio propagation channel holds its states for the duration of some frame interval, with transitions occurring on frame boundaries. This is a reasonable approximation for radio propagation channels where the channel coherence time is longer than the duration of each fading block. For the sake of clarity, the channel is assumed to be frequency-flat, which, roughly speaking, means that the channel affects the received amplitudes but does not distort the signal waveform. This is modeled by multiplying each transmit signal by a complex number, called a channel coefficient. Note that in contrast to that, frequency-selective fading affects the signals in both amplitude and shape. At some points in the book, it is also assumed that symbol epochs6 of all links are perfectly synchronized at all receivers in the sense that their beginnings and ends coincide. In other words, perfect synchronization means that symbol epochs of all links are perfectly aligned at each receiver.
5
6
For instance, if the signal bandwidth is large enough, the network design is significantly simplified when each link employs a zero-forcing receiver to create a wireless network with mutually orthogonal links (see the remarks in Sect. 4.3.2). Slot or symbol epoch is defined below, after Remark 4.3.
92
4 Network Model
Remark 4.3 (An important remark concerning the above assumptions). It is emphasized that, unless otherwise noted, perfect synchronization and channel flatness are not presumed in this book. Both assumptions, that is, perfect synchronization and channel flatness, are maintained in the following derivations as well as at some other points throughout the book only for the sake of clarity. Also, they help to bring out the essential features of some solutions. The common frame interval B of length TF = 1 is partitioned into M disjoint intervals T(1), . . . , T(M ) (called slots, symbol intervals or symbol epochs) such that ∀m∈{1,...,M } T(m) = [(m − 1)T, mT ) ⊂ TF ,
M "
T(m) = B .
m=1
The slot length T is chosen such that M = TF /T is large enough to ensure statistical significance (see the next section). In general, given any slot m, each link can be either active or idle. If link k ∈ K is active, then it transmits an information-bearing symbol or simply data symbol Xk (m) = 0 that is chosen from some given finite subset of C and carries (or represents) a certain amount of data bits.7 Otherwise, if the link is idle, then no symbol is transmitted. The following two assumptions are crucial for the theory in this book. (C.4-1) There is at most a single data stream per link so that, in any slot, each link transmits at most one data symbol. Thus, in general , the results do not apply to communication systems using frequency or spatial multiplexing methods, where multiple data streams are transmitted over one link simultaneously (see also Remark 4.4 below). (C.4-2) Xk (m), 1 ≤ m ≤ M , is a zero-mean independent random variable chosen from a given finite set (alphabet). For convenience, we assume that if link k is idle in slot m, then Xk (m) carries no data bits and Xk (m) = 0 with probability one. Note that if the link is active, then we have Xk (m) = 0 with probability one. Moreover, unless otherwise stated, it is, in addition to (C.4-2), assumed that (C.4-3) for each k ∈ K, all possible symbol streams {Xk (1), . . . , Xk (M )} are equiprobable and all symbols Xk (1), . . . , Xk (M ) represent the same number of data bits, which depends on the choice of modulation order. Thus, if link k is idle (in some slot), the only possible symbol stream under the equiprobability assumption is {0, . . . , 0}, which carries no data bits.
7
Throughout the book, we neglect the transmission of control symbols such as pilot or synchronization symbols.
4.3 Wireless Communication Channel
93
By (C.4-1), the results are not applicable to general orthogonal frequency division multiplexing (OFDM) systems and multiple antenna systems that employ spatial multiplexing techniques or even some diversity gain techniques [98] (see also an example of a multiple-antenna wireless network in Sect. 4.3.5). However, it is emphasized that Condition (C.4-1) does not exclude communication systems where multiple distinctly weighted copies of a data symbol are transmitted simultaneously via the wireless channel and combine coherently at the receiver side (see also Sect. 4.3.5). The copies of a data symbol are called transmit symbols. Remark 4.4. Condition (C.4-1) can be somewhat relaxed to include communication systems where the data rate or another quality of service parameter of interest is a monotonic function of a single performance measure that can be brought into the same form as the signal-to-interference ratio defined by (4.4). This is for instance true in the case of Alamouti space-time coded multiple antenna systems (see Remark 4.17). However, since in most practical situations the overall performance of a link with multiple parallel data streams is not of the form given by (4.4), (C.4-1) is assumed throughout the book. Conditions (C.4-2) and (C.4-3) imply that the data symbols of each link are zero-mean independent identically distributed (i.i.d) random variables chosen from some finite set.8 Whereas the assumption of independence in (C.4-2) holds throughout the book, (C.4-3) is not true when link scheduling is involved (see Sects. 5.2.1 and 5.4). In this case, for any k ∈ K, the distribution of Xk (m) may vary over m, and with it the number of data bits carried by the symbol, due to the change of modulation type or, simply, due to the fact that the links can be active only for a fraction of the frame interval. However, in this chapter, both (C.4-2) and (C.4-3) are assumed to hold. Remark 4.5. It is important to emphasize that (C.4-2) is usually not satisfied in practice where some dependencies between different symbols are introduced by error-control coding. However, as discussed in [97, pp. 37–38], it is perfectly reasonable to study and compare different transmission strategies without error-control coding. In particular, the signal-to-interference ratio for independent data symbols (see the next section) is widely used in error-control coded systems as a basis for choosing appropriate modulation order and code rate. For more details we refer the reader, for instance, to [99] and Sect. 4.3.4. Each symbol modulates a fixed link-specific and square integrable waveform with support [0, T ] in such a way that the transmission signal modulated by the mth symbol is entirely contained in the time slot T(m). This together with perfect synchronization, the flat fading assumption and (C.4-2) implies that the entire information about Xk (m), k ∈ K, at the kth receiver (receiver of link k) is contained in time slot T(m). 8
Note that if a link is idle, then its symbols are zero-mean independent random variables that are equal to zero with probability one.
94
4 Network Model
Remark 4.6. The reader should be careful to observe that if a signal is timelimited (as assumed above), it cannot be band-limited. In such a case, one takes B > 0 as the bandwidth of a time-limited signal if “most” of its energy is contained in (−B, B) (see also the discussion in [97, p. 7]). Each receiver observes a superposition of all transmit signals corrupted by an additive white Gaussian noise (also called background noise). Let yk (t) ∈ C, t ∈ T(m), be the received signal of link k in slot m, and let ck (t) ∈ C, t ∈ [0, T ), be a given square integrable function associated with this link, which is called the kth receiver (receiver k) or receiver of link k. By far the most prominent example of ck (t) is the so-called matched-filter receiver [97, 96] (see also the following subsections). Given an arbitrary slot 1 ≤ m ≤ M , the received signal yk (t) in the interval T(m) is projected on ck (t − (m − 1)T ) to give . ˆ k (m) = X ck (t − (m − 1)T )yk (t)dt < +∞, 1 ≤ m ≤ M (4.2) T(m)
which is bounded due to the Cauchy–Schwarz inequality and because . T . ∀m ∀k |yk (t)|2 dt < +∞ and |ck (t)|2 dt < +∞ . T(m)
0
It is pointed out that due to the assumption of perfect synchronization of the receiver (see also Remark 4.3 and the discussion preceding this remark), the received signal in (4.2) is integrated exactly over the interval T(m) . The quanˆ k (M ) are referred to as soft-decision variables and are used ˆ k (1), . . . , X tities X to decode the data symbols Xk (1), . . . , Xk (M ) transmitted on link k. In this book, “soft-decision variable” and “receiver output” have the same meaning ˆ k (m) should and thus are used interchangeably. Intuitively, it is desired that X be as close to Xk (m) as possible (at least on average) with respect to a suitable measure. In all that follows, it is assumed that if {Xk (m)}m∈{1,...,M } is a realization of an independent stationary ergodic process, for every 1 ≤ k ≤ K, then yk (t) is a realization of a wide-sense stationary and ergodic stochastic process. Note that while a wide-sense stationarity implies that the first and second moments of yk (t) do not vary over time [100, pp. 183–184], ergodicity ensures that the time averages of the first and second moments are equal to their ensemble averages for any t. 4.3.1 Signal-to-Interference Ratio A widespread and useful figure of merit in wireless systems is the signal-tointerference ratio9 (SIR). SIR is equal to the ratio of the expected received 9
Sometimes, to emphasize the existence of the additive white background noise, one also says the signal-to-interference-plus-noise ratio. In this book, we drop “plus-noise” for brevity.
4.3 Wireless Communication Channel
95
power of the desired data signal to the expected received power of all interfering components at a soft-decision variable. Throughout the book, we assume that there is a bijective, and thus one-to-one relationship between the quality of service (QoS) parameter of interest (e.g., rate or delay) and the SIR at the receiver output. Thus, meeting some QoS requirements is equivalent to achieving certain SIR levels, which is a reasonable assumption in many cases of practical interest (see also Sects. 4.3.4 and 5.3). This is particularly true when SIR is used in conjunction with error-control decoders [97, p. 120]. SIR gives a soft-decision variable that reflects the reliability of decisions. As a result, there is a strong or even monotonic dependence between SIR and QoS parameter values. Moreover, if the linear receiver is followed by single-user decoders, one for each user, then the information-theoretic rate is a strictly increasing function of SIR [101]. It should be emphasized that here the rate is the single-user rate, and SIR is the signal to interference ratio of the desired user whose rate is being considered. To define SIR, suppose that (C.4-2)–(C.4-3) are satisfied so that the data symbols Xk (1), . . . , Xk (M ) are i.i.d. random variables chosen from some (finite) set. Moreover, let 1 ≤ m ≤ M, 1 ≤ k ≤ K .
E[Xk (m)] = 0 and E[|Xk (m)|2 ] = pk ,
(4.3)
Then, the SIR at the kth soft-decision variable or, equivalently, at the output of the kth receiver is independent of m (due to the wide-sense stationarity) and is defined to be V k pk 2 =
l∈K pl Vk,l + σk
SIRk (p) :=
l∈K
pk pl
Vk,l Vk
+
2 σk Vk
, 1≤k≤K.
(4.4)
The notation in (4.4) is defined as follows. • pk ≥ 0 defined in (4.3) is called the transmit power of link k. We use p = (p1 , . . . , pK ) ∈ RK +
(4.5)
to denote a vector of transmit powers, referred to as power vector or power allocation. • Vk > 0 depends on the path attenuation on link k as well as on the allocated wireless spectrum, the state of the wireless channel and other various system parameters. • Vk,l ≥ 0, l = k, is influenced by the path attenuation between transmitter l and the output of receiver k. In other words, if transmit power of link l is pl , then the expected interference power caused by this link at the output of receiver k = l is pl Vk,l . As before, Vk,l depends on the spectrum allocation, receiver structure, the channel state and other various system parameters. • Vk,k ≥ 0 captures the effect of self- and intersymbol interference, which may occur, for instance, due to the time-dispersive nature of the wireless
96
4 Network Model
channel.10 In many cases of practical interest, it is reasonable to assume that Vk,k = 0 for every 1 ≤ k ≤ K. • σk2 > 0 is the variance of the Gaussian (background) noise at the kth receiver. It is important to point out that a key feature of the definition (4.4) is the implicit assumption that the interference power at a soft-decision variable is a linear combination of all transmit powers with some given nonnegative coefficients. Thus, the power of all interfering components at the receiver output is equal to a weighted sum of all transmit powers with nonnegative weights plus the noise power. The term Vk,k pk ≥ 0 reflects the interference power at the output of receiver k due to self-interference. The contribution to the
total interference power caused by multiple access interference is given by l∈Kk Vk,l pl ≥ 0. Unless otherwise stated, it is assumed throughout the book that (C.4-4) no self-interference is present, that is, Vk,k = 0 for each k ∈ K or, equivalently, trace(V) = 0 where the matrix V is defined below by (4.6). In fact, from the mathematical point of view, self-interference presents no additional challenge. In particular, the algorithms presented in Chap. 6 apply to systems with self-interference as well. Remark 4.7. Note that we usually have Vk,l = Vl,k . If Vk,l = 0, then link k is said to be orthogonal to link l as link k perceives no interference from link l. If Vk,l = Vl,k = 0, then the links k and l = k are said to be mutually orthogonal. In general, there may be constraints in the concurrent activation of links because, for instance, nodes are not able to receive and transmit simultaneously, except that they are equipped with separate radio interfaces. This situation can be captured in the analysis by making both Vk,l and Vl,k sufficiently large for some k = l so that power control are discouraged to allocate resources to both these links as it has a detrimental impact on the network performance. It is however not clear how to implement such power control schemes in practical systems, where links or nodes cannot transmit and receive at the same time. Remark 4.8. In practice, strong interference is often avoided using an appropriate protocol as, if strong interference (collision) occurs, packets are corrupted and may have to be retransmitted. The so-called collision avoidance protocols may be seen as a part of a link scheduling policy (see Sect. 5.2.1). They are however beyond the scope of this book. In fact, the theory presented here is independent of additional mechanisms at the DLC layer such as collision avoidance, meaning that our study includes arbitrary levels of interference powers. Here, we also point out that strong interference can sometimes be handled by appropriate receiver design (see Sect. 4.3.2). 10
Obviously, in this case, the channel is not frequency-flat as was assumed in the discussion leading to (4.2). See also Remark 4.3.
4.3 Wireless Communication Channel
97
It is convenient to write the ratios Vk,k /Vk in a matrix V such that V k,k k=l k (V)k,l = vk,l = VVk,l (4.6) k = l . Vk In a broader sense, the matrix V represents the effective state of the wireless channel. The quantities Vk > 0, Vk,l ≥ 0, l ∈ K, k ∈ K, are called power gains so that V is referred to as the gain matrix . Let us refer to Vk as the signal power gain of link k and to Vk,l as the interference power gain between transmitter l and receiver k. The entries vk,l ≥ 0 of the gain matrix defined by (4.6) can be interpreted as effective power gains. The vector of effective noise variances 2 2 (4.7) z = σV1 , . . . , VσK > 0 1 K is called the (effective) noise vector. Here the word ”effective” can be used to emphasize that z relates the noise variances to the corresponding signal power gains and in consideration of potential noise enhancements by the receivers (see Sect. 4.3.2). We see from (4.4) that SIR depends on both V and z, which in turn depend on receivers and other system parameters. Using this notation, the SIR at the output of receiver k can be written as pk pk = , Ik (p) = vk,l pl + zk (4.8) SIRk (p) = Ik (p) (Vp + z)k l∈K
where Ik : RK + → R++ is an interference function. This function is called an affine interference function as it is affine in the power vector. Although affine interference functions are not the only ones which can be encountered in real-world wireless networks, they are definitely the most common ones. A more general framework of interference functions is presented in Sect. 5.5.2 and a slight generalization of affine interference functions is considered in Sect. 6.8. Unless otherwise stated, however, we assume an affine interference function in the definition of the SIR throughout the book. In this case, we have the following simple observation, which is stated for completeness and is an immediate consequence of the fact that the Hessian of (4.4) exists and is continuous on RK + (see also App. B.1). Observation 4.9. For each k ∈ K, the function RK + → R+ : p → SIRk (p) is twice continuously (Gateaux) differentiable (see Definition B.16). The primary message the reader should take away from this section is that the SIR at the output of receiver k depends in general not only on transmit power of link k, but also on transmit powers of all other links. Furthermore, due to a typically large number of slots in a frame as well as due to the ergodicity of the received signal, it is reasonable to assume that the SIR at every receiver output is equal to the ratio of respective time-average signal power to time-average interference power, with the averages taken over all symbols transmitted in a given frame interval.
98
4 Network Model
4.3.2 Different Receiver Structures By (4.2), the value of SIRk (p) is not only influenced by the power vector p ≥ 0 but also depends on the choice of the receiver ck (t), t ∈ [0, T ). More precisely, the kth receiver impacts the SIR at its output via the kth row of the gain matrix V ≥ 0 and the kth entry of z > 0. From (4.2), we see that the receivers considered in this book summarize the information contained in the received signal by the scalar decision statistic, being the correlation of the received signal with a deterministic signal. Such receivers belong to the class of linear receivers since the inner product in (4.2) is linear in its second argument, which is the received signal [97]. This linearity property makes it possible to separate at the receiver output the respective contributions of different signals and background noise. Moreover, if a wireless communication channel is represented by an equivalent discrete-time model, which is a common approach in the literature (see, for instance, [97] and other textbooks on digital wireless technology), then the linear receivers are vectors in CW where W is the dimension of the underlying signal space. Thus, W specifies how many mutually orthogonal signals can be constructed in the considered time-frequency domain. In the remainder of this section, we are going to present three different types of linear receivers that are represented by vectors ck ∈ CW , k ∈ K. Consequently, if the vector yk (m) ∈ CW , 1 ≤ m ≤ M, is a discrete-time received signal of receiver k ∈ K, then the kth soft-decision variable at time m is given by ˆ k (m) = cH X k yk (m) = ck , yk (m) . The first two types of linear receivers are “compatible” with the above definition of the SIR so that all results presented in this book are applicable to these two cases. The third receiver type is not captured by the above definition of the SIR since, under the third receiver, the power gains depend on the power vector. Consequently, only a part of the results applies to this case, which is pointed out at the respective places. A standard reference for the design of multi-user receivers in the context of CDMA is [97]. Although the focus of [97] is on CDMA systems, it actually conveys basic ideas behind the design of multi-user receivers for a larger class of wireless systems with interference. Note that since the kth receiver influences only the kth SIR and has no impact on other SIRs, we can focus on a single arbitrarily chosen link. The transmitters are also represented by vectors in CW . However, we are not primarily interested in transmitters but effective transmit vectors that depend on the wireless channel (see the following subsection for more explanation). Thus, since channel properties from one transmitter to distinct receivers are not necessarily the same, each transmitter is in general identified by a set of different effective transmit vectors, each of which being associated with a distinct receiver. This is especially true in distributed wireless networks.
4.3 Wireless Communication Channel
99
Matched-Filter Receiver (k)
Let k ∈ K be arbitrary but fixed, and let bl := bl be the effective transmit vector of transmitter l ∈ K associated with receiver k. The effective transmit vectors or, more concisely, effective transmitters may have different physical meanings, depending on the realization of the physical layer. Abstractly speaking, the effective transmitters determine the “directions” of transmit signals in some appropriately chosen signal space. In Sect. 4.3.5, we present two examples that will hopefully clarify the definitions. In particular, it is important to bear in mind that the set of effective transmit vectors {bl }l∈K depends in general on k. Now given K effective transmit vectors b1 , . . . , bK ∈ CW from K transmitters to receiver k, the (single-user) matched-filter receiver ck ∈ CW of link k is defined to be (4.9) ck = a · bk , a = 0, k ∈ K . In words, the matched-filter receiver is matched to the desired effective transmit vector. This definition holds regardless of whether there is perfect synchronization among the links or not. The constant a = 0 is any fixed real number and, in the analysis, is usually chosen to be either |cH k bk | = 1 or ck 2 = 1 for each k ∈ K. The matched-filter receiver is probably the most common receiver in wireless communications systems. The original theoretical justification for using this receiver was the fact that it maximizes the signal-to-noise ratio in the presence of additive white background noise with finite variance [97, pp. 93– 95]. Consequently, the matched-filter receiver is definitely a good choice in networks with mutually orthogonal links based, for instance, on TDMA or FDMA as well as in networks where the interference from other links is relatively small. In nonorthogonal interference-limited wireless systems, the matched-filter receiver may exhibit very poor performance [97, 112–118]. On the other hand, this may be true for all linear receivers, which becomes apparent when one considers that the matched-filter receiver is optimal in the sense of maximizing the SIR, provided that the effective transmit vectors are chosen appropriately [102, 103]. However, the main reason for the popularity of the matched-filter receiver in wireless communications system is that it is relatively easy to implement, especially in distributed wireless networks. Note that for this receiver to be implemented, only local information is necessary. Finally, we point out that the theory presented in this book applies to any fixed vector ck ∈ CW , ck = 0, for some W ≥ 1. If ck = a · bk for all a = 0, then ck is referred to as a mismatched-filter receiver. An interesting example of a mismatched-filter receiver is the decorrelating receiver, which is also called the zero-forcing receiver [97, Chap. 5]. If K ≤ W , a decorrelating receiver projects its desired signal on a subspace of CW that is orthogonal to all other (interfering) effective transmit vectors. In other words, provided that K ≤ W , each decorrelating receiver cancels the interference completely so that V = 0. Note that K ≤ W implies that such a complete interference
100
4 Network Model
cancellation is possible only if the number of links is not too large. Moreover, due to the noise enhancement, zero-forcing may deteriorate network performance significantly as the noise enhancement may outweigh the gains resulting from canceling the interference [104]. As for implementation issues, a direct computation of decorrelating receivers is difficult in real-world networks so that training (pilot) signals are usually transmitted periodically to “learn” the desired filters. Obviously, this signaling overhead requires additional wireless resources implying less resources are available for the actual data transmission. Furthermore, unless effective transmit vectors associated with any receiver are mutually orthogonal, a complete interference cancellation leads to a noise enhancement. Linear Successive Interference Cancellation Receiver Now let us assume perfect synchronization among all links as discussed at the beginning of Sect. 4.3. For brevity, we confine our attention here to the uplink channel, in which case all (logical) receivers are located at the same geographical location (see also Sect. 4.3.5). As a result, bl ∈ CW , bl = 0, is the effective transmit vector of transmitter l ∈ K associated with all receivers. Due to the assumption of perfect synchronization and the statistical independence of the data symbols, the communication channel is a memoryless one-shot channel, which allows us to restrict attention to one arbitrary symbol interval [97]. As a consequence, we can model the input sequence Xk (1), . . . , Xk (M ) of link k ∈ K with a sufficiently large M by a random discrete-time received signal y(1), . . . , y(M ) at the revariable Xk and the
ceiver inputs by y = k∈K bk Xk + n where n ∈ CW is an additive zero-mean random background noise vector. For such a channel, the (one-stage) linear successive interference cancellation (SIC) receiver is defined by the following system of linear equations k−1 ˆ π(k) = bH ˆ π(j) , k ∈ K X y − bπ(j) X π(k)
(4.10)
j=1
ˆ k denotes the estimation of Xk and where we used π : K → K to where X denote any permutation of the set K, of which there is a total number of K! (K factorial) possible permutations. The permutation function π determines in which order the interference components are estimated and subtracted from ˆ π(k) = bH yπ(k) where yπ(k) the received signal. From (4.10), we see that X π(k) is obtained recursively from ˆ π(k−1) , yπ(k) = yπ(k−1) − bπ(k−1) X
yπ(1) = y .
ˆ π(k) = cH y where [105] So, a straightforward examination shows that X π(k) cπ(k) =
k−1 I − bπ(j) bH bπ(k) , π(j) j=1
k∈K
(4.11)
4.3 Wireless Communication Channel
101
with the product set to the identity matrix if k = 1. This shows that the case of the linear SIC receiver is captured by the SIR definition (4.4) since (4.11) is a vector in CW that is independent of the power vector p. ˆ k is a good estimate of Xk , then the link π(k) perceives almost no inIf X terference from the links π(j) for each j < k. In other words, the link π(k) is almost orthogonal to all preceding links with respect to the link ordering determined by the permutation function π. Thus, the first link π(1) is interfered by all other links and the interference at the output of the last link π(K) is negligibly small when sufficiently good symbol estimates are used. An important consequence of this is that the system performance is strongly influenced by the permutation function π. The choice of this function may depend on the queue states, channel conditions and effective transmit vectors. Note that the linear SIC receiver operates on the soft-decision variables, some of which are remodulated and subtracted from the current received signal according to (4.10). This stands in contrast to the decision-driven nonlinear SIC receiver which operates on hard-decision variables (demodulated data) √ [97, Chap. 7]. In the case of binary symbols dk = Xk / pk ∈ {−1, +1}, k ∈ K, the estimation dˆπ(k) ∈ {−1, +1} of dπ(k) under the (conventional) nonlinear ˆ π(k) )) where sgn : R → {−1, +1} is SIC receiver is given by dˆπ(k) = sgn(Re(X the signum function defined to be sgn(x) = 1, x ≥ 0, and sgn(x) = −1, x < 0, ˆ π(k) is and where the soft-decision variable X k−1 √ ˆ π(k) = bH X pπ(j) bπ(j) dˆπ(j) , k ∈ K . π(k) y −
(4.12)
j=1
ˆ π(k) depends on the hard decision variables dˆk , k ∈ K, the SIRs at the As X soft-decision variables cannot be written in the form (4.4), where the interference power at these variables is a linear combination of all transmit powers. Thus, the results of this book are not applicable to this receiver except for the special case when dk = dˆk (no decision error). Indeed, in this case, which corresponds to an asymptotic behavior under a capacity-achieving coding scheme, the soft-decision variables are ˆ π(k) = bH bπ(k) Xπ(k) + bH X π(k) π(k)
K
bπ(j) Xπ(j) + n , k ∈ K .
j=k+1
So, if the probability of error P (dk = dˆk ) is negligibly or sufficiently small for ˆ π(k) has the each k ∈ K, then it may be reasonable to assume that the SIR at X form (4.4), with the gain matrix V being a strictly upper triangular matrix. This means that each link, say link π(k), perceives no interference from the links π(j), 1 ≤ j < k and the interference from the links π(j), k < j ≤ K is treated as noise.
102
4 Network Model
Optimal Linear Receiver In this section, we derive an optimal linear receiver in the sense that the SIR at the receiver output is maximized for some given power allocation. As the QoS performance of each link is assumed to improve as its SIR increases, this kind of optimality is highly desired. Moreover, it is known [106] (see also [97, pp. 300–301]) that the probability of incorrectly demodulating the data symbols under an optimal linear receiver is accurately approximated √ by Q( SIRk ) where Q(x), x ∈ R, is the widely-known Q-function defined to √ /∞ 2 be Q(x) = 1/ 2π x e−t /2 dt, x ∈ R. So, the probability of error is strictly decreasing in the corresponding SIR, which justifies the assumption of a oneto-one correspondence between the QoS perceived by a link and its SIR under an optimal linear receiver. In contrast to the preceding examples of linear receivers, which are independent of the choice of transmit powers,11 the receiver presented in this section is a nonlinear function of the power vector.12 As an immediate consequence of this, the corresponding interference power at the receiver output cannot be expressed as a linear combination of all transmit powers, implying that the SIR is not of the form (4.4). In order to illustrate the dependence and bring out the essential features of the receiver, it is best to assume that all links are perfectly synchronized. For a general asynchronous case and practical implementation aspects, we refer the reader to [97] (see also the remarks at the end of this subsection). In the case of perfect synchronization, the SIR at the output of an arbitrary linear receiver ck ∈ CW is SIRk (p, ck ) =
cH bk bH ck pk |ck , bk |2 = pk k H k 2 2 2 ck Zk ck l∈Kk pl |ck , bl | + ck 2 σk
(4.13)
where Zk ∈ CW ×W is defined to be Zk := Zk (p) :=
2 pl b l b H l + σk I
(4.14)
l∈Kk
and where bH l denotes the conjugate transpose of bl (Definition A.23). Remark 4.10. To be precise, we point out that unlike (4.4), the SIR in (4.13) is defined as a function of both the power vector and the corresponding receiver. However, for any fixed ck ∈ CW , we have SIRk (p) = SIRk (p, ck ), k ∈ K. 11
12
Note that the linear SIC receiver (4.11) is independent of the power vector for any given permutation π. In general, however, the choice of the permutation may be influenced by some function of the power vector in which case the linear SIC receiver is a nonlinear function of this vector. In this general case, the matrix gain depends on the permutation and thus also on the power vector. The receiver is called optimal linear receiver since it is a vector in CW .
4.3 Wireless Communication Channel
103
Now, given an arbitrary index k ∈ K, the noise variance σk2 > 0, some power vector p ≥ 0 and a collection of K vectors b1 , . . . , bK ∈ CW , we aim at finding a vector ck ∈ CW for which (4.13) attains its maximum on CW \ {0}. In what is to follow, it is important to bear in mind that because of the positivity of the noise variance, the matrix Zk is Hermitian positive definite for any p ≥ 0 so that its inverse exists. (see also Definition A.21 and the remarks on complex-valued matrices.) Moreover, it is evident from (4.13) that SIRk (p, ack ) = SIRk (p, ck ) for any a = 0 (the ray property) so that sup ck ∈CW \{0}
SIRk (p, ck ) =
sup SIRk (p, ck ) = ck ∈CW ck 2 =1
max
ck ∈CW \{0}
SIRk (p, ck )
for any fixed p ≥ 0 where the last equality follows from Theorem B.11, the ray property and the compactness of {x ∈ CW : x2 = 1} ⊂ CW . Hence, the SIR defined by (4.13) attains its maximum over CW \ {0} and the vector c∗k := ck (p) = arg max SIRk (p, x)
(4.15)
x∈CW \{0}
is an optimal receiver of link k (for a given power vector p). The following lemma is well-known in the literature [102, 97]. Lemma 4.11. Let bl , l ∈ K, in (4.13) be arbitrary but fixed. Then, for any given p ≥ 0, we have −1 SIRk (p, ck ) ≤ SIR∗k p := pk bH k ∈ K. (4.16) k Zk bk , Equality holds if
W ck = a Z−1 k bk ∈ C
(4.17)
for any complex number a = 0. Conversely, if (4.16) holds with equality and pk > 0, then one has (4.17) for some a = 0. Proof. Let k ∈ K and p ≥ 0 be arbitrary but fixed. We can assume that pk > 0 since otherwise (4.16) is trivially satisfied and any nonzero receiver is optimal. Due to positive definiteness of Zk , it follows from Theorems A.19 and A.21 (as well as the remarks on complex-valued matrices after the theorems) that there exist a unitary matrix U ∈ CW (see App. A.3.2) and a diagonal H positive definite matrix Λ = diag(λ1 , . . . , λW ) ∈ RW + such that Zk = UΛU . Hence, cH bk bH k ck . SIRk (p, ck ) = pk Hk ck UΛUH ck √ √ Now let Λ1/2 = diag( λ1 , . . . , λW ) and uk := Λ1/2 UH ck ∈ CW . As Λ1/2 is invertible (due to positive definiteness) and U−1 = UH (due to unitarity), we have ck = UΛ−1/2 uk and can write the SIR as the Rayleigh quotient R(A, uk ) [107, pp. 108–109]:
104
4 Network Model
R(A, uk ) := pk
−1/2 H −1/2 uH U bk bH uk uH Auk k UΛ k Λ = kH . H uk uk uk uk
So, we can find an optimal receiver by maximizing R(A, uk ) with respect to uk = 0, and then applying ck = UΛ−1/2 uk . The maximization of the Rayleigh quotient is an easy task since A is an outer product of the vector √ pk Λ−1/2 UH bk . Consequently, A is a rank-1 (Hermitian) positive semidefinite matrix and its eigenvector associated with the only positive eigenvalue is given by u∗k = α1 Λ−1/2 UH bk for any α1 = 0 (see the remarks after Theorem A.22 and the subsequent remarks on complex-valued matrices. The rank of a matrix is defined in App. A.2). Thus, by the Cauchy–Schwarz inequality (A.6), we have R(A, uk ) ≤ R(A, u∗k ) with equality if and only if uk = α2 Λ−1/2 UH bk , α2 = 0. Hence, we can conclude that c∗k = a UΛ−1/2 Λ−1/2 UH bk = a Z−1 k bk ,
k∈K
for any a = 0. This proves (4.17). Now putting c∗k into (4.13) yields SIR∗k (p) = SIRk (p, c∗k ) ≥ SIRk (p, ck ), k ∈ K, with equality if and only if (4.17) holds (pk > 0). This completes the proof. An examination of (4.17) shows that the optimal receiver generally depends on the power vector, the effective transmit vectors and the noise variances. So, an explicit computation of the optimal receivers using (4.17) is not a feasible option for most distributed wireless networks. The problem is in fact much more intricate in practice due to the lack of perfect synchronization and (multipath) fading propagation. So, under real-world conditions, the matrix Zk will additionally depend on signal delays and the channel impulse responses, which are in general time-varying parameters. For this reason, we think that the use of adaptive algorithms that converge to the optimal receiver seems to be the only feasible option. Distributed algorithms for computing the optimal receiver defined by (4.17) are widely established. These algorithms are based either on blind or pilot-based estimation methods [97, pp. 306–325]. 4.3.3 Power Constraints In practice, there is a variety of system constraints. Most of them have only a marginal impact on the results presented in this book, and therefore are neglected for the sake of clarity. However, constraints on transmit powers must be incorporated into the system model since otherwise the results permit only crude insight into performance limits of practical networks. Strict limitations on transmit powers in wireless networks result from a number of factors, including regulations, hardware costs and battery lifetime. Most studies distinguish between two types of power constraints, namely peak constraint and average constraint. The first one is expressed in terms of the maximum crest factor (or peak-to-average power ratio (PAPR)) (see [108]
4.3 Wireless Communication Channel
105
for an overview of PAPR reduction techniques in the context of multi-carrier transmission) and typically arises from to the need to limit intermodulation effects caused by nonlinear signal distortion, which is inevitably introduced by low-cost transmitters (amplifiers). Thus, the peak power constraints usually pertain separately to each physical communication link. In contrast, the average power constraint may be imposed on the overall transmit power in a network to reduce interference to adjacent networks as well as on the total transmit power of each node to prolong battery life (individual power constraints on each node). The average transmit power impacts relevant performance measures such as data rate and bit error rate. As a consequence, some average transmit power on each link is necessary (but not always sufficient) to guarantee performance requirements of applications with regard to data rate and bit error rate. Unless otherwise stated, in this book, we assume individual power constraints on each node. To be precise, let P1 , . . . , PN be positive real numbers, referred to as individual power constraints, each of which is associated with one origin node. Note that we only consider origin nodes represented by N = {1, . . . , N }. Now the transmit powers satisfy individual power constraints on each node, in which case they are said to be admissible, if p = (p1 , . . . , pK ) with pk given by (4.3) is a member of P ⊂ RK + , which is the set of all admissible power vectors (allocations) defined to be P := P1 × · · · × PN |K(n)| |K(n)| Pn := x ∈ R+ : xk ≤ Pn , n ∈ N
(4.18)
k=1
and where 1 ≤ |K(n)| for each n ∈ N . In other words, if p(n) = (pk )k∈K(n) is the vector13 of average transmit powers at which links originating at node n ∈ Nt transmit, then p(n) must be a member of Pn . Conversely, if p(n) ∈ Pn , then p(n) is admissible at node n in the sense that link k ∈ K(n), where k is the lth entry of K(n), can transmit at power pk = pl (n), which is the lth entry of p(n). The power constraints (4.18) can also be written in a matrix ˆ where p ˆ = (P1 , . . . , PN ) ∈ RN form as Cp ≤ p ++ and the matrix C = (cn,k ) ∈ {0, 1}N ×K is defined as follows: cn,k = 1 if (and only if) k ∈ K(n) and cn,k = 0 otherwise. Using this, one obtains ˆ} P = {p ∈ RK + : Cp ≤ p
(4.19)
from which we immediately see that the set of admissible power vectors P is a convex compact polytope14 in RK + . The compactness property is simply due 13 14
Note that the lth entry of p(n) is equal to pl with l being the lth entry of K(n). Recall that a set C ⊂ RK is called convex polytope if it is a bounded convex polyhedron. C is a convex polyhedron if it is an intersection of finitely many half-spaces.
106
4 Network Model
to the nonnegativity of the power vectors and the existence of at least one positive entry in each column of C. Throughout the book, P defined by (4.18) or (4.19) is called the admissible power region. Remark 4.12. Note that admissibility in the context of power vectors only requires that the power constraints are satisfied. If there are no additional constraints, then the set of admissible and feasible power vectors coincide so that the notions of admissibility and feasibility in this context are the same. Otherwise, if there are additional constraints expressed, for instance, in terms of some QoS requirements (see Sect. 5.5), then every feasible power vector is admissible but not vice versa. The reader is also referred to Definition 5.40 and Remark 5.41 in Sect. 5.5.1. The above model for power constraints includes two types of power constraints often encountered in practice. (a) Sum (or total) power constraint for N = {1} (one origin node): This is a network with K links originating at a common node
and being constrained on total transmit power, that is, we have p1 = k∈K pk ≤ Pt for some given Pt > 0 that is called a sum power constraint. In this case, C and ˆ in (4.19) reduce to C = (1, . . . , 1) ∈ R1×K ˆ = P1 = Pt . This p and p + type of constraint is typical for data transmission from some central node (base station or cluster head) to K nodes. The best-known example is the downlink channel of a single-cell wireless cellular network. Total power constraints can also be imposed on the sum of transmit powers of mobile nodes to limit the radiation to other networks. We restrict ourselves or refer to this special case at various points throughout the book. (b) Individual power constraints on each link (K = N origin nodes N = {1, . . . , K}): This scenario corresponds to a situation where each origin ˆ = and p node is an origin for exactly one link. Thus, C = I ∈ RK×K + (P1 , . . . , PK ) so that, by (4.19), we have pk ≤ Pk , k ∈ K, for some given P1 , . . . , PK > 0 (individual power constraints on each link). A widely studied example is the uplink channel of a wireless cellular network. For simplicity, this type of power constraints is assumed in Sect. 6.8. Remark 4.13. It is worth pointing out that the assumption on P to be of the form (4.18) is not necessary at many points in this book. In general, we require that P ⊂ RK + is a compact, convex, and downward comprehensive set with a nonempty interior. The downward comprehensivity (in RK + ) means that, for and p ≤ p imply p ∈ P. In particular, this property any p ∈ P, both p ∈ RK + implies that 0 ∈ P. The additional property that P is the Cartesian product P1 × · · · × PN (as in (4.18)), with the set Pn affecting only links originating at node n ∈ N , is usually necessary to make a distributed implementation of power control algorithms feasible. For further explanation, the reader is referred to Sect. 6.5.4.
4.3 Wireless Communication Channel
107
Finally, we point out that for the analysis presented in this book, the limitations on average transmit powers are of interest only when the noise variance σk2 in (4.4) is non-negligible when compared to the interference term
2 15 then l∈K pl Vk,l . Otherwise, if σk is negligible and Vp > 0 for every p > 0, it is justified to assume pk V k pk pk =
= , p > 0 (4.20) (Vp)k l∈K pl Vk,l l∈K pl vk,l
SIRk (p) ≈ SIR0k (p) :=
where the gain matrix V is defined in Sect. 4.3.1. This is, for instance, a reasonable approximation for relatively large CDMA-based networks with nonorthogonal spreading sequences. Due to the ray property ∀c>0 SIR0k (p) = SIR0k (c · p),
p > 0, 1 ≤ k ≤ K
(4.21)
we see that if the background noise is neglected, the transmit power on each link can always be scaled down to satisfy given power constraints without influencing the SIR values. Except for Sect. 5.9, the noise vector z is assumed to be positive throughout the book. The neglect of the background noise is justified in the design of resource allocation strategies for wireless networks with interference when the interference is dominant (compared with the noise component) and/or the noise variances at the receiver outputs are not known exactly [49, 51, 53, 52, 57, 64, 109] (see
also Sect. 5.9). Note that the background noise can be neglected whenever l∈K vk,l pl zk for each k ∈ K, in which case we have an interference-limited scenario. 4.3.4 Data Rate Model The data rate attainable on a wireless link is not fixed but depends in general on transmit powers, channel states and link scheduling policy involved. The data rate model under a link scheduling protocol is considered in Sect. 5.2.1. In this section, we assume that no link scheduling is involved, which means that each link, say link k ∈ K, is either active (pk > 0) or idle (pk = 0) during the whole frame interval (see also Conditions (C.4-2)–(C.4-3)). Then, given any gain matrix V ≥ 0 defined by (4.6), the data rate on link k is a nonlinear function of the transmit power vector p and is given by (4.22) νk (p) = Φ SIRk (p) where SIRk (p) ≥ 0 is defined by (4.4) and Φ : R+ → R+ is referred to as the rate-SIR function (relationship) or, simply, the rate function.16 Note that 15 16
In particular, this is true if V is irreducible (Definition A.27) Obviously, this rate function has nothing to do with the rate function defined in Sect. 1.2.3. In all that follows, we use the term “rate function” to refer to Φ defined by (4.22).
108
4 Network Model
(4.22) is the data rate within a frame, and hence it may vary from frame to frame due to the changes of V and p under the assumption of block frequencyflat channel model. Furthermore, note that the data rate on any link depends on transmit powers of other links according to (4.4). In other words, the data rates are interdependent since they are functions of global variables. In practice, the feasible data rates are restricted to some finite set of operating points because practical systems rely on a finite number of coding parameters such as modulation order, spreading factor and code rate. Each operating point corresponds to exactly one feasible tuple of the coding parameters, which in turn are selected based on the SIR in such a way that the increase of the SIR does not decrease the data rate. As a consequence of this, the rate function in practical systems is a monotonically increasing (non-decreasing) discontinuous piecewise constant function of the SIR. Such rate functions lead to optimization problems of combinatorial nature that are notoriously difficult to solve, especially in wireless networks with interference. For this reason, in this book, a real-world piecewise constant rate function is approximated by a continuous and strictly increasing rate function. In fact, unless otherwise stated, we assume a logarithmic rate function (see also the remarks at the end of this section): (4.23) νk (p) = κ1 log 1 + κ2 SIRk (p) , k ∈ K or, equivalently, Φ(x) = κ1 log(1 + κ2 x),
x≥0
where κ1 , κ2 > 0 are some appropriately chosen constants, and log(x), x > 0, is the natural logarithm. The constants κ1 and κ2 depend on some system parameters, employed modulation and coding schemes as well as the desired bit error rate at which the system should operate. Without loss of generality, it is assumed for simplicity that κ1 = 1 and κ2 = 1. In this context, it is worth pointing out that if κ1 = κ2 = 1, then the mutual information achieved for each user under an independent (complex-valued) Gaussian channel input distribution is given by (4.23) [101]. Figure 4.2 illustrates the approximation of a typical piecewise constant rate function by a logarithmic rate function given by (4.23) with suitably chosen constants κ1 and κ2 . The value xi , 0 ≤ i ≤ 5, with x0 = 0 and x5 = SIR, is the minimum SIR which is necessary in a practical system to transmit at ˜ i ) bits per symbol. The value xi is called a threshold the data rate of Φ(x ˜ Φ(SIR) ˜ is the maximum feasible data level of a piecewise constant function Φ. rate, which implies that increasing the SIR above SIR will not improve the rate performance (but will in general reduce the bit error rate). Now, since an optimization framework for resource allocation presented in this book is based on a logarithmic rate function or, more generally, any continuously differentiable rate function Φ satisfying Conditions (C.4-5)–(C.4-7) stated later in this section, the following should be considered when designing practical systems:
data rate per channel use
4.3 Wireless Communication Channel
0
109
Maximum data rate
Φ˜
0 x1 x2
x3
x4
SIR
x5 = SIR
Fig. 4.2: The data rate per symbol against the SIR under a piecewise constant rate ˜ and a logarithmic rate function given by (4.23) for some suitably chosen function Φ constants κ1 > 0 and κ2 > 0. The figure also shows a linear approximation and a piecewise linear approximation (dashed lines).
(a) If SIR∗k is the SIR achieved on link k ∈ K under an optimal resource allocation derived under the assumption (4.23), then the resulting data ∗ ∗ ˜ rate of practical system is Φ(SIR k ), which results from “flooring” Φ(SIRk ) ˜ according to the thresholds levels of the piecewise constant function Φ. ∗ ∗ ˜ Thus, since Φ(SIRk ) ≤ Φ(SIRk ) for each k ∈ K, the resource allocation may be suboptimal from the point of view of practical system design. (b) The value of SIRk should not be significantly larger than SIRk , which is the minimum SIR for which the maximum feasible data rate on link k is provided (see Figure 4.2). As far as (b) is concerned, the theory in this book can be easily extended to bound above the SIR values. One simple way to achieve that is to impose an additional power constraint P˜k > 0 on each link k ∈ K. Such additional power constraints limit each SIR since SIRk (p) ≤ P˜k /σk2 , k ∈ K, for any admissible power vector p. Alternatively, additional constraints on the SIR values can be explicitly incorporated into our problem formulation. Such a problem generalization implies a slight modification of the primal-dual algorithms presented in Sects. 6.7.1 and 6.8.3, where the SIR values are required to be larger than some given SIR thresholds. Augmenting the new SIR constraints to an associated Lagrangian function would, in fact, only increase the dimensionality of the dual variable. Finally, following similar ideas as in Sect. 5.7.2, one could make large SIR values less “attractive” by introducing a suitable penalty function. The problem mentioned in (a) is much more involved and, in fact, the performance loss due to the approximation of an underlying piecewise constant function by a continuous and strictly increasing function is in general not known. The performance loss is, in addition to some other system parameters, strongly influenced by the number and distribution of the threshold levels of
110
4 Network Model
the piecewise constant function. In this book, we only point out that the performance of practical systems can be generally improved if, instead of “flooring” an optimal solution of the relaxed problem, a search is performed in the direct neighborhood of this optimal solution. More precisely, if SIR∗k ∈ [xi(k) , xi(k)+1 ) where xi(k) < SIRk is the ith threshold level of a piecewise constant rate function employed by link k (see also Figure 4.2), then the ˜ i(k)+1 ). However, the ˜ i(k) ) or Φ(x data rate of link k is chosen to be either Φ(x decision here depends on the choices of other links so that it may be that an optimal search (in some sense) must be carried out in a centralized fashion. Remark 4.14. Notice that the data rate in (4.23) is expressed (measured) in nats per channel use. First, recall that a nat is a statistical unit of information or entropy, based on the natural logarithm, which is the logarithm to the base e. In contrast, if the logarithm is to the base 2, then the amount of information is expressed in bits [13, p. 13]. This means that, in order to express data rate in bits per channel use, the logarithm to the base 2 should be used in (4.23). Second, the phrase “the data rate is ν nats per channel use” means that, on average, there are ν nats per symbol entering the channel. Thus, if a symbol enters the channel every T seconds (T is the duration of a time slot in seconds), then the data rate is ν/T nats per second. Here, averaging is necessary if link scheduling is involved, in which case (C.4-3) does not hold. Also, note that if a link is idle in some slot, then, by Condition (C.4-2), the corresponding symbol carries no information (zero nats of information) and is equal to zero with probability one. Remark 4.15. For the analysis in this book, it is not necessary that the data rate on link k is exactly of the form (4.23). We will adhere to this widely-used model for concreteness. However, unless otherwise noted, the reader can assume that the rate function Φ : R+ → R+ is any surjective function satisfying the following two conditions: (C.4-5) Φ is a continuous strictly increasing function. (C.4-6) Φ(x) → 0 as x → 0 and Φ(x) → +∞ as x → +∞. A brief justification of strict monotonicity of Φ can be found at the beginning of Sect. 4.3.1. Obviously, the function Φ(x) = log(1 + x), x ≥ 0, satisfies both conditions (i) and (ii). Note that due to the surjectivity and first condition, Φ is bijective (Definition B.6), and therefore there exists an inverse function Φ−1 (x) : R+ → R+ such that Φ(Φ−1 (x)) = x, x ≥ 0 (Theorem B.7). In addition to these conditions, for some results and statements to be true, the function Φ must be further restricted so as to guarantee that (C.4-7) U (x) = Ψ (Φ−1 (x)), x > 0, is monotonically increasing and strictly concave, where Ψ : R++ → R satisfies Conditions (C.5-2)–(C.5-4) in the next chapter. In particular, if (4.23) holds, then Φ−1 (x) = ex − 1, x ≥ 0 and U (x) is indeed monotonically increasing and strictly concave (see Sect. 5.2.5). But this
4.3 Wireless Communication Channel
111
requirement is also satisfied by a linear rate function Φ(x) = a x, x ≥ 0, for some a > 0, and a piecewise linear approximation of a piecewise constant rate function Φ˜ (see Figure 4.2). 4.3.5 Examples Now we briefly illustrate the definitions introduced above by considering two examples of wireless communications networks. A Cellular Network with Linear Beamforming First consider a single-cell of a wireless cellular network with a multi-element antenna at the base station. Single-element antennas are considered at the mobiles. Such a network has a star topology with the base station acting as a central network controller. Due to the single-hop operation, no routing protocol is needed. Without loss of generality, we can assume that there is one link per wireless link. This in turn implies that there are as many links as flows, and thus we have Nt − 1 source destination pairs and no relays. If we assume that node 1 is the base station, then l = l(n, m) ∈ L is either a wireless link from the base station n = 1 to node m ∈ {2, . . . , Nt } or from node n ∈ {2, . . . , Nt } to the base station m = 1. The set of wireless links originating at the base station (node 1) establishes the so-called downlink channel, whereas the set of wireless links from the mobile stations to the base station constitutes the uplink channel. In the uplink channel, the number of origin nodes N is equal to the total number of nodes minus one (the base station) so that N = Nt −1. In practice, downlinks and uplinks are used either in separate frame intervals (time division duplex (TDD) mode) or different frequency bands (frequency division duplex (FDD) mode). As a consequence, the downlink and uplink channels can be considered separately with {l = l(1, m) : 1 < m ≤ Nt } and {l = l(n, 1) : 1 < n ≤ Nt } as link sets for the downlink and uplink channel, respectively, if the base station is assumed to be node 1. Let us first focus on the downlink scenario from the base station to Nt − 1 mobile nodes being arbitrarily distributed in a cell. The base station is the only source and origin node (N = 1) so that K = K(1). As mentioned above, the base station is equipped with a multi-element antenna and each mobile station has an single-element antenna.17 Suppose that there are W ≥ 1 antenna elements at the base station. The data stream for each user, say user k, is spread over the antenna array by a given (fixed ) vector uk ∈ CW with uk 2 = 1, a so-called transmit beamforming vector. To be more precise, for an arbitrary slot m, consider the data symbols X1 (m), . . . , XK (m) that are 17
The usual assumption is that each mobile is equipped with an omnidirectional antenna, although such an antenna is not exactly realizable.
112
4 Network Model
to be transmitted to the mobile nodes, that is, the symbol Xk (m) is destined for node k. The base station forms the vector xT UH and transmits each element of this vector, say element j, over the jth antenna element, where U = (u1 , . . . , uK ) ∈ CW ×K and x = (X1 (m), . . . , XK (m)). The resulting transmit signals at each antenna element are distorted on their way to the mobile nodes. We focus on a multiplicative distortion meaning that the contribution of the jth antenna element to the received signal at node k is equal to hj,k xT UH ej where hj,k ∈ C, which is referred to as the jth channel coefficient, is the coefficient of the channel between the jth antenna element and the antenna of node k, and ej ∈ {0, 1}W is the vector with 1 at the jth position and zeros elsewhere. The received signal at node k is a straightforward superposition of these contributions corrupted by a realization nk of an independent background noise with the variance σk2 . As a result, the soft-decision ˆ k (m) = xT UH hk + nk , k ∈ K, where the vector ˆ k (m) is given by X variable X hk = (h1,k , . . . , hW,k ) is referred to as the channel signature or simply channel of user k. In practice, it depends on the characteristics of the radio channel, the array geometry and the relative position of a node to the base station. Remark 4.16. Note that in this scenario, there is no receiver design in the sense of Sect. 4.3.2 as each receiver is a scalar (W = 1) equal to one. Alternatively, using the terminology of Sect. 4.3.2, we can view the channel signature hk as a receiver of user k and the transmit beamforming vectors u1 , . . . , uK as the effective transmit vectors associated with any receiver. ˆ k (m) be further Let k ∈ K be arbitrary and let the soft-decision variable X rewritten as ˆ k (m) = xT UH hk + nk = uH hk Xk (m) + X uH k l hk Xl (m) + nk . l∈Kk desired signal interference + noise
Now if Xk (m), m = 1 · · · M , are drawn i.i.d. from some zero-mean discrete probability distribution with E[|Xk (m)|2 ] = pk for each m, 1 ≤ m ≤ M (see (C.4-2)– (C.4-3) and (4.3)), then the SIR measured at the antenna output of the kth receiver (over a sufficiently long frame interval) yields SIRk (p) =
pk u H k Rk uk , H 2 l∈Kk pl ul Rk ul + σk
1≤k≤K
(4.24)
where the rank-1 matrix Rk = hk hH k , 1 ≤ k ≤ K, is called the spatial covariance matrix. We point out that if the channels are rapidly time-varying within a single frame interval, the spatial covariance matrix Rk is defined to be Rk = E[hk hH k ], in which case Rk may have full rank. Also, observe that the SIR in (4.24) is only a function of p as u1 , . . . , uK are assumed to be fixed. Now let us turn our attention to the uplink channel from Nt − 1 mobile nodes to the base station. As before, it is assumed that there are K = Nt − 1
4.3 Wireless Communication Channel
113
(logical) links labeled by 1, . . . , K. However, unlike in the downlink example above, no links originates at the base station and each mobile node is a source and origin node for exactly one link (N = K = Nt −1). So, using the notational convention introduced at the beginning of this chapter, the base station is now node Nt and the mobile nodes are labeled by 1, . . . , N . Moreover, we have ∪n∈N K(n) = K with |K(n)| = 1 for each n ∈ N = {1, . . . , N }. We use the same notation for the beamforming vectors and channel signatures as in the case of the downlink channel. When compared with the downlink case, the roles in the uplink scenario are, in a sense, reversed with the antenna array at the base station acting as a linear receiver.18 Indeed, given an arbitrary ˆ k (m) = uH y where y ∈ CW is a vector slot m, the soft-decision variable is X k whose jth entry is the received signal at the jth antenna element in slot m and uk ∈ CW is now called the kth receive beamforming vector. As in the case of the downlink channel, each entry of y results from a superposition of different transmit signals corrupted by independent zero-mean background noise except that now each
transmit signal is distorted by a user-specific channel signature. Thus, y = l∈K hl Xl (m) + n, which implies that ˆ k (m) = uH X k
H H hl Xl (m) + uH hl Xl (m) + nk k n = uk hk Xk (m) + uk l∈K l∈Kk desired signal interference + noise
where n ∈ CW consists of the background noise samples at the antenna elements and nk = uH k n. Note that the interference term at the output of the kth receiver depends on the channel signatures of all other users but is independent of their beamforming vectors. In the downlink channel, the situation is reversed with the interference term depending on the beamforming vectors of all other users and being independent of their channel signatures. Thus, with the same assumptions on data symbols as before, we obtain SIRk (p) =
pk u H k Rk uk , H 2 l∈Kk pl uk Rl uk + σk
1≤k≤K
(4.25)
where σk2 is the noise variance. In fact, since all links perceive the same noise 2 . at the common receive antenna array, we actually have σ 2 = σ12 = · · · = σK To keep the model as general as possible though, the variances are allowed to be different. Note that in the downlink channel, the noise variances are in general different due to the existence of different receivers. The best performance in the downlink channel and in the uplink channel can be achieved by jointly optimizing transmit powers and beamforming vectors [110]. The theory presented in this book, however, targets networks with 18
Using the definitions of Sect. 4.3.2, uk ∈ CW is now the receiver of user k and the channel signatures h1 , . . . , hK ∈ CW are the effective transmit vectors associated with any receiver.
114
4 Network Model
a classical approach of power control for channel-dependent beamforming vectors that are fixed independently by some channel-aware rule or criterion. Due to its simplicity, this approach is of great interest in practice. A very simple and quite popular strategy is to choose uk = hk , k ∈ K, in which case beamforming vectors are said to be matched to channel signatures. In the uplink channel, the receive strategy is referred to as matched-filter receiver (see Sect. 4.3.2 and [98]). Exceptions to the assumption of fixed beamforming vectors are the results of Sects. 5.5 and 5.8 that fall into the framework of joint power and receiver control. If the beamforming vectors are given, then both (4.24) and (4.25) are special cases of (4.4) with Vk , k ∈ K, and Vk,l ≥ 0, k ∈ K, l ∈ K, given by ⎧ H ⎪ ⎨ul Rk ul downlink, k = l H Vk = uk Rk uk and Vk,l = uH (4.26) uplink, k = l k Rl uk ⎪ ⎩ 0 k = l. From this, we can obtain the gain matrix V defined by (4.6). It is readily seen from (4.26) that if Vk = uH k Rk uk = 1 for each k ∈ K and V is the T gain matrix for the uplink channel such that vk,l = uH k Rl uk , then V is the gain matrix for the downlink channel. This fact gives rise to the so-called duality theory for downlink and uplink multi-user beamforming [110, 111]. This theory provides a framework for joint power control and beamforming in wireless cellular networks. A Distributed Network based on Code Division Multiple Access Now, we illustrate the definitions of Sect. 4.3.1 by considering a distributed wireless network based on code division multiple access (CDMA). We assume that all K links are perfectly synchronized as described in Sect. 4.3. Let J ≥ 1 be a common length of signature sequences, and suppose that link k ∈ K(n), n ∈ N , is assigned a signature sequence sk = (sk,1 , . . . , sk,J ) with sk 2 = 1, which is a vector in CJ . In every time slot, say slot m, the transmitter on link k multiplies the signature sequence sk by a data symbol Xk (m) and transmits the resulting sequence elements sk,j Xk (m), 1 ≤ j ≤ J, at a rate of J/T . Note that the transmission rate is increased by the factor J, which is referred to as the spreading factor [97]. Due to the perfect synchronization and (C.4-2), we can drop the time index m and focus on any single slot. Let us consider a discrete-time model where the receiver of link k ∈ K(n) originating at node n ∈ N observes a vector yk of J samples given by hk,l sl Xl + n yk = l∈K
sl Xl + hk,l sl Xl + n . = hk,k sk Xk + hk,k l∈K(n),l =k l∈K(n) / noise desired signal interference 1
interference 2
4.3 Wireless Communication Channel
115
Here, n is a zero-mean background noise vector with E[nnH ] = σ 2 I, hk,l ∈ C, where |hk,k | > 0 is the channel coefficient magnitude between the transmitter of link l and the receiver of link k, interference 1 is caused by other (logical) links originating at node n and interference 2 is due to all other links. Note that if l ∈ K(n) in the above equation, then hk,l = hk,k . In words, if link l originates at the same node as link k = l, then the interference from link l at the output of the kth receiver is hk,k sl Xl . In the discrete-time domain, CDMA receivers are vectors in CJ . Let ck be the receiver of link k such that |ck , sk | = 1. Notice that this assumption does not impact the generality as the norm of the receiver does not impact the SIR, which can be seen from (4.13). Now, the soft-decision variable is ˆ k = ck , yk and yields X ˆk = X hk,l ck , sl Xl + nk l∈K
= hk,k ck , sk Xk + hk,k
ck , sl Xl +
hk,l ck , sl Xl + nk
l∈K(n) /
l∈K(n),l =k
where nk = ck , n and k ∈ K(n). Thus, E[nk ] = 0 and E[|nk |2 ] = σk2 = ck 22 σ 2 . Considering (4.3) and (4.4), we see that the signal-to-interference ˆ k , k ∈ K(n), is ratio at the soft-decision variable X |hk,k |2 pk 2 2 2 l |ck , sl | + ck 2 σ l∈Kk |hk,l pk =
|hk,l |2 2 2 l∈K(n) pl |ck , sl | + l∈K(n) / |hk,k |2 pl |ck , sl | +
SIRk (p) =
|2 p
l =k
2 σk |hk,k |2
where we used the fact that hk,l = hk,k whenever l ∈ K(n) or, equivalently, k and l originate at the same node n ∈ N . As an immediate consequence of the above SIR expression, the gain matrix V defined by (4.6) is given by |hk,l |2 2 l = k 2 |ck , sl | |h k,k | (V)k,l = vk,l = 0 l=k σ2
σ2
2 2 2 K and the noise vector is z = ( |h1,11 |2 , . . . , |hK,K |2 ) with σk = ck 2 σ , k ∈ K. Finally, we point out that this is an example where the effective transmit vectors (in the terminology of Sect. 4.3.2) depend on the choice of the receiver. Indeed, the effective transmit vectors associated with the receiver k ∈ K(n) / K(n), respectively. are hk,k sl and hk,l sl for l ∈ K(n) and l ∈
Distributed Multiple-Antenna Wireless Network The last example of a wireless network that fits into the theoretical framework presented in this book is an extension of the cellular network example with
116
4 Network Model
linear beamforming to decentralized networks in which each node is equipped with multiple-element antenna. Thus, in contrast to the downlink and uplink scenarios discussed previously, there are now multiple-element antenna at both transmitter and receiver sides. As before, assume perfect synchronization among all links as discussed at the beginning of Sect. 4.3 and recall that Conditions (C.4-1)–(C.4-3) are assumed to hold. We consider a homogenous network in the sense that each node is equipped with W ≥ 1 antenna elements. In order to comply with (C.4-1), multiple antennas are not used to achieve spatial multiplexing gains, which means that, on each link, there is at most one stream of independent data symbols at a time. In other words, multiplexing independent data streams in space is not an option to increase the data rate. As a consequence, the use of multiple antenna elements can only provide power and diversity gains when compared to single-antenna transmission [98]. Recall that power gain (or array gain) can be achieved at transmitter and/or receiver side and is defined as the increase of the signal-to-noise ratio due to coherent combining of signals at different antenna elements. In contrast, higher diversity gains ensure “smaller” fluctuations of the SNR in fading channels, and thus is a measure of robustness against fading effects. It is pointed out that diversity gain schemes based on space-time coding are out of the scope of this book as well since, in general, they do not fulfill Condition (C.4-1) (even if there is no spatial multiplexing gain in comparison with single antenna transmission). Indeed, in the case of space-time block coding for instance, there are multiple (transmit) symbols entering the channel simultaneously, each of which is generally not a copy of one data symbol but rather a linear combination of several data symbols. Remark 4.17. It is worth pointing out that the theory can be applied to a wireless network in which each link employs a 2 × 2 multiple antenna system with the Alamouti space-time block coding [112, 98]. The reason is that in this special case of orthogonal space-time codes, both transmit symbols are received at the same SIR, which is of the form (4.4) [113]. This common SIR value determines the data rate of a link as described in Sect. 4.3.4. However, the reader should be careful not to conclude that this property is inherent to any orthogonal space-time code. In [113], it is shown that orthogonal spacetime codes for more than two antennas do not have the “all SIR are equal” property. A sufficient condition for having this property can also be found in [113]. Now consider an arbitrary frame and suppose that H(k,l) ∈ CW ×W describes the multiple antenna channel from transmitter l to receiver k. More precisely, under the assumption of block frequency-flat channel model, the (i, j)th entry of H(k,l) is the coefficient of the channel between the ith antenna element at node k and the jth antenna element at node l [98]. Then, for an arbitrary slot, the antenna outputs on link k are grouped in the vector yk ∈ CW given by
4.3 Wireless Communication Channel
yk = H(k,k) uk Xk +
117
H(k,l) ul Xl + nk l∈Kk noise interference
where Xk is a zero-mean data symbol of link k (in any slot) and uk with uk 2 = 1 is the kth transmit beamforming vector. The role of transmit beamforming vectors here and in the downlink channel described before is essentially the same: Distribution of signal power over different antenna elements and transmitter-side control of the antenna pattern. Now suppose that ck ∈ CW denotes the kth receive beamforming vector, which combines the signals received via different antenna elements, and thus acts as a linear receiver of link k (see Sect. 4.3.2). Then, defining nk = cH k nk , the soft-decision ˆ variable Xk , k ∈ K, is equal to (k,l) ˆ k = cH H(k,k) uk Xk + X cH ul Xl + nk . k k H l∈Kk (k)
Again, using the terminology of Sect. 4.3.2, the vector bl = H(k,l) ul ∈ CW , l ∈ K, is the lth effective transmit vector associated with receiver k ∈ K. Alternatively, one can describe this vector as the effective transmit beamforming vector of transmitter l into the direction of receiver k. Now proceeding essentially as before shows that SIRk (p) =
(k,k) uk |2 pk |cH k H
H (k,l) u |2 p l l l∈Kk |ck H
+
ck 22 σ 2
pk V l∈Kk k,l pl /Vk + zk
=
where zk = σ 2 ck 22 /Vk is the effective noise variance and (k,l) |cH ul |2 H (k,k) 2 k H uk | Vk,l = Vk = |ck H 0
k= l k = l.
Consequently, the entries of the gain matrix in such a wireless network are given by ⎧ (l) 2 (k,l) |cH u l |2 ⎨ |cH k dk | k H k = l H H(k,k) u |2 = H d(k) |2 |c k |c k vk,l = (V)k,l = k k ⎩0 k = l. Finally, the noise vector z is z = (σ 2 c1 22 /V1 , . . . , σ 2 cK 22 /VK ).
5 Resource Allocation Problem in Communications Networks
This chapter formulates the resource allocation problem for wireless networks. Before that, however, we briefly discuss the fundamental trade-off between efficiency and fairness in wired networks. This trade-off eventually led researchers to consider the problem of maximizing the sum of monotonically increasing and strictly concave utility functions of source rates. We review some existing solutions to this problem and explain the insufficiency of these solutions in case of wireless networks. Section 5.2 reformulates the problem to better capture the situation encountered in wireless networks. We will argue in favor of MAC layer fair policies that have already been used in wired networks as a basis to achieve end-to-end fairness. We precisely define the concept of joint power control and link scheduling as well as introduce the notion of the feasible rate region. It is shown that this set is not convex in general, which makes the optimization of wireless networks a fairly tricky task. The utility-based power control problem is formulated in Sect. 5.2.4. In particular, we introduce a class of monotonically increasing and strictly concave utility functions of link rates for which the power control problem can be converted into a convex optimization problem. The reader will realize a strong connection to the results of the first part of the book because the inverse functions of the considered utility functions are log-convex functions. Finally, we will utilize some results of Chap. 2 to obtain valuable insights into the problem of joint power control and link scheduling.
5.1 End-to-End Rate Control in Wired Networks A standard problem in network design deals with the question of how the available bandwidth should be shared between competing flows to meet some share objectives [69, 114, 1, 70, 76] (and references therein). One possible objective is to allocate rates to the set of flows so as to maximize the total throughput subject to link capacity constraints. The main drawback of this strategy is that it may be quite unfair in the sense that some flows (users) S. Stanczak et al., Fundamentals of Resource Allocation in Wireless Networks, Foundations in Signal Processing, Communications and Networking 3, 2nd edn., c Springer-Verlag Berlin Heidelberg 2009 DOI 10.1007/978-3-540-79386-1 5,
120
5 Resource Allocation Problem in Communications Networks
may be denied access to the links [114]. For this reason, any rate control scheme must address the issue of fairness. One of the most common ideas of fairness is max-min fairness. The idea behind the max-min fair approach is to treat all users as fairly as possible by making all rates as large as possible [114]. More precisely, among all rate allocation strategies saturating1 a network, a max-min fair strategy makes the rates as equal as possible so that it is not possible to increase any rate without deteriorating other rates that are smaller or equal (see also the definition in the following subsection). The main drawback of this approach is that such “perfect fairness” is usually achieved at the expense of a considerable drop in efficiency expressed in terms of total throughput. Indeed, there seems to be a fundamental trade-off between throughput and fairness, with the throughput-optimal policy and max-min fair policy being two extremes of this trade-off [1]. A common approach to balance the issue of fairness and efficiency is to maximize the aggregate (overall) utility of rate allocations represented through continuously differentiable, monotonically increasing, and strictly concave functions (the law of diminishing returns) [69, 114, 70]. In this section, we briefly discuss the utility maximization problem in wired networks and summarize some interesting results. In the next section, we build on these results to formulate the utility maximization problem in wireless networks. 5.1.1 Fairness Criteria Consider a network with an established topology (N , L) and fixed routes / L (no traffic is routed over for each flow. Let φl (s) with φl (s) = 0 if l ∈ nonexistent links) be a routing variable so that the product φl (s)νs is the expected data rate of flow s ∈ S going through link l ∈ L. For brevity, we assume single-path routing, in which case we have φl (s) = 1 if flow s goes through link l and φl (s) = 0 otherwise (see also the assumptions in Sect. 4.1). Let ν = (ν1 , . . . , νS ) be a vector of flow rates or, equivalently, source rates.2 Then, the problem of end-to-end network utility maximization can be stated as follows max U (ν) subject to ∀l∈L φl (s)νs ≤ Cl (5.1) ν≥0
s∈S
where Cl denotes a fixed capacity of wired link l ∈ L and U : RS+ → R is a continuous, concave (strictly increasing in each entry) function representing the total utility of all flows. It is important to notice that link capacities 1
2
A wired network is said to be saturated if each link capacity constraint is satisfied with equality. Recall from the previous chapter that we make no difference between source rate and (end-to-end) flow rate. Thus, for consistency with the literature, ν is also referred to as a vector of source rates or, simply, source rate vector.
5.1 End-to-End Rate Control in Wired Networks
121
are fixed and that flows can share wired links over both time and frequency. Furthermore, notice that (5.1) deals with the expected traffic and thus does not preclude the existence of traffic queues at the nodes. Any vector of source rates ν ≥ 0 satisfying the link capacity constraints in (5.1) is called feasible. The standard formulation of network utility maximization for elastic traffic is to maximize the sum of individual sources’ utilities subject to the link capacity constraints [69]. In this case, U (ν) is of the form Us (νs ), Us : R+ → R (5.2) U (ν) = s∈S
where Us is a continuously differentiable, strictly increasing, and concave function. Choosing Us (x) = x, x ≥ 0, for every s ∈ S turns (5.1) into the problem of maximizing the total end-to-end throughput. In general there are infinitely many throughput-optimal allocations for a given network topology. However, simple examples show that a necessary condition for attaining the maximum is that, roughly speaking, relatively long flows are allocated zero source rates. Therefore, throughput-optimal policies are said to be unfair. The problem can be illustrated by means of a network with three flows and two links as depicted in Fig. 5.1.
Flow 3
Flow 1
C1
C2
Flow 2
Fig. 5.1: Three flows compete for access to two links [1, 2]. Whereas flows 1 and 2 are one-link flows going through links 1 and 2, respectively, flow 3 uses both links. The links have fixed capacities C1 and C2 , respectively. Clearly, the maximum total throughput is C1 + C2 and, in the maximum, the longer flow must be shut off (ν3 = 0) so that the one-link flows can be allocated rates of ν1 = C1 and ν2 = C2 . In contrast, if C1 ≤ C2 , the max-min fair rate allocation is ν1 = C1 /2, ν2 = C2 − C1 /2 and ν3 = C1 /2. Thus, the total throughput is C2 + C1 /2 which is strictly smaller than C1 + C2 . Note that if C1 = C2 , then all source rates are equal under the max-min fair solution.
A contrary approach is to make the source rates as equal as possible while satisfying each capacity constraint with equality, which leads us to the concept of max-min fairness [2]:
122
5 Resource Allocation Problem in Communications Networks
Definition 5.1 (Max-Min Fairness). A feasible source or flow rate vector ν is defined to be max-min fair if any rate νs cannot be increased without decreasing some νr , r = s, which is smaller than or equal to νs . It is well known that the max-min fair rate allocation is unique when the number of resources and the number of flows are both finite. Furthermore, it can be derived using the following simple procedure: Starting from a zero rate allocation, increase uniformly the rate of each flow until the capacity constraint of some link is achieved. Freeze the rate allocations of the flows going through this link and continue the procedure for the remaining flows until all flows are constrained (e.g., [2, p. 526]). The max-min fair approach is a user-centric approach in the sense that all users are treated fairly. In fact, we see from the above definition that, under the max-min fair policy, no flow is allocated a higher rate at the expense of other flows. This corresponds to an ideal ”social” network, where all flows (users) are provided with data rates that are as close to each other as possible, regardless of how much resources each flow needs. As a result, a significant drop in the overall throughput should be expected, especially if there exist long flows going through many bottleneck links. Recall that a link
l ∈ L is a bottleneck link with respect to a rate vector ν for a flow s ∈ S if s∈S φl (s)νs = Cl and νs ≥ νs for all flows s going through link l [2]. For instance, considering the example in Fig. 5.1 shows with C2 > C1 that the bottleneck links of flows 1, 2 and 3 with respect to the max-min fair rate allocation are links 1, 2 and 1, respectively. It is a matter of controversy whether the max-min fair rate allocation is desirable. As illustrated above, under the max-min fair solution, some flows may consume significantly more resources than others. Generally, the problem is how to balance between fairness and the utilization of resources. This led researchers to look for alternative ways of sharing network resources. The appropriateness of max-min fairness as a resource sharing objective for elastic traffic has been questioned in the landmark paper [69] where the notion of proportional fairness was introduced. A vector of rates ν ∗ is proportional fair if it is feasible (ν ∗ is nonnegative and satisfies the capacity constraints) and if for any other feasible vector ν, the aggregate of proportional change is nonpositive: νs − ν ∗ s ≤ 0. νs∗ s∈S
Considering the Kuhn–Tucker conditions [16] for problem (5.1) with (5.2), it may be shown that ν ∗ is a
proportional fair rate allocation if and only if ν ∗ solves (5.1) with U (ν) = s log νs . Thus, since log(x), x > 0, is a strictly concave function, it may be inferred that proportional fair rates are unique. Reference [69] has also considered a weighted version of the proportional fairness criterion in which case source rates νs are chosen so as to maximize
U (ν) = s ws log νs . The use of the weights has been advocated as a way for each user (associated with a flow) to choose the charge per unit time that the
5.1 End-to-End Rate Control in Wired Networks
123
user is willing to pay. The user’s rate as a result of optimization increases as the charge the user is willing to pay increases. Instead of linear utility functions for throughput maximization, we have logarithmic functions in case of proportional fairness. Since log(x) → −∞ as x → 0, it is easy to see that each source rate is strictly positive under proportional fair allocation. But this is actually the strict concavity property of the logarithm which forces fairness between sources (flows). Indeed, whereas the rate of increase of Us (x) = x is the same for all x ≥ 0, the rate of increase of Us (x) = log(x) is monotonically decreasing in x (the law of diminishing returns), and hence smaller source rates are favored in the latter case. On the other hand, if the rate of increase does not decrease too rapidly, then the total throughput is improved in comparison with the max-min fair allocation. For instance, consider the network in Fig. 5.1. It may be easily verified that the proportional fair rate allocation satisfies the following set of equations: ⎧ ⎪ ⎨ν1 + ν3 = C1 ν1 , ν2 , ν3 ≥ 0 . ν2 + ν3 = C2 ⎪ ⎩ ν2 ν3 = νν11+ν 2 Under the assumption of equal link capacities C1 = C2 = C, the solution to the above set of equations is given by ν ∗ = (2C/3, 2C/3, C/3). The total throughput is 5C/3, which is smaller than 2C (maximum throughput) but greater than 3C/2 (max-min fair throughput). So, in this example, the introduction of a strictly concave utility function provides some balance between efficiency and fairness. The notion of proportional fairness has been generalized by [70]. This generalization includes arbitrarily close approximation of max-min fairness. To be more precise, let w = (w1 , . . . , wS ) be a positive vector, and let α be a nonnegative constant. Then a vector of rates ν ∗ is said to be (w, α)proportionally fair if it is feasible and for any other feasible vector ν: s∈S
ws
νs − νs∗ ≤ 0. νs∗α
(5.3)
Obviously, if α = 1 and w = 1, ν ∗ is a proportional fair rate vector. Further ∗ T examination reveals
that the left-hand side of (5.3) is equal to ∇U (ν ) (ν − ∗ ν ) with U (ν) = s Us (νs ) and Us : R++ → R given by 1−α ws x1−α α>1 (5.4) Us (x) = ws log(x) α = 1 . Consequently, since U : RS++ → R is a twice continuously differentiable and strictly concave function, we have U (ν) = U (ν ∗ ) + ∇U (ν ∗ )T (ν − ν ∗ ) + 12 (ν − ν ∗ )T ∇2 U (ν ∗ )(ν − ν ∗ ) + o(ν − ν ∗ 22 ) ≤ U (ν ∗ ) + o(ν − ν ∗ 22 ) for every feasible rate vector ν. So ν ∗ is a local maximum of U . However, as U (ν) =
124
5 Resource Allocation Problem in Communications Networks
s Us (νs ) with (5.4) has a unique global maximum, it follows that the (w, α)proportionally fair rate vector maximizes U (ν) over the set of all feasible rate vectors. The converse holds as well, which can be deduced from the associated Kuhn–Tucker conditions [70] (see App. B.4.3 for the definition of the Kuhn– Tucker conditions). Summarizing, we can say
that ν ∗ is (w, α)-proportionally fair if and only if ∗ ν solves (5.1) with U (ν) = s Us (νs ) and Us (x) given by (5.4). Furthermore, it is shown in [70] that the (w, α)-proportionally fair rate vector approaches the max-min rate vector as α → ∞ (see also Sect. 5.2.6).
5.1.2 Algorithms Given a network with fixed link capacities and a fixed number of sources (flows), the max-min fair rates for these sources can be easily computed by employing the filling procedure described in the previous section. Such a solution may be appropriate for small networks with an omniscient network controller that could easily compute the max-min fair rates and update them as the number of flows changes. Since this is impractical for moderately large networks, there are many publications on distributed max-min fair algorithms that dynamically adjust the source rates as the number of flows changes (see [2], pp. 528–530 and references therein). Most of those algorithms require some coordination and exchange of information between network nodes. An interesting exception is an approach suggested by [115, 116], where the authors show that max-min fairness can be achieved by performing per-flow fair queuing on all network links. More precisely, in this approach, each link offers a transmission slot to its flows by polling them in round-robin order. In addition, hop-by-hop (window-based) congestion control is performed to prevent excessive packet queues at the network nodes. Now as the window size increases, the source rates approach the max-min fair rates. Finally, we mention reference [117]. This paper provides an asynchronous distributed algorithm that converges to the exact max-min fair rate allocation. In the proposed scheme, each source progressively discovers its rate allocation by comparing it with the “advertised rate” of the links on its route. Note that the max-min fair utility function is not differentiable so that some standard optimization methods cannot be applied in this case. In contrast, proportional fair objectives are continuously differentiable, monotonically increasing and strictly concave, therewith admitting a convex optimization formulation with zero duality gap. In [69], the authors proposed two algorithms (primal and dual) that arbitrarily closely approximate the (w, 1)proportionally fair rates. The primal algorithm changes the rate of flow s ∈ S according to the following system of differential equations: d νs (t) = κ ws − νs (t) φl (s)μl (t) dt l∈L
μl (t) = pl
s∈S
φl (s)νs (t)
5.2 Problem Formulation for Wireless Networks
125
with φl (s) ∈ {0, 1} (single-path routing) where κ is a positive constant and pl (x) = (x−Cl +)+ /2 , > 0, is a nonnegative, continuous, and monotonically increasing function. In words, each source, say source s, gets feedback μl (t) (related to residual capacity on link l) from the links and gradually changes its rate as follows: Increase the rate linearly proportional to ws and decrease it multiplicatively proportional to total feedback. In the dual algorithm, instead of rates, the Lagrange multipliers (shadow prices) μl (t) are adjusted gradually, with rates given as functions of the shadow prices. Algorithms for computing (w, α)-proportionally fair rates have been developed in [70]. Here, each source adjusts its window size based on the total delay. This stands in contrast to [69] where source rates are calculated explicitly. Another interesting work is [114], where the static regime of a network with perfectly fluid flows is considered. Given a fixed end-to-end window control, the authors have showed that different fair rate allocation objectives can be met by implementing different queuing disciplines in network nodes, provided that the network is not too congested. For instance, it turns out that if roundtrip delays are negligible, then the static rates under the FIFO (first in first out) queuing discipline are (w, 1)-proportionally fair rates with the weights being equal to the window sizes. In contrast, the maximum throughput allocation is achieved with longest queue first policy if the round-trip delay is small. There is a similar conclusion for (w, 2)-proportionally fair rates if each node maintains per-flow queuing with service rate for each queue proportional to the square root of the queue size. These results as well as the work of [115, 116] show that network-wide (end-to-end) fairness can be also achieved if each node executes an appropriate contention resolution algorithm. These results may serve as a motivation for MAC layer fair power control algorithms for wireless networks presented later in this book.
5.2 Problem Formulation for Wireless Networks In the previous section, we have briefly outlined the problem of rate control in wired networks. In what follows we turn our attention to wireless networks. The first question which may arise is the following: Is there something fundamental about the nature of wireless networks that prevents us from reusing the well-developed techniques for wired networks? In fact, one of the most important unique features was already mentioned in Sect. 4.3.4, namely that the data rate achievable on any link is a nonlinear function of global variables such as transmit powers and channel states of all links. Moreover, the gain matrix V ≥ 0 can be only partially influenced (if at all) since it depends on relative locations of nodes and other objects (scatterers) in the vicinity of a network. Therefore, even if the nodes are stationary and the network topology is fixed, V is not known in advance as the channel states can vary due to the mobility of these objects. In a mobile network environment with nodes changing their positions permanently, the network topology is not known in advance either
126
5 Resource Allocation Problem in Communications Networks
and the process of route discovery and maintenance may consume a lot of wireless resources. Due to the variation of the wireless radio environment, the capacity of wireless links exhibits an ephemeral and dynamic nature. This stands in clear contrast to wired networks where the capacity of any link is fixed and independent of the transmission rate on other links. In wireless networks, nodes in general do not even know the exact capacities of their own links because, as mentioned above, the capacity of each link depends on some global network variables. Due to this mutual dependence, it is clear that the scope of the utility maximization problem (5.1) is limited for wireless networks. Furthermore, when designing protocols and algorithms for wireless networks, coordination between nodes should be reduced to a minimum in order to save wireless resources. This suggests the development of smart strategies for resource allocation and interference management that achieve network-wide fairness with minimum global coordination. In this book, we argue in favor of power control and link scheduling policies designed to ensure fairness at the MAC layer (MAC layer fairness; see Sect. 5.2.4 for more details). This provides a better utilization of scarce resources. In a sense, such an approach can be viewed as an extension of the work done by [115, 116, 114] to wireless networks. Remark 5.2. It must be emphasized that the problem of maximizing the aggregate utility of source (or link) rates is not appropriate for all scenarios. For instance, if the power supply is a bottleneck (like in sensor networks), then the throughput and fairness performance should be balanced against power consumption to prolong battery life [118, 93, 73]. Considering only the throughput performance would discharge the batteries after a relatively short time. Therefore, the rate control strategies presented in this book may be not applicable to wireless (sensor) networks where nodes are equipped with low-capacity batteries so that the energy supply is a major bottleneck. The following section introduces the notion of joint power control and link scheduling. This model is used in Sect. 5.4.1 to provide some interesting insights into the design of throughput-optimal MAC policies. However, as mentioned before, the main focus is on the power control problem for a given link scheduling policy.3 In the face of implementation constraints, this seems to be a reasonable approach in many cases of practical interest. 5.2.1 Joint Power Control and Link Scheduling Now let us introduce the notion of joint power control and link scheduling (JPCLS). Our definition is tailored to better illustrate throughput-optimal MAC policies discussed in Sect. 5.4.1. Roughly speaking, a JPCLS policy is a (distributed or centralized) mechanism of the MAC layer that divides every frame into a finite number of 3
See Sect. 5.2.4 for assumptions on link scheduling in the analysis of power control.
5.2 Problem Formulation for Wireless Networks
127
perfectly synchronized subframe intervals, assigns a group of (logical) links to each subframe, and allocates transmit powers to them. Thus, the subframes, each of which consisting a sufficiently number of symbol epochs, define time intervals where different power vectors can be allocated to the links. In what follows, we formalize these ideas. To this end, let B be a bounded interval on the real line, and let Q := {1, 2, . . . , |Q|}, where |Q| is usually significantly smaller than the number of time slots M (symbol intervals) in each frame. Assume that A = {Bn : n ∈ Q} is a given system of subsets of B with " Bn = B ∀n,m∈Q Bn ∩ Bm = ∅ . n =m
n∈Q
In words, A partitions B into a finite number of disjoint sets Bn . We use μ : A → [0, 1] to represent any real (set) function such that " Bn = μ(Bn ) = μ(B) = 1 . (5.5) ∀n∈Q μ(Bn ) ≥ 0 μ(∅) = 0 μ n∈Q
n∈Q
Furthermore, with each link k ∈ K, we associate a set function pk : A → R+ . For any given B and A, any functions μ : A → [0, 1] (satisfying (5.5)) and p = (p1 , . . . , pK ) : A → RK + have the following interpretation: The expected data rate in nats per channel use (see Sect. 4.3.4) on link k ∈ K is equal to νk (p, μ) = μ(Bn )Φ(SIRk (p(Bn ))) n∈Q
where
pk (Bn )Vk 2, l∈K pl (Bn )Vk,l + σk
SIRk (p(Bn )) =
n ∈ Q.
(5.6)
Note that in the special case when μ(B1 ) = μ(B) = 1 and μ(Bn ) = 0 for all n > 1, we have SIRk (p(B1 )) = SIRk (p) where SIRk (p) is defined by (4.4). Definition 5.3. Given B and A, link scheduling refers to the operation of choosing μ : A → [0, 1] satisfying (5.5), while power control determines p : A → RK + . A mechanism that jointly determines the pair (p, μ) : A × A → RK + × [0, 1]
(5.7)
is called joint power control and link scheduling (JPCLS). If μ(B1 ) = μ(B) = 1, then we say that there is no link scheduling involved. Throughout the book, we adopt the following assumptions and interpretations of the above definitions. (i) B is referred to as a frame, while Bn , n ∈ Q, is the nth subframe. The subframes are ordered in any particular way. (ii) p(Bn ) is a vector of transmit powers allocated to links in subframe Bn . If pk (Bn ) > 0, we say that link k is active in Bn , otherwise it is said to be idle.
128
5 Resource Allocation Problem in Communications Networks
(iii) μ(Bn ) is the fraction of the frame occupied by subframe Bn . Also, μ(Bn ) can be viewed as the relative frequency at which the power vector p(Bn ) is utilized. If μ(Bn ) = 0, then the power vector p(Bn ) is not utilized. (iv) SIRk (p(Bn )) is the signal-to-interference ratio at each soft-decision variable in subframe Bn . It is assumed that every nonempty subframe is long enough (in terms of the number of transmitted symbols) to ensure that the SIR defined by (5.6) is close to the time average SIR. If μ(Bn ) = 0, then the average SIR in Bn is equal to zero. As a consequence, Φ(SIRk (p(Bn ))) is equal to the time average rate on link k in subframe Bn ∈ A. Choosing μ in Definition 5.3 is equivalent to determining the lengths of the subframes, and therefore this operation can be viewed as time slot management, where groups of symbol intervals are merged to form subframes. In practice, there is usually a fixed division of a frame into subframes whose lengths are multiples of T (the length of a single symbol interval). Link scheduling then refers to the operation of assigning links to the subframes. Power control in turn allocates transmit powers to links in each subframe. Link scheduling may be implemented in a centralized or decentralized manner. In the first case, there is a central scheduler that coordinates time slot management and link assignment across the network. A distributed implementation requires coordination between local link schedulers at every node. Remark 5.4. According to Definition 5.3, link scheduling determines the lengths of all subframes and assigns all links to each of them. Power control then determines transmit powers of all links in each subframe. Thus, the actual task of assigning links to subframes is carried out by power control when a link is said to be assigned to a subframe if and only if it is active in this subframe. In practice and also in the remainder of this book (unless otherwise stated), the task of a link scheduler is to assign each link to one or more given subframes; this link is then activated in each assigned subframe by power control. Power Constraints under Link Scheduling In Sect. 4.3.3, we have specified constraints on transmit powers that can be dissipated over the duration of the frame interval when no link scheduling is involved. The average transmit power on each link is assumed to be approximately equal to the expected transmit power in any symbol interval. Thus, if the expected transmit power in every symbol interval is kept constant, then the expected transmit power over a frame period decreases when the active time of a link decreases. So the question arises whether the links can compensate the power loss by increasing their (expected) transmit powers over the active time periods. Formally, considering Definition 5.3, the question is which of the following should hold: p(Bn )μ(Bn ) ∈ P (5.8) n∈Q
5.2 Problem Formulation for Wireless Networks
129
or ∀n∈Q p(Bn ) ∈ P
(5.9)
where p(Bn ) is a vector of the expected transmit powers used in subframe Bn . Hence, if subframes are sufficiently long, condition (5.8) limits the average transmit powers that can be dissipated over a frame interval. In contrast, there are no constraints on the entries of p(Bn ) which may become arbitrarily large as μ(Bn ) → 0. This implicitly requires amplifiers with an ideal linear transfer characteristic. Practical amplifiers however have nonlinear characteristics and will (hopefully) go into saturation beyond a certain limit. From a practical point of view, it is thus reasonable to assume that (5.9) holds. In this case, we say that transmit powers are subject to (MAC layer) peak power constraints. Note that if there are peak power constraints, the maximum average transmit power decreases as time occupied by a link decreases. Thus, link scheduling is less attractive in the case of peak power constraints. With these constraints, it follows from the above definitions that the data rate on link k under a JPCLS policy (p, μ) is μ(Bn )Φ SIRk (p(Bn )) (5.10) νk (p, μ) = n∈Q
where p(Bn ) ∈ P for each n ∈ Q. Note that whereas p in (4.22) is a vector of transmit powers, p in (5.10) is a vector of set functions defined on A, each of which is subject to the (MAC layer) peak power constraints. 5.2.2 Feasible Rate Region The set of all achievable data rate vectors ν(p) = (ν1 (p), . . . , νK (p)) ∈ RK + is called the feasible rate region. It is a set of all data rates that are achievable on wireless links under a given coding strategy. Hence, this notion is distinct from the information-theoretic capacity region [13], which includes optimization over all possible coding schemes. When no link scheduling is involved (μ(B1 ) = 1 and Q = {1}), the feasible rate region R ⊂ RK + is given by " ω ∈ RK R := R(P) := ω ∈ RK + : ∃p∈P ω ≤ ν(p) = + : ω ≤ ν(p) p∈P
(5.11) where P is the admissible power region defined by (4.18). Obviously, R is connected as the union of connected sets is connected if there is a nonempty intersection of these sets. Moreover, since P is a compact and downward comprehensive set (Remark 4.13) and the rate function Φ : R+ → R+ is continuous strictly increasing function, the following observation can be easily verified using (5.11). Observation 5.5. R is a compact and downward comprehensive set.
130
5 Resource Allocation Problem in Communications Networks
Note that the downward comprehensivity is defined in the same manner as in the case of the admissible power region (see Remark 4.13 and Definition 5.29 later in Sect. 5.3, where we consider more general feasible QoS regions). When dealing with the utility maximization problem, one of the main difficulties stems from the fact that the feasible rate region is not a convex set in general. This stands in clear contrast to wired networks where the feasible rate region is a convex polytope. To see that R is not convex in general, let ω ∈ R be arbitrary. Hence, by the definition of R, it follows that there exists p = (p1 , . . . , pK ) ∈ P such that ωk ≤ Φ(SIRk (p)) for each k ∈ K. Since Φ : R+ → R+ is a bijection with φ(0) = 0 (Definition B.6), we can rewrite this set of inequalities using the inverse function Φ−1 (x) as Φ−1 (ωk ) vk,l pl + zk ≤ pk , k ∈ K . (5.12) l∈K
In vector form, this becomes Γ(ω)z ≤ (I − Γ(ω)V)p
(5.13)
where Γ(ω) = diag(Φ−1 (ω1 ), . . . , Φ−1 (ωK )) and z > 0 is the noise vector defined by (4.7). Now Theorem A.51 implies that if ρ(Γ(ω)V) < 1, there exists a unique vector p(ω) = (p1 (ω), . . . , pK (ω)) ≥ 0 given by4 p(ω) := (I − Γ(ω)V)−1 Γ(ω)z .
(5.14)
Conversely, if p(ω) ≥ 0 exists for some ω, then ω ≥ 0 is unique and ρ(Γ(ω)V) < 1. In other words (see also Lemma 2.11), there exists a bijective continuous map from R onto P such that for every p ∈ P, there is exactly one ω ∈ R such that p = p(ω). In other words, there is a one-to-one correspondence between R and P. Considering this and (5.13) show that (see also the remark below) ω ∈ R if and only if p(ω) ∈ P, which in fact implies that ρ(Γ(ω)V). Hence, one has K R = {ω ∈ RK + : p(ω) ∈ P} ⊂ R+ .
(5.15)
Comparing this with the results of Chap. 2 reveals that p(ω) is a special form of (2.4) with X(ω) = Γ(ω)V and b(ω) = Γ(ω)z (see also Sects. 2.3 and 2.3.3). Using the terminology of the first part of the book, the feasible rate region is nothing but the feasibility set when the parameter vector ω is chosen to be a vector of link rates.5 The nonnegative solution in (2.4) is a unique power vector for which the data rate allocation is equal to ω ≥ 0. 4
5
Note that the kth coordinate of p(ω) is zero if and only if ωk = 0. Thus, for every ω > 0 with ρ(Γ(ω)V) < 1, there is a unique positive vector p(ω) > 0. More precisely, R is the closure of the feasibility set with γ(x) = ex − 1 as the latter set does not contain the boundary vectors with zero entries.
5.2 Problem Formulation for Wireless Networks
131
Remark 5.6. The necessity and sufficiency of p(ω) ∈ P for ω ∈ R also follows from Theorem 5.25 by noting that R = cl(Fγ (P)) with γ(x) = ex − 1, x > 0. More precisely, by Lemma 5.24, if ρ(Γ(ω)V) < 1, then p(ω) given by (5.14) is the unique component-wise minimum power vector for which (5.13) is satisfied. So, since P is downward comprehensive (Remark 4.13) and ω ≤ ν(p), implies p(ω) ≤ p, we have ω ∈ R = ∅ if and only if p(ω) ∈ P. Using the terminology of Sect. 5.5, we can equivalently say that p(ω) is the minimum point (Definition B.4) of the valid power region (Definition 5.40). It follows from Theorem 2.6 that each element of p(ω) would be a logconvex function of ω if Φ−1 (x) was log-convex. Unfortunately, for Φ(x) = log(1 + x), x ≥ 0, we have Φ−1 (x) = ex − 1, x ≥ 0, which is not log-convex on an arbitrary interval I ⊆ R+ . This is because d2 θ ex (x) = − x < 0, 2 dx (e − 1)2
x > 0, θ(x) = log(ex − 1) .
Although the log-convexity property is not necessary for the feasible rate region to be a convex set, Example 2.5 shows that the feasible rate region is indeed not convex in general. The nonconvexity of R makes the utility maximization problem over a joint space of transmit powers and link schedulers significantly more challenging and, in general, difficult to solve. Indeed, if R is not a convex set, then a throughput-optimal MAC policy (involving joint power control and link scheduling (JPCLS) under some peak power constraints as discussed in Sect. 5.2.1) is related to the problem of computing the points of the convex hull of R. Recall that the convex hull of a set of points is the intersection of all convex sets containing this set. Therefore, ˜ := ConvexHull(R) R⊆R
(5.16)
˜ is the set of data rates being achievable by means of some feasible where R MAC policy. This immediately follows from (5.10), which says that data rates under any JPCLS policy is equal to a convex combination of data rates corresponding to points in R. In Sect. 5.4.1, we define throughput-optimal MAC ˜ Conpolicies as those policies that achieve some points on the boundary of R. sequently, the problem of finding throughput-optimal policies is in general a nonconvex problem of combinatorial nature that is difficult to solve [119]. Finally we point out that if each entry of the gain matrix V follows an ergodic stochastic process (note that V depends on the state of the wireless channel), taking values on a finite state space V, with time average probabilities pV , then the set of all average rate vectors (averaged over all gain matrices) is given by [120, 46]: ¯= ˜ R pV R(V) V∈V
˜ where R(V) is used to denote the convex hull of the feasible rate region when the gain matrix V is given. However, notice that the feasible rate region R
132
5 Resource Allocation Problem in Communications Networks
¯ serves as a basis for the development of dynamic throughput(and not R) optimal policies. 5.2.3 End-to-End Window-Based Rate Control Having introduced the notion of the feasible rate region, we briefly discuss an end-to-end window-based rate control problem for wireless networks. This discussion should be primarily considered as a motivation for the power control problem formulated in the next section. For a more detailed presentation of this approach, the reader is referred to [76, 78]. One of the major difficulties in achieving end-to-end fairness in wired networks is that, in the optimum, any source rate is a function of not only routing variables but also of other source rates. In wireless networks, an additional problem is that link capacities are not fixed but depend on the interference powers at the receivers, which in turn depend on transmit powers. We may therefore conjecture that the network performance can be improved by implementing a cross-layer protocol that couples an end-to-end window-based congestion control with power control at the MAC layer. Such a cross-layer protocol could work as follows: Each source gets implicit feedback from the network such as round-trip delay or throughput and regulates the source rate by adjusting its window size defined as the maximum number of packets to be transmitted but not yet acknowledged. Then, a power control policy utilizes some information from the transport layer to determine the pair (p, μ) defined by (5.7). Note that each link is associated with a flow and that there is a per-flow queuing at every node. be a matrix such that ak,s νs is the expected Let A = (ak,s ) ∈ RK×S + (average) data rate of flow s ∈ S going through link k ∈ K, where νs denotes the rate of flow s ∈ S. Thus, ak,s = φl (s) (see Sect. 5.1) if link k shares wireless link l and ak,s = 0 otherwise. Note that each row of A has exactly one positive entry since flows cannot share links. As in Sect. 5.1, assume that the objective is to maximize the sum of sources’ utilities subject to link capacity constraints given a gain matrix V. A formal problem formulation is given by (5.1) except that now the vector of link rates Aν must lie in the convex hull of the feasible rate region. Therefore, ˜ Us (νs ) subject to Aν ∈ R (5.17) ν ∗ = arg max ν
s∈S
˜ is defined by (5.16) and the maximum is assumed to exist. where R Now suppose for a moment that Φ(SIRk (p)) is a concave function of p ≥ 0 for each k ∈ K and6 that Slater’s condition holds for the problem (5.17) [16, Sect. 5.2.3]. By Sect. 4.3.4 and (5.11), the concavity property implies that R = 6
If Φ(x) = log(1 + x), x ≥ 0, the concavity requirement is actually not satisfied except for some special cases. For the rest of this section, the reader may also think of other functions that can satisfy this requirement.
5.2 Problem Formulation for Wireless Networks
133
˜ from which it follows that all points of the set R ˜ can be achieved by power R, control with all links being active concurrently. Under these assumptions, the problem in (5.17) is convex without applying the convex hull operation to the feasible rate region R and the Kuhn–Tucker conditions (Definition B.50) provide necessary and sufficient conditions for optimality. Therefore, solving (5.17) is equivalent to satisfying the complementary slackness condition and finding a (feasible) stationary point of the Lagrangian function [16, 78] (see also App. B.4.3). In what follows, let us assume that the complementary slackness conditions are satisfied for any primal and dual optimal solutions. We see that the link capacity constraints in (5.17) can be written as ∃p∈P ∀s∈S ∀k∈Ks νs ≤ Φ(SIRk (p)) where Ks ⊆ K is a set of those links through which flow s ∈ S passes. So, the Lagrangian function7 associated with the problem is Us (νs ) − λT Aν + λk Φ(SIRk (p)) (5.18) L(ν, p, λ) = s∈S
k∈K
where λ = (λ1 , . . . , λK ) ≥ 0 are dual variables, p ∈ P and ν ≥ 0. We see that whereas the last addend on the right-hand side depends on p, the first two addends are independent of transmit powers. Thus, by linearity of the differentiation operator, the problem of finding a stationary point of the Lagrangian function can be decomposed into two problems coupled by the optimal Lagrange multiplier vector λ∗ (Definition B.50): Us (νs ) − λ∗T Aν ν ∗ = arg max ν∈RS +
∗
p = arg max p∈P
s∈S
λ∗k Φ(SIRk (p)) .
(5.19)
k∈K
The first subproblem can be implicitly solved by employing an appropriate end-to-end window-based congestion control algorithm for different functions Us (x), such as those given by (5.4) [70, 78, 121]. For example, TCP Vegas has been shown to solve the first subproblem for logarithmic utility functions with the associated dual variable λk being the queuing delay along link k. Therefore, the Vegas source rates are (w, 1)-proportionally fair (see Sect. 5.1.1), where the weight ws linearly increases with the round-trip propagation delay for source s ∈ S. The equilibrium backlogs at the links provide the optimal Lagrange multipliers. The second subproblem in (5.19) is to allocate transmit powers to interfering links so as to maximize a weighted sum of link rates. Under the assumption of concavity of Φ(SIRk (p)), k ∈ K, the problem is mathematically 7
In fact, it is a partial Lagrangian function as the weighted sum of constraint functions does not take into account the power constraints. See also Sect. 6.7.1.
134
5 Resource Allocation Problem in Communications Networks
tractable and can be solved using methods of convex optimization such as gradient projection methods (see Sect. 6.5 and App. B.4.2). Unfortunately, if Φ(x) = log(1+x), x ≥ 0, the problem is not convex in general. In order to make it tractable, it is a common practice [76] to assume that Φ(x) = log(x), x > 0, which is equivalent to assuming the high SIR regime (Sect. 5.4.2). The feasible rate region R is then a convex set since the inverse of log(x), x > 0, is logconvex on R (see the previous section as well as Sect. 5.3). Furthermore, as shown later in Chap. 6, log(SIRk (es )) is a concave function of s = log(p). In [76], the stationary point of the Lagrangian function (5.18) is found iteratively by a simultaneous application of a congestion control mechanism and a gradient projection algorithm for the second subproblem, with the weights being equal to the dual variables associated with the problem (the queuing delays in case of TCP Vegas). 5.2.4 MAC Layer Fair Rate Control The previous section illustrates how traditional TCP protocols may be coupled with power control and link scheduling policies to enhance the network performance in terms of some aggregate utility function. Upon receiving information about queuing delays (the dual variables in case of TCP Vegas), each source node updates its window size to adjust the source rate. At the same time, a distributed MAC (medium access control) protocol assigns groups of links to subframes and allocates transmit powers to them so as to maximize a weighted sum of link rates. This is however still an end-to-end control scheme, and therefore it has some important disadvantages common to such schemes. The rates are adjusted with a period proportional to the end-to-end roundtrip delays, which are usually large in wireless networks. As a result, such schemes can be expected to have slow convergence and extensive rate oscillations. The latter one may cause large queues (excessive memory) or data loss on congested links when intermediate nodes have no means to limit the traffic generated by other nodes in their vicinity. In fact, because of slow convergence, it is justified to claim that the determination of correct rates would not be affordable if such a scheme was implemented in dynamic wireless environments. Finally, the commonly raised argument in favor of end-to-end schemes that they keep the network simple and scalable by placing the complexity in the hosts hardly applies to wireless networks where the number of flows per node is of a much smaller order than in the Internet. Moreover, wireless networks have per-flow queuing for reasons of scheduling and power control. For these reasons (see also the discussion at the beginning of Sect. 5.2), we argue in favor of MAC layer (also called per-link) fairness with some kind of hop-by-hop (or node-by-node) congestion control as described, for instance, in [116, 2, 122, 123]. MAC layer fair mechanisms such as weighted fair queuing have already served as a basis for achieving end-to-end fairness in wired networks. Due to the unique characteristics of wireless networks, however, it is clear that MAC layer policies for wired networks cannot be simply reused
5.2 Problem Formulation for Wireless Networks
135
in the wireless environment. Actually, as pointed out by [124], MAC layer flows (one-hop flows between neighboring nodes) in wireless networks have location-dependent contention for resource allocation, and thus have some commonalities with network layer flows in wired networks in the sense that they experience different contention. The notion of MAC layer fairness is defined along similar lines to end-toend fairness in Sect. 5.2.3, except that instead of network flows, MAC layer flows are considered. For simplicity, the reader may assume a FIFO queuing discipline. A MAC layer policy with hop-by-hop congestion control may work as follows: At the beginning of every frame, a (distributed) MAC controller 8 chooses link rates ν ∗ ∈ RK + such that ν ∗ = arg max ν
Uk (νk )
subject to
˜ ν∈R
(5.20)
k∈K
where Uk is a continuously differentiable, monotonically increasing and strictly ˜ is the convex hull of R defined by (5.11). The utility concave function, and R function is usually of the form Uk (x) = wk U (x) where wk is a nonnegative weight that couples the MAC layer with a congestion/flow control protocol for each link, and therefore usually depends on the current queue states. In addition, at source nodes, a simple window-based end-to-end congestion control may be used to prevent excessive queues at the network nodes. One possible strategy for choosing the weights inspired by the so-called back-pressure policy [125, 126] is (k) (5.21) wk = max{u(k) s − ud , 0} (k)
(k)
where us ≥ 0 and ud ≥ 0 are buffer occupancies at the source and destination of link k, respectively. Thus, wk is zero if the queue occupancy at the source node is smaller than the queue occupancy at the destination node. The choice of the weights is beyond the scope of this book. We simply assume that the weights are provided at the beginning of every frame according to some strategy. Furthermore, unless otherwise stated, it is assumed throughout the book that (C.5-1) all weights are positive so that w ∈ RK ++ . Although this does not impact the generality of the analysis in this book, it is important to emphasize that the process of determining the weights is an important issue and has a decisive impact on the overall network behavior. As mentioned in Sect. 5.2.2, optimal joint power control and link scheduling policies are difficult to determine even in a centralized manner. For this reason, practical MAC protocols are usually based on heuristic approaches that attempt to avoid strong interference by activating neighboring links in different subframes. This can be achieved by a suitable collision avoidance 8
Note that, beginning with this section, ν denotes a K-dimensional nonnegative vector of link rates.
136
5 Resource Allocation Problem in Communications Networks
mechanism to avoid strong interference. For a brief discussion of this matter, we refer to page 90 in Sect. 4.2 as well as to Remarks 4.7 and 4.8. Notice also Remark 5.7. In the remainder of this book, we neglect the problem of link scheduling and focus on the power control problem. The only exception is Sect. 5.4 where some interesting consequences of the results presented in Chap. 2 on throughput-optimal policies are discussed. For simplicity, it is assumed that no link scheduling is involved, which means that Q = {1} and μ(B1 ) = μ(B) = 1. This implies that all links can be activated simultaneously without having a collision or, equivalently, without causing strong interference. In particular, each node can transmit and receive simultaneously over all its links. Remark 5.7. Although this is not a particularly realistic scenario, we again emphasize that the assumption does not impact the generality of the analysis presented in this book. In fact, the extension to an arbitrary link scheduling policy is straightforward. Due to the MAC layer peak power constraints (5.9), it is clear that the power control problem under some link scheduling policy decomposes into separate problems of the same type, each for one subframe. The data rate on each link follows from (5.10) to be a linear combination of data rates achieved in each subframe, with the coefficients being equal to the length of the subframes. In real networks, the computation of an optimal power vector for each subframe makes sense only if the number of subframes is relatively small. Alternatively, groups of links can be scheduled in different frames which may be a reasonable approach when larger delays are acceptable. Finally it is interesting to point out that if we had the average power constraints in (5.8), the power control problem does not decompose into separate problems for each subframe since then the power vectors for different subframes are subject to a common power constraint. 5.2.5 Utility-Based Power Control Under the above assumptions, the vector of data rates ν is confined to be an element of the feasible rate region R defined by (5.11). Moreover, by (5.15) and the discussion in Sect. 5.2.2, we know that there exists a one-to-one correspondence between the feasible rate region and the admissible power region P, which contains all feasible power vectors (see Remark 4.12). As an immediate consequence of this, we can change the optimization domain in (5.20) so as to arrive at an equivalent power control problem subject to the power constraints: Uk (νk (p)) = arg max Uk (Φ(SIRk (p))) p∗ =arg max (5.22) p∈P p∈P k∈K
k∈K
where it is assumed that the maximum exists (see Lemma 5.12). Recall that, unless otherwise stated, Φ(x) = log(1 + x), x ≥ 0. Having found the power vector p∗ , the MAC layer fair rate on link k is given by Φ(SIRk (p∗ )). Note
5.2 Problem Formulation for Wireless Networks
137
that in comparison with the original problem, we have only changed the optimization domain from R to P. Although other domains could be considered by using different bijective mappings (see also the next section), the power domain appears to be the most natural choice in case of wireless networks. If Uk (x) is of the form given by (5.4) for some α > 0, the power vector p∗ defined by (5.22) is referred to as a (w, α)-fair power allocation. Although the functions in (5.4) are strictly concave, it may be easily verified by computing the Hessian matrix for K = 2 that Uk (νk (p)) is in general not concave with respect to the power vector p. Thus, the power control problem in (5.22) is not convex. General-type nonconvex problems are too difficult for numerical solution. The computational effort required to solve such a problem by the best-known numerical methods may grow prohibitively fast with the dimension of the problem, and there are serious theoretical reasons to conjecture that this is the intrinsic feature of nonconvex problems rather than a drawback of the existing optimization techniques. The situation is further complicated when decentralized algorithms are desired. Therefore, we slightly modify the traditional utility criteria, thereby preserving monotonicity and strict concavity with respect to data rate. To be precise, given a weight vector w ∈ RK ++ , we consider the following class of utility functions [127, 128, 82] Uk (x) = wk Ψ (Φ−1 (x)) = wk U (x), x > 0, wk > 0
k∈K
(5.23)
and assume that the following conditions on Ψ (x) hold:9 (C.5-2) Ψ : R++ → Q is a twice continuously differentiable and strictly increasing (bijective) function where Q is an open interval on the real line such that Ψ −1 : Q → R++ . (C.5-3) There holds lim Ψ (x) := −∞
x→0
and
lim Ψ (x) = lim
x→0
x→0
dΨ (x) = +∞ . dx
(5.24)
(C.5-4) Ψe (x) := Ψ (ex ), x ∈ R, is concave. Since Ψ is twice continuously differentiable and ex is positive on R, this is equivalent to Ψe (x) =
d2 Ψ e (x) ≤ 0, x ∈ R dx2
and
Ψ (x) + xΨ (x) ≤ 0
(5.25)
for all x > 0. Note that the subscript “e” is not an index, parameter or any variable but a fixed part of the function name. This is in contrast to the subscript “k” in (5.23), which is an index taking different values. Remark 5.8. It is worth pointing out that the theory presented in this book straightforwardly extend to the case when the utility function is of the form 9
Note that some of the conditions may be redundant because they may be implied from the other ones.
138
5 Resource Allocation Problem in Communications Networks
Uk (x) = wk Ψk (Φ−1 (x)), x > 0, wk > 0 k ∈ K where Ψk : R++ → Qk ⊆ R, k ∈ K, is any function satisfying (C.5-2)–(C.5-4) with Q = Qk (an open interval on the real line). In words, different utility functions can be assigned to different links, provided that each utility function fulfills the above conditions. The reader is also referred to Sects. 5.7.2 and 6.2. For later references, we state the following simple observation, which is a straightforward consequence of (5.24). Observation 5.9. Suppose that the maximum in (5.22) exists. For any w > 0, p∗ given by (5.22) is a positive vector, and thus Uk (Φ(SIRk (p))) = max Uk (Φ(SIRk (p))) (5.26) max p∈P
p∈P+
k∈K
k∈K
where P+ := P ∩ RK ++ , and each link is assigned a positive transmit power in the maximum. We point out that twice differentiability of Ψ could be replaced by continuous differentiability and the Lipschitz continuity condition on the gradient of the aggregate utility function (see also Chap. 6). The second condition ensures that, in the maximum, each link is assigned a nonzero data rate, which follows from Observation 5.9 and strict increasingness of the rate function Φ. However, the most important condition is the third one since it enables us to convert the utility optimization problem into a convex problem (Chap. 6). It is pointed out that a class of functions satisfying (C.5-2)–(C.5-4) forms a proper superset of functions considered in [89, 89, 90]. These papers in fact considered a set of twice continuously differentiable and strictly increasing functions Ψ : R++ → Q satisfying RΨ (x) := −
Ψ (x)x ∈ [1, 2] . Ψ (x)
(5.27)
Now it is easy to see that this condition follows from (5.25). The converse, however, does not hold as the examples below will show. It is important to notice that, with (C.5-2)–(C.5-4) and Φ(x) = log(1 + x), x > 0, the utility function Uk (x) given by (5.23) is monotonically increasing and strictly concave, and hence satisfies the fundamental properties of the traditional utility functions defined in [70]. Observation 5.10. Let Φ(x) = log(1 + x), x > 0, and let Ψ satisfy (C.5-2)– (C.5-4). Then, Uk (x), x > 0, defined by (5.23) is a monotonically increasing and strictly concave function. Proof. It is clear that Uk is monotonically increasing so that we only need to show strict concavity. To this end, let x ˆ, x ˇ ∈ R++ be arbitrary. By concavity of Ψe , we have (1 − μ)Ψ (exˆ − 1) + μΨ (exˇ − 1) ≤ Ψ (exˆ − 1)1−μ (exˇ − 1)μ <
5.2 Problem Formulation for Wireless Networks
139
Ψ (e(1−μ)ˆx+μˇx − 1) for all μ ∈ (0, 1) where the last inequality follows since Ψ is ˆ, x ˇ>0 strictly increasing and (exˆ − 1)1−μ (exˇ − 1)μ < e(1−μ)ˆx eμˇx − 1 for all x with x ˆ = x ˇ and μ ∈ (0, 1). Since this holds for any x ˆ, x ˇ ∈ R++ , we deduce that Uk is strictly concave on R++ . As pointed out in Sect. 4.3.4, Φ(x) = log(1 + x), x ≥ 0, could be replaced by any monotonically increasing and strictly concave rate-SIR function Φ(x), provided that Ψ (Φ−1 (x)), x > 0, is strictly concave (see also the remark in Sect. 4.3.4). Remark 5.11. Sometimes it is desired to guarantee certain quality of service (QoS) requirements, for instance, in terms of some maximum delay. In such cases, it is convenient to take dom(Ψ ) = [a, +∞) with a > 0 chosen such that the QoS requirements are guaranteed, and define the value of Ψ outside of its original domain to be −∞. See also the example in the following section. It may be easily verified that (C.5-2)–(C.5-4) are satisfied by the traditional utility functions defined in (5.4) for all α ≥ 1. Thus, potential choices of the function Ψ are 1−α x α>1 Ψ (x) = Ψα (x) := 1−α x > 0. (5.28) log(x) α = 1 Another interesting family of functions is ⎧ ⎪ ⎨log(x) x Ψ (x) = Ψ˜α (x) := log 1+x ⎪ ⎩log x + α−2 1 j=1 j(1+x)j 1+x
α=1 α=2 α>2
x > 0.
(5.29)
For the analysis later in this chapter, it is important to bear in mind that the parameter α ≥ 1 in (5.28) may be a real-valued number. In what follows, we assume that α is either equal to 1 or it is a member of [2, ∞). Fig. 5.2 depicts the utility function (5.23) for different choices of Ψ and compares them to the traditional utility functions corresponding to proportional fairness and total potential delay utility criteria (α = 2) defined in Sect. 5.1.1. We see that the “new” class of utility functions is obtained by composing the traditional utilities with the function Φ−1 (x) = ex − 1, x > 0. The effect of this is a linearization of the logarithmic rate-SIR curve Φ(x). Indeed, substituting (4.22) into Uk (νk (p)) with Uk defined by (5.23) yields Uk (νk (p)) = wk Ψ (SIRk (p)) so that the utility maximization problem in (5.22) becomes p k p∗ = arg max wk Ψ (SIRk (p)) = arg max wk Ψ p∈P p∈P Ik (p) k∈K
k∈K
(5.30)
(5.31)
140
5 Resource Allocation Problem in Communications Networks
log(ex − 1)
log(x)
U(x)
0
log e e−1 x x
− x1
− ex1−1 0
x
Fig. 5.2: Assuming Φ−1 (x) = ex −1, x ∈ R, the figure compares the modified utilities U (x) = Ψ (Φ−1 (x)), x > 0, with the traditional ones U (x) = Ψ (x), x > 0, for Ψ (x) = log(x), Ψ (x) = −1/x, x > 0, and Ψ (x) = log(x)/(1 + x), x > 0.
where, unless otherwise stated, Ik : RK + → R++ is an affine interference function defined by (4.8): Ik (p) = (Vp + z)k . So, in analogy to the wired case (see (5.4)), if Ψ (x) has the form given by (5.28), p∗ in (5.31) can be regarded as being (w, α)-fair with respect to the SIR. In this book, we adopt this interpretation and refer to power allocations given by (5.31) as SIR fair or, more specifically, as (w, Ψ )-fair power allocations. Unless otherwise stated, Ψα and Ψ˜α refer to the classes of functions defined by (5.28) and (5.29), respectively. (1, Ψα )-fair policies with α = 1 are close to throughput-optimal ones if Φ(SIRk (p)) ≈ log(SIRk (p)) for each k ∈ K (the high SIR regime), since then Ψ (SIRk (p)) ≈ νk (p). On the other hand, if SIRk (p) 1 for each k ∈ K, then νk (p) = log(1 + SIRk (p)) ≈ SIRk (p). Hence, at low SIR values, the modified utility functions are good approximations of the traditional ones, that is, we have Ψ (SIRk (p)) ≈ Ψ (νk (p)). This can be seen in Fig. 5.2 where, for values of x > 0 close to zero, the modified and traditional utility functions almost coincide. For completeness, we show that the maximum in (5.31) exists. Lemma 5.12. There exists p∗ ∈ P such that wk Ψ (SIRk (p)) = wk Ψ (SIRk (p∗ )) . sup p∈P
k∈K
(5.32)
k∈K
∗ Moreover, there exists n ∈ N such that k∈K(n) pk = Pn , which in words means that at least one power constraint is active at p∗ . Proof. A standard method for showing that a function f : RK → R has a maximum value on a compact set is to argue that f is continuous on this set (see Theorem B.11). The set P is closed and bounded so that, by Theorem
5.2 Problem Formulation for Wireless Networks
141
B.3, P is compact. So the only problem is that the objective function is discontinuous on P due to the zero components in p ∈ P. However, this can be
pk ), and let p ∈ P+ be easily fixed. To this end, let F (p) = k wk Ψ ( (Vp+z) k ∗ any fixed power vector and note that p ∈ P+ by Observation 5.9. Define ¯ = {p ∈ P+ : F (p ) ≤ F (p)} ⊂ P+ ⊂ P. Clearly, P ¯ is a nonempty compact P ¯ set. Moreover, F (p) is continuous on P and wk Ψ (SIRk (p)) = sup wk Ψ (SIRk (p)) . sup p∈P
k∈K
¯ p∈P k∈K
¯ is compact, the supremum is attained. To prove the last part, assume As P
that k∈K(n) p∗k < Pn , n ∈ N . Define c = maxn∈N (1/Pn k∈K(n) p∗k ) < 1 ˜ = p∗ /c. Clearly, as z > 0, we have p ˜ ∈ P+ and F (p∗ ) < F (˜ and p p), which is a contradiction, and thus proves the lemma. Remark 5.13. It is worth pointing out that the strong convexity results in Sect. 6.4 imply that if V is an irreducible matrix (Definition A.27) and (C.5-2)– (C.5-4) hold, then p∗ is unique. The results also show that p∗ is unique for any choice of V if Ψe : R → R defined in (C.5-4) is strictly concave. 5.2.6 Efficiency-Fairness Trade-Off The parameter α ≥ 1 in (5.28) and (5.29) can be used to achieve different trade-offs between fairness and efficiency, with the efficiency expressed in terms of the aggregate utility or in terms of total throughput. Indeed, if α increases, the relative concavity of Ψα increases as well in the sense that the slope becomes steeper at low values and flatter at high ones. In economics, the quantity RΨ (x) ≥ 0 defined by (5.27) is known as the coefficient of relative risk aversion [129] and is used to measure the relative concavity of the utility function Ψ (x).10 The larger the value of RΨ (x) ≥ 0 is, the larger is the relative concavity of Ψ (x) at x > 0. In particular, for the class of utility functions Ψα given by (5.28), we have RΨα (x) = α for all x > 0. This implies that if α increases, then any solution to (5.31) can be expected to lead to a more fair allocation in the sense that the SIRs and the data rates are made more equal in some sense11 at the cost of the throughput performance. In fact, as α tends to infinity, we can observe the following, which is an application of [70, Lemma 3 and Corollary 2]. Observation 5.14. Let w > 0 be arbitrary, and let U (x) = Ψ (Φ−1 (x)), x > 0, (as in (5.23)) with Ψ (x) = Ψα (x) where Ψα is defined by (5.28). Suppose ˜ = ConvexHull(R) substituted by the that ν ∗ (α) is a solution to (5.20) with R feasible rate region R defined by (5.11). Then, as α → ∞, ν ∗ (α) converges to a max-min fair rate allocation defined below (Definition 5.16). 10
11
The absolute risk aversion RΨ (x)/x, x > 0, proposed by [130] is a widely accepted function to measure the risk aversion of the decision-maker at the point x. See also the discussion on fairness measures later in this section.
142
5 Resource Allocation Problem in Communications Networks
α−1 Proof (Outline). For α ≥ 2, we have U (x) = Ψα (Φ−1 (x)) = − −h(x) /(α− 1), x > 0, where h(x) = −1/(ex − 1), x > 0, is a continuously differentiable, monotonically increasing, negative, and strictly concave function. So, since ˜ replaced by R) is a compact and downward comprehenR in (5.20) (with R sive set (Observation 5.5), the observation follows from [70, Lemma 3 and Corollary 2]. Remark 5.15. It is worth pointing out that ν ∗ (α) defined in Observation 5.14 converges to a max-min fair rate allocation as α → ∞, regardless of the choice of w > 0. This is illustrated at the end of Sect. 5.3. Recall that the rate function Φ defined in Sect. 4.3.4 determines the relationship between the link rate and the SIR at the receiver output. By Observation 5.14, as α tends to infinity, we have an asymptotic agreement between utility maximization and max-min fairness being defined as follows. ¯ ∈ R Definition 5.16 (Max-Min Fairness). A rate vector (allocation) ν is said to be max-min fair if no rate ν¯k can be increased without decreasing ¯ ∈P some ν¯l , l = k, which is smaller than or equal to ν¯k . A power vector p with ν¯k = Φ(SIRk (¯ p)), k ∈ K, is then called a max-min fair power vector (allocation). Unless otherwise stated, max-min fairness refers to a situation where the link rates form a max-min fair rate vector.12 The definition is a restatement of Definition 5.1 for wireless networks, in which the set of all feasible rates is given by R. By Observation 5.5, R is a compact, connected and downward comprehensive subset of RK + that includes the zero vector 0. Consequently, a max-min fair rate allocation can be derived by a procedure similar to that presented in Sect. 5.1.1 (see also the proof of Observation 5.17): Starting from a zero rate allocation, increase uniformly the rate of each link until the boundary of R is achieved or, equivalently, until there is at least one active power constraint. Freeze the rate allocations of the links being subject to active power constraints.13 Then, freeze the rate allocations of those links l ∈ K for which there is a sequence of natural numbers {ln }M n=0 with l0 = l and lM = k for some M = M (k, l) ∈ N such that vlr ,lr−1 > 0 for each r = 1, . . . , M , where k is any link with an active power constraint. Continue the procedure in this manner for the remaining links until all links are frozen. Notice that each time a power constraint is activated, the procedure freezes not only the transmit powers of all links whose power constraints have been 12
13
Note that max-min fairness can be defined not only with respect to the data rate but also with respect to other performance measures. There is an active power constraint if and only if there exists n ∈ N such that
subject to an active power constraint if l∈K(n) pl = Pn . Link k is said to be
there is n ∈ N such that k ∈ K(n) and l∈K(n) pl = Pn .
5.2 Problem Formulation for Wireless Networks
143
activated, but also the transmit powers of those links which directly or indirectly (via other links) causes interference to the links limited by the power constraints. The remaining links are not frozen even if they perceive interference from the power constraint links. ¯ be a max-min fair rate allocation. Then, ν ¯ ∈ R Observation 5.17. Let ν ¯ ∈ P are unique positive vectors. and p Proof. The proof is deferred to Sect. 5.10. With Observations 5.14 and 5.17 in hand, the max-min fair power vector can be characterized as a limit of the power vectors −1 p(ν ∗ (α)) = I − Γ(ν ∗ (α))V Γ(ν ∗ (α))z ∈ P,
α≥1
∗ where Γ(ν ∗ (α)) = diag(Φ−1 (ν1∗ (α)), . . . , Φ−1 (νK (α))) and ν ∗ (α) ∈ R is de∗ fined in Observation 5.14. Since z > 0 and ν (α) > 0 (by Observation 5.9 and the fact that Φ(SIRk (p)) > 0 for any p > 0), it follows from Theorem A.51 that ρ(Γ(ν ∗ (α))V) < 1 and p(ν ∗ (α)) > 0. Moreover, by Sect. 5.2.2, p(ν ∗ (α)) is equal to p∗ given by (5.31) with Ψ (x) = Ψα (x), x > 0, defined by (5.28). Now, Observation 5.14 implies that ν ∗ (α) converges to the max-min fair ¯ as α tends to infinity. Thus, by continuity of Φ−1 (x), x ≥ rate allocation ν ˆ ν )Vˆ p + Γ(¯ ν )z ≥ 0. Considering 0, we obtain p := limα→∞ p(ν ∗ (α)) = Γ(¯ ˆ=p ¯ > 0, Observation 5.17 and Theorem A.51 shows that ρ(Γ(¯ ν )V) < 1 and p ˆ > 0 is the unique max-min fair power allocation. Hence, that is, p
−1 ¯=p ˆ = I − Γ(¯ Γ(¯ ν )z = lim p(ν ∗ (α)) . p ν )V α→∞
(5.33)
If V is an irreducible matrix (Definition A.4.1), we can also write ¯ = arg max min νk (p) = arg max min Φ(SIRk (p)) p p∈P k∈K
= arg max min SIRk (p)
p∈P k∈K
(5.34)
p∈P k∈K
where the maximum can be shown to exist and the last equality is due to strict monotonicity of Φ (see also Sects. 5.5.1 and 5.6). The irreducibility of the gain matrix V ensures that all links are coupled by interference, which in turn admits the max-min representation above. Moreover, if V is irreducible, then the filling procedure reduces as follows: Increase uniformly the rate of each link until the boundary of R is achieved or, equivalently, until the first power constraint is activated. For more information and further analysis regarding the max-min performance in wireless networks, the reader is referred to Sects. 5.6 and 5.9.1, where we address a closely related max-min SIR balancing problem. In particular, the max-min fair power allocation is a max-min SIR power allocation (Definition 5.62) that maximizes mink∈K SIRk (p) over P. The converse is not true since a max-min SIR power allocation is not unique in
144
5 Resource Allocation Problem in Communications Networks
general as the simple example at the beginning of Sect. 5.6.1 shows. However, if V is irreducible, then a max-min SIR power allocation is unique and equal to the max-min fair power allocation defined by (5.33) and (5.34). Figure 5.3 illustrates the impact of the parameter α in (5.28) on end-to-end (flow) rates in a noiseless wireless network with string topology. As predicted
Sum of Source Rates 2.4
Flow 3
Flow 4
Sum of Source Rates
Flow 2
Source Rates
Flow 1
8.4
1.8
1
6
α
11
8 16
Fig. 5.3: Throughput performance as a function of α ≥ 1 in a wireless network with string topology depicted on the left-hand side. There are five nodes, four endto-end flows, four wireless links and ten logical links (single-hop flows). The gain matrix V ≥ 0 with trace(V) = 0 was chosen randomly in such way that I + V > 0 (a nonnegative primitive matrix). The transmit powers are given by (5.31) with Ψ (x) = Ψα (x), x > 0, w = 1 and z = 0 (noiseless channel). The right picture depicts end-to-end rates and the end-to-end total throughput. The rate function is Φ(x) = log(1 + x), x ≥ 0.
by Observation 5.14, if α → ∞, then the link rates converge to the max-min fair rate allocation, which, in this case, is a rate vector with all entries being equal (see also Sect. 5.9.1). As a consequence, all the end-to-end rates converge to the same value as well because all links employ the same rate function. Fairness Measures Based on Majorization As mentioned in Sect. 5.1.1, max-min fairness represents only one possible notion of fairness corresponding to an ideal “social” network with link rates being as close to each other as possible. However, every value of α in (5.31) with (5.28) or (5.29) enforces some degree of fairness, and thus an important question is how to quantify and compare (if possible) the fairness performance of different power allocation strategies (see, for instance, [131] and references therein). Although there will be probably no definitive answer to this question, one possibility is to exploit the majorization theory [132] (see also App. B.2.2) by defining the fairness measure (index) as follows: ϑ(x) :=
T (x) , x ∈ RK + , x = 0 E(x)
(5.35)
5.2 Problem Formulation for Wireless Networks
145
K where T : RK + → R+ and E : R+ → R++ are Schur-concave and Schur-convex functions, respectively (Definition B.30), with
(C.5-5) E(x) = 0 if and only if x = 0, T (1) = E(1) as well as T (cx) = cT (x) and E(cx) = cE(x) for any c > 0. As a result, ϑ has the ray property ϑ(cx) = ϑ(x) for any c > 0, which is desired as the fairness measure should be independent of scaling. Now, we can prove the following observation. Observation 5.18. Let (C.5-5) be true. Then, each of the following is true (i) ϑ(x) ∈ [0, 1] for any x ∈ RK +. (ii) ϑ : RK + → R+ is Schur-concave. (iii) ϑ is strictly Schur-concave if T is strictly Schur-concave or E is strictly Schur-convex (Definition B.30). Proof. The claim ϑ(x) ≥ 0 for any x ∈ RK + is obvious. To see the upper bound, note that due to the ray property, one has sup ϑ(x) = x =0
sup ϑ(x) ≤
x1 =K
supx1 =K T (x) T (1) ≤ =1 inf x1 =K E(x) E(1)
where the last inequality follows from properties of Schur-concave and Schurconvex functions (see the remark after Definition B.30). This shows (i). (ii) follows from the following two facts: (a) the product of Schur-concave functions is Schur concave [132, p. 62], (b) the inverse of a Schur-convex function is Schur-concave since R++ → R++ : x → 1/x is strictly decreasing and E is Schur-convex [132, p. 61]. The strictness property (iii) may be easily verified by considering the following: (a) h(x, y) = x · y, (x, y) ∈ R2+ , is strictly increasing in each component, and (b) 1/x, x > 0, is strictly decreasing implying that the inverse of E is strictly Schur-concave whenever E is strictly Schur-convex. The strictness property ensures that the value of ϑ(ν) changes when ν ≥ 0 changes except the vector is scaled or its entries are permuted. Considering (C.5-5) and the above observation shows that, for any strictly Schur-concave function ϑ, we have ϑ(ν) = 1 if and only if ν1 = · · · = νK . In fact, this upper bound is attained if all rates are equal, regardless of the choice of E and T belonging to the set of all functions that satisfy (C.5-5). On the other hand, note that although the function ϑ is bounded below by 0 on RK + , this bound is not tight in general. The lowest value of ϑ, however, corresponds to the worst-case fairness performance, which is achieved when one link is allocated all resources and all other links are denied access to them. This is because ϑ(ν) ≥ T (ek )/E(ek ), k ∈ K, for any ν ≥ 0, ν = 0. The majorization is a partial order that reflects how “spread out” the components of the vectors are: The more equal the entries of a vector are, the smaller the vector is in the sense of majorization. In particular, we have 1 ≺ y and y ≺ Kek for all vectors y ∈ RK + with y1 = K. Hence, the more
146
5 Resource Allocation Problem in Communications Networks
“spread out” the entries of x are, the lower will be the value of ϑ(x) whenever T is strictly Schur-concave or E is strictly Schur-convex. Due to Observation 5.18 and the above discussion, the function ϑ is a reasonable measure for comparing and quantifying the fairness performance of different resource allocation strategies. Given some ϑ : RK + → R+ , the ˆ is said to be more fair than ν ˇ if ϑ(ˇ rate allocation ν ν ) < ϑ(ˆ ν ), where the exact values of ϑ(ˆ ν ) and ϑ(ˇ ν ) depend on the choice of the functions T and E. Reasonable choices of T are
• the harmonic mean TH (x) = K( k∈K 1/xk )−1 , x > 0, ! 1/K • the geometric mean TG (x) = ( k∈K , x ≥ 0,
xk ) 1 • the arithmetic mean TA (x) = K k∈K xk , x ≥ 0, and • the minimum function TM (x) = mink∈K xk , x ≥ 0. Note that TH and TG are strictly Schur-concave and TM (x) ≤ TH (x) ≤ TG (x) ≤ TA (x) for every x ∈ RK ++ [133]. As far as the function E is concerned, convenient choices are • K −1/q xq , q ≥ 1, x ≥ 0 and • EM (x) = maxk∈K xk where the first one is strictly Schur-convex for every q > 1. We complete this subsection by pointing out that the relationship between the choice of the utility function Ψ in the power control problem (5.31) and the value of ϑ(ν ∗ ) with ν ∗ = (Φ(SIR1 (p∗ )), . . . , Φ(SIRK (p∗ ))) seems to be an open problem. Of particular interest here is the impact of the coefficient of relative risk aversion RΨ defined by (5.27) on the fairness performance expressed in terms of the function ϑ. From Observation 5.14, we merely know that the rate vector converges to the max-min fair rate allocation as α = RΨα tends to infinity. Note that even if ν ∗ is the max-min fair rate allocation, we may have ϑ(ν ∗ ) < 1 as the max-min fair rates are not necessarily equal, unless V is irreducible (see also the discussion after (5.34)). 5.2.7 Kuhn–Tucker Conditions The power control problem (5.31) is a nonlinear program with inequality constraints on transmit powers. The constraint set related to the power constraints is the admissible power region P ⊂ RK , which can be written as P = U ∩ RK U := x ∈ RK : ∀n∈N fn (x) ≤ 0 (5.36) +
where fn (x) := k∈K(n) xk − Pn . As U is a convex polyhedron and Ψ, fn , n ∈ N , are differentiable functions on the open set RK ++ , we can use Lagrangian optimization theory (App. B.4.3) to specify optimal power vectors by the Kuhn–Tucker conditions (Definition B.50). First we assume that only (C.5-2)– (C.5-3) hold. By the proof of Lemma 5.12, we see that the maximum exists.
5.2 Problem Formulation for Wireless Networks
147
Due to a polyhedral structure of U, it follows from Lemma B.52 that the Kuhn–Tucker constraint qualification is satisfied at any p ∈ U. By Observation 5.9, the nonnegativity constraints are inactive, that is, they are negligible by complementary slackness condition. A consequence is that it sufficient to N → R associated consider the following Lagrangian function L : RK ++ × R 14 with the power control problem: L(p, λ) = wk Ψ (SIRk (p)) − λn fn (p) (5.37) k∈K
n∈N
where λn ∈ R, n ∈ N . So, as (C.4-4) holds, the Kuhn–Tucker conditions and Theorem B.53 yield that if p∗ is given by (5.31), then there exists (λ∗1 , . . . , λ∗N ) such that vl,k SIRl (p∗ )gl (p∗ ) − λ∗n = 0, k ∈ K(n), n ∈ N (5.38) gk (p∗ ) − l∈K
λ∗n fn (p∗ ) = 0, λ∗n ≥ 0, n ∈ N fn (p∗ ) ≤ 0, n ∈ N .
(5.39) (5.40)
Here, gk (p) =
wk Ψ (SIRk (p))/Ik (p), k ∈ K, with the affine interference function Ik (p) = k∈K vk,l pl + zk defined by (4.8) and λ∗n , n ∈ N , are Lagrange multipliers which, due to the fulfillment of the Kuhn–Tucker constraint qualification, exist and (λ∗1 , . . . , λ∗N ) = 0. Using g(p) = (g1 (p), . . . , gK (p)) and D(p) = diag(SIR1 (p), . . . , SIRK (p)), we can write (5.38) as ⎞ ⎛ ∗ λ1 1|K(1)| ⎟ ⎜ .. K I − VT D(p∗ ) g(p∗ ) = λ∗ := ⎝ (5.41) ⎠ ∈ R+ . ∗ λN 1|K(N )| where 1n = (1, . . . , 1) (n times) and the power vector p∗ satisfies the power constraints. These constraint conditions can be written in a compact matrix form by noting that (I − D(p)V)p = D(p)z must hold for any p ∈ P. This leads us to the following theorem. Theorem 5.19. If p∗ is a solution to (5.31), then15 ⎧ T ∗ ∗ −1 ∗ ⎪ λ ⎨g(p ) = I − V D(p ) ∗ ∗ −1 ∗ p = (I − D(p )V) D(p )z ∈ P ⎪ ⎩ ρ(D(p∗ )V) < 1 .
(5.42)
∗ for some λ∗ ∈ RK + , λ = 0, satisfying (5.39). 14 15
Note that in contrast to App. B.4.3, we deal here with a maximization problem. Note that since P is a compact subset of RK + , the second condition actually implies the third one. Nevertheless, we state it explicitly to avoid misunderstandings.
148
5 Resource Allocation Problem in Communications Networks
Proof. If p∗ is a solution to (5.31), then p∗ ∈ P and, with SIRk (p∗ ) = p∗k /(Vp∗ + z)k , k ∈ K, we have (I − D(p∗ )V)p∗ = D(p∗ )z. By Observation 5.9, p∗ ∈ P is a positive vector. So, by Theorem A.51 and the fact that z > 0, we have ρ(D(p∗ )V) < 1 and p∗ = (I − D(p∗ )V)−1 D(p∗ )z ∈ P. )V) = ρ(VT D(p∗ )) < 1, Theorem A.55 implies that Furthermore, as ρ(D(p∗
∞ T (I − V D(p∗ ))−1 = I + j=1 (VT D(p∗ ))j exists and is a nonnegative matrix. Thus, as a constraint qualification condition holds for the problem (5.31), the Kuhn–Tucker conditions are satisfied by p∗ and the theorem follows with (5.41) as the corresponding form of the Kuhn–Tucker conditions. Now we show that the converse holds. In order to prove this, we apply the results of the subsequent section, where the power control problem is considered in a so-called QoS domain. Theorem 5.20. Let p∗ satisfy (5.42) and let (C.5-4) (in addition to (C.5-2)– (C.5-3)) be true. Then, p∗ is a solution to (5.31). Proof. As shown in the next section, there is a one-to-one relationship between solutions to (5.31) and (5.57). By Theorem 5.32, the problem (5.57) is convex so that every local solution is globally optimal. Due to the one-toone relationship, this let us conclude that (5.31) is globally solvable as well (Definition B.40). Hence, by the Kuhn–Tucker theorem (Theorem B.53), if p∗ satisfies the Kuhn–Tucker conditions (5.42) (with (5.39)), then p∗ is a global solution to (5.31). A Remark on Continuously Differentiable Interference Functions Before completing this section, we point out that Theorem 5.19 can be somewhat extended to a more general class of interference
functions that includes affine interference functions of the form Ik (p) = l∈K vk,l pl + zk (see also (4.8)) as special cases. One example of a nonlinear interference function follows from Lemma 4.11 in Sect. 4.3.2. This lemma provides a closed-form expression for the interference power at the output of a receiver, which is optimal in the sense of maximizing the signal-to-interference ratio for any given power vector. Other examples of interference functions can be found in Sect. 5.5.2, where we introduce an axiomatic definition of interference function. In what follows, let us assume that Ik : RK + → R, k ∈ K, is an interference function satisfying the following conditions. (C.5-6) For each k ∈ K, Ik (p) > 0, p > 0. (C.5-7) The maximum in (5.31) exists and is attained for some p∗ > 0. (C.5-8) ∇Ik (p), k ∈ K, exists and the partial derivatives are continuous on RK ++ . Moreover, ∂Ik (p)/∂pl ≥ 0 for each k, l ∈ K and all p > 0. If the interference functions are continuously differentiable, then (C.5-6) and (C.5-8) are the most reasonable assumptions to make about the interference functions.
5.2 Problem Formulation for Wireless Networks
149
Since the interference functions are positive on RK ++ and the Kuhn–Tucker constraint qualification is satisfied, we can, just as in the derivation of (5.38)– (5.40), invoke the Kuhn–Tucker theorem to conclude that if p∗ solves (5.31) with interference functions Ik satisfying (C.5-6)–(C.5-8), then we can find a nonnegative vector λ∗ = (λ∗1 , . . . , λ∗N ) of the form defined in (5.41) such that gk (p∗ ) −
∂Il (p∗ )SIRl (p∗ )gl (p∗ ) − λ∗n = 0, k ∈ K(n), n ∈ N ∂pk
(5.43)
l∈K
λ∗n fn (p∗ ) = 0, λ∗n ≥ 0, n ∈ N fn (p∗ ) ≤ 0, n ∈ N .
(5.44) (5.45)
where the constraint function fn , n ∈ N , is given by (5.36). Hence, the only ∗ difference to (5.38)–(5.40) is that,
in (5.43), we have (∂Il /∂pk )(p ∗) instead of vl,k . In particular, if Il (p) = j∈K vl,j pj + zl , then (∂Il /∂pk )(p ) = vl,k , in which case (5.43)–(5.45) is equal to (5.38)–(5.40). In the matrix form, (5.43) becomes T I − ∇I(p∗ ) D(p∗ ) g(p∗ ) = λ∗ ∈ RK (5.46) + K where D(p) is already used in (5.41), I : RK + → R++ is the vector-valued interference function given by I(p) = (I1 (p, . . . , IK (p)) and ∇I : RK ++ → RK×K is its Jacobian matrix whose kth row is the gradient of Ik (p) with respect to p. Now, it follows from SIRk (p∗ ) = p∗k /Ik (p∗ ) that p∗ ∈ P must fulfill p∗ = D(p∗ )I(p∗ ) where, by (C.5-6) and (C.5-7), D(p∗ ) is positive definite and p∗ is positive. By (C.5-8), the Jacobian matrix ∇I(p) is nonnegative for all p > 0. Hence, since g(p∗ ) > 0, it can be deduced from (5.46) that ρ(D(p∗ )∇I(p∗ ))) = ρ ∇I(p∗ ))T D(p∗ ) ≤ 1 must hold. This is summarized in an observation.
Observation 5.21. Suppose that (C.5-6)–(C.5-8) hold. Then, if p∗ is a solution to (5.31), then there exists λ∗ ∈ RK + fulfilling (5.44) such that ⎧ ∗ ∗ T ∗ ∗ ⎪ ⎨ I − ∇I(p ) D(p ) g(p ) = λ p∗ = D(p∗ )I(p∗ ) ∈ P ⎪ ⎩ ρ D(p∗ )∇I(p∗ ) ≤ 1 .
(5.47)
It is important to emphasize that the last inequality in (5.47) does not need to be strict as we may have λ∗ = 0, which is true by (5.44) whenever ∗ fn (p∗ ) < 0 for eachn ∈ N . However, if λ > 0, we could invoke Theorem A.51 ∗ ∗ to conclude that ρ D(p )∇I(p ) < 1 must hold for (5.46) to be satisfied. This is also true if λ∗ ≥ 0 has at least one positive component and ∇I(p∗ ) is irreducible, which can be concluded from Theorem A.52.
150
5 Resource Allocation Problem in Communications Networks
5.3 Interpretation in the QoS Domain In this section, we are going to formulate the power control problem in the QoS domain, where the function Ψ introduced in the previous section is interpreted as a SIR-QoS mapping that relates a QoS parameter of interest and the signalto-interference ratio at the output of a linear receiver. In this interpretation, Ψ can be either strictly increasing or strictly decreasing depending on whether a larger value of Ψ (SIRk (p)), p ∈ P, implies a better QoS for flow k (as in the case of data rate)16 or smaller values of Ψ (SIRk (p)) are desired (as in the case of delay). However, in order to be conform with (C.5-2)–(C.5-4), the function Ψ is assumed to be a strictly increasing one. For simplicity and consistency with Chap. 6, strictly decreasing SIR-QoS mappings are represented by the negative version of Ψ : ψ(x) := −Ψ (x),
x > 0.
(5.48)
In other words, the SIR-QoS mapping is either Ψ or −Ψ where, in the latter case, we use ψ. Widely considered QoS parameters are delay, bit error rate and data rate but other quantities such as effective bandwidth [101] and effective spreading gain [32] have been also considered in the literature (see also a brief discussion later in this section). Let Q ⊆ R be an interval on the real line and suppose that U : R++ → Q is a twice continuously differentiable and strictly monotonic (bijective) function. In contrast to the preceding section, we allow the function U (x) to be either strictly increasing or strictly decreasing. The function value at x ∈ R++ can be interpreted as the degree of satisfaction of a link flow with the service quality if the link rate is equal to x. Now suppose that ωk ∈ Q is a QoS parameter value of flow k, and let ω = (ω1 , . . . , ωK ) ∈ QK be a QoS vector. This can be a vector of delays, data rates or other QoS parameter values of interest.17 Definition 5.22 (Feasibility). We say that a QoS vector ω is feasible if there exists a power vector p ∈ P such that ωk ≤ U (Φ(SIRk (p)))
U strictly increasing
ωk ≥ U (Φ(SIRk (p)))
U strictly decreasing
(5.49)
where Φ : R+ → R+ is a given strictly increasing rate function discussed and defined in Sect. 4.3.4. The meaning of feasibility is explained in Sect. 5.5 in the context of QoSbased power control (see also (5.51) below). Note that this definition implies a one-to-one relationship between the QoS parameter value of interest and the 16
17
In fact, note that (5.20) can be viewed as an equivalent formulation of the problem (5.22) in the QoS domain when the considered QoS parameter is the data rate. Notice that QoS parameter values are relative values and are not expressed in any absolute units like, for instance, seconds (delay) or bits per second (data rate).
5.3 Interpretation in the QoS Domain
151
signal-to-interference ratio at the receiver output. Let us characterize the set of all feasible QoS parameter values. To this end, define a function γ(x) with dom(γ) = Q as follows (the interpretation of the functions U, Φ, γ and Ψ is illustrated in Fig. 5.4) γ(U (x)) = Φ−1 (x), x ∈ R++ .
(5.50)
So γ(x) is positive for all x ∈ Q, which is in fact equivalent to assuming that each link rate is positive. Since Φ−1 (x) is strictly increasing, it is clear that if U is strictly increasing (decreasing), then γ is strictly increasing (decreasing) as well. Now combining (5.49) with (5.50), and then proceeding essentially as in Sect. 5.2.2 shows that ω ∈ QK is feasible if and only if there is a power vector p ∈ P such that ∀1≤k≤K γ(ωk ) ≤ SIRk (p)
(5.51)
which can be written in a matrix form as (see also Sect. 5.2.2 for the special case of the data rate) Γ(ω)z ≤ (I − Γ(ω)V)p
(5.52)
where Γ(ω) = diag(γ(ω1 ), . . . , γ(ωK )) represents the minimum signal-tointerference ratios that are necessary to provide the QoS vector ω to the flows. Now let Fγ (P) be a subset of QK defined as follows: Fγ (P) := ω ∈ QK : ρ(Γ(ω)V) < 1, p(ω) ∈ P (5.53) with
p(ω) = (I − Γ(ω)V)−1 Γ(ω)z .
(5.54)
By Theorem A.51 and Γ(ω)z > 0 (due to positivity of z and γ(ωk ) for each k ∈ K), we know that p(ω) ∈ RK ++ exists (and is unique) if and only if the spectral radius of the matrix Γ(ω)V satisfies ρ(Γ(ω)V) < 1. Hence, the condition ρ(Γ(ω)V) < 1 in (5.53) ensures that the inverse matrix in the definition of p(ω) exists and p(ω) is a positive vector. Also, it is worth pointing out that if ρ(Γ(ω)V) < 1, then Theorem A.55 implies that (I − Γ(ω)V) is an M-matrix, and hence, by Definition A.54, its inverse exists and is nonnegative. Remark 5.23. An alternative characterization of the feasible QoS region Fγ (P) can be found in Sect. 5.6.4. This alternative characterization, which is an immediate consequence of Theorem 5.74 (see also Remark 5.75), is equal to the intersection of sublevel sets of the spectral radii of certain nonnegative matrices. Thus, it may provide some advantages over (5.53) due to the applicability of the results presented in Chap. 1. Also, the subsequent results in this section may be easier to prove and comprehend using the characterization presented in Sect. 5.6.4.
152
5 Resource Allocation Problem in Communications Networks
The following lemma shows that the vector p(ω) is the component-wise minimum power vector for which (5.49) is satisfied. Furthermore, for later references, it also states that p(ω) is monotonic with respect to the partial ordering (A.1). Lemma 5.24. Suppose that ρ(Γ(ω)V) < 1 for a given ω ∈ QK . Then, equality holds in (5.49) if and only if p = p(ω), and p(ω) > 0 is the unique vector ˆ ∈ Fγ and such that p(ω) ≤ p for all p ≥ 0 satisfying (5.49). For any ω ˇ ∈ Fγ with ω ˆ ≤ ω, ˇ we have p(ω) ˆ ≤ p(ω) ˇ if γ is strictly increasing and ω ˇ ≤ p(ω) ˆ if γ is strictly decreasing. p(ω) Proof. If p(ω) > 0 exists, then, by the construction, it satisfies (5.49) with equality. By Theorem A.51 and Γ(ω)z > 0, p(ω) indeed exists and is unique. Moreover, by (5.52) and (5.54), we have 0 ≤ (I − Γ(ω)V)(p − p(ω)) for all p satisfying (5.49). Multiplying both sides of this inequality by (I−Γ(ω)V)−1 ≥ 0, which exists, yields 0 ≤ p−p(ω) for any p fulfilling (5.49). The second part of the lemma becomes apparent when we apply the Neumann series (Theorem A.16) to (5.54) and consider nonnegativeness of Γ(ω), V and z for all ω ∈ Fγ . If p(ω) ∈ P, then, of course, ω is feasible as, by Definition (5.54), the vector p = p(ω) satisfies (5.49) with equality. By the above lemma, the converse holds as well since P is a downward comprehensive set (see Remark 4.13). Therefore, if p fulfills (5.49) and p ∈ P, then p(ω) ∈ P. These observations are summarized in a theorem. Theorem 5.25. ω ∈ QK is feasible if and only if p(ω) ∈ P where p(ω) is defined by (5.54). The theorem justifies the following definition. Definition 5.26 (Feasible QoS Region). The set Fγ (P) ⊂ QK ⊆ RK is called the feasible QoS region. By the above discussion, it is clear that the set Fγ := {ω ∈ QK : ρ(Γ(ω)V) < 1} ⊂ Fγ (P)
(5.55)
is the feasible QoS region when there are no constraints on transmit powers. Note that if U (x) = x, which implies γ(x) = ex − 1, x > 0, then the closure cl(Fγ (P)) of Fγ (P) is the feasible rate region R defined by (5.15). Remark 5.27. Considering Chap. 2 and, in particular, Sect. 2.2 reveals that Fγ (P) is closely related to the feasibility set F(Pt ; P1 , . . . , PK ) defined by (2.13). Indeed, if X(ω) = Γ(ω)V and z = 1, then F(Pt ; P1 , . . . , PK ) = Fγ (Pt ∩ Pi ) K where Pt := {x ∈ RK + : x1 ≤ Pt } and Pi := {x ∈ R+ : ∀k∈K xk ≤ Pk }. In special cases of 1) no power constraints, 2) a total power constraint and 3) individual power constraints on each link, we have (respectively)
5.3 Interpretation in the QoS Domain
Fγ = F Fγ (Pt ) = F(Pt ) Fγ (Pi ) = F(P1 , . . . , PK )
(F defined by (2.5)) (F(Pt ) defined by (2.9)) (F(P1 , . . . , PK ) defined by (2.11))
153
(5.56)
where it is assumed that z = 1. For later references, we note the following one-to-one correspondences, which follow from the preceding remark and Lemma 2.11. Observation 5.28. There is a one-to-one correspondence between Fγ (P) and K P+ = P ∩ RK ++ as well as between Fγ and R++ . Fig. 5.4 illustrates the relationship between the feasible QoS region, the feasible rate region defined in Sect. 5.2.2 and the feasible SIR region, which is equal to cl(Fγ (P)) with γ(x) = x, x > 0 (see also Sects. 5.4.3 and 5.5.1). If we take the intersections of the feasible rate region and the feasible SIR region with RK ++ , then there are one-to-one correspondences between these regions and these correspondences are defined by the functions Φ and U , which in turn determine Ψ and its inverse γ. Ψ ω2
ν2
SIR2
U Φ Feasible QoS region
Feasible rate region ω1
Φ−1 Feasible SIR region ν1
SIR1
γ
Fig. 5.4: Any point in the feasible QoS region Fγ (P) can be expressed in terms of the rate vector as (U (ν1 ), . . . , U (νK )) for some unique ν = (ν1 , . . . , νK ) ∈ R where R is the feasible rate region defined by (5.11); it may also be written as (Ψ (SIR1 ), . . . , Ψ (SIRK )) for uniquely determined SIR levels SIRk , k ∈ K. The function γ is the inverse function of Ψ so that (γ(ω1 ), . . . , γ(ωK )) for some ω ∈ Fγ (P) is a point in the feasible SIR region. The function Φ and its inverse Φ−1 relate the feasible SIR region and the feasible rate region. In this example, each function is strictly increasing and the sets are downward comprehensive.
Up to this point, we have used the notion of downward comprehensivity in connection with the admissible power region and the feasible rate region (see Remark 4.13 and Sect. 5.2.2). The following definition straightforwardly extends the concept of comprehensivity for the feasible QoS region, which is needed in the subsequent observation and later in this chapter.
154
5 Resource Allocation Problem in Communications Networks
Definition 5.29. We say that Fγ (P) is downward (respectively, upward) comˆ (respectively, ˆ ∈ Fγ (P), both ω ∈ QK and ω ≤ ω prehensive if, for any ω ˆ ≤ ω) imply ω ∈ Fγ (P). ω The following simple observation is used in the remainder of this section. Observation 5.30. Fγ (P) is a connected set and its interior int(Fγ (P)) relative to RK is nonempty. Moreover, if γ is strictly increasing (respectively, decreasing), then Fγ (P) is downward (respectively, upward) comprehensive. The connectedness immediately follows from Observation 2.3 and the above discussion. The nonemptiness of int(Fγ (P)) follows from the nonemptiness of int(P) = int(P+ ) and the fact that p(ω) given by (5.54) is a biLipschitz bijection18 from Fγ (P) onto P+ . To see comprehensivity, consider Lemma 5.24 to conclude that if γ is strictly increasing (respectively, decreasing), then, for any ω ∈ Fγ (P), we have p(ω) ≥ p(u), ω ≥ u ∈ QK , (respectively, p(ω) ≥ p(u), ω ≤ u ∈ QK ). Thus, since P is downward comprehensive (Remark 4.13) and p(ω) ∈ P, one obtains u ∈ Fγ (P) implying that Fγ (P) is downward (respectively, upward) comprehensive. Now we are in a position to state the power control problem in the QoS domain. First assume that a larger value of ωk implies a better QoS for link k ∈ K. Then, the utility-based power control problem can be equivalently stated as the problem of finding a QoS vector ω ∗ ∈ Fγ (P) such that ω ∗ = arg max wT ω, ω∈Fγ (P)
w ∈ RK ++
(5.57)
where the maximum is assumed to exist and w > 0 is a given weight vector. A role of the weight vector in the considered power control problem is briefly described in Sect. 5.2.4. In contrast, if a smaller value of ωk implies a better QoS performance, then ω ∗ = arg min wT ω, ω∈Fγ (P)
w ∈ RK ++
(5.58)
where it is assumed that the minimum exists. Now it follows from (5.57) and (5.58) why convexity of the feasible QoS region is a highly desired property. Indeed, if Fγ (P) is a convex set, the problem in the QoS domain simply reduces to finding a vector ω ∗ (which exists by assumption) at the boundary of Fγ (P) where the hyperplane with the normal vector w supports the feasible QoS region. The corresponding power vector is then p(ω ∗ ) ∈ P where p(ω) is given by (5.54). Lemma 5.12 then implies that ω ∗ is a boundary point of Fγ (P). 18
Bijectivity is a consequence of Lemma 2.11 and bi-Lipschitz is then due to boundedness of the partial derivatives of p(ω) on Fγ (P), which exist by virtue of (C.5-3), (5.50) (see also (5.60)) and ρ(Γ(ω)V) < 1, ω ∈ Fγ (P).
5.3 Interpretation in the QoS Domain
155
Remark 5.31. For convenience, unless otherwise stated, the boundary ∂Fγ (P) of Fγ (P) and the boundary of cl(Fγ (P) always refer to points ω ∈ QK such that p(ω) satisfies some power constraints with equality. Formally, we have ∂cl(Fγ (P)) = cl(∂Fγ (P)) and , (5.59) pk (ω) = Pn . ∂Fγ (P) = ω ∈ QK : p(ω) ∈ P and ∃n∈N k∈K(n)
We are not interested in other boundary points (if they exist) because, by Lemma 5.12, utility-based power control (5.31) always correspond to points belonging to ∂Fγ (P) defined by (5.59). Thus, we have ω ∗ ∈ ∂Fγ (P). Also, note that according to this convention, Fγ (P) is strictly convex (in the sense of Definitions 1.44 and 2.16) if every boundary point of Fγ (P) cannot be written as a convex combination of any two other points of Fγ (P). Since P is a convex set, it follows from (5.53) that Fγ (P) is convex if ρ(Γ(ω)V) and pk (ω), k ∈ K, are convex functions of ω ∈ QK . Hence, by Corollary 1.42 and Theorem 2.6 (see also Corollary 2.9), we can conclude that both Fγ and Fγ (P) are convex sets if γ(x) is log-convex on Q. This raises the question of whether γ(x) is log-convex when (C.5-2)–(C.5-4) are satisfied. To answer this question, we combine the identity Ψ (Φ−1 (x)) = U (x), x > 0, (see Fig. 5.4) with (5.50) to obtain Ψ (γ(x)) ≡ x, x > 0 γ strictly increasing ψ(γ(x)) ≡ x, x > 0
γ strictly decreasing
(5.60)
where ψ is given by (5.48). By (5.60), if γ(x) is strictly increasing (respectively, decreasing), then Ψ (x) (respectively, ψ(x)) is its inverse function. Now, Theorem B.36 in App. B.3.1 asserts that γ(x) is log-convex if and only if Ψe (x) = Ψ (ex ) is concave or if and only if ψe (x) = ψ(ex ) is convex depending on whether γ(x) is a strictly increasing or decreasing function. Consequently, if (C.5-2)–(C.5-4) are satisfied, then γ(x) is log-convex, and therefore, by Theorem 2.6, the corresponding feasible QoS region is a convex set. Furthermore, if V ≥ 0 is irreducible, then Theorem 2.15 (see also Corollary 2.17) implies that Fγ (P) is strictly convex. Thus, we have proven the following. Theorem 5.32. If (C.5-2)–(C.5-4) hold, then Fγ (P) is a convex set, and hence both (5.57) and (5.58) are convex problems.19 Moreover, if V ≥ 0 is irreducible, then Fγ (P) is strictly convex in the sense of Definition 2.16. Since the problems (5.57) and (5.58) are just reformulations of the utilitybased power control problem (5.31), it follows from Lemma 5.12 and the above observations that the maximum in (5.57) and the minimum in (5.58) exist if γ is log-convex. Another interesting property of Fγ (P) is the following. 19
For a definition of a convex problem, see Sect. B.4.3.
156
5 Resource Allocation Problem in Communications Networks
Theorem 5.33. Suppose that (C.5-2)–(C.5-4) hold and Ψ (γ(x)) = x, x > 0, (respectively, ψ(γ(x)) = x, x > 0). Then, each of the following holds. (i) Fγ (P) has a maximal (respectively, minimal) point, and every such a point is a boundary point. (ii) If V is irreducible, then every boundary point of Fγ (P) is its maximal (respectively, minimal) point. For the definition of maximal (respectively, minimal) points, we refer to Definition B.4 in the appendix. Proof. See Sect. 5.10. Remark 5.34. Maximal and minimal points with respect to the partial ordering (A.1) (component-wise inequality) are called Pareto optimal. So, using this terminology, the theorem says that there is always a Pareto optimal point and every boundary point of Fγ (P) is Pareto optimal if V is irreducible. In the subsequent discussion, γ is assumed to be strictly increasing. So, under the assumption of a utility function Ψ satisfying (C.5-2)–(C.5-4) and ˆ ∈ ∂Fγ (P), if ω ˆ ≤ ω for irreducibility of V, the theorem states that, for every ω ˆ Note that, depending on the power constraints, some ω ∈ Fγ (P), then ω = ω. the irreducibility property might not be necessary for the boundary points to be maximal ones. For instance, in the case of a total power constraint Pt , every boundary point of Fγ (Pt ) is maximal even if V = 0. On the other hand, if V = 0 and transmit powers are subject to individual power constraints Pi , then Fγ (Pi ) with any strictly increasing function γ has only one maximal point which is its maximum point (see the left part of Fig. 5.5 and the definition of a maximum point in App. B.1). This simple example shows that the condition on irreducibility of V cannot be relaxed in general, that is, for an arbitrary convex set P. In general, a point ω ∈ C is the (unique) maximum point of some (not necessarily convex) set C ⊂ RK if and only if ω ∈ C is a unique maximizer of x → wT x over C for all w > 0 (as in the left part of Fig. 5.5). Furthermore, it is known (see also the proof below) that if x ∈ C maximizes x → wT x over C for some w > 0, then x is a maximal point of C. The converse is not true in general but if C is a convex set, then for any maximal point ω of this set, there exists w ≥ 0 (nonnegative) such that ω maximizes x → wT x over C [16, pp. 54–56]. By Theorem 5.33, every boundary point of Fγ (P) is a maximal (minimal) point of Fγ (P) if V is irreducible. The following theorem shows that every boundary point maximizes (minimizes) wT x over x ∈ Fγ (P) for some positive weight vector w. Theorem 5.35. Suppose that (C.5-2)–(C.5-4) is true. Let Ψ, γ, ψ be related to each other by (5.60), and let V be irreducible. Then, ω ∈ ∂Fγ (P) with γ being strictly increasing (respectively, decreasing) if and only if there exists w > 0 such that ω maximizes (respectively, minimizes) x → wT x over Fγ (P).
5.3 Interpretation in the QoS Domain
157
Proof. The proof can be found in Sect. 5.10. Theorems 5.33 and 5.35 imply that if their conditions are satisfied, γ is strictly increasing and V is irreducible, then every boundary point of Fγ (P) is both maximal and maximizes x → wT x over Fγ (P) for some w > 0.20 The middle part of Fig. 5.5 depicts such a case. In the left part of Fig. 5.5, we have individual power constraints with mutually orthogonal links (V = 0) so that there are non-maximal points on the boundary of Fγ (P) that maximize x → wT x over Fγ (P) for some vector w having at least one zero entry. Finally, the right part of the figure shows the case where Fγ (P) is not convex.
ω2
ω ∗ is the maximum point ω2
ω ∗ is a maximal point
ω2
w>0
w>0
ˆ is a maximal point ω
Fγ (P)
Fγ (P)
Fγ (P) ω1
ω1
ω1
Fig. 5.5: The feasible QoS regions Fγ (P) for three different choices of γ, P and V ≥ 0. Left: ω ∗ is the (unique) maximum point of Fγ (P) as it maximizes x → wT x over Fγ (P) for any choice of w > 0. Other boundary points are not maximal although they maximize the inner product for a nonnegative weight vector. Middle: Every boundary point is maximal and maximizes the inner product for some positive weight ˆ However, there exists vector. Right: Fγ (P) is not convex but has a maximal point ω. ˆ ≥ wT x for all x ∈ Fγ (P). no w ≥ 0 for which wT ω
Note that because of the strict monotonicity property, the increase of the SIR always leads to a better quality of service. Below we present two interesting examples of strictly monotonic functions whose inverse functions are log-convex. (a) Data rate in the high SIR regime: When SIRk (p) 1, we have log(1 + SIRk (p)) ≈ log(SIRk (p)). Thus, at high SIR values, the relationship between the data rate and the signal-to-interference ratio is well approximated by Ψ (x) = log(x), x > 0. The inverse function γ(x) = ex is logconvex on x ∈ R, which together with Theorem 5.32 shows that the corresponding feasible QoS region Fγ (P) is a convex set. (b) Average customer time for an M/M/1 queuing system in the low SIR regime: If SIRk 1, then the data rate is approximately linear in SIR since then log(1 + SIRk ) ≈ SIRk . On the other hand, the average customer time is 1/(ν − λ), ν > λ, where ν and λ denote service rate and 20
If γ is strictly decreasing, then the words “maximal” and “maximizes” should be replaced by “minimal” and “minimizes”, respectively.
158
5 Resource Allocation Problem in Communications Networks
arrival rate, respectively [2]. Thus, in the low SIR regime, the average customer time in an M/M/1 queuing system is the inverse function of SIR. So, provided that λ ∈ (0, ν) is not too large, utility-based power control strategy (5.31) with Ψ (x) = −ψ(x) = −1/x, x > 0, is a good approximation for minimizing the total delay. Of course, the inverse function of ψ(x) = 1/x, x > 0, is γ(x) = 1/x, x > 0, which is a log-convex function implying that Fγ (P) is a convex set. Note that the convexity property holds even if ψ(x) = 1/(x − λ) for some λ ∈ (0, x). This, however, requires that data rates (under the linear approximation) greater than λ can be guaranteed on every link. If (λ, . . . , λ) does not lie in the feasible rate region R (which is equal to the feasible SIR region under the linear approximation), the problem could be resolved by an appropriate link scheduling, provided that (λ, . . . , λ) is in the convex hull of R. Illustration of the Efficiency–Fairness Trade-off In the remainder of this section, let us again turn our attention to the class of strictly increasing SIR-QoS mappings Ψ (x) = Ψα (x), x > 0, α ≥ 1, defined by (5.28). We use the preceding results to illustrate the impact of the parameter α on the fairness performance under utility-based power control considered in Sect. 5.2.5. In particular, we illustrate Observation 5.14, which states that ˜ replaced by the feasible rate region R, converges a solution to (5.20), with R to the max-min fair rate allocation (Definition 5.16) as α → ∞ (see also the remark and the discussion after Observation 5.14). Recall that unless otherwise stated, the rate function is assumed to be Φ(x) = log(1 + x), x > 0 and the feasible rate region R is defined by (5.11). We consider a network with two links and assume that the QoS vector ω ∈ R2 is feasible if and only if there exists p ∈ P such that ω = (ω1 , ω2 ) ≤ log(SIR1 (p)), log(SIR2 (p)) . (5.61) As a consequence, the feasible QoS region Fγ (P) is given by (5.53) with K = 2 and γ(x) = ex , x ∈ R. The latter is equivalent to saying that the SIR-QoS mapping is Ψ (x) = log(x), x > 0. For brevity, let us assume that P = {p ∈ R2+ : p1 ≤ Pt } for some Pt > 0 (a sum power constraint). Moreover, it is assumed that ¯ ⊂ P = ∅ such that mink∈{1,2} SIRk (p) 1 for all p ∈ P. ¯ (C.5-9) there is P First of all, the assumption implies that the intersection of the nonnegative orthant R2+ with the feasible QoS region Fγ (P) ⊂ R2 defined by (5.61) is a nonempty set. But, more importantly, (C.5-9) ensures that Φ(SIRk (p)) = log(1 + SIRk (p)) ≈ log(SIRk (p)) ¯ and hence Fγ (P) ¯ ⊂ Fγ (P) is a good approximation of the for all p ∈ P, feasible rate region R in a so-called high SIR regime (see Sect. 5.4.2).
5.3 Interpretation in the QoS Domain
159
Figure 5.6 shows exemplarily Fγ (P)∩R2+ . In addition, the figure depicts the sets of all nonnegative QoS values that can be achieved in the sense of (5.61) by means of utility-based power control (5.31) with Ψ (x) = Ψα (x), α ≥ 2. In other words, each set depicted in Fig. 5.6 contains all QoS vectors ω = (ω1 , ω2 ) such that 0 ≤ ω ≤ (log(SIR1 (p∗ )), log(SIR2 (p∗ ))) where p∗ = p∗ (α) is a solution to (5.31) with Ψ (x) = Ψα (x), α ≥ 1, for some positive weight vector w. Let us denote these sets by Rα , α ≥ 1, where R1 = Fγ (P) ∩ R2+ . Note that since Ψα satisfies (C.5-2)–(C.5-4) for every α ≥ 1, it follows from Theorem 5.35 that every boundary point of Rα can be achieved by allocating p∗ given by (5.31) (with Ψ (x) = Ψα (x)) for some w > 0. Also, notice that in the high SIR regime, Rα , α ≥ 1, approximates the sets of link rate vectors that can be achieved using the utility function Ψα given by (5.28).
ω2 max-min fairness
w
maxω∈Fγ (P) wT ω α1 < α2 < α3 < α4
∞
α4 α3
α2
α1 = 1 ω1
Fig. 5.6: The sets Rα with α = αn , n = 1 · · · 4 and α1 = 1, for a randomly chosen gain matrix V ≥ 0 with trace(V) = 0 and I + V > 0. R∞ is a limiting set that is approached as α → ∞. See also Remark 5.36.
Remark 5.36. The reader should not conclude that Rα+1 ⊆ Rα for every α ≥ 1, although the figure might suggest it. This is assumed in the figure for simplicity and lucidity but we have no proof for this to be true in general. We see from Fig. 5.6 that, as α increases, Rα resembles more and more the set R∞ which is a rectangle whose right-upper vertex corresponds to the max-min fair rate allocation.21 Hence, increasing α leads to a fairness performance that is less sensitive to the choice of the weight vector w > 0. In fact, asymptotically as α → ∞, the hyperplane with the normal vector w supports R∞ at the point corresponding to max-min fairness, regardless of 21
Note that this vertex is the maximum point of R∞ . At this point, wT x attains its maximum over R∞ for all w > 0.
160
5 Resource Allocation Problem in Communications Networks
the choice of w > 0. It is important to emphasize that the max-min fairness point is independent of α ≥ 1. This is simply because the set of solutions to the problem maxp∈P mink∈K Ψ (SIRk (p)) = Ψ (maxp∈P mink∈K SIRk (p)) is invariant to the choice of a strictly increasing function Ψ . In Sect. 5.9.1, it is shown that, in the noiseless case, the point only depends on the gain matrix V. In the case of noisy channels considered in Sect. 5.6, the max-min fairness point depends on both the gain matrix and the power constraints.
5.4 Remarks on Joint Power Control and Link Scheduling The objective of this section is to discuss some potential consequences of the results from Chap. 2 on throughput-optimal MAC policies. Our definition of the MAC layer includes two mechanisms for resource allocation and interference management, namely link scheduling and power control. The operation of dividing a frame into a number of shorter subframes and assigning links to each subframe is referred to as link scheduling. The power control protocol determines transmit powers of the links in each subframe. The process of jointly optimizing these two mechanisms is called joint power control and link scheduling (JPCLS) (see Sect. 5.2.1). We say that a MAC policy does not involve any link scheduling if each link is either active or idle during the whole frame interval. In other words, there is no time sharing protocol between different points in the feasible rate region that prevents some links from being active concurrently. In order to gain a full understanding of the results, it is important to bear in mind that the links are subject to peak power constraints as discussed in Sect. 5.2.1. 5.4.1 Optimal Joint Power Control and Link Scheduling We are interested in throughput-optimal strategies defined as follows. Definition 5.37. Let w ≥ 0 be a given weight vector, and let p(Bn ) ∈ P for every n ∈ Q. We say that (p∗ , μ∗ ) is throughput-optimal if (p∗ , μ∗ ) = arg max wk νk (p, μ) (p,μ)
k∈K
where νk (p, μ) is given by (5.10). The corresponding JPCLS is referred to as throughput-optimal. As mentioned in Sect. 5.2.1, link rates under an optimal JPCLS correspond to some point on the boundary of the convex hull of the feasible rate region defined by (5.16). Thus, the problem of determining (p∗ , μ∗ ) is related to the computation of the points of the convex hull of R. In this section, our main concern is the question of when (if at all) a concurrent transmission of links
5.4 Remarks on Joint Power Control and Link Scheduling
161
should be preferred. This is of interest as sophisticated link scheduling policies can be too prohibitive to be implemented in a distributed manner. As already mentioned in Sect. 5.2.2, the question is directly linked to the geometry of the feasible rate region R ⊂ RK + . Recall that R includes all data rates being achievable by means of power control when link scheduling is not implemented. Thus, since data rates under any JPCLS policy is a convex combination of some points in R, pure geometrical reasoning shows that: (i) If R is a convex set, all boundary points of the convex hull of R can be achieved without resorting to link scheduling. More precisely, there exists an optimal JPCLS policy with μ∗ (B1 ) = μ∗ (B) = 1 and some power vector p∗ = p∗ (B1 ) ∈ P. (ii) If R is strictly convex (see the remark in the previous section and Definition 2.16), an optimal strategy does not involve any link scheduling. Using the definitions of Sect. 5.2.1, this means that in the optimum, we must have μ∗ (B1 ) = 1. Summarizing, we can say that the JPCLS problem becomes a pure power control problem if R is a convex set. To illustrate this, consider a network with mutually orthogonal links or, equivalently, with the gain matrix V = 0. It may be easily verified that in this case, the feasible rate region R is a convex set (Example 2.4). Figure 5.7 depicts the feasible rate region for two mutually orthogonal links
subject to a sum power constraint (p1 + p2 ≤ Pt ). Note that if V = 0 and k pk ≤ Pt , R is a strictly convex set since Φ(x) = log(1 + x) is strictly concave on R+ . In case of K ≥ 2 mutually orthogonal links, every point ν ∗ on the boundary of R (see the remark on the boundary of Fγ (P) in
A ν2
w ν∗
B Feasible Rate Region
ν1 Fig. 5.7: The feasible rate region for two mutually orthogonal links subject to a sum power constraint. The region is a strictly convex set so that link scheduling between arbitrary points on the boundary of the feasible rate region is suboptimal.
162
5 Resource Allocation Problem in Communications Networks
the previous section) is given by νk∗ = wk Φ(p∗k /zk ), k ∈ K, where w ∈ RK ++ is a fixed weight vector and p pk k = arg max . wk Φ wk log 1 + p∗ = arg max zk zk p∈RK p∈RK +
p1 =Pt
k∈K
+
p1 =Pt
k∈K
This is a weighted version of the standard water-filling problem [16] for which a closed form solution can be easily found and is given by , w k p∗k = max 0, − zk , zk > 0, k ∈ K . λ
where the dual variable λ is chosen to satisfy k∈K max{0, wλk − zk } = Pt . In a special case, when p∗k > 0 for each k ∈ K (no link is idle), one obtains
Pt + l∈K zl ∗ − zk > 0 . pk = wk
l∈K wl From this equation, we see that every optimal positive power vector is associated with a unique weight vector w normalized to be w1 = 1. Thus, except for the points on the boundary where at least one of the links is idle (νk (p) = 0 for some k ∈ K), every rate vector ν ∗ is associated with a unique (up to a scaling factor) positive weight vector w which is normal to the hyperplane supporting the feasible rate region at ν ∗ . Due to the strict convexity property of R in the example above, the points on the boundary of R cannot be achieved when link scheduling is involved. Indeed, if there are at least two subframes B1 and B2 with μ(B1 ) > 0 and μ(B2 ) > 0, and associated power vectors p(B1 ) = p(B2 ), then the resulting data rate μ(B1 )ν(B1 ) + μ(B2 )ν(B2 ) with μ(B1 ) + μ(B2 ) = 1 is interior to R where νk (Bn ) = Φ(SIRk (p(Bn ))), k ∈ K. If R is convex (but not strictly convex), an optimal MAC policy may involve link scheduling. To see this, consider two links with p1 + p2 ≤ Pt and p1 p2 ν2 (p) = log 1 + ν1 (p) = log 1 + p2 + 1 p1 + 1 for some > 0. In this case, the feasible rate region is equal √ to the closure of the feasibility set in Example 2.5. It follows that if = ( 1 + Pt − 1)/Pt , the feasible rate region R is convex but not strictly convex. Now suppose that w1 = w2 = 1, in which case the objective is to maximize ν1 (p) + ν2 (p) subject to p1 + p2 ≤ Pt . Due to the symmetry of the objective function, it may be easily seen that the optimal power allocation is p1 = p2 = Pt /2 and the corresponding data rates are √ Pt ( 1 + Pt − 1) Pt = log 1 + ν1 (p) = ν2 (p) = log 1 + √ Pt 1 + Pt + 1 1 = log 1 + Pt = log(1 + Pt ) . 2
5.4 Remarks on Joint Power Control and Link Scheduling
163
These rates can also be achieved if the links take turns transmitting at powers equal to Pt , such that the one link is active during the first half of the frame interval and the second link during the other half. Formally, this means that μ(B1 ) = μ(B2 ) = 1/2 and p(B1 ) = (Pt , 0), p(B2 ) = (0, Pt ). 5.4.2 High SIR Regime We say that link k operates in the high SIR regime if νk (p) = Φ(SIRk (p)) = log(1 + SIRk (p)) ≈ log(SIRk (p)) . Thus, in the high SIR regime, the data rate behaves like a logarithmic function of SIR. As a result, a linear increase of SIR results only in a logarithmic increase of data rate. In this section, we discuss how this impacts optimal MAC strategies. To this end, define the feasible rate region with a common rate requirement α ≥ 0 as follows (note that α ≤ ω is equivalent to α1 ≤ ω; see the remarks on notation in App. A.1) R(α) = {ω ∈ RK + : α ≤ ω ≤ ν(p), p ∈ P} . Clearly, R(α) ⊆ R for all α ≥ 0 with R(0) = R and R(α) = ∅ for some sufficiently large α. Now suppose that α > 0 is chosen such that both22 (i) R(α) = ∅, and (ii) for every ν ∈ R(α), there holds νk ≤ νk (p) ≈ log(SIRk (p)) for each k ∈ K and some p ∈ P. Under this assumption, R(α) is well approximated by the feasible QoS region Fγ (P) defined by (5.53) with γ(x) = ex , x ∈ [α, ∞). Now since the exponential function is log-convex, Corollary 2.9 implies that Fγ (P) is a convex set. This in turn implies that R(α) is a convex set under the assumption of the logarithmic relationship between data rate and SIR. Moreover, if V is irreducible, which is usually the case in practice, Corollary 2.17 ensures that Fγ (P) is a strictly convex set. Thus, by the discussion above, we can conclude that in the high SIR regime, no link scheduling should be involved. This is quite intuitive since a logarithmic increase in data rate due to a higher value of SIR cannot compensate a linear decrease due to a shorter transmission time (by virtue of partitioning a frame into subframes). 5.4.3 Low SIR Regime Now let us assume that a network operates in the low SIR regime. In this case, the relationship between data rate and SIR can be well approximated by a linear function: 22
Note that if there is no such α, then the network cannot operate in the high SIR regime.
164
5 Resource Allocation Problem in Communications Networks
νk (p) = Φ(SIRk (p)) = log(1 + SIRk (p)) ≈ SIRk (p) . Thus, when compared with the high SIR regime, we have an entirely different situation here since a linear increase of SIR entails a linear increase in data rate. This has a tremendous impact on the design of optimal MAC policies. The main driving force behind the operation in the low SIR regime is the ability to transmit at an energy per information bit close to the minimum [134, 135]. For instance, in wireless sensor networks, the energy consumption (rather than the spectral efficiency) is one of the major design criteria. Comparing (5.15) with (5.53) reveals that the feasible rate region in the low SIR regime is equal to the closure of the feasible QoS region Fγ (P) with γ(x) = x, x > 0. In all that follows, let us assume that23 γ(x) = x, x > 0. We refer to cl(Fγ (P)) as the feasible SIR region, while its complement Fcγ (P) = RK + \ cl(Fγ (P)) is called the infeasible SIR region. Since the linear function is not log-convex, the results of Sect. 2.3 cannot be applied to the low SIR regime. Furthermore, it follows from Sect. 2.4 that the feasible SIR region is not a convex set. However, for K = 2, the convex hull of the feasible SIR region is a convex polygon, regardless of whether the links are subject to individual power constraints or are constrained on total power. This is easy to see from (2.16) and the following discussion in Sect. 2.4, which shows that p1 (ω) and p2 (ω) are both concave on Fγ = F. Thus, in the case of two links subject to a total power Pt , the convex hull of cl(Fγ (P)) is a triangle with the vertices given by (0, 0), (Pt /z1 , 0) and (0, Pt /z2 ). The nonzero vertices are the points E and F in Fig. 5.8. Clearly, if V = 0, a throughput-optimal MAC policy is then a simple link scheduling (or time sharing) protocol between the vertices (Pt /z1 , 0) and (0, Pt /z2 ), which correspond to the power vectors p = (Pt , 0) and p = (0, Pt ), respectively. In other words, the links take turns transmitting at powers equal to Pt , such that only one link is active at any time. Using the definitions of Sect. 5.2.1, this means that each frame B is divided into (at most) two subframes B1 and B2 such that μ(B1 ) + μ(B2 ) = μ(B) = 1, with the corresponding power vectors being equal to p(B1 ) ∈ {(Pt , 0), (0, Pt )} and p(B2 ) = (Pt , Pt ) − p(B1 ). The set function μ : A → [0, 1] with A = {B1 , B2 } indicates at which relative frequencies each of the power vectors is utilized in an optimal MAC policy. This then determines a family of optimal MAC policies parameterized by the function μ. The situation is slightly more complicated in the case of two links subject to individual power constraints on each link P1 > 0 and P2 > 0. In this case, the convex hull of the feasible SIR region can be either a triangle spanned by the points (0, 0), (P1 /z1 , 0) and (0, P2 /z2 ) or a convex quadrilateral whose vertices are (0, 0), (P1 /z1 , 0), (0, P2 /z2 ) and (ω1∗ , ω2∗ ) satisfying p1 (ω ∗ ) = p2 (ω ∗ ). Clearly, if the convex hull is the triangle ((ω1∗ , ω2∗ ) is then a member of this triangle) and V = 0, then an optimal MAC policy is similar to that for a total 23
The only reason for excluding zero from the definition of γ(x) is the compatibility with the previous definitions. Because of this, we have to take the closure of Fγ (P) to obtain the feasible rate region.
5.4 Remarks on Joint Power Control and Link Scheduling
165
power constraint except that now it involves a time sharing protocol between the points (P1 /z1 , 0) and (0, P2 /z2 ) (D and A in Fig. 5.8), with the corresponding power vectors being (P1 , 0) and (0, P2 ). This again implies two subframes B1 and B2 in each frame such that μ(B1 ) + μ(B2 ) = 1, with the power vectors being given by p(B1 ) ∈ {(P1 , 0), (0, P2 )} and p(B2 ) = (P1 , P2 ) − p(B1 ). When the convex hull is a convex quadrilateral, a time sharing protocol either between (P1 /z1 , 0) and (ω1∗ , ω2∗ ) (D and G in Fig. 5.8) or between (0, P2 /z2 ) and (ω1∗ , ω2∗ ) (A and G) is optimal. Again, each frame is divided into two subframes B1 and B2 such that μ(B1 ) + μ(B2 ) = 1. However, the power vectors can be either p(B1 ) ∈ {(P1 , 0), (p1 (ω ∗ ), p2 (ω ∗ ))} p(B2 ) = {(P1 , 0), (p1 (ω ∗ ), p2 (ω ∗ ))} \ {p(B1 )} or p(B1 ) ∈ {(0, P2 ), (p1 (ω ∗ ), p2 (ω ∗ ))} p(B2 ) = {(0, P2 ), (p1 (ω ∗ ), p2 (ω ∗ ))} \ {p(B1 )} . In the point (ω1∗ , ω2∗ ), both links are active and transmit at powers specified by the power vector p(ω ∗ ). Again, the set function μ determines at which relative frequencies the power vectors are utilized in an optimal strategy. Now the question is whether it is possible to generalize these observations to a network with K links and transmit powers subject to some constraints? More precisely, we are interested in the following problem. Problem 5.38. Is the convex hull of the feasible SIR region a convex polytope in RK + , regardless of the type of power constraints? If this was true, then a MAC policy based on a time sharing protocol between the polytope vertices similar to that for the two dimensional case would be optimal. However, this cannot betrue in full generality. To see this, consider the feasible SIR region given by cl Fγ (Pt )∩(Fγ (Pi ) where Fγ (Pt ) and Fγ (Pi ) are defined by (5.56) for z = 1. In words, while cl(Fγ (Pt )) is the feasible SIR region in a network constrained on total power, cl(Fγ (Pi )) denotes the feasible SIR region under individual power constraints on each link. The situation is illustrated in Fig. 5.8 for K = 2. In this two dimensional example, the convex hull of the feasible SIR region is a convex pentagon generated by the points (0, 0), A, B, C and D. So, in this case, a time sharing protocol between some of the pentagon vertices A, B, C and D is optimal. While in A and D only one of the links is active, both links are active in B and C. Theorem 2.18 however asserts that Fcγ (Pt ) is not a convex set in general, so that the convex hull of Fγ (Pt ) ∩ Fγ (Pi ) does not need to be a convex polytope, which in turn may require alternative MAC policies to achieve points on the boundary of the convex hull. For instance, if Fcγ (Pt ) was not convex in the two-user case, a time sharing protocol between the points B and C (Fig. 5.8) would not
166
5 Resource Allocation Problem in Communications Networks
E p(ω)1 = Pt p2(ω) = P2
A ω2
p1(ω) = P1
B G C feasible SIR region
0
ω1
D
F
Fig. 5.8: The feasible SIR region for two users under total power constraint Pt and individual power constraints on each link P1 < Pt and P2 < Pt . If there were no individual power constraints, a MAC policy involving a time sharing protocol between the points E and F , corresponding to power vectors (0, Pt ) and (Pt , 0), respectively, would be optimal. In contrast, when in addition individual power constraints are imposed, a time sharing protocol between A and D (that correspond to power vectors (0, P2 ) and (P1 , 0), respectively) is suboptimal. In this case, it is better to schedule either between A and B or between B and C or between C and D depending on the target signal-to-interference ratios.
be necessarily the best strategy since the boundary of Fγ (Pt ) may intersect the straight line connecting B and C, and hence requiring another policy to achieve the boundary points. It is interesting to point out that in a network constrained only on total power, the convex hull of the feasible SIR region cl(Fγ (Pt )) is a convex polytope despite the fact that Fcγ (Pt ) is not a convex set in general. Indeed, we have ˜ γ (Pt ) = ConvexHull Fγ (Pt ) F (5.62) where T ˜ γ (Pt ) = {ω ∈ RK F ++ : ω z ≤ Pt } .
˜ γ (Pt )) is a convex polytope whose vertices are (0, . . . , 0) It is clear that cl(F ˜ γ (Pt ) can be achieved using and Pt /zk ek with k = 1, . . . , K. Every point in F a TDMA-like protocol that alternates between different links transmitting at power Pt , such that only one link is active in every subframe. More formally,
5.4 Remarks on Joint Power Control and Link Scheduling
167
given B and A, any JPCLS policy (p∗ , μ∗ ) with k μ(Bk ) = 1 and p(Bk ) = Pt ek , 1 ≤ k ≤ K, is optimal. To see (5.62), note that, for all ω ≥ 0 (ω = 0) with ρ(Γ(ω)V) < 1, (I − Γ(ω)V)−1 − I ≥ 0 with equality (in all entries) if and only if V = 0. From this, it follows that ω T z < p(ω)1 for any ω ∈ Fγ ˜ γ (Pt ) for any matrix and V = 0, and thus Fγ (Pt ) ⊂ Fγ is a proper subset of F ˜ V = 0. Moreover, since any point in Fγ (Pt ) is a convex combination of some points in Fγ (Pt ), we obtain (5.62). 5.4.4 Wireless Links with Self-Interference Up to now, we have assumed that wireless links are only exposed to interference from other links (multiple access interference), and therefore no link interferes with itself whenever it is active (Condition (C.4-4)). The assumption is reasonable when multiple access interference is a dominant factor. Nevertheless, self-interference is usually present in wireless networks, mainly due to the time-dispersive nature of the radio propagation channel, but also due to the nonlinear characteristics of deployed components. Whatever the reason is, it is reasonable to assume that the self-interference is proportional to transmit power. In case of a multipath propagation channel, the received signal may be composed of a strong signal path and some (usually weaker) delayed paths. When the time dispersion of the channel is sufficiently large and only the strong path is used to decode transmitted symbols, other paths may cause relatively strong intersymbol interference. It is important to emphasize that it is not the absolute value of self-interference that matters but rather its relation to multiple access interference. Indeed, link scheduling can entail a noteworthy performance improvement only if multiple access interference is dominant. On the contrary, if self-interference is a dominant factor on each link, concurrent transmission may be preferable. But when exactly is self-interference dominant? Although it is difficult to give a definite answer to this question in a general context, Theorem 1.51 in Sect. 1.4.2 suggests some useful guidelines for the design of MAC strategies in the presence of relatively strong self-interference. This theorem asserts that is symmetric the Perron root is convex if both the gain matrix V ∈ RK×K + positive semidefinite and γ : Q → R+ defined by (5.50) is a convex function. This in turn implies (Corollary 1.52) that the feasible QoS region Fγ defined by (5.55) is a convex set if both V is symmetric positive semidefinite and γ is convex. Consequently, under the assumption of positive semidefiniteness of V, convexity of γ : Q → R+ is sufficient for Fγ to be a convex set, which is a significantly weaker requirement than log-convexity. In particular, γ(x) = ex − 1, x ≥ 0 is a convex function implying that the feasible rate region under no power constraints is a convex set when the gain matrix is symmetric positive semidefinite.24 24
For the power constraint case, see the remarks in Sect. 5.6.4.
168
5 Resource Allocation Problem in Communications Networks
If there is no self-interference or, equivalently, if trace(V) = 0, the gain matrix V cannot be positive semidefinite, which immediately follows from the nonnegativeness of V. Roughly speaking, for the matrix V to be positive semidefinite, self-interference must be dominant on each link. Thus, if the gain matrix V is symmetric (or approximately symmetric) and self-interference is dominant on each link so that V is positive semidefinite, then the results mentioned above imply that all points on the boundary of the convex hull of the feasible rate region R (in an interference-limited scenario) can be achieved by power control with all links being active concurrently. Hence, in such cases, throughput-optimal MAC policies do not need to involve link scheduling. Recall from the discussion at the end of Sect. 4.3.3 that we have an interferencelimited scenario if the interference is dominant so that the background noise can be neglected. For the case that V is not symmetric but its diagonal elements are dominant in the sense that Vs = (V + VT )/2 is positive semidefinite, numerical experiments suggest that similar conclusions may be possible. However, we have no proof for that to be true in general. If the symmetric part Vs of V is positive definite, then it is known that ρ(V) ≤ ρ(Vs ) for any nonnegative matrix V [9]. Note that each quadratic matrix can be uniquely written as the sum of a symmetric matrix and a skew-symmetric one. If we view RK×K as a Hilbert space with the inner product given by A, B = trace(AT B), then the sets of symmetric matrices and skew-symmetric ones are orthogonal complementary subspaces in RK×K . Moreover, Vs is the orthogonal projection of V onto the space of K × K symmetric matrices. Thus, in this sense, Vs is the closest symmetric matrix to V. This suggests that Fγ is a “nearly convex set” when both Vs is positive semidefinite (self-interference dominant) and the distance between V and Vs is not too large. In these cases, no link scheduling should be preferred when maximizing total throughput.
5.5 QoS-based Power Control The utility-based power control of Sect. 5.2.5 is a network-centric approach to the power control problem in wireless networks, where there is no guarantee for the links to achieve some target quality of service (QoS) expressed in terms of the signal-to-interference ratio. In fact, the objective is to allocate transmit powers to the links so that a certain real-valued measure of the network performance attains its maximum. The network performance measure can be interpreted as the aggregate utility perceived by the network operator. In contrast, QoS-based power control aims in the first instance at making sure that the signal-to-interference ratio on each link is above some link-specific threshold. The thresholds depend on the QoS requirements of applications
5.5 QoS-based Power Control
169
whose traffic is carried across the network. For this reason, this kind of power control is a user-centric approach because the demands of the users (links) restrict the set of points at which the network is allowed to operate. Note that this set can be an empty set if the requirements are too strict and the channel quality too poor. In this section, we deal with the most popular formulation of the QoS-based power control problem, where some given feasible QoS requirements need to be satisfied at a minimum total cost expressed in terms of a weighted sum of transmit powers. So, in a special case of equal weights, the total cost is just the overall power consumption in a network. The problem is analyzed within the framework of standard interference functions, which was introduced by [55] in the context of a QoS-based power control problem. The linear interference function Ik (p) = (Vp + z)k used in the previous sections (see (4.8)) is an example of a standard interference function. In the following subsection, we introduce some definitions and make some basic observations by assuming the linear interference function. The notion of standard interference functions is introduced in Sect. 5.5.2. At the end of Sect. 5.5.2, the reader will find a brief discussion on an alternative axiomatic framework for interference functions. Sect. 5.5.3 is devoted to QoS-based power control. 5.5.1 Some Definitions Unless otherwise stated, it is assumed throughout Sect. 5.5 that (C.5-10) z > 0 so that each entry of the noise variance vector z is positive. For some results regarding the noiseless case z = 0, the reader is referred to Sect. 5.9.1. As in Sect. 5.3, we use Γ(ω) ∈ RK×K to denote + Γ(ω) := diag γ(ω1 ), . . . , γ(ωK ) (5.63) where γ : Q → R++ given by (5.50) is the inverse function of Ψ or ψ, depending on whether γ is strictly increasing or decreasing (see also Sect. 5.3 for further explanations and Remark 5.39). For this section, it is sufficient to bear in mind that the value γk := γ(ωk ) is the minimum SIR level which is necessary on link k to ensure the QoS requirement ωk ∈ Q. We refer to γ1 , . . . , γK as the SIR targets and use γ = (γ1 , . . . , γK ). Since γ is a positive function, we have (C.5-11) γk = γ(ωk ) > 0 for each k ∈ K. This is equivalent to saying that the diagonal matrix Γ(ω) is positive definite. Note that γk and γ(ωk ) for some given ωk ∈ Q mean the same and the latter notation is preferred whenever the dependence on ωk should be emphasized. Because the data rate is strictly increasing in the SIR (by (4.22)), the assumption (C.5-11) is equivalent to assuming that there is a certain rate requirement
170
5 Resource Allocation Problem in Communications Networks
on each link, which in turn entails some requirements on end-to-end rates. A power control problem in networks that support both best-effort links25 and links that have positive SIR targets (QoS links) is dealt with in Sect. 5.7. Remark 5.39. The assumption that γ is the inverse function of the utility function Ψ or its negative version ψ is not necessary in the remainder of this book. We have introduced this definition in Sect. 5.3 to interpret the utilitybased approach in the feasible QoS region and will adhere to it for simplicity. We must emphasize, though, that γ could be any continuously differentiable and strictly monotonic function mapping some open interval Q ⊆ R onto R++ , where Q is not necessarily the same set as in the definitions of Ψ . With such function properties, we are able to choose any positive SIR target since, for every γk ∈ R++ , there exists exactly one ωk ∈ Q such that γk = γ(ωk ). Finally, notice that, as in the case of the utility function Ψ (see Remark 5.8), we can choose different functions γk : Qk → R++ , k ∈ K, for different links, where Qk ⊆ R are some open sets. In this and next sections, we make use of the notion of valid and feasible power vectors defined as follows. Definition 5.40 (Valid and Feasible Power Vectors). Let ω ∈ QK be given. Then, we say that p ∈ RK + is a valid power vector if (and only if ) ∀k∈K SIRk (p) ≥ γk = γ(ωk )
max k∈K
γ(ωk ) ≤ 1. SIRk (p)
(5.64)
Any valid power vector p ∈ RK + such that p ∈ P is said to be feasible. The set of all valid power vectors P(ω) ⊂ RK + and the set of all feasible power vectors P◦ (ω) := P(ω) ∩ P = ∅ .
(5.65)
are called the valid power region and the feasible power region, respectively. Remark 5.41. Note that, for a power vector to be valid, it is necessary and sufficient that the QoS requirements are satisfied. Thus, valid power vectors may be inadmissible as they may violate the power constraints. A valid power vector is said to be feasible if it is valid and admissible. Obviously, if there are no power constraints, every valid power vector is feasible (and vice versa). Similarly, if there are no QoS requirements as in the previous sections, then every admissible power vector is feasible (and vice versa). By Definition 5.40 and (4.4), the valid power region P(ω) (given a QoS vector ω ∈ QK ) can be written as
25
If k is a best-effort link, then, per definition, γ(ωk ) ≡ 0 (an extended-valued function), and thus γk = 0.
5.5 QoS-based Power Control
P(ω) :=
# k∈K
Pk (ω)
, : SIR (p) ≥ γ(ω ) Pk (ω) := p ∈ RK k k + , K = p ∈ R+ : uTk p ≤ −zk , T ˜ ˜ : u = p ∈ RK p ≤ 0 ,k ∈ K + k
171
(5.66)
˜ k = (uk , zk ) ∈ where uk = (vk,1 , . . . , vk,k−1 , −1/γk , vk,k+1 , . . . , vk,K ) ∈ RK , u ˜ = (p, 1) ∈ RK+1 . Now, Definition 5.40 can be used to restate RK+1 and p + the notion of feasibility (see also Definition 5.22). Definition 5.42 (Feasible QoS Vector and Feasible SIR Targets). We say that ω ∈ QK is feasible if (and only if ) there exists a feasible power vector (in the sense of Definition 5.40). In such a case, γ1 , . . . , γK are said to be feasible SIR targets. Remark 5.43. We point out here that the notion of feasibility and the definition of P(ω) extend to more general interference functions discussed in the following section. This means that Definition 5.40 maintains its validity except that Pk (ω) is not necessarily of the form (5.66). The definition of P◦ (ω) gives rise to the following necessary and sufficient condition for ω to be feasible. Observation 5.44. ω ∈ QK is feasible if and only if P◦ (ω) = ∅. Moreover, P◦ (ω) = ∅ if and only if p(ω) ∈ P◦ (ω) with p(ω) given by (5.54). Proof. The first part follows directly from Definition 5.40. As for the second part, if p(ω) ∈ P◦ (ω), then, obviously, P◦ (ω) = ∅. The converse is due to downward comprehensivity of P (Remark 4.13) and Observation 5.24, which states that p(ω) is the minimum point of P(ω) (Definition B.4). It is important to notice that if (C.5-10) and (C.5-11) are satisfied, then p > 0 for every p ∈ P(ω), and therefore P(ω) ⊂ RK ++ . Figure 5.9 visualizes the definitions. ◦ Observation 5.45. P(ω) ⊂ RK ++ and P (ω) ⊂ P(ω) are closed convex sets.
Proof. Since an empty set is convex by definition, we can assume that the sets are not empty. By (5.66), for any ω, Pk (ω) is the intersection of the nonnegative orthant RK + , which is a closed convex set, with a closed convex affine half-space. So, the observation follows since the intersection of closed convex sets is closed and convex. Note that in general, P(ω) is not upward comprehensive (with respect to the partial ordering (A.1)), which can also be seen from Fig. 5.9. However, for ˆ with P(ω) ˆ = ∅, one has the following implication: If p = p(ω) ∈ any given ω ˆ ω = ω, ˆ with p(ω) = (I − Γ(ω)V)−1 Γ(ω)z (see also (5.54)), then P(ω),
172
5 Resource Allocation Problem in Communications Networks
p2 P2 uT1 p = −z1
P◦(ω)
p(ω)
uT2 p = −z2 P1
p1
Fig. 5.9: The feasible power region for two links subject to individual power constraints P = {p ∈ R2+ : p1 ≤ P1 , p2 ≤ P2 }. The vector p(ω) defined by (5.54) is the minimum point (element) of P(ω).
γ(ˆ ωk ) ≤ γ(ωk ), k ∈ K
and
∃l∈K γ(ˆ ωl ) < γ(ωl ) .
(5.67)
This follows from strict monotonicity of γ and Lemma 5.24, which states that ˆ is the minimum point of P(ω), ˆ and thus it is a unique power vector that p(ω) meets all the SIR requirements γ(ˆ ωk ), k ∈ K, with equality. As a consequence of (5.67) and the fact that the spectral radius of a nonnegative matrix is ˆ = ∅ implies that monotonically increasing in the matrix entries, p(ω) ∈ P(ω) ˆ ρ(Γ(ω)V) ≤ ρ(Γ(ω)V) < 1
(5.68)
where the last inequality is a necessary and sufficient condition for the existence of the positive vector p(ω) as discussed in Sect. 5.3. Sometimes, in the analysis of power control algorithms, it is useful to know whether or not P(ω) and P◦ (ω) have nonempty interiors (with respect to RK ). First, it may be deduced from (5.68) together with the one-to-one correspondence between RK ++ and Fγ defined by (5.55) (see Lemma 2.11 and Observation 5.28) that if P(ω) = ∅ for some given ω ∈ QK , then int(P(ω)) = ◦ ∅. In contrast to that, int(P◦ (ω)) ⊂ RK + may be an empty set even if P (ω) = ∅. For instance, we have such a situation if p(ω) happens to be a maximal point of P (Definition B.4), in which case P◦ (ω) = {p(ω)} (a singleton set). This follows immediately from the fact that p(ω) > 0 is the (unique) minimum point of P(ω) (Observation 5.24), and thus it is also the minimum point of P◦ (ω) as 0 ∈ P. So, we have p(ω) ≤ p for all p ∈ P◦ (ω) with at least one strict inequality. But if p(ω) is additionally a maximal point of P, then p(ω) ≤ p for some p ∈ P◦ (ω) implies p = p(ω) so that we must have P◦ (ω) = {p(ω)}. However, notice that in general, the interior of P◦ (ω) can be empty even if P◦ (ω) is not a singleton set. On the other hand, if p(ω) ∈ int(P), then the minimum property of p(ω) and downward comprehensivity of P can be used (together with continuity of p(ω) in each component) to show that
5.5 QoS-based Power Control
173
int(P◦ (ω)) = ∅. The following observation shows that if V is irreducible, which is the most common case in wireless networks, we have either P◦ (ω) = {p(ω)} or int(P◦ (ω)) = ∅, whenever P(ω) = ∅. Observation 5.46. If P(ω) = ∅ for some given ω ∈ QK and V is irreducible, then either P◦ (ω) = {p(ω)} (singleton set) or int(P◦ (ω)) = ∅, with the interior taken with respect to RK . Proof. As γ(ωk ) > 0, we have P◦ (ω) ⊂ RK ++ , and thus we can focus on P+ = ◦ P∩RK ++ . As aforementioned, if p(ω) ∈ int(P) = int(P+ ), then int(P (ω)) = ∅. ˜ = p(ω) is a boundary point of P+ . Since P+ is convex, So let us assume that p ˜ ≥ wT p for Theorem B.5 shows that there is w ≥ 0, w = 0, such that wT p all p ∈ P+ . Hence, for any p ∈ P+ , there is at least one index k ∈ K such ˜ is the (unique) minimum point of P(ω) that p˜k ≥ pk . On the other hand, p ˜ ≤ p for all p ∈ P(ω). By irreducibility of V, however, we can see so that p ˜ < p for from (5.67), Theorem A.16 and Lemma A.28 that, in fact, we have p ˜ . From this and p˜k ≥ pk for some k, we conclude that any p ∈ P(ω), p = p p} = {p(ω)}. P◦ (ω) = {˜ We complete this subsection by stating an alternative necessary and sufficient condition for the feasibility of a QoS vector. The main reason why we are going to formulate an additional condition is that it naturally extends to the case of more general interference functions considered in the next section. Note first that due to (C.5-10) and (C.5-11), we have mink∈K (SIRk (p)/γk ) > 0 for any p ∈ P+ . Thus, C(ω) := inf C(ω, p) = inf C(ω, p) < +∞ p∈P
p∈P+
where C(ω, p) := max k∈K
γ(ωk ) , SIRk (p)
p > 0.
(5.69)
(5.70)
Because the fulfillment of the K inequalities SIRk (p) ≥ γ(ωk ), k ∈ K, is equivalent to satisfying C(ω, p) ≤ 1, it is evident that a necessary condition for some QoS vector ω to be feasible is C(ω) ≤ 1. To see the sufficiency of this condition, we first state the following simple observation. Observation 5.47. Let ω ∈ QK be given. If (C.5-10) and (C.5-11) hold, then RK + → R+ : p → mink∈K (SIRk (p)/γ(ωk )) is a continuous function. The observation follows since a composition of continuous functions is continuous, the pointwise minimum RK → R : x → min{x1 , . . . , xK } is conK tinuous and, by Observation 4.9, RK + → R+ : p → (SIR1 (p), . . . , SIRK (p)) is continuous as well. It implies that 1/C(ω, p) = mink∈K (SIRk (p)/γ(ωk )) ≥ 0 is continuous on the compact set P, and thus it attains its maximum over P. Moreover, we have maxp∈P (1/C(ω, p)) > 0 (due to (C.5-10) and (C.5-11)). So, the infimum in (5.69) is attained for some p ∈ P, p > 0. This shows the sufficiency of C(ω) ≤ 1 for ω to be feasible, and thereby bringing us to the following observation.
174
5 Resource Allocation Problem in Communications Networks
Observation 5.48. ω is a feasible QoS vector if and only if C(ω) ≤ 1 .
(5.71)
Remark 5.49. We refer to any power vector for which the infimum in (5.69) is attained as a max-min SIR-balanced power vector. The problem of characterizing such power vectors is addressed in Sect. 5.6. Comparing the condition of Observation 5.48 with ρ(Γ(ω)V) < 1, which, as discussed in Sect. 5.3, is a necessary and sufficient condition for a QoS vector ω to be feasible in a network without power constraints, suggests that there is a close connection between C(ω) and the spectral radius. ˜ ˜ Observation 5.50. Let C(ω) = inf p∈RK C(ω, p). Then, C(ω) = ρ(Γ(ω)V). ++ (t)
Proof. Let ω be arbitrary. Define V(t) := V+t11T and λp (ω) := ρ(Γ(ω)V(t) )+ −1 (t) Γ(ω)z, t > 0. Due to t > 0 for all t > 0. Let p(t) = λp (ω)I − Γ(ω)V(t) (t) λp (ω) > ρ(Γ(ω)V(t) ), t > 0, (C.5-10) and (C.5-11), Theorem A.51 implies (t) that p(t) exists and is positive for every t > 0. Now, as λp (ω) → ρ(Γ(ω)V) (t) when t → 0 (by Theorem A.8) and C(ω) ≤ C(ω, p ) for all t > 0, one obtains C(ω) ≤ lim inf C(ω, p(t) ) ≤ lim inf λ(t) p (ω) − t min γ(ωk ) = ρ(Γ(ω)V) . t→0
t→0
k∈K
On the other hand, however, we have C(ω) ≥ inf max k∈K p∈RK ++
(Γ(ω)Vp)k = ρ(Γ(ω)V) pk
where the identity follows from Theorem A.47. This completes the proof. 5.5.2 Axiomatic Interference Functions For most of this book, it is assumed that the interference at the kth receiver output is a linear combination of transmit powers of other links plus noise: SIRk =
pk , Ik (p)
k∈K
(5.72)
with the interference function Ik : RK + → R++ being an affine function: Ik (p) = (Vp + z)k = vk,l pl + zk . (5.73) l∈K
As explained in Sect. 4.3.1, the coefficients vk,l ≥ 0 with vk,k = 0 (due to the assumption of no self-interference) are determined by the transceiver structure, wireless fading channel and etc. Recall from Sect. 4.3.1 that
5.5 QoS-based Power Control
175
vk,l = Vk,l /Vk ≥ 0, where Vk > 0 is called the signal power gain and Vk,l denotes the interference power gain. All these quantities are fixed, that is, independent of allocated transmit powers. This is for instance the case when each link employs the matched-filter receiver or the linear SIC receiver, both of which are discussed in Sect. 4.3.2. However, one can go one step further and assume that interference depends on some adaptive receive strategy. Given an arbitrary power allocation p ≥ 0, a fairly general model of the (effective) interference power at the output of an adaptive receiver k ∈ K is vk,l (uk )pl + zk (uk ) (5.74) ϕk (p, uk ) = l∈K
where uk := uk (p) ∈ Uk may depend on p and is used to represent the receive strategy taken from a given compact set of all possible receive strategies Uk for link k ∈ K. We see that, in addition to the power gains vk,l , l = k, the receive strategy uk also influences the effective noise power zk = σk2 /Vk because Vk depends on uk . It is important to emphasize that any receive strategy uk is assumed to have impact only on the interference of link k, which is a reasonable assumption in practice. Furthermore, we assume that (C.5-12) vk,l (uk ) and zk (uk ) are continuous functions on Uk for each l ∈ K and each k ∈ K. Consequently, Uk → R++ : u → ϕk (p, u), k ∈ K, is continuous for any p ≥ 0. Note that in general, we do not require differentiability with respect to u. An example of such a strategy can be found in Sect. 4.3.2, where, under the assumption of perfect synchronization (see Sect. 4.3), we introduced optimal receivers in the sense of maximizing each SIR for a given power vector. An optimal receiver of Sect. 4.3.2 is a vector in CW , W ≥ 1, and can be normalized to be of unit norm, in which case Uk is the unit sphere in CW for each k ∈ K. The resulting interference function is nonlinear interference function and depends on the power vector. Precisely, if p is a vector of transmit powers, then the SIR at the output of the (normalized) optimal receiver derived in Sect. 4.3.2 is given by (4.16). Hence, in this case, the interference power at the receiver output is equal to −1 −1 , Ik (p) = bH k Zk bk
k∈K
(5.75)
where bk ∈ CW , W ≥ 1, is the effective transmit vector of transmitter k associated with receiver k (see Sect. 4.3.2 for explanations) and the matrix Zk defined by (4.14) depends on the power vector p. Since Zk is positive definite, the interference function above is positive for any choice of the effective transmit vector bk . It is also worth pointing out that (5.75) is only a function of the power allocation as the receivers are fixed whenever p is given. This stands in contrast to (5.74).
176
5 Resource Allocation Problem in Communications Networks
The function in (5.75) is a highly nonlinear function of transmit powers and the resulting power gains as well as the noise variances are influenced by the power vector. Because the corresponding receiver is optimal in the sense of maximizing the SIR over the set of all nonzero receivers (vectors) in CW , the interference power Ik (p) with Ik of the form (5.75) is the minimum interference power under the power allocation p. As this is true for an arbitrary p ≥ 0, (5.75) falls into the following class of interference functions, the socalled minimum interference functions: Ik (p) = min (ϕk (p, u)), k ∈ K . u∈Uk
(5.76)
It is important to emphasize that the definition (5.76) is more general than (5.75) since there is no presumption of perfect synchronization in (5.76), the interference function in (5.76), which results from the minimization of (5.74) over an arbitrary compact set Uk , is not necessarily differentiable. The minimizer in (5.76) exists but does not need to be unique. Instead of focusing on a particular interference model, the interference is often characterized by an axiomatic framework that captures basic properties common to a wide range of interference functions [55, 136] (and references in [136]). Such generic models contain the above examples (5.73), (5.75) and (5.76) as special cases. The following definition introduces the notion of a standard interference function [55], which is the most common model in the literature. Later in this section, we present another but closely related definition of interference functions. Definition 5.51 (Standard Interference Function). For any fixed ω, we K given by26 say that the (vector-valued) interference function I : RK + →R I(p) := I(p, ω) := γ(ω1 )I1 (p), . . . , γ(ωK )IK (p) (5.77) is standard if each of the following holds. A1 I(p) > 0 for all p ≥ 0 (positivity). A2 I(μp) < μI(p) for any p ≥ 0 and for all μ > 1 (scalability). A3 I(p) ≥ I(p ) if p ≥ p ≥ 0 (monotonicity). Ik : RK + → R, k ∈ K, is said to be standard if it satisfies A1–A3. Remark 5.52. Notice that as Γ(ω) is positive definite by assumption (C.5-11), K I : RK + → R is a standard interference function if and only if each Ik is a standard interference function. Moreover, notice that A2 implies λI(p) < I(λp) for all λ ∈ (0, 1) and all p ≥ 0. It must be emphasized that A1 (positivity) and A2 (scalability) on RK + are due to positivity of the noise vector z. So, for the noiseless channels, the interference functions are not standard but they are captured by the axiomatic framework presented at the end of this section. 26
In order to avoid confusion with the identity matrix, we violate our notational convention by using a non-boldface letter to denote the vector-valued function I.
5.5 QoS-based Power Control
177
It may be easily verified that both the affine interference function (5.73) and the minimum interference function (5.75) fulfill axioms A1–A3, and therefore they are standard interference functions. Another example of a standard interference function is the maximum interference function: Ik (p) = max k (p, ξ) ξ∈X
(5.78)
where RK + → R+ : p → k (p, ξ) is a standard interference function for any ξ ∈ X. The value k (p, ξ) is equal to the interference power given a power allocation p under some interference uncertainty ξ from some (suitable) compact set X. The interference uncertainty means here that the interference power may continuously vary depending on the choice of ξ ∈ X. Thus, interference functions of the form (5.78) can be used, for instance, to model the worst-case interference under imperfect channel knowledge [137]. The idea of the worst-case design can be extended to adaptive receivers by letting k in (5.78) additionally depend on the receiver of link k. Precisely, let RK + → R+ : p → k (p, u, ξ) be a standard interference function associated with link k under uncertainty ξ ∈ X when its receiver is u ∈ Uk ; the dependence on (u, ξ) ∈ Uk × X is continuous. Now, the worst-case design with adaptive receivers minimizing the interference power leads to the following interference functions: Iˆk (p) = min max k (p, u, ξ) u∈Uk ξ∈X
Iˇk (p) = max min k (p, u, ξ) ξ∈X u∈Uk
(5.79)
where all the maxima and minima exist. Due to the assumptions on k , both Iˆk and Iˇk are standard interference functions. In general, however, we have Iˇk (p) ≤ Iˆk (p) for any p ≥ 0, with equality if the corresponding maximizer and minimizer form a saddle point of Uk × X : (u, ξ) → k (p, u, ξ). As shown below, axioms A1–A3 are sufficient to conclude continuity of standard interference functions [136], which is used in the next subsection to guarantee the convergence of certain QoS-based power control algorithms. Furthermore, assuming a standard interference function in the definition of the SIR (5.72), we can state a necessary and sufficient condition for the existence of a valid power allocation in the sense of Definition 5.40. The following theorem is a slight extension of the result presented in [136] to nonnegative power vectors. K Theorem 5.53. Let I : RK + → R++ be a standard interference function. Then, I is component-wise continuous, that is, an arbitrary sequence ˜ = limn→∞ p(n), fulfills {p(n)}n∈N0 with a limit p
lim I(p(n)) = I(˜ p) .
n→∞
Proof. The reader can find the proof in Sect. 5.10.
(5.80)
178
5 Resource Allocation Problem in Communications Networks
As mentioned above, Theorem 5.53 can be used to state a necessary and sufficient condition for feasibility of some given SIR targets γ1 , . . . , γK (according to Definition 5.42) or, equivalently, the existence of a feasible power vector (Definition 5.40). To this end, given ω ∈ QK , we define CI (ω) := inf CI (ω, p) p∈P
and
CI (ω, p) := max k∈K
γ(ωk )Ik (p) pk
(5.81)
and show that CI (ω), ω ∈ QK , is the desired measure of feasibility [136].27 Indeed, if ω is feasible, then, by Definition 5.42, there is p ∈ P so that CI (ω) ≤ CI (ω, p) ≤ 1 where the first inequality is due to the infimum in (5.81). Conversely, if CI (ω) ≤ 1, then ω is feasible since the infimum in (5.81) is attained on P (provided that P is compact which is true by assumption). This is due to Theorem 5.53 and (C.5-11), which imply that 1/CI (ω, p) = mink∈K (pk /(γ(ωk )Ik (p))) is continuous on P, and thus attains its supremum (due Theorem B.11) and the supremum is positive. The consequence is that CI (ω, p) attains its infimum on P. We summarize these observations in a theorem. Theorem 5.54. A QoS vector ω ∈ QK is feasible if and only if CI (ω) ≤ 1 .
(5.82)
Moreover, there exists a valid power vector if and only if C˜I (ω) := inf CI (ω, p) < 1 . p∈RK ++
(5.83)
Note that by Definition 5.40, the existence of a valid power vector implies that ω would be feasible if there were no power constraints. By the definition of the infimum as the greatest lower bound (see App. B.1), it should be obvious that (5.83) is sufficient for the existence of a valid power vector. On the other hand, by Definition 5.40, C˜I (ω) ≤ 1 is a necessary condition for a valid power vector to exist, but C˜I (ω) = 1 is not sufficient because the infimum in (5.83) is not attained. Indeed, due to axiom A2, we see that CI (ω, λp), is a strictly decreasing function of λ > 0 for any p > 0. The consequence is that CI (ω) can only be approached asymptotically as λ → ∞ for some appropriately chosen p ∈ RK ++ (see also [136, Theorem 4.2]). It seems that besides continuity, no other properties that are relevant for our applications can be derived from axioms A1–A3 of Definition 5.51. However, if we consider sub-classes of this axiomatic model, then it may be possible to prove some additional properties for each sub-class. The following is an immediate consequence of the previous definitions. Observation 5.55. Let Ik : RK + → R, k ∈ K, be a standard interference function. Then, we have, 27
Note that CI (ω) is a straightforward extension of C(ω), given by (5.69), to any standard interference function.
5.5 QoS-based Power Control
179
(i) Ik is concave (log-concave) if Ik is given by (5.76) and p → ϕk (p, uk ) is concave (log-concave) for every uk ∈ Uk . (ii) Ik is convex (log-convex) if Ik is given by (5.78) and p → k (p, ξ) is convex (log-convex) for every ξ ∈ X. Proof. The observation follows from the well-known fact that the minimum (maximum) operator preserves the concavity and log-concavity (convexity and log-convexity) property. This can be shown by proceeding along similar lines as in [16, p. 80]. Note that for any fixed uk ∈ Uk , the interference ϕ given by (5.74) is affine in p, and thus it is concave and convex on RK + simultaneously. An Alternative Axiomatic Framework In [136] (see also references therein), another axiomatic framework for interference modeling was introduced: Definition 5.56. Let K ≥ 1 be a given natural number and let k ∈ K be arbitrary. We say that Jk : RK + → R is an interference function (of link k) if the following axioms are fulfilled: A1 Jk (p) > 0 if p > 0 (conditional positivity) A2 Jk (μp) = μJk (p) for all μ > 0 (scale invariance) A3 Jk (p) ≥ Jk (p ) if p ≥ p (monotonicity) It is emphasized that p in the definition is of dimension K that may be different from K (the number of links) (see [136]). This will become clear by the following discussion. Consider a simple example of the linear interference function Jk (p) = (Vp)k .
(5.84)
Unlike the affine interference function in (5.73), the function (5.84) contains no noise so that it is linear in p and scale-invariant (A2 ) but not scalable as required by Axiom A2 in Definition 5.51. Thus, the function (5.84) is an interference function in the sense of Definition 5.56, but it is not a standard interference function. Note that such a noiseless model is appropriate for an interference-limited network, in which the background noise can be neglected on every link. As explained in Sect. 4.3.3 (see also Sect. 5.9), the power constraints in such networks have no impact on the network performance, and thus can be dropped without influencing the generality of the analysis. Despite the scale invariance property, the axiomatic framework for interference functions (according to Definition 5.56) is not confined to noiseless models. Indeed, defining an extended power vector ˜ = (p, pK+1 ) = (p1 , . . . , pK , 1) p
(5.85)
180
5 Resource Allocation Problem in Communications Networks
˜ = (V, z) ∈ RK×K , K = K + 1, the linear and an extended gain matrix V + function (5.84) becomes ˜ p)k = (Vp + z)k p) = (V˜ (5.86) Jk (˜ → R satisfies axioms A1 −A3 , which is equal to (5.73). Obviously, Jk : RK+1 + and thus is an interference function in the sense of Definition 5.56. This simple example shows that certain standard interference functions can be modeled by the above alternative axiomatic framework. Another example can be found in [136]. In a forthcoming publication [138], it will be shown that in fact every standard interference function can be modeled within the axiomatic framework of Definition 5.56. The framework of standard interference functions (Definition 5.51) is a useful theory that provides a basis for many iterative algorithms in the literature (see Sect. 5.5.3). However, there are some cases where the general framework of Definition 5.56 turns out to be more suitable. (a) The scale invariance property A2 is useful for the analysis of interferencelimited networks, as discussed above. Studying a system without noise can provide deeper analytical insight because certain problems become easier to handle. The reader is for instance referred to Sect. 5.9. (b) The framework of Definition 5.56 has been successfully used for the analysis of feasible QoS regions (Sect. 5.3). It was shown in recent works (see [138] and references therein) that many feasible QoS regions can be expressed as sublevel sets or superlevel sets of interference functions. There is a one-to-one correspondence between certain properties of interference functions and properties of QoS regions. Some important properties include comprehensiveness, convexity, and log-convexity. There are applications in resource allocation and axiomatic game theory. (c) Many utility optimization problems can be studied within the axiomatic framework A1 − A3 . For example, consider the function ˜ (w) = max U wk uk (5.87) u∈U
k∈K
where w ∈ ΠK is a weight vector and the utility vector u = (u1 , . . . , uK ) is chosen from some compact set U ⊂ RK + . For instance, uk ≥ 0 could stand for a link rate and wk for a queue backlog as explained in Sect. 5.2.4. Now ˜ (w) fulfills the axioms A1 − A3 , and thus it may be easily verified that U it is an interference function in the sense of Definition 5.56. These examples show that the axiomatic framework of Definition 5.56 offers some interesting analytical possibilities. Most of the results can be transferred to standard interference functions, as shown in [138]. 5.5.3 QoS-Based Power Control Algorithms As already explained at the beginning of Sect. 5.5, QoS-based power control is a user-centric approach, in which the QoS requirements (represented by
5.5 QoS-based Power Control
181
the QoS vector ω and expressed in terms of some SIR targets) need to be satisfied permanently. In other words, the objective is to find p ∈ P so that (5.64) is fulfilled, which is equivalent to finding a feasible power allocation. For simplicity, in this section, it is assumed that (C.5-13) there are no power constraints, implying that the feasible power region and the valid power region coincide (see Definition 5.40). The case of no power constraints is concisely captured by assuming that P = RK + , which violates the compactness of P assumed in Sect. 4.3.3. In contrast to utility-based power control problems, the neglect of power constraints is possible in the case of QoS-based power control and does not impact the generality of the presentation. Indeed, the reader will easily realize that if a point of attraction of any power control algorithm presented in this section is valid but infeasible due to the violation of some power constraints, then no feasible power vector exists and there is no way to simultaneously satisfy given SIR targets and all power constraints. On the other hand, it must be pointed out that if the SIR targets are not feasible, then it may be impossible to implement the algorithms due to strict limitations on transmit powers in practical systems. A straightforward remedy for this problem is to project power updates on the admissible power region P. We briefly address this issue in the context of the fixed-point algorithm (page 186), which is the only algorithm presented here working for any standard interference function. In the case of affine interference functions, the valid power region is denoted by P(ω) and is given by (5.66). Let PI (ω) be the valid power region for any given standard interference function. From (5.72) and (5.64), it follows that PI (ω) = p ∈ RK (5.88) + : ∀k∈K pk ≥ γk Ik (p) . By Theorem 5.54 and Definition 5.40, PI (ω) = ∅ if and only if CI (ω) < 1. Moreover, by (C.5-13), PI (ω) ⊂ RK + is the feasible power region. Finding a valid power allocation is a relatively easy task, but the problem is that not every valid power allocation provides a desired network performance. For instance, due to a limited battery life of wireless devices and constraints on interference caused to other systems, an overall goal may be to satisfy given QoS requirements with links transmitting at powers as low as possible. A common approach is to minimize the total transmit power, which is equal to the sum of all transmit powers [63, 136]. The following result shows that under the assumption of a standard interference function, this is equivalent to finding the minimum point of PI (ω), which is also referred to as the minimum valid power vector (allocation). The following theorem is a minor extension of [55, Theorem 1] (it exploits Brouwer’s theorem to conclude the existence of a fixed point). K Theorem 5.57. For any ω ∈ QK with PI (ω) = ∅, let I : RK + → R++ be a standard interference function defined by (5.77) such that SIRk (p) = pk /Ik (p), k ∈ K. Then, PI (ω) has the minimum point (see Definition B.4)
182
5 Resource Allocation Problem in Communications Networks
and p ∈ PI (ω) is the minimum point of PI (ω) if and only if p = I(p ) which is equivalent to ∀k∈K SIRk (p ) = γk . Proof. Let a > 0 be any vector such that a ≥ I(a). Such a vector exists since PI (ω) = ∅ and its positivity follows from axiom A1 of Definition 5.51. Suppose that T = {p ∈ RK + : p ≤ a} and note that T with T ∩ PI (ω) = ∅ is a convex and compact subset of RK . Moreover, considering axiom A3 of Definition 5.51 shows that, for any p ∈ T, we have a ≥ I(a) ≥ I(p) ∈ T, which implies that the image of T under I is a subset of T (I maps T into T). Thus, since I is continuous (by Theorem 5.53), it follows from Brouwer’s theorem [139, p. 26] that I has a fixed point p = I(p ) ∈ T. By (5.88), p ∈ PI (ω). Now proceeding essentially as in [55, Theorem 1], we prove by contradiction that p is the minimum point of PI (ω) (which is always unique). So, assume that there is p ∈ PI (ω), p = p , with pk ≤ pk for some k ∈ K. Then, since p > 0 due to p ∈ PI (ω), we can always find a real number μ > 1 and an index k0 ∈ K such that μp ≥ p and μpk0 = pk0 . Hence, by A2 and A3, one obtains pk0 = γk0 Ik0 (p ) ≤ γk0 Ik0 (μp) < μγk0 Ik0 (p) ≤ μpk0 where the last inequality holds for all p ∈ PI (ω). This contradicts μpk0 = pk0 , and thus proves the theorem. As PI (ω) has the minimum point, it follows from Theorem B.5 that in this K T point, RK + → R+ : p → w p attains its minimum over PI (ω) for every fixed w > 0. Therefore, the problem of finding the minimum valid power allocation, which is the minimum point of PI (ω), can be stated as follows. Corollary 5.58. Let PI (ω) = ∅. Then, p ∈ PI (ω) is the minimum valid power allocation if and only if p = arg min wT p , p∈PI (ω)
for all w > 0 .
(5.89)
Note that p is the unique minimizer, regardless of the choice of w > 0. In particular, if we choose w = 1, then the corollary shows that, if existent, the minimum valid power allocation minimizes the total sum of transmit powers p1 subject to the SIR targets. The problem (5.89) is easily solved in a centralized manner if Ik is an affine interference function of the form (5.73). As shown in Sect. 5.3 (Lemma 5.24 on page 152), the vector p(ω) given by (5.54) is the minimum point of PI (ω) = P(ω) = ∅. Recall that by Theorem A.51, p(ω) > 0 is unique and exists if and only if CI (ω) = ρ(Γ(ω)V) < 1 (see also Observation 5.50). For a general standard interference function, such an explicit characterization of the minimum point of PI (ω) seems to be not possible and algorithms are usually applied to search for the minimum point iteratively. Also, if the interference function is twice continuously differentiable, as in the case of the affine interference function (5.73) and the minimum interference function
5.5 QoS-based Power Control
183
(5.75), then powerful Newton-based methods or primal-dual methods can be used to find the optimal power vector in the sense of (5.89). In the following, we present and briefly discuss algorithmic approaches to the problem (5.89). For a more elaborate discussion, the reader is referred to the literature [54, 55, 140, 141] and [136, Chap. 5]. We also remark that (5.89) can be solved by the generalized Lagrangian approach presented in Sect. 6.8, but for a slightly different class of interference functions (specified later in Sect. 6.8). Fixed-Point Algorithm By Theorem 5.57, p is the minimum point of PI (ω) if and only if p = I(p). In other words, p is the sought minimum valid power vector p given by (5.89) if and only if it is the fixed point of the vector-valued interference function I given by (5.77). Note that, as a consequence of this, if PI (ω) is not an empty set, which is assumed in all that follows, then, by Theorem 5.57 and the fact that the minimum point (if it exists) is always unique (see Definition B.4 and the remark thereafter), the fixed point exists and is unique. This suggests using a fixed-point power control algorithm to find the optimal solution to (5.89). Such a power control algorithm was proposed by [54] for affine interference functions. Later Reference [55] showed that the fixedpoint iteration converges to the fixed point or, equivalently, the minimum point (if existent) for any interference function satisfying axioms A1–A3 of Definition 5.51. The algorithm of [55] is referred to as the standard power control algorithm. Algorithm 5.1 Standard Power Control Algorithm Input: n = −1, p(0) ≥ 0, ω ∈ QK such that PI (ω) = ∅. Output: p ∈ PI (ω) 1: repeat 2: n=n+1 3: p(n + 1) = I(p(n)) = γ(ω1 )I1 (p(n)), . . . , γ(ωK )IK (p(n)) 4: until p(n) − p(n + 1) 1 < 5: p = p(n + 1)
The reader should bear in mind that the vector-valued interference function I(p), p ≥ 0, depends on the SIR targets (see (5.77)). Also, it is important to notice that the initial power allocation p(0) can be any nonnegative vector, even the zero vector 0 ∈ RK . The basic intuition behind the algorithm is made probably most clear by writing the iteration for link k ∈ K as follows: pk (n + 1) =
γ(ωk ) pk (n), SIRk (p(n))
n ∈ N0
(5.90)
184
5 Resource Allocation Problem in Communications Networks
where SIRk (p) is defined by (5.72) with a standard interference function in the denominator. Thus, in every iteration, each link scales down or up its current power iterate depending on whether the SIR target is satisfied or not, with no change if and only if the target is met with equality. The larger the discrepancy between the actual SIR and the corresponding SIR target, the larger is the deviation of the scaling factor from unity. It is important to emphasize that the fixed-point algorithm may have one or more additional loops contained in the main loop when each iteration involves an additional optimization problem. This is usually the case when the respective interference functions are defined as a minimum or maximum of some intricate problem so that they are not known explicitly. For instance, the minimum interference function defined by (5.76) is, in most practical cases, not known explicitly due to the lack of a closed-form solution to the minimization problem in (5.76). For this reason, it is a common practice to numerically compute the value of such interference functions in each iteration using some optimization or search method. This leads to an alternating structure of the algorithm, where the power updates and the computation of the interference function for a given power allocation are performed in an alternating fashion. So, the step 3 of Algorithm 5.1 in fact consists of the following two sub-steps: 3a: Determine the value of Ik (p(n)) for each k ∈ K. 3b: p(n + 1) = I(p(n)) The sub-step 3a may involve a combinatorial (discrete) search method if, for every p ≥ 0, the interference functions Ik (p), k ∈ K, take a finite number of values. This may occur if, for instance, the optimization variable u in (5.76) is confined to belong to some discrete set Uk . Here, one can think of receive beamforming vectors that, due to certain hardware constraints, can be chosen from a finite set of feasible beamforming vectors. In contrast, if the interference functions are continuously differentiable on some connected compact set, then a standard optimization method can be applied to perform the sub-step 3a (see also the remarks on Newton-based algorithms at the end of this section). The iteration pk (n + 1) = γk Ik (p(n)) can be implemented in a distributed manner provided that the value of Ik (p(n)) can be determined from local measurements and without an excessive coordination between the nodes. Typically, each receiver estimates or, in general, determines Ik (p(n)) during a training phase and/or the data transmission phase, and then feeds the estimate back to the corresponding transmitter using a low-rate control channel. Based on this knowledge, each transmitter can compute the (effective) interference power at the desired receiver, from which a new update of the power vector is obtained. So, the (n + 1)th power vector is obtained recursively from the initial power allocation as p(n + 1) = I n (p(0)) := I(I(· · · I(p(0)) · · · )) .
(5.91)
It is important to underline that perfect synchronism at the symbol level is not necessary for the algorithm to work in practice. Everything one needs is
5.5 QoS-based Power Control
185
a coarse synchronization between the links to ensure that each receiver can obtain a sufficiently good estimation of non-outdated information on the interference power caused by other links. The required degree of synchronization depends on the time period between two subsequent power updates. However, even if a sufficiently good synchronization cannot be established, there is an asynchronous version of the algorithm in [55], for which the convergence to the minimum valid power vector is shown under some additional conditions. Convergence of the Fixed-Point Algorithm Now we address the problem of convergence. The following theorem, which was proven in [55], shows a global convergence to the minimum valid power vector. K Theorem 5.59. If CI (ω) < 1 and I : RK + → R++ is a standard interference function, then the sequence of power vectors {p(n)}n∈N0 generated by Algorithm 5.1 converges to the minimum valid power vector, which is the unique minimizer of (5.89). Moreover,
(i) the sequence is component-wise strictly increasing if p(0) ≤ p for all p ∈ PI (ω), and (ii) the sequence is component-wise strictly decreasing if p(0) ∈ PI (ω). Proof. The proof is deferred to Sect. 5.10. See also [55]. A remarkable fact about this result is that global convergence is ensured for any standard interference function, provided that there is a solution to the problem (5.89). If there is no solution, in which case one has CI (ω) ≥ 1 or, equivalently, PI (ω) = ∅, then the sequence generated by Algorithm 5.1 diverges in the sense that pk (n) → ∞ for each k ∈ K, which can be deduced from (5.90). In this case, a given standard interference function I : RK + → has no fixed point as p ≤ I(p) for all p ≥ 0 with at least one strict RK ++ inequality. In practice, however, transmit powers cannot diverge to infinity due to the constraints on transmit powers. In other words, practical algorithms inherently perform some form of the projection operation on the feasible power region. References [55] and [142] incorporated power constraints into fixedpoint algorithms and established their convergence. In the case of individual power constraints on each link, each transmitter, say transmitter k, sends at the maximum transmit power Pk > 0 whenever the power update pk (n + 1) given by (5.90) exceeds the value Pk . The convergence rate of Algorithm 5.1 is strongly influenced by the value of CI (ω), which is an indicator for the effective system load28 and must be strictly smaller than 1 for the algorithm to converge. However, the closer CI (ω) to 1 is, the slower the convergence [136] (see App. B.4.1 for common 28
The effective system load indicates how large is the overall interference level in relation to the SIR targets.
186
5 Resource Allocation Problem in Communications Networks
definitions of convergence rate). The main disadvantage of the fixed-point algorithm is a relatively strong dependence of the convergence rate on the effective system load. So, in congested wireless networks in which either the number of links or their QoS requirements are relatively high, the algorithm is expected to converge slowly to the minimum valid power allocation. Remarks on Power Constraints As we neglected the power constraints (Assumption (C.5-13)), the fixed-point algorithm will violate them if the minimum valid power vector is not admissible. An obvious idea is to map the power updates p(n + 1), n ∈ N0 , of Algorithm 5.1 into the set of admissible power vectors P. In this context, the ¯ ∈ P, p ¯ = (¯ following observations is useful [55]: For any fixed p p1 , . . . , p¯K ) > 0, K → R with the function I¯ = (I¯1 , . . . , I¯K ) : RK + ++ I¯k (p) := I¯k (p, ω) := min p¯k , γ(ωk )Ik (p) , k ∈ K (5.92) is standard whenever Ik is a standard interference function. Of course, this is also true if, for each k ∈ K, ∗ ¯ = arg min p − I(p)2 (5.93) I¯k (p) := I¯k (p, ω) := min p¯∗k , γ(ωk )Ik (p) , p p∈P
¯ ∗ > 0 in (5.93) is the projection with I(p) given by (5.77). By Theorem B.45, p of I(p) on P, which exists and is unique as P is compact.29 The standard property of I¯ can be easily shown by verifying that both (5.92) and (5.93) satisfy axioms A1–A3 of Definition 5.51. Furthermore, notice that, for any p ∈ / P, p ≥ 0, there is an index k ∈ K such that pk > p¯k . Thus, we must have CI¯(ω) = inf p>0 maxk∈K (γ(ωk )I¯k (p)/pk ) < 1. As an immediate consequence of these observations and Theorem 5.59, we can conclude that the iteration (5.94) p(n + 1) = I¯ p(n) , n ∈ N0 ¯ ), regardless converges to the unique fixed point p ∈ P of I¯ given by p = I(p K of the choice of ω ∈ Q . Moreover, we have p(n) ∈ P for every n ∈ N0 . It is important to notice that the function I¯k is not an interference function because the value I¯k (p) is not necessarily equal to the interference power at ¯ the output of the kth receiver. The consequence is that p ≥ I(p) does not ¯ need to imply SIRk (p) ≥ γ(ωk ) as p¯k < γ(ωk )Ik (p) may hold. But, p ≥ I(p) ¯ ≥ I(p), in which case the fixed does imply SIRk (p) ≥ γ(ωk ) if (and only if) p ¯ ) is the minimum valid power vector. point p = I(p
29
Positivity of p∗ immediately follows from the fact that I(p) is positive for all p ≥ 0 (axiom A1).
5.5 QoS-based Power Control
187
Remarks on Stochastic Fixed-Point Algorithms A further and no less important aspect is the question of how to cope with the noisy measurements of the signal-to-interference ratios that are made available to the corresponding transmitters. In practice, there will be at least two main sources of the noise: The estimation noise due to the erroneous SIR estimations at the receivers and the quantization noise due to a limited capacity of the feedback channels. Now if the observed SIRs at the transmitters are not sufficiently good estimates of the real SIRs, then the algorithm must be modified and the modified algorithm must be analyzed within the framework of stochastic approximation [143]. Here, we confine our attention to one possible modification of the fixed-point algorithm as well as discuss the basic idea behind stochastic approximation. For more details, an interesting reader is referred to [143], which is the standard book on stochastic approximation. We also refer to Sect. 6.6.5, where a gradient-projection algorithm is analyzed under the assumption of a noisy gradient estimate. ˜ Let I(p(n)) denote a noisy version of I(p(n)) provided to the transmitters ˜ at time n so that {I(p(n)) − I(p(n))}n∈N0 is the corresponding estimation noise. From practical point of view, it is reasonable to make assumptions ˜ on the sequence of random variables {I(p(n))} n∈N0 such as boundedness for every time n or even some form of independence between different random variables (see [143] and Sect. 6.6.5). These assumptions are however irrelevant for explaining the basic idea, which is our main objective here. An intuitive approach
n to ˜the problem of noisy measurements is to utilize in place of the instantaneous interference the averages 1/(n + 1) j=0 I(p(j)) ˜ measurements I(p(n)). More precisely, instead of the fixed-point algorithm, it may be meaningful to consider a stochastic version of this algorithm: 1 ˜ I(p(j)), n + 1 j=0 n
p(n + 1) =
n ∈ N0 .
(5.95)
˜ Note that if the random variables I(p(n)), n ∈ N0 , were mutually independent and identically distributed with a finite variance, then p(n + 1) would be the n ˜ linear least square estimate of its unknown mean value given {I(p(j))} j=0 . Now, Reference [144] realized that considering all previous observations as in (5.95) to compute new updates is not necessary and inefficient, especially when one is not interested in the intermediate observations. In such cases, a more efficient algorithm is obtained if (5.95) is written in the following recursive form: p(n + 1) =
( ) 1 ˜ n ˜ p(n) + I(p(n)) = p(n) − δ(n) p(n) − I(p(n)) n+1 n+1
where δ(n) = 1/(n + 1) and n ∈ N0 . From this, we learn that an appropriate use of a sequence of decreasing step sizes {δ(n)}n∈N0 with δ(n) = 1/(n + 1) has the same effect as a direct averaging of all observations [144]. Yet the
188
5 Resource Allocation Problem in Communications Networks
sequence {δ(n)}n∈N0 , δ(n) = 1/(n+1), is only one possibility and the choice of the step size sequence is a central problem in the design of recursive stochastic with δ(n) > 0, n ∈ N0 , algorithms [143]. It is usually required that {δ(n)}n∈N
0 ∞ is a non-increasing sequence, limn→∞ δ(n) = 0 and n=0 δ(n) = +∞, where the last condition ensures that the step sizes do not decrease too fast so as to exploit the effect of averaging. Now let us assume that {δ(n)}n∈N0 is some appropriately chosen step size sequence. Then, the preceding discussion suggests stochastic fixed-point algorithms for the problem (5.89) of the form (5.96) p(n + 1) = Jδ(n) p(n) , p(0) ≥ 0 where
˜ ˜ Jδ (p) := (1 − δ)p + δ I(p) = p − δ p − I(p) .
(5.97)
Algorithms of the form (5.96) are called stochastic QoS-based power control algorithms. Such algorithms were already mentioned in [55] and analyzed in ˜ [140] and [141] under some assumptions on the noisy observations I(p(n)), n∈ N0 . In [141], the notions of standard and quasi-standard stochastic interference functions were introduced, with the goal of designing so-called standard and quasi-standard stochastic power control algorithms. Centralized Matrix-Based Algorithm The fixed-point algorithm converges to the minimum valid power vector for any standard interference function. In particular, this is true for the minimum interference function of the form (5.75). This interference function is of great interest in practice since it models the interference power in networks that employ optimal linear receivers in the sense of maximizing the signal-tointerference ratio at each receiver output (see also Sect. 4.3.2). The matrix-based structure (5.76) of the minimum interference function can be exploited to design an iteration that offers a significantly improved rate of convergence when compared with the fixed-point algorithm. The following alternating iteration was proposed in [136], where the improvement of the convergence rate is achieved at the expense of suitability to distributed implementation. Remark 5.60. In the description of the matrix-based algorithm (Algorithm 5.2), the set Uk is assumed to be the unit sphere in CW for some W ≥ 1. Under this assumption and the assumption of perfect synchronization, Sect. 4.3.2 derives the optimal receivers in the sense of step 3 of Algorithm 5.2 under the assumption of perfect synchronization. It is shown in [136] that the sequence of power vectors generated by Algorithm 5.2 converges to the minimum valid power vector p defined by (5.89). It further turns out that in each step, the above algorithm has a better convergence behavior than the fixed-point algorithm in the following sense: If both
5.5 QoS-based Power Control
189
Algorithm 5.2 Matrix-Based Algorithm Input: n = −1, ω ∈ QK such that PI (ω) = ∅, p(0) ∈ PI (ω), > 0. Output: p ∈ PI (ω) 1: repeat 2: n=n+1
3: uk (n) = arg minu∈Uk l∈K vk,l (u)pl + zk (u) , k ∈ K −1 4: p(n + 1) = I − Γ(ω)V u(n) Γ(ω)z u(n) , u(n) = (u1 (n), . . . , uK (n)) 5: until p(n) − p(n + 1) 1 < 6: p = p(n + 1)
algorithms start with the same p(0) ∈ PI (ω), then, in each step, the power vector computed by the matrix-based algorithm is component-wise smaller than or equal to the vector generated by the fixed-point iteration. Another advantage is that, after each power allocation step (step 4), the SIR targets, which are feasible, are met exactly so that SIRk (p) = γ(ωk ) for each k ∈ K. This is however not true anymore after step 3. Using methods of [145], it can be shown that Algorithm 5.2 exhibits a superlinear convergence [146] (see also Sect. B.4.1 for some basic definitions of convergence rate). The numerical experiments show very fast convergence to the optimal power vector, with the convergence behavior being, to a large extent, independent of the effective system load, which is represented by CI (ω) < 1. The reader is referred to [136] for further discussion. In comparison with the fixed-point iteration, however, Algorithm 5.2 has two serious disadvantages. First of all, the algorithm is clearly not amenable to distributed implementation. The main problem is the computation of the power update in step 4 of Algorithm 5.2, which requires the knowledge of the current gain matrix, the noise vector and the SIR targets of all links. Here note that the gain matrix depends on all transmit and receive strategies as well as on the state of the wireless channels. The second main disadvantage is that the start point for the algorithm must be a valid power vector. Summarizing, we can say that the algorithm can be of interest only for centralized networks, where some pre-selected node (network controller or base station) has all the global information which is needed to compute the power update and send the corresponding transmit powers to the other nodes. Newton-based Algorithm If the interference functions Ik : RK + → R++ , k ∈ K, are standard and continuously differentiable, Newton-like methods can be considered as an alternative centralized approach for computing the minimum valid power vector [147]. Thereby, a crucial precondition for implementing a Newton-like iteration is the knowledge of all partial derivatives of the interference functions. In other words, the Jacobian matrix of the vector-valued interference function
190
5 Resource Allocation Problem in Communications Networks
K I : RK + → R++ given by (5.77), or its sufficiently good estimate, is presumed to be known at a network controller. As before, let PI (ω) = ∅, and let p be the unique fixed point of I such that p = I(p ), which is the sought minimum valid power vector. Suppose that we are going to find the fixed point by searching for an equilibrium of the following discrete system −1 E(p(n)), n ∈ N0 (5.98) p(n + 1) = p(n) − A(p(n))
where E : RK → RK is the error function defined to be E(p) = p − I(p),
E(p ) = 0
(5.99)
and A(p(n)) ∈ RK×K , n ∈ N0 , is a suitable matrix such that (A(p(n)))−1 exists for every n ∈ N0 . In what follows, we assume that (C.5-14) the Jacobian ∇E(p) = (∂E(p)/(∂pi ∂pj )) exists and its entries are continuous. Furthermore, it is assumed that ∇E(p ) is nonsingular so that (∇E(p ))−1 exists. This condition is, for instance, satisfied by (5.75), which is the interference function under the optimal receiver in a perfectly synchronized network. Now the standard (or pure) Newton iteration is obtained from (5.98) by letting A(p(n)) = ∇E(p(n))), n ∈ N0 . Standard results (see for instance [146, pp. 90–91]) show that if p(0) ∈ Br (p ) for a sufficiently small r > 0 (see Definition B.1), then the standard Newton algorithm converges to p superlinearly. Moreover, if E is Lipschitz continuous on Br (p ) and (∇E(p))−1 2 is bounded above on Br (p ) by some constant, then we have quadratic quotient convergence to p for some sufficiently small r > 0. The definition of quadratic quotient convergence is provided in App. B.4.1. Although, by Theorem 5.57, p is the unique nonnegative power vector satisfying E(p) = 0, a global convergence of the standard Newton algorithm cannot be guaranteed in general even if (C.5-14) holds. The main problem is that ∇E(p) may be indefinite and/or singular for some p ≥ 0. Consequently, (∇E(p(n)))−1 E(p(n)) may not be a desired direction or the algorithm may break down completely due to the singularity. Further problems may appear when E(p) is not Lipschitz continuous on RK and (∇E(p))−1 2 is not bounded above by some constant. In such cases, the algorithm can even violate the nonnegativity constraint of the power vector. For these reasons, modifications to the standard Newton iteration are usually needed and they depend on properties of the interference function at hand. It is in particular desired that A(p(n)) in (5.98) is a positive definite matrix and suitably bounded so that ρ(A(n) ) < 1, n ∈ N0 . This suggests the following modifications to the standard Newton iteration [146]: 1) Choose a diagonal positive definite matrix D(n) such that A(p(n)) = ∇E(p(n)) + D(n) is positive definite. 2) Choose the damping factors δ(n), n ∈ N0 to be sufficiently small to ensure monotonicity.
5.6 Max-Min SIR Balancing Power Control
191
Finally, we point out that if Ik : RK + → R++ is the minimum interference function defined by (5.76), then the function E : RK → RK is order-convex on its domain, in which case it is also referred to as being convex [148, Definition 13.3.1]. Thus, in this special case, it follows from [148, pp. 461–462] that the Newton algorithm (5.98) converges to p , provided that (A(p))−1 and E(p) are suitably bounded on RK (see the conditions in [148, Theorem 13.4.7]).
5.6 Max-Min SIR Balancing Power Control This section is devoted to the problem of characterizing a max-min SIRbalanced power vector (allocation) defined as follows. Definition 5.61 (Max-Min SIR-balanced power allocation). Let γ : Q → R++ be given and let ω ∈ QK be any QoS vector chosen such that ¯ (ω) is said to be max-min SIR-balanced if (C.5-11) holds. Then, p ¯ (ω) := arg max min p p∈P
k∈K
SIRk (p) pk = arg max min . k∈K γ(ωk )Ik (p) γ(ωk ) p∈P
(5.100)
The function γ and the QoS vector ω are defined in Sect. 5.3. The reader is also referred to Sect. 5.5.1 and, especially, Remark 5.39. Even though Ik : RK + → R++ in (5.100) can be any standard interference function (Definition 5.51) or even an interference function in the sense of Definition 5.56 (see also Sect. 5.9) we confine our attention throughout this section to affine interference functions. So, we assume that
(C.5-15) Ik (p) = (Vp + z)k = l vk,l pl + zk , p ≥ 0, k ∈ K. The assumptions (C.5-10) and (C.5-11) continue to hold so that we have zk > 0 and γk := γ(ωk ) > 0, k ∈ K. Note that the latter assumption is equivalent to saying that the diagonal matrix Γ = diag(γ1 , . . . , γK ) is positive definite. As in the previous section, γk and γ(ωk ) are used exchangeably if ωk is given (with the latter notation used if we wish to emphasize the dependence on ωk ). The same holds for Γ and Γ(ω). It is important to notice that due to (C.5-15) and Theorem B.11, the maximum in (5.100) exists as mink∈K (SIRk (p)/γk ) is continuous on the compact set P. Thus, one crucial difference to the previous section is that the value γk in the max-min SIR problem formulation is not necessarily met under optimal power control. This is simply because a max-min SIR power vector exists regardless of whether the SIR targets are feasible or not in the sense of the previous section. For this reason, γk can also be interpreted as a desired SIR value of link k. The goal of the max-min SIR balancing power control is then to maximize the worst-case performance with respect to the ratio of the actual SIR value to the desired one. Notice that the question which link is the “worst” one depends on allocated transmit powers. Definition 5.61 is a straightforward extension of the following (widely-used) definition to the case of weighted signal-to-interference ratios.
192
5 Resource Allocation Problem in Communications Networks
Definition 5.62 (Max-Min SIR power allocation). If γ = γ1 = · · · = ¯ (ω) defined by (5.100) is called a max-min SIR power allocaγK > 0, then p tion. ¯ (ω) given by (5.100) is a max-min SIR power vector By the definition, p if Γ = cI for some c > 0. The word “balanced” is used to emphasize the weighting in (5.100). However, if there is no risk of misunderstanding, we may ¯ (ω) as max-min SIR power drop the word “balanced” for brevity and refer to p vector (allocation). The max-min SIR-balanced approach is a widely studied strategy for allocating transmit powers in wireless networks [64, 109]. By Observation 5.48 (see also the discussion preceding Observation 5.48 and Remark 5.49), a key feature of this strategy is that any QoS vector ω is feasible in the sense of Definition 5.42 if and only if this QoS vector is met under a max-min SIRbalanced power allocation. Moreover, as pointed out in Sect. 5.2.6, the special case of max-min SIR power control (as specified in Definition 5.62) leads to max-min fairness in the sense of Definition 5.16, provided that the gain matrix fulfills some conditions. Remark 5.63. Notice that in this book, max-min fairness always corresponds to the max-min fair rate allocation, which is achieved under the max-min fair power allocation considered in Sect. 5.2.6. Regarding the notion of max-min fairness in communications networks, we also refer the reader to a discussion in Sect. 5.1.1. Sect. 5.2.6 deals with the trade-off between throughput and fairness performance in wireless networks. In the noiseless case with Γ = cI, c > 0, it can be seen from Sect. 5.9 that if ¯ (ω) is a positive right eigenvector of V, V ≥ 0 is an irreducible matrix, then p which is unique up to a scaling factor. We will see in Sect. 5.6.2 that a similar result holds in noisy channels subject to a sum power constraint. Thus, the max-min SIR-balanced power control problem reduces in both cases to finding a unique positive right eigenvector of some nonnegative irreducible matrix. It ¯ (ω) is not possible for general seems that such an elegant characterization of p power constraints, in which case we can only provide a semi-analytical solution that involves a search of the maximum spectral radius in a set of N spectral radii corresponding to individual power constraints on different nodes. The case of general power constraints is considered in Sect. 5.6.3. 5.6.1 Some Preliminary Observations ¯ (ω) of Definition 5.61 is positive. This is First of all, we point out that p simply because int(P) = ∅ and, due to (C.5-10) and (C.5-11), we have mink∈K (SIRk (p)/γk ) = 0 if and only if p ≥ 0 is a nonpositive power vector (see also Lemmas 5.65 and 5.70). It is further important to notice that ¯ (ω) is not necessarily unique. For instance, for two mutually orthogonal p links (V = 0) subject to individual power constraints P1 > 0 and P2 > 0 with
5.6 Max-Min SIR Balancing Power Control
193
P1 < P2 , γ(ω1 ) = γ(ω2 ) and z1 = z2 , any pair (p1 , p2 ) such that p1 = P1 and p2 ∈ [P1 , P2 ] solves the problem (5.100). This is illustrated in the left picture of Fig. 5.10). It is interesting to point out that in this toy example, (P1 , P2 ) is the max-min fair power allocation (5.33) corresponding to the max-min fair rate allocation (Definition 5.16). Obviously, this vector also solves the max-min SIR balancing problem (5.100). As the example shows, however, the converse does not hold in general so that a solution to the max-min SIR balancing problem is not necessarily the max-min fair power allocation. The reason for the lack of uniqueness in this example is that the links are completely decoupled. One possible coupling is
through common power ¯ (ω) constraints as in the case of a sum power constraint k pk ≤ Pt , where p can be shown to be unique for any V ≥ 0 (see also Sect. 5.6.2). For general power constraints, the uniqueness is ensured if V is irreducible since then the links are mutually dependent through the interference. In order to see this, it is convenient to reformulate the max-min SIR balancing problem. This ¯ (ω) numerically for alternative problem formulation can also be used to find p general power constraints (Algorithm 5.4 in Sect. 5.6.3). A simple but important observation is that the problem (5.100) is equivalent to finding the largest positive threshold t such that t ≤ SIRk (p)/γ(ωk ) for all k ∈ K and p ∈ P. The constraints can be equivalently written in a matrix form as Γ(ω)z ≤ (1/tI − Γ(ω)V)p and p ∈ P. So, as p must be a positive vector (due to (C.5-11)), Theorem A.51 implies that the threshold t must satisfy ρ(Γ(ω)V) < 1/t . (5.101) Now, one particular solution to the max-min SIR balancing problem (5.100) ¯ (ω) given by is p 1 −1 ¯ (ω) = I − Γ(ω)V p Γ(ω)z (5.102) t where ( 1t I − Γ(ω)V)p = Γ(ω)z t = arg max t subject to (5.103) p ∈ P. t≥0 ¯ (ω) is a max-min SIR-balanced power vector such that ¯ = p Note that p p )/γk = SIRl (¯ p )/γl for each k, l ∈ K. This immediately follows from SIRk (¯ (5.102) when it is written as a system of K SIR equations. By (5.101), (5.102), ¯ (ω) > 0 exists and is a unique power vector (5.103) and Theorem A.51, p corresponding to a point in the feasible SIR region that is farthest from the origin in a direction of the unit vector γ/γ1 . This is illustrated in Fig. 5.10. The interpretation gives rise to a simple algorithm that is presented at the end of Sect. 5.6.3 (Algorithm 5.4). From Sect. 5.4.3, we know that the feasible SIR region is not a convex set in general, which can also be seen from Fig. 5.10. The nonconvexity property might be undesired since then the power vector given by (5.102) does not necessarily maximize a weighted sum of SIRs over the feasible SIR region. On
194
5 Resource Allocation Problem in Communications Networks
SIR2
SIR2 (¯ γ1 , γ¯2 )
(¯ γ1 , γ¯2 ) (¯ γ1 , γ¯2 )
γ
Fγ (Pi)
γ
SIR1
Fγ (Pi) SIR1
Fig. 5.10: The feasible SIR region (γ(x) = x, x > 0) under individual power constraints Pi and two different gain matrices V ≥ 0. The following notation is used: ¯ (ω) and p ¯ (ω) are defined by p(ω)) and γ¯k = SIRk (¯ p (ω)) where p γ¯k = SIRk (¯ (5.100) and (5.102), respectively. Left: V is chosen so that SIR2 (p) = p2 /z1 , in ¯ (ω) is not unique. Right: V is irreducible, in which case p ¯ (ω) is unique which case p and equal to (5.102).
¯ (ω) can also be characterized by considering any the positive side, however, p continuous strictly increasing function of mink∈K SIRk (p)/γk . As pointed out at the end of Sect. 5.3 (page 160; see also Sects. 1.2.4, 1.7 and 5.9.1), we have φ(mink∈K SIRk (p)/γk ) = mink∈K φ(SIRk (p)/γk ) for any strictly increasing function φ : R+ → R. If φ is strictly decreasing, we need to replace the minoperator by the max-operator. Thus, for any continuous function φ : R → R, one has ⎧ ⎪ φ(SIRk (p)/γ(ωk )) φ is strictly increasing ⎨arg max min k∈K p∈P ¯ (ω) = (5.104) p ⎪ ⎩arg min max φ(SIRk (p)/γ(ωk )) φ is strictly decreasing p∈P
k∈K
¯ (ω) > 0 is a max-min SIR-balanced power vector defined by (5.100). where p With (5.104) in hand, we can use the results of Sect. 5.3 to prove a sufficient ¯ (ω) under general power constraints. To this condition for the uniqueness of p end, given some continuous bijection φ : R++ → Q ⊆ R with the inverse function denoted by φ−1 : Q → R++ , we define F = ω ∈ QK : ∃p∈P ∀k∈K φ−1 (ωk ) ≤ SIRk (p)/γ(ωk ) ⊂ RK . (5.105) According to Sect. 5.3, the set F can be interpreted as a feasible QoS region when the QoS value for link k ∈ K is defined to be φ(SIRk (p)/γ(ωk )). By Thep (ω)) orem 5.32, F ⊂ RK is a convex set if φ−1 is log-convex. Moreover, SIRk (¯ ¯ (ω) defined by (5.102) corresponds to a point on the boundary of F with p where all links have the same QoS values (as defined above). And finally, as in the case of the feasible QoS region, there is a one-to-one correspondence between F and P+ (see Observation 5.28). Now we use these facts in the proof of the following observation.
5.6 Max-Min SIR Balancing Power Control
195
Observation 5.64. Suppose that z and Γ(ω) with (C.5-10)–(C.5-11) are ¯ (ω) given by (5.100) is unique and given. If V ≥ 0 is irreducible, then p ¯ (ω) defined by (5.102). equal to p ¯ (ω) ∈ P be ¯=p ¯ (ω) ∈ P be any solution to (5.100) and let p ¯ = p Proof. Let p ¯ > 0 and p ¯ > 0, we can assume φ : R++ → Q ⊆ R defined by (5.102). Since p in (5.104) to be any strictly increasing function such that its inverse function is log-convex. Suppose that F is given by (5.105) and note that by Theorem 5.32, p)/γk ), k ∈ K, F is a convex downward comprehensive set. Let ω ¯ k = φ(SIRk (¯ ¯ = (¯ p )/γk ), k ∈ K. Clearly, one has ω ω1 , . . . , ω ¯K ) ∈ F and let ω ¯ k = φ(SIRk (¯ ¯ = (¯ ω1 , . . . , ω ¯K ) ∈ F. By strict monotonicity of φ, it can be further and ω seen from (5.102) with (5.103) that at least one power constraint is active at ¯ . So, by (5.59), ω ¯ ∈ ∂F, that is, ω ¯ is a boundary point of F. Thus, by p irreducibility of V and Theorem 5.35, there must exist a weight vector w > 0 ¯ − ω) ¯ ≥ 0. Due to positivity of w, this implies that there is such that wT (ω ¯ j . On the other hand, however, we at least one index j ∈ K such that ω ¯ j > ω ¯ This is simply because ω ¯ ≤ ω. ¯ 1 = · · · = ω ¯K = mink∈K φ(SIRk (¯ p)/γk ). have ω ¯ =ω ¯ , and therefore, by one-to-one Combining both inequalities shows that ω ¯ = p ¯ , which is unique by correspondence between F and P+ , we obtain p (5.102) and Theorem A.51. 5.6.2 Characterization under Sum Power Constraints ¯ (ω) ∈ P defined by (5.100) Our objective in this section is to characterize p under the assumption that P = {p ∈ RK + : p1 ≤ Pt } for some given Pt > 0. As mentioned in Sect. 4.3.3, this assumption corresponds for instance to the important downlink communication scenario from a single base station to multiple users in cellular wireless networks. Although the results of this section can be generalized to an arbitrary convex polytope P under additional constraints on the gain matrix, we handle the case of sum power constraints separately because it provides a good preparation for the more general setup. This special case, however, also stands out in that it leads to a unique positive ¯ (ω) without the need for additional constraints on the gain matrix vector p V ≥ 0. Thus, the results presented here are extended later to more general power constraints, but only under the assumption of an irreducible gain matrix V ≥ 0. There is a brief discussion page 200 on the role of the irreducibility property in the case of general power constraints. Given a diagonal positive definite matrix Γ = diag(γ1 , . . . , γK ) of desired ¯ := p ¯ (ω) in networks link SIR values, a max-min SIR-balanced power vector p constrained on total power is given by ¯ = arg p
max
min
p≥0,p1 ≤Pt k∈K
SIRk (p) . γk
(5.106)
Again, we point out that the maximum exists as the SIR is continuous on P. ¯. The following simple lemma is crucial in the characterization of p
196
5 Resource Allocation Problem in Communications Networks
¯ be given by (5.106). Then, Lemma 5.65. Let p ¯ > 0, (i) p (ii) ¯ p1 = Pt , and p) = · · · = γK /SIRK (¯ p) = β for some β > 0. (iii) γ1 /SIR1 (¯ Proof. See Sect. 5.10. It should be emphasized that the lemma puts no constraints on the gain matrix V ≥ 0. In what follows, let β > 0 be the constant in part (iii) of the lemma. This together with part (ii) implies that ¯ = ΓV¯ βp p + Γz (5.107) T ¯ = Pt . 1 p Putting the first equation into the second one yields ⎧ ¯ = ΓV¯ p + Γz ⎨ βp 1 T ⎩ β= p + Γz . 1 ΓV¯ Pt
(5.108)
This set of K + 1 linear equations can be described by one matrix equation involving a (K + 1) × (K + 1) nonnegative matrix. Indeed, using an extended ˜ = (¯ power vector p p, 1) ∈ RK+1 , it is easy to see that (5.108) can be equiva+ lently written as ˜ = A˜ βp p, β > 0 (5.109) (K+1)×(K+1)
where the nonnegative matrix A ∈ R+ A=
ΓV 1 T 1 ΓV Pt
Γz 1 T 1 Γz Pt
is defined to be
.
(5.110)
¯ similar to that in (5.109) was reported in the literature A characterization of p (see, for instance, [149]), where V was assumed to be irreducible. In such a case, A is primitive (Definition A.37) since then A2 > 0 due to 1T ΓV > 0 (by irreducibility of V), Γz > 0 and 1T Γz > 0. So, if V is irreducible, it follows ˜ > 0 is the positive right eigenvector of from the Perron theorem A.38 that p A and β > 0 is the associated simple eigenvalue, which is the Perron root of A. The converse holds as well so that if p is the positive right eigenvector of A normalized such that the last component is 1 and λmax = ρ(A) is the ˜ = (¯ associated Perron root, then λmax = β and p = p p, 1). An immediate ¯ is the optimal power vector in the sense of (5.106) consequence of this is that p ˜ = (¯ if and only if p p, 1) satisfies (5.109) with β = ρ(A). It is important to ˜ and p ¯ are emphasize that due to the normalization of the last component, p K both unique vectors in RK+1 ++ and R++ , respectively. In the above discussion, we have assumed irreducibility of the gain matrix V, and hence we could refer to the Perron theorem. When V is reducible, we
5.6 Max-Min SIR Balancing Power Control
197
can only invoke the weak form of the Perron–Frobenius theorem (Theorem A.39), which does not allow to draw the above conclusions. The problem is that for general nonnegative matrices, everything we can say is that there exists a nonnegative eigenvalue being equal to the spectral radius and the associated right eigenvector is nonnegative. In other words, the important properties regarding the positivity and uniqueness can be now missing. We show below that the statements carry over to any nonnegative matrix V, which is basically due to fact that the last column of A defined by (5.110) is positive regardless of the choice of V. The result is obtained from an alternative matrix representation of the system of linear equations (5.107). In order to arrive at this alternative representation, instead of writing the power constraint in (5.107) as an additional equation in the system of linear equations (5.109), we account for the sum power constraint by incorporating it into the first K equations. Indeed, as the power constraint can be written ¯ , one obtains as 1 = 1/Pt 1T p 1 ¯ = ΓV¯ ¯ βp p + Γz · 1 = ΓV¯ p + Γz · 1T p Pt 1 ¯ = B¯ ¯ = Pt p, β > 0, 1T p = ΓV + Γz1T p Pt
(5.111)
. This matrix can be written as where B = ΓV + 1/Pt Γz1T ∈ RK×K + B := ΓV +
1 1 ˜ Γz1T = Γ V + z1T = ΓV Pt Pt
(5.112)
˜ := V + 1/Pt z1T . Consequently, both the matrix equations (5.109) where V and (5.111) are reformulations of (ii) and (iii) of Lemma 5.65. We summarize these observations in a lemma. ¯ solves the max-min SIR balancing problem (5.106), then Lemma 5.66. If p ˜ = (¯ it satisfies both (5.109) with p p, 1) and (5.111). ¯ solves (5.106), then p ˜ = (¯ In other words, the lemma says that if p p, 1) > 0 ¯ > 0 are eigenvectors of A and B associated with β > 0, respectively. and p The following result is used to prove the converse and strengthen the results. Lemma 5.67. Given any constant c > 0, there is exactly one positive vector p with p1 = c such that λp p = Bp for some λp > 0. Moreover, the associated eigenvalue λp is simple and λp = ρ(B) > λ for any λ ∈ σ(B) \ {λp }. Proof. Because ΓV ≥ 0, it follows from (C.5-10) and (C.5-11) that B ≥ 1/Pt Γz1T > 0. So, the lemma follows from the Perron theorem for primitive matrices (Theorem A.38) and the normalization p1 = c > 0. Now we can prove our main theorem in this section. Theorem 5.68. The following statements are equivalent.
198
5 Resource Allocation Problem in Communications Networks
¯ solves the max-min SIR balancing problem (5.106). (i) p ¯ is the (unique) right eigenvector of B associated with ρ(B) > 0 such (ii) p that ¯ p1 = Pt . ρ(B) is a simple eigenvalue of B. ˜ = (¯ (iii) p p, 1) is the unique right eigenvector of A associated with ρ(A) > 0 such that its last entry is equal to one. ρ(A) is a simple eigenvalue of A. Proof. The proof can be found in Sect. 5.10. By the theorem, we have two slightly different characterizations of the max-min SIR power vector (5.106). The first characterization results from (5.109) with β = ρ(A), and thus is obtained in a vector space of dimension K + 1 where the additional dimension has been introduced to incorporate the constraint on the sum of transmit powers. The second characterization (5.111) is in a K dimensional vector space. An examination of (5.112) shows that the effect of the sum power constraints is the addition of the vector Γz/Pt to each column of the matrix ΓV. Notice that as a consequence of this, the matrix B is positive for any nonnegative gain matrix V, which allows us to invoke the Perron–Frobenius theory for primitive matrices. The characterization (5.111) with (5.112) also offers interesting interpretation of how the sum power constraints impact the network performance in terms of feasibility (Definition 5.42) and max-min SIR performance. First consider the following corollary, which is an immediate consequence of Theorem 5.68. The part ρ(B) ≤ 1 is in fact a special case of [150, Theorem 2 and Corollary 1]; the reader is also referred to Remark 5.75. Corollary 5.69. A QoS vector ω ∈ QK is feasible if and only if ρ(A) = ρ(B) ≤ 1
(5.113)
where A and B are defined by (5.110) and (5.112), respectively. ¯) ≤ 1 Proof. By Observation 5.48 and (5.70), ω is feasible if and only if C(ω, p p) ≤ 1. By (iii) of Lemma or, equivalently, if and only if maxk∈K γk /SIRk (¯ p) = β, k ∈ K, for β > 0 given by (5.109) or 5.65, we know that γk /SIRk (¯ (5.111). This implies that ω is feasible if and only if β ≤ 1, and therefore, by Lemma 5.67 and Theorem 5.68, if and only if the inequality in (5.113) holds. Considering (ii) and (iii) in Theorem 5.68 with (5.109) and (5.111) (see also the proof of the implication (ii)→(iii)) reveals that ρ(A) = ρ(B) = β. This completes the proof. By the corollary, the spectral radii of A and B give rise to necessary and sufficient conditions for a QoS vector to be feasible under sum power constraints. Thus, considering Sect. 5.3 (with (5.55)) and Sect. 5.9 (see also Observation 5.50) reveals that the matrix B defined by (5.112) plays an analogous role as the matrix ΓV in the noiseless or power-unconstrained case. From (5.112), we see that the only difference to the noiseless case is that the gain ˜ = V + 1/Pt z1T , which is equal to V matrix V is substituted by the matrix V
5.6 Max-Min SIR Balancing Power Control
199
when z = 0 and converges to V as Pt → ∞. Consequently, the impact of the power constraint is captured by adding the vector 1/Pt z to each column of V. Now since the gain matrix V determines the effective interference coupling, we can interpret the power constraint as an additional source of interference, whose power is inversely proportional to the power constraint Pt . Finally, notice that if Pt is sufficiently small (or, equivalently, 1/Pt is sufficiently large), then we see from (5.110) and (5.112) that ρ(A) = ρ(B) ≈
1 T 1 1 Γz = γk zk . Pt Pt k∈K
So, for relatively small values of
Pt , the necessary and sufficient condition (5.113) approximately becomes k∈K γk zk ≤ Pt . This corresponds to a noiselimited regime, where the interference can be neglected. 5.6.3 General Power Constraints Under the assumption of irreducibility on V, this section extends the results presented in the previous section to more general power constraints. Precisely, we assume that P ⊂ RK + is a convex polytope given by (4.18) or (4.19). So, throughout this section, pk ≤ P n ⇔ max gn (p) ≤ 1 ∀n∈N n∈N
k∈K(n)
where K(n) ⊆ K is the set of links originating at node n ∈ N (see Sect. 4.1) and 1 T gn (p) := 1 p, n ∈ N . (5.114) Pn K(n) Here and hereafter, 1K(n) = (11∈K(n) , . . . , 1K∈K(n) ) and 1k∈K(n) is used to denote the indicator function of the set K(n) defined as follows: 1k∈K(n) = 1 / K(n). Thus, if K(1) = K (all links if k ∈ K(n) and 1k∈K(n) = 0 if k ∈ originate at the first node), then 1K(1) = 1, which is exactly the case of sum power constraints considered in the previous section. On the other extreme, if |K(n)| = 1 for each n ∈ N (each origin node is an origin for exactly one link), then 1K(n) = ej with j ∈ K(n). Now, using (5.114), the max-min SIR power ¯=p ¯ (ω) defined by (5.100) can be written as vector p ¯ = arg max min p p≥0
k∈K
SIRk (p) γk
subject to
max gn (p) ≤ 1 . n∈N
(5.115)
where γk = γ(ωk ) > 0, k ∈ K, is arbitrary but fixed. The following lemma is an extension of Lemma 5.65 to a general admissible power region P. ¯ be any power vector that solves (5.115). Then, the folLemma 5.70. Let p lowing holds
200
5 Resource Allocation Problem in Communications Networks
¯ > 0 and (i) p p) = 1 . max gn (¯ n∈N
(5.116)
¯ is unique and (ii) If V ≥ 0 is irreducible, then p ∀k∈K
γk =β SIRk (¯ p)
(5.117)
for some β > 0. ¯ is clear from the previous discussion and (5.116) should Proof. Positivity of p also be obvious since if we had gn (¯ p) < 1 for all n ∈ N , then it would be p)/γk by allocating the power vector c¯ p∈P possible to increase mink∈K SIRk (¯ with c = 1/ maxn∈N gn (¯ p) > 1. In order to show part (ii), note first that due to (i), we can focus on positive power vectors p ∈ P+ = P ∩ RK ++ . Suppose that F is given by (5.105) with φ−1 : Q → R++ being any strictly increasing and log-convex function. By Theorem 5.32, F is then a convex downward ωk ) = SIRk (¯ p)/γk > 0, k ∈ K. Due to (5.116), we comprehensive set. Let φ−1 (¯ ¯ = (¯ ¯ K ) ∈ ∂F. Thus, by irreducibility of V and Theorem 5.33, have ω ω1 , . . . , ω ¯ is a maximal point of F. Definition B.4 allows us then to conclude that if ω ¯ ≤ ω for any ω ∈ F, then ω = ω. ¯ That is, there is no vector in F that is ω ¯ On the other hand, by Sect. 5.5.1 (see the larger in all components than ω. ¯ is a point where the hyperplane30 in discussion in Sect. 5.6.1 and Fig. 5.10), ω the direction of the vector (1/K, . . . , 1/K) intersects the boundary of F. As a ¯ K , which together with the maximality property and strict result, ω ¯1 = · · · = ω p)/γk = β monotonicity of φ (with φ−1 (φ(x)) = x, x > 0), shows that SIRk (¯ for each k ∈ K and some β where β is positive due to (i). If V is irreducible, ¯ follows from Observation 5.64. This completes the proof. the uniqueness of p ¯ maximizes mink∈K (SIRk (p)/γk ) over P, it follows from (5.117) Because p that 1/β > 0 is the corresponding maximum. It must be emphasized that (5.117) is not true for general nonnegative matrices V ≥ 0. In the lemma, we require the gain matrix be irreducible, which is sufficient for (5.117) to hold but not necessary. The irreducibility property ensures that, regardless of the choice of P, there is no subnetwork being completely decoupled from the rest of the network. The consequence is that the network is entirely coupled by interference (see also Sect. 5.9.1). However, the irreducibility property is not necessary in general. As already mentioned in the previous section, this is for instance the case if there is a constraint on total power since then the network is entirely coupled by the power constraint: Increase of any of the components of p with p1 = Pt entails that there is less power for all other links. Unless otherwise stated, it is assumed in the remainder of this section that V ≥ 0 is an arbitrary irreducible matrix. Due to (ii) of Lemma 5.70, this implies that the max-min SIR-balanced power vector is unique. 30
We mean the hyperplane which is a vector subspace of RK , and thus it contains the origin of coordinates.
5.6 Max-Min SIR Balancing Power Control
201
For brevity, in what follows, we slightly abuse the notation and define 1n := 1K(n) . We further define , N0 (p) := n0 ∈ N : n0 = arg max gn (p) = 1 (5.118) n∈N
which includes the indices of those nodes for which the power constraints are active under the power vector p. For brevity, we also define p) N0 := N0 (¯
(5.119)
¯ . Notice that by (i) of Lemma which is well-defined due to uniqueness of p 5.70, the cardinality of N0 must be larger than or equal to 1. In the remainder of this section, let β > 0 be the constant in part (ii) of Lemma 5.70. This together with part (i) of this lemma implies that ¯ = ΓV¯ βp p + Γz
p) = 1 . gn (¯
Now, proceeding essentially as in the case of sum power constraints shows ¯ solves the max-min SIR balancing problem, then that if p ˜, ˜ = A(n) p βp
˜ ∈ RK+1 β > 0, p ++
(5.120)
˜ = (¯ for each n ∈ N0 where p p, 1) is the extended power vector and the (K+1)×(K+1) nonnegative matrix A(n) ∈ R+ is defined to be ΓV Γz , n∈N. (5.121) A(n) = 1 T 1 T Pn 1n ΓV Pn 1n Γz Arguing further as in the discussion leading up to (5.111), we alternatively obtain, for each n ∈ N0 , ¯ = B(n) p ¯, βp
¯ ∈ RK β > 0, p ++
(5.122)
where B(n) ∈ RK×K , n ∈ N , is defined to be + B(n) := ΓV + and
1 1 ˜ (n) , Γz1Tn = Γ V + z1T = ΓV Pn Pn n
n∈N
(5.123)
˜ (n) = V + 1 z1T , n ∈ N . (5.124) V Pn n Comparing (5.121) and (5.123) with (5.110) and (5.112) shows that if N = {1} and K(1) = K (1n = 1), then A(n) = A and B(n) = B with n = 1. ¯ is defined by (5.115), The consequence of the above derivations is that if p then (5.120) and (5.122) hold for each n ∈ N0 . In other words, the solution to (5.115) in a network entirely coupled by interference must satisfy (5.120) and (5.122) for each node n ∈ N whose power constraint is active at the maximum. This is summarized in the following lemma, which is a generalization of Lemma 5.66 to general power constraints under the assumption that V is irreducible.
202
5 Resource Allocation Problem in Communications Networks
¯ solves the max-min SIR balancing Lemma 5.71. If V ≥ 0 is irreducible and p ¯ satisfies both (5.120) and (5.122). problem (5.115), then p Note that the lemma is an immediate consequence of parts (i) and (ii) of Lemma 5.70, from which (5.120) and (5.122) follow for an arbitrary n ∈ N0 . Now we extend Lemma 5.67 to general power constraints under the assumption that the gain matrix V is irreducible. Lemma 5.72. Suppose that V ≥ 0 is irreducible. Then, for any constants c1 > 0 and c2 > 0, the following holds. (i) For each n ∈ N , there is exactly one positive vector p = p(n) ∈ RK + with gn (p) = c1 satisfying β (n) p = B(n) p for some β (n) > 0. Moreover, β (n) is a simple eigenvalue of B(n) and β (n) = ρ(B(n) ). ˜ =p ˜ (n) ∈ RK+1 (ii) For each n ∈ N , there is exactly one positive vector p + (n) (n) ˜ (n) ˜ for some β with p˜K+1 = c2 satisfying β p = A p > 0. Moreover, β (n) is a simple eigenvalue of A(n) and β (n) = ρ(A(n) ). Proof. The reader is referred to Sect. 5.10. The lemma is a simple application of Theorem A.32 and says that, for each n ∈ N , the matrix equation β (n) p = B(n) p with β (n) > 0 and p ∈ RK + is satisfied if and only if p is a positive right eigenvector of B(n) associated with β (n) = ρ(B(n) ). Furthermore, if gn (p) = 1, then p is unique. Similarly, β (n) p = A(n) p with β (n) > 0 and p ∈ RK+1 is satisfied if and only if p + is a positive right eigenvector of A(n) associated with β (n) = ρ(A(n) ) and there is exactly one such an eigenvector whose last entry is equal to one. Furthermore, for each n ∈ N , β (n) = ρ(A(n) ) = ρ(B(n) ). This is because if ρ(B(n) )p = B(n) p holds for some n ∈ N , then we must have ρ(B(n) )˜ p = ˜ where p ˜ = (p, 1) ∈ RK A(n) p ++ . Lemma 5.72 implies then ρ(A(n) ) = ρ(B(n) ),
n∈N.
(5.125)
Note that the solution to the max-min SIR balancing problem is not necessarily obtained for each n ∈ N since, in the optimum, some power constraints may be inactive. Indeed, the set N0c = N \ N0 is in general not an empty set ¯ defined by (5.115) is unique due to (ii) of Lemma 5.70. where p Now we combine Lemmas 5.71 and 5.72 to obtain the following [151]. Theorem 5.73. If V ≥ 0 is irreducible, then the following statements are equivalent. ¯ ∈ P solves the max-min SIR balancing problem (5.115). (i) p ¯ is a unique positive right eigenvector of B(n) associ(ii) For each n ∈ N0 , p ated with β = ρ(B(n) ) > 0 such that gn (¯ p) = 1. ˜ is a unique positive right eigenvector of A(n) associ(iii) For each n ∈ N0 , p ated with β = ρ(A(n) ) > 0 such that p˜K+1 = 1.
5.6 Max-Min SIR Balancing Power Control
203
¯ ∈ P satisfies (5.122) for some β > 0. Proof. (i)→(ii): By Lemma 5.71, p Thus, by Lemma 5.72, part (i) implies part (ii). (ii)→(iii): Given any n ∈ N0 , ¯ with gn (¯ it follows from (5.122) that ρ(B(n) )¯ p = B(n) p p) = 1 is equivalent to (n) ρ(B )¯ p = ΓV¯ p + Γz, which in turn can be rewritten to give (5.120) with ˜ = (¯ ¯ is positive, so is also p ˜ . Thus, p ˜ with β = ρ(B(n) ) and p p, 1). Since p (n) p˜K+1 = 1 is a positive right eigenvector of A and the associated eigenvalue is equal to ρ(B(n) ) > 0. So, considering part (ii) of Lemma 5.72 (or (5.125)), we can conclude that (iii) follows from (ii). (iii)→(i): By Lemma 5.72, for each ˜ with p˜K+1 = 1 such that n ∈ N0 , there exists exactly one positive vector p (5.120) is satisfied. Furthermore, β = ρ(A(n) ), n ∈ N0 is a simple eigenvalue ˜ . Now considering Lemma 5.71 proves the last missing of A(n) belonging to p implication, and hence proves the theorem. ¯ > 0 is the (positive) Theorem 5.73 implies that if V is irreducible, then p right eigenvector of B(n) associated with ρ(B(n) ) ∈ σ(B(n) ) for each n ∈ N0 . ¯ can be obtained from p ˜ = (¯ Alternatively, p p, 1), which is the positive right (n) (n) eigenvector of A associated with ρ(A ) for each n ∈ N0 . The problem is, however, that N0 is not known as this set is determined by the solution to the max-min SIR balancing problem, and hence its determination is itself a part of the problem. Before characterizing the set N0 , let us introduce the following problems parameterized by n ∈ N : t∗n = arg max t
subject to t ∈ Tn := μ > 0 : gn (p(μ)) ≤ 1, p(μ) > 0
where p(μ) = ( μ1 I − ΓV)−1 Γz and the maximum exists for each n ∈ N . Note that by (5.101), we must have max t∗n < 1/ρ(ΓV) . n∈N
(5.126)
Moreover, as V is irreducible, it follows from Observation 5.64 with (5.102) and (5.103) (see also the discussion in Sect. 5.6.1) that ¯ (n) := p ¯ (t∗n ) = p
1 −1 I − ΓV Γz, t∗n
gn (¯ p(n) ) = 1, n ∈ N
(5.127)
would be the max-min SIR-balanced power vector if gn (p) ≤ 1 was the only constraint on transmit powers. Now we are in a position to characterize the set N0 as follows (see also the remark below). Theorem 5.74. Suppose that V ≥ 0 is irreducible. Then, , N0 = n0 ∈ N : n0 = arg max ρ(B(n) ) n∈N , = n0 ∈ N : n0 = arg max ρ(A(n) ) . n∈N
(5.128)
204
5 Resource Allocation Problem in Communications Networks
Moreover, ω ∈ QK is feasible if and only if max ρ(B(n) ) = max ρ(A(n) ) ≤ 1 . n∈N
n∈N
(5.129)
Remark 5.75. We point out that the theorem can be deduced from [150], where the authors used different tools to prove (5.129) (with respect to ρ(B(n) )). Moreover, it follows from [150] that (5.129) is true for any nonnegative (not necessarily irreducible) gain matrix V. Proof. The proof is deferred to Sec. 5.10. Assuming an irreducible gain matrix, the theorem leads to the following ¯ defined by (5.115). procedure for computing the max-min SIR power vector p Algorithm 5.3 A max-min SIR balancing power control algorithm Input: Γ = diag(γ1 , . . . , γK ) and V irreducible. ¯ ∈ P+ Output: p 1: Find an arbitrary index n0 ∈ N such that n0 = arg maxn∈N ρ(B(n) ) where B(n) is given by (5.123). ¯ to be that unique vector for which ρ(B(n0 ) )¯ ¯ and 1Tn0 p ¯ = p = B(n0 ) p 2: Define p Pn0 .
Note that in the case of a sum power constraint discussed in Sect. 5.6.2, the first step in the above algorithm reduces to a trivial step as N = {1} and B(1) = B is given by (5.112). Obviously, the procedure is not suitable for decentralized wireless networks as it requires a central network controller (base station) that has a knowledge of global network parameters to compute the spectral radii and the appropriate eigenvector. The interpretation of the max-min SIR power vector presented in Sect. 5.6.1 suggests an alternative algorithm that is redolent of the filling procedure for deriving the max-min fair rate allocation (see Sect. 5.2.6) In words, the algorithm increases the SIR targets along the vector γ (using some sufficiently small steps) while utilizing the fixed-point power control algorithm to achieve the SIR targets. The SIR targets are increased until the first power constraint is achieved. Notice that the algorithm can be implemented in a distributed manner, provided that the coordinated (synchronized) increase of the SIR targets can be efficiently decentralized. A serious disadvantage may result from the fact that a distributed computation of intermediate transmit powers in step 4 involves an iterative process. Thus, the algorithm includes an iteration within another iteration, which may cause problems. 5.6.4 Some Consequences and Applications We complete this section by pointing out some interesting consequences and applications of the results presented up to now. Note that throughout this
5.6 Max-Min SIR Balancing Power Control
205
Algorithm 5.4 A max-min SIR balancing power control algorithm based on the fixed-point algorithm (Algorithm 5.1) Input: m = −1, δ > 1 (sufficiently small), Γ = diag(γ1 , . . . , γK ), V irreducible and > 0 such that ρ(ΓV) < 1. ¯∈P Output: p 1: repeat 2: m=m+1 3: if ρ(δ m ΓV) < 1 then 4: Find pk (m) = δ m γk Ik (p(m)) for each k ∈ K using Algorithm 5.1. 5: else 6: break 7: end if 8: until ∃n∈N gn (p(m)) ≥ 1 ¯ = Pn p(m)/ p(m) 1 9: p
subsection, we use γ(ωk ) rather than γk in order to emphasize the dependence on ωk ∈ Q. The reason is that the matrix Γ(ω) = diag(γ(ω1 ), . . . , γ(ωK )) is used to specify different points in the feasible QoS region Fγ (P) (defined in Sect. 5.3). Thus, Γ(ω) is not fixed but takes on different values depending ˜ and Γ(ω)V ˜ (n) , n ∈ N , on the choice of ω ∈ QK . With this notation, Γ(ω)V are equal to B and B(n) , n ∈ N , given by (5.112) and (5.123), respectively. ˜ defined by (5.112) and V ˜ (n) , n ∈ N , defined Unless otherwise stated, both V by (5.124) are irreducible matrices, which is true whenever the gain matrix V is irreducible. The first interesting implication of Theorem 5.74 is an alternative characterization of the feasible QoS region under general power constraints: Given a strictly monotonic continuous function γ : Q → R++ determined by (5.60), it is an easy exercise to verify that Fγ (P) defined by (5.53) is equal to ˜ (n) ) ≤ 1 Fγ (P) = ω ∈ QK : max ρ(Γ(ω)V n∈N # ˜ (n) ) ≤ 1 . ω ∈ QK : ρ(Γ(ω)V =
(5.130)
n∈N
Thus, we can write the feasible QoS region as the intersection of sublevel sets ˜ (n) ), n ∈ N , each of which corresponding to exactly one power of ρ(Γ(ω)V constraint (see also Fig. 5.11). Notice that in the special case of sum power ˜ so that, in this case, constraints, there is only one sublevel set of ρ(Γ(ω)V) we have ˜ ≤ 1} . Fγ (P) = {ω ∈ QK : ρ(Γ(ω)V) The advantage of this characterization is that it allows us to apply many results presented in Chap. 1 to noisy links subject to general power constraints. The consequences of the results from Sects. 1.2.4 and 1.2.5 are discussed in
206
5 Resource Allocation Problem in Communications Networks
more detail in the two subsections below. The following list briefly states other interesting consequences. (a) Since pointwise maximum preserves the convexity property, convexity of the feasible QoS region follows directly from Theorem 1.39 if γ is logconvex and V is irreducible and from Observation 1.70 in the case of general nonnegative gain matrices. Theorem 1.2 with Corollary 1.3 can be used to characterize the boundary of the feasible QoS region. ˜ (n) , n ∈ N , is (symmetric) positive semidefinite, then (b) By Sect. 1.4.2, if V ˜ (n) ) is a convex function of ω for any convex function γ. Thus, ρ(Γ(ω)V with the assumption of positive semidefiniteness, Fγ (P) is a convex set whenever γ is convex. In particular, this is true for the feasible rate region (with Φ(x) = log(1 + x), x ≥ 0) defined in Sect. 5.2.2. ˜ (n) V ˜ (n) , n ∈ N , is irreducible, so is also V ˜ (n)T , which is basically (c) If V due to the positivity of the noise vector z and 1Tn 1n > 0 for each n ∈ N . Consequently, by Theorem 1.63, Fγ (P) is a strictly convex set (in the sense ˜ (n) is irreducible and γ(x) = ex , x ∈ R. of Definitions 1.44 and 2.16) if V ˜ (n) V ˜ (n)T , n ∈ N , is a positive matrix for It may be also verified that V any z > 0, regardless of the choice of V ≥ 0. Max-Min SIR Power Allocation via Utility Maximization Now we are going to use Theorem 1.29 of Sect. 1.2.4 (page 21) to show that the max-min SIR-balanced power vector (5.100) solves the utility-based power control problem (5.31) stated in Sect. 5.2.5, provided that the weight vector is chosen suitably. For brevity, we focus here on max-min SIR power control in the sense of Definition 5.62 and establish the connection under the assumption of an irreducible gain matrix; the general max-min SIR-balanced power allocation is treated in [151]. The irreducibility assumption is dropped in Sect. 5.9.1 in the noiseless setting; this section also presents other applications of the results from Sects. 1.2.4–1.2.5 and 1.7 as well as from App. A.4. The observations made here show that all the results presented in Sect. 5.9.1 can be straightforwardly extended to noisy links with general power constraints. Suppose that V is an arbitrary irreducible matrix and let us write the solution p∗ to the original utility-based power control problem (5.31) as the solution of an equivalent minimization problem: (Vp + z) k wk φ p∗ (w) = arg min F (p, w) = arg min (5.131) p p∈P+
p∈P+
k∈K
k
where φ(x) = −Ψ (1/x), x > 0, with Ψ : R++ → Q ⊆ R being any function satisfying (C.5-2)–(C.5-4). Notice that the minimum in (5.131) exists for any w > 0 (Lemma 5.12) and, as pointed out in Remark 5.13, p∗ (w) is unique for any weight vector w > 0. Without loss of generality, we assume w ∈ Π+ K so that F in (5.131) is a mapping from R++ × Π+ K into R.
5.6 Max-Min SIR Balancing Power Control
207
For any n ∈ N , let us now define Gn : R++ × Π+ K → R to be Gn (p, w) :=
k∈K
wk φ
(V ˜ (n) p)k pk
,
n∈N
(5.132)
˜ (n) ≥ 0 is defined by (5.124), and hence it is irreducible. where the matrix V Since maxn∈N gn (p) ≤ 1 for all p ∈ P+ with gn defined by (5.114), φ(x) is strictly increasing and z > 0, we have [(V + 1/P z1T )p] n k n pk k∈K (Vp + zg (p)) (Vp + z) n k k = ≤ wk φ wk φ pk pk
Gn (p, w) =
wk φ
k∈K
(5.133)
k∈K
= F (p, w), n ∈ N , p ∈ P+ , w ∈ Π+ K with equality if and only if n ∈ N0 (p), where N0 (p) is given by (5.118). On the ˜ (n) ), n ∈ N , other hand, as the function φ belongs to the function class G(V specified by Definition 1.27, an application of Theorem 1.29 shows that ˜ φ(ρ(V
(n)
)) ≤ Gn (p, w),
n∈N
(5.134)
holds for all p > 0 if and only if w = w(n) , where w(n) := y(n) ◦ x(n) ∈ Π+ K,
n∈N.
(5.135)
Here and hereafter, y(n) > 0 and x(n) > 0 with (y(n) )T x(n) = 1 are positive ˜ (n) , respectively, which, by Theorem A.32, exist left and right eigenvectors of V and are unique up to positive scaling. Moreover, it follows from Theorem 1.29 that whenever w = w(n) , then (5.134) holds with equality if and only if p = c x(n) > 0 for some c > 0. Combining (5.133) and (5.134) shows that ˜ φ(ρ(V
(n)
)) ≤ Gn (p, w(n) ) ≤ F (p, w(n) ),
p ∈ P+
(5.136)
with equalities in both cases if and only if both p = c x(n) for some c > 0 and n ∈ N0 (p). Thus, by (5.131), (5.136) and Theorems31 5.74 and 5.73, we can conclude that ¯ = p∗ (w) p
(5.137)
¯ is the max-min SIR power whenever w = w(n) for some n ∈ N0 , where p vector specified by Definition 5.62 and 31
˜ (n) due Γ(ω) = I. Note that both theorems are applied with B(n) = V
208
5 Resource Allocation Problem in Communications Networks
, ¯p ˜ (n) ) = λ n ∈ N0 = n ∈ N : ρ(V
¯ p := max ρ(V ˜ (n) ) . λ n∈N
(5.138)
¯ is equal to p∗ (w) whenever w = In words, the max-min SIR power vector p (n) w , n ∈ N0 . By (5.136), (5.137), and (5.138), if w = w(n) for some n ∈ N0 , then ¯ p = min F (p, w) = F (p∗ (w), w) = F (¯ p, w) . (5.139) λ p∈P+
It is important to emphasize that w = w(n) , n ∈ N0 , is sufficient but not necessary for (5.137) and (5.139) to be true. Indeed, as F (p, w) is linear in w > 0, both (5.137) and (5.139) hold for any w ∈ W where , W= w>0:w= cn w(n) , cn = 1, ∀n∈N0 cn ≥ 0 ⊂ Π+ K . (5.140) n∈N0
n∈N0
The converse is also true so that if (5.137) and (5.139) are true, then w ∈ W. Some of the observations are illustrated in Fig. 5.11.
ω2 = Ψ SIR2(p) (1)
˜ )=1 ρ(Γ(ω)V
w ¯ ω (2)
ω1 = ω2 Fγ (P)
˜ )=1 ρ(Γ(ω)V ∂Fγ (P) ω1 = Ψ SIR1(p)
Fig. 5.11: The figure shows an example of the feasible QoS region Fγ (P) defined by (5.105) with 2 users subject to individual power constraints. V is irreducible ˜ (n) ) ≤ 1) where Γ(ω) = diag(γ(ω1 ), γ(ω2 )). The and Fγ (P) = ∩n∈N {ω : ρ(Γ(ω)V ¯ corresponds to the unique max-min SIR-balanced power allocation. The point ω weight vector w is normal to a hyperplane which supports the feasible QoS region ¯ ∈ ∂Fγ (P). Note that N0 = {1}, and thus 2 ∈ / N0 since the second constraint is at ω ¯. not active at p
5.6 Max-Min SIR Balancing Power Control
209
Saddle Point Characterization The observations in the previous subsections show that we can compute the max-min SIR power allocation by solving the utility-based power control problem (5.31). An advantage of the utility-based approach is that distributed power control schemes presented in Chap. 6 can be used to compute the maxmin SIR power vector, provided that each link knows how to select its weight. By the preceding results, however, a desired weight vector is determined by ˜ (n) , n ∈ N0 , so that the links cannot positive left and right eigenvectors of V choose their weights independently. Thus, as neither the eigenvectors nor the corresponding matrices are a priori known at any node, the presented approach for computing the max-min SIR power allocation is still not amenable to implementation in decentralized wireless networks. A basic idea to overcome or at least to alleviate this problem is to let each link iteratively update its weight vector in parallel to the power control recursion. An example of such an algorithm can be found in [152] where the authors combine a gradient method for updating the weights with the fixedpoint power control iteration presented in Sect. 5.5.3. The algorithm is not completely decentralized as the weight vector must be normalized in every step such that its entries sum up to unity; the normalization however requires relatively little signaling overhead and coordination between nodes. Another class of recursive algorithms can be developed that work directly ˜ (n) . This was on the saddle-point characterization of the Perron roots of V already pointed out in [151]. To see this notice that if V is irreducible, we can utilize the results of Sects. 1.2.4, 1.7.4 and 5.6.3 to conclude the following: φ(ρ(B(n) )) = max+ min Gn (p, w) = min max+ Gn (p, w), w∈ΠK p>0
p>0 w∈Π K
n ∈ N . (5.141)
A saddle point exists and is given by (x(n) , w(n) ), where x(n) > 0 and w(n) are defined in (5.135). Note that by irreducibility, (x(n) , w(n) ) is unique up to positive multiples of x(n) > 0. By the previous subsection and, in particular, (5.134), (5.136) and (5.139), one obtains ¯p (5.142) ∀w∈Π+ min F (p, w) ≤ min F (p, w∗ ) = λ K
p∈P+
p∈P+
if and only if w∗ ∈ W. The minimum on the right-hand side of the inequality ¯ is the max-min SIR power ¯ ∈ P+ , where p is attained if and only if p = p allocation defined by 5.62. It may be further deduced from (5.141) and the observations in the previous subsection that ¯ p = sup F (p∗ , w) ≤ sup F (p, w), ∀p∈P+ λ w∈Π+ K
(5.143)
w∈Π+ K
¯ = p∗ (w), w ∈ W. With this choice of the power vector, if and only if p∗ = p the supremum is attained if and only if w ∈ W. Combining both (5.142) and (5.143) shows the saddle-point property:
210
5 Resource Allocation Problem in Communications Networks
∀p∈P+ ∀w∈Π+ F (p∗ , w) ≤ F (p∗ , w∗ ) ≤ F (p, w∗ ) K
(5.144)
¯ and w∗ ∈ W. The existence of a saddle point in P+ ×Π+ if and only if p∗ = p K is ensured by the assumption of irreducibility of the gain matrix V. The saddle-point characterization provides a basis for the design of distributed power control algorithms for saddle point problems that converge ¯ . Basically, the idea is redolent of primal-dual algorithms that employ to p some optimization methods to minimize the Lagrangian over primal variables and to simultaneously maximize it over dual variables (see for instance Sect. 6.7.1). In particular, if strong duality holds (see App. B.4.3), then a simultaneous application of gradient methods converges to a saddle point of the associated Lagrange function. This approach can be applied to more general functions with the saddle-point property, if some additional conditions are satisfied [153, pp. 125–126].
5.7 Utility-based Power Control with QoS Support In this section, we return to utility-based power control and extend the “pure best-effort” approach of Sects. 5.2–5.4 by incorporating potential QoS requirements of the links expressed in terms of minimum SIR guarantees. We assume an affine interference function (5.73) so that the signal-to-interference ratio at the the output of each receiver takes the form given by (4.4). We confine our attention to such interference functions since, to the best of our knowledge, the utility-based power control problem with general standard interference functions is not well-understood at the time of writing this book. In Sect. 5.8, we will briefly address the original utility-based power control problem under the assumption of the minimum interference function (5.76). It can be seen later in Sect. 6.3 that this problem can be converted into a convex optimization problem if Ik (es ) is a log-convex function of s ∈ RK . Consequently, most of the results presented in this section straightforwardly extend to interference functions with such a log-convexity property. The original utility-based power control problem formulated in Sect. 5.2.5 aims at optimizing the overall network performance with respect to some aggregate utility function. This approach has attracted a great deal of attention, mainly because it guarantees an efficient utilization of wireless resources. In view of many applications, however, its main drawback is that no QoS requirements expressed in terms of given SIR targets can be guaranteed, even if they are feasible in the sense of Definition 5.42. One possible approach to the problem of incorporating QoS support into the utility maximization problem is to enforce the desired SIR levels by projecting the transmit powers on the valid power region. Recall from Definition 5.40 that every valid power vector provides the required SIR performance to each link. This approach is called hard QoS support as the SIR targets are
5.7 Utility-based Power Control with QoS Support
211
met whenever they are feasible. As far as the implementation in a decentralized network is concerned, the main problem seems to be the projection operation. Already the simple cyclic projection algorithm [66], where the links successively perform projections on individual sets corresponding to each SIR target, may require a lot of coordination in a network. Although the projection problem can be somewhat alleviated or even circumvented by primal-dual methods (see Sects. 6.7.1 and 6.8), there is another inherent problem associated with hard QoS support: The set of valid power allocations may be empty due to, for instance, poor channel conditions caused by fading effects. A necessary and sufficient condition for given SIR targets to be feasible are given by (5.71) in the case of affine interference functions and, in a more general case of standard interference functions, by (5.82). If (5.82) is not satisfied, then no solution exists to a power control problem with hard QoS support as it is impossible to meet the SIR targets. Note that this feature pertains to any classical QoS-based power control strategy (including those discussed in Sect. 5.5) and may pose a significant challenge in distributed wireless networks as the necessary communication overhead for admission control to cope with the problem of infeasibility can explode, requiring a lot of additional resources and thereby deteriorating the overall network performance significantly. For this reason, we argue in favor of soft QoS support in which case a solution to the corresponding power control problem attempts to approach the SIR targets closely, provided that the utility functions are chosen appropriately. Once the SIR targets are met, remaining power resources can be allocated to all or some users, with the goal of optimizing the overall network performance expressed in terms of some aggregate utility function. A key feature of this approach as compared to hard QoS support is that a solution to the corresponding power control problem exists even if the SIR targets are infeasible. If necessary, soft QoS support can be easily combined with hard QoS support. Other link-specific system parameters (variables) such as queue backlogs can also be considered when allocating resources to the links. See also further discussion in Sect. 5.7.2. Remark 5.76. We point out that in practical systems, increasing the SIR above a certain threshold corresponding to the maximum feasible data rate will not improve the rate performance (but will in general reduce the bit error rate). For this reason, it may be reasonable to impose additional upper bounds on the SIR values. The problem is not addressed in this book but the reader will realize that our results can be easily extended to incorporate any upper bounds on the SIR values. Such a problem generalization requires a slight modification of the primal-dual algorithms presented in Sect. 6.7.1. In fact, augmenting the new SIR constraints to an associated Lagrangian function would only increase the dimensionality of the dual variable. Alternatively, we could follow similar ideas to those in Sects. 5.7.2 and 6.7.2 to make large SIR values less “attractive” by introducing a suitable penalty function.
212
5 Resource Allocation Problem in Communications Networks
In the following, we address the problem of incorporating hard and soft QoS support into the utility-based power control problem (5.31). The only purpose of Sect. 5.7.1 is to formulate the utility-based power control problem with hard QoS support. Algorithmic solutions to this problem can be found in Sect. 6.7.1. Sect. 5.7.2 is devoted to soft QoS support, while the corresponding algorithms are presented in Sect. 6.7.2. 5.7.1 Hard QoS Support As aforementioned, utility-based power control with hard QoS support ensures that each link satisfies its QoS, provided that the vector of QoS requirements ω is feasible according to Definition 5.42. In this book (see also Sect. 5.5.1), the satisfaction of a QoS requirement by link k ∈ K is equivalent to meeting its SIR target γk = γ(ωk ) ≥ 0 in the sense that SIRk (p) ≥ γk for some p ∈ P. In order to be conform with Sect. 5.3, the function γ : Q → R++ is assumed to be the inverse function of the utility function Ψ : R++ → Q ⊆ R or ψ given by ψ(x) = −Ψ (x), x > 0. The assumption may appear to be restrictive, but this is not the case here and, in fact, it can be significantly relaxed (see Remark 5.39). Note that although the function γ is uniquely determined by Ψ , we are still free to choose any positive value for the SIR target γk = γ(ωk ) > 0 by suitably selecting the value of ωk ∈ Q. For this reason, without loss of generality, γ can be assumed to be the inverse function of Ψ or ψ, depending on whether γ is strictly increasing or decreasing. The only minor technical problem is that γ is, by definition, a positive function. Thus, in order to incorporate best-effort links, we need to slightly extend the definition of γ, which is used throughout Sects. 5.7, 6.7 and 6.8. Definition 5.77 (Best-Effort and QoS Links). A link k ∈ K is called a best-effort link if it has no QoS requirement, in which case we write γ(ωk ) ≡ 0 and γk = 0. If γk > 0, then link k ∈ K is said to be a QoS link. By the definition, if γ(ωk ) ≡ 0, then ωk represents no QoS requirement and link k is a best-effort link. γ(ωk ) ≡ 0 is equivalent to saying that the SIR target γk is zero. Also, note that it is perfectly reasonable that best-effort links are allocated resources under utility-based power control. In fact, in the original problem formulation in Sect. 5.2.5, all links are best-effort links, which stands in clear contrast to QoS-based power control considered in Sect. 5.5.3, where a link is allocated a positive transmit power only if it has a positive SIR target. The notions of the valid power region P(ω) ⊂ RK + and the feasible power region P◦ (ω) ⊂ RK + (Definition 5.40) trivially extend to the cases with besteffort links. Recall that P◦ (ω) given by (5.65) is a subset of P(ω) of those power vectors which satisfy the power constraints and are thus admissible. Formally, we have P◦ (ω) = P(ω) ∩ P where P is the admissible power region introduced in Sect. 4.3.3. However, if there are best-effort links, one has to be careful since then P(ω) contains nonpositive vectors, and thus it is at least theoretically possible that all members of P◦ (ω) are nonpositive so that
5.7 Utility-based Power Control with QoS Support
213
P◦ (ω) ∩ RK ++ = ∅. This does not need to be a problem in general but we need to exclude this unlikely (pathological) case due to the consideration of utility functions that satisfy (C.5-2)–(C.5-4), and thus are defined only for the positive reals. So, for the problem to be well-defined, we assume in what follows that (C.5-16) P◦+ (ω) = ∅ where K P◦+ (ω) := P◦ (ω) ∩ RK ++ = P(ω) ∩ P ∩ R++ = P(ω) ∩ P+
(5.145)
is the (positive) feasible power region. This definition is also used in the next section, which deals with soft QoS support. With these definitions, observations and assumptions in hand, the utilitybased power control problem with hard QoS support is stated as follows wk Ψ (SIRk (p)) . (5.146) p∗ (ω) := arg max p∈P◦ (ω) k∈K
Comparing (5.146) with (5.31) reveals that the only difference to the traditional utility-based power control is that the aggregate utility function is maximized over P◦ (ω), instead of P. By Observation 5.45, P◦ (ω) and P◦+ (ω) are convex sets so that the problem is convex and well-defined if (C.5-16) holds. In view of primal-dual algorithms considered in Sects. 6.7.1 and 6.8, however, we are on the safe side by assuming that (C.5-17) int(P◦ (ω)) = ∅. From practical point of view, this is a reasonable assumption since V is usually irreducible so that, by Observation 5.46 and (C.5-16), the interior of P◦ (ω) is nonempty, unless P◦ (ω) is a singleton, in which case the problem is trivial (and well-defined due to (C.5-16)). Also, note that (C.5-17) implies (C.5-16). As aforementioned, as far as distributed implementation is concerned, the projection of power vectors on the set P◦ (ω) may pose a significant challenge. This is explained in Sect. 6.7.1. In Sect. 6.7.1, we present a primal-dual algorithm that alleviates the projection problem. Primal-dual algorithms based on a modified nonlinear Lagrangian are introduced and analyzed in Sect. 6.8.3. Section 6.7.1 considers the possibility of exploiting barrier properties of the utility functions to approximate a solution to (5.146). 5.7.2 Soft QoS Support As mentioned at the beginning of this section, utility-based power control with soft QoS support attempts to provide the desired QoS to the links by taking the corresponding SIR targets into consideration. There are at least two ways of incorporating soft QoS support into the utility-based power control problem (5.31):
214
5 Resource Allocation Problem in Communications Networks
(a) (Dynamic) adjustment of the weight vector w in (5.31). (b) Approximation of a max-min SIR-balanced power vector (Definition 5.61). In this book, we focus on the second approach and generalize this approach in Sect. 5.7.2 to incorporate best-effort links, and thereby improving the network performance with respect to some aggregate utility function. As far as the first approach (a) is concerned, we only mention that soft QoS support can be achieved by choosing the weight vector w ≥ 0 according to some priorities in such a way that links with high priority are assigned relatively large weights. In addition to that, the weights can be dynamically adjusted to a changing situation in a network. As mentioned in Sect. 5.2.4, a widely studied approach is to choose the weights depending on queue backlogs or on differential queue backlogs along the links [125, 120]. The weights can be alternatively chosen to be proportional to an average traffic load on the corresponding link (measured periodically over some time interval), which may reduce the frequency with which transmit powers must be readjusted. A dynamic adjustment of the weight vector depending on the queue states plays a central role in the design of throughput-optimal strategies for elastic data applications with random arrivals times [46]. Such strategies exploit the relative delay tolerance of data applications and channel fluctuations to improve the throughput performance. Throughput optimality means here that the length of each queue is kept finite whenever the average arrival rates at the queues admits a stabilizable network under some given system constraints [120, 154]. The objective is not to guarantee certain link qualities even if such qualities can be provided using another strategy. This stands in contrast to the power control strategies presented in this section, which aims at enforcing or at least closely approaching given SIR targets, regardless of the instantenous channel states. Approximation of a Max-Min SIR Balancing Power Allocation Soft QoS support can be achieved by approximating the max-min SIRbalanced power vector specified by Definition 5.61. The approach is actually inspired by Observation 5.14, stating that if the utility function Ψ is chosen to be Ψα given by (5.28), then, as α tends to infinity, the link rates approach the max-min fair rate allocation given by Definition 5.16. Note that if V is irreducible, then all link rates are equal under the max-min fair rate allocation, which follows from Lemma 5.70 and (4.22) (see also the discussion in Sect. 5.2.6 on the filling procedure under an irreducible gain matrix). On the other hand, by Observation 5.48, we know that ω is feasible if and only if C(ω) = minp∈P maxk∈K (γ(ωk )/SIRk (p)) ≤ 1 or, equivalently, if and only if the SIR targets are met under a max-min SIR-balanced power allo¯ := p ¯ (ω). As pointed out in Sect. 5.5.1, p ¯ is positive if z > 0 and cation p γk := γ(ωk ) > 0 for each k ∈ K, which is assumed to be true throughout this subsection. We again use γ(ωk ) to denote the SIR target γk only for the
5.7 Utility-based Power Control with QoS Support
215
¯ purpose of emphasizing the dependence on the QoS vector ω. For brevity, p is referred to as a max-min SIR power vector, despite differing SIR targets. Our goal in this section is to show that one can arbitrarily closely ap¯ to the max-min SIR balancing problem (5.100) by proximate a solution p maximizing SIR (p) k (5.147) wk Ψα Fα (p) := Fα,ω (p) = γ(ωk ) k∈K
over the admissible power region P for any w > 0 and some sufficiently large value of α ≥ 1, where Ψα (x), x > 0, is given by (5.28). Given α ≥ 1, any maximizer of Fα over P is denoted by p(α) so that one formally has p(α) := p(α, ω) = arg max Fα (p) .
(5.148)
p∈P
¯ as α → ∞. Recall Thus, we aim at showing that p(α) tends to some p ¯ is not necessarily unique unless the gain matrix V is from Sect. 5.6.1 that p irreducible (Lemma 5.70). Since Ψα : R++ → Q in (5.147) satisfies (C.5-2)– (C.5-4) and γk , k ∈ K, are positive constants, it follows from Lemma 5.12 that the maximum in (5.148) exists. Now consider the following theorem, which in fact restates Observation 5.14 in terms of max-min SIR balancing, and hence it is an application of [70, Lemma 3 and Corollary 2] to power-controlled wireless networks. Theorem 5.78. Let w > 0 and γ(ωk ) > 0, k ∈ K, be arbitrary. Suppose that Ψ (x) = Ψα (x), x > 0, where Ψα : R++ → Q is defined by (5.28). Then, ω is feasible if and only if, for any > 0, there is α() ≥ 1 such that SIRk p(α) ≥1− (5.149) min k∈K γ(ωk ) for all α ≥ α(), where p(α) is given by (5.148). Proof. See Sect. 5.10. By the above theorem, we can approximate a solution to the max-min SIR balancing problem by maximizing the function Fα over P+ . The accuracy of the approximation depends on the value of α, with the accuracy becoming better when this value increases. In practice, the theorem implies that if the QoS vector ω is feasible, then each link approximately meets its SIR target under the utility-based power control (5.148), provided that α is sufficiently large. The choice of α is influenced by the gain matrix V, and hence also by the state of the wireless channel. The consequence is that if α is fixed, given SIR targets may be violated for some realization of the channel, even if they are feasible given this channel realization.32 For this reason, we have to do here with soft QoS support rather than hard QoS support. 32
In other words, if α is large but fixed, there may be still channel realization for which the value of α is not large enough.
216
5 Resource Allocation Problem in Communications Networks
Finally, note that algorithms for the problem (5.148) are slightly modified versions of the algorithms for the original utility-based power control problem. The necessary modification is explained at the beginning of Sect. 6.7.2. Incorporation of Best Effort Traffic One drawback of the power control strategy (5.148) is, certainly, the inability of choosing γk = 0, k ∈ K. This raises the question of how to incorporate besteffort links (Definition 5.77). Note also that once all the SIR targets γk > 0 are met for a sufficiently large α, the links with relatively high SIR targets are preferred when allocating “extra” resources.33 In terms of throughput performance, the power control strategy (5.148) does not efficiently utilize these extra resources for large values of α ≥ 1. The inefficiency results from the fact that, for large values of α, this strategy is highly fair in the sense that it attempts to equalize the ratios SIRk (p)/γk , γk > 0 (see also Lemma 5.70), thereby preferring links with high SIR targets regardless of the choice of the weight vector. As a result, the maximum throughput performance of (5.148) may be much worse than that of power control strategies with hard QoS support if Ψ (x) = log(x) is used as the utility function, which is a good approximation of the rate function in the high SIR regime (see Sect. 5.4.2). This is illustrated in Fig. 5.6 for the case of γ1 = · · · = γK = 1. We can also see from the figure that the throughput performance (in the high SIR regime) can be the same for all values of α provided that the weight vectors are chosen suitably. See also Sects. 5.6.4 and 5.9.1. In order to achieve more flexibility in the allocation of the extra resources, we consider a strategy that combines the power control policy (5.148) with the traditional utility-based approach (5.31). To this end, we define two link subsets A ⊆ K and B ⊆ K with A ∪ B = K as follows. • A includes indices of those links k ∈ K for which γk = γ(ωk ) > 0. Thus, by Definition 5.77, A represents all QoS links. • B contains all best-effort links and may also contain any link belonging to A. In other words, in addition to the best-effort links, B may also include any QoS link. The sets A and B are thus not necessarily disjoint. Without loss of generality, we can assume that the users are ordered so that A = {1, . . . , ma }
B = {mb + 1, . . . , K},
0 ≤ mb ≤ ma ≤ K
with A = ∅ if ma = 0 and B = ∅ if mb = K. For a better understanding of the approach proposed below, it may be helpful for the reader to notice that all links whose indices belong to A \ B are “fully satisfied” if their SIR targets are achieved with equality. In other words, 33
By “extra” resources, we mean transmit powers that can be allocated in addition to the transmit powers, which are necessary to satisfy the SIR targets.
5.7 Utility-based Power Control with QoS Support
217
for each link k ∈ A \ B, there is no or negligible gain when its SIR is increased above the SIR target. This is usually true for links carrying voice traffic (voice links), where the corresponding transmitter-receiver pairs support a fixed data rate using some fixed code. In such cases, the increase of the SIR above some desired value at which the system is designed to operate has a negligible impact, if any, on voice quality. From the network operator’s perspective, there is also no reason for allocating extra resources to the “pure” QoS links specified by A \ B since such an allocation does not increase the operator’s revenue. In contrast, the network operator can in general increase its revenue by allocating the resources to the links in B so as to maximize a suitable aggregate utility function. Note that link k ∈ B is “interested” in the extra resources even if γk > 0, in which case k ∈ A ∩ B. The set A ∩ B could, for instance, include links for video transmission, where some minimum service rate may be required but higher rates are usually desired to obtain satisfactory video quality. With the above definitions in hand, the power control problem can be formulated as follows: ˜ (α) := p ˜ (α, ω) = arg max F˜α (p) p p∈P
(5.150)
where (see also Remark 5.80 below) F˜α (p) =
k∈A
ak Ψα
SIR (p) k + bk Ψ SIRk (p) γk
(5.151)
k∈B
and where |A|
|B|
(C.5-18) a ∈ R++ and b ∈ R++ are given weight vectors, (C.5-19) Ψα : R++ → Q, α ≥ 2, is given by (5.28), and (C.5-20) Ψ : R++ → Q is any function that satisfies (C.5-2)–(C.5-4). Notice that (5.150) with A = ∅, B = K, is the utility-based power control problem (5.31), and if we choose A = K, B = ∅, the problem reduces to (5.148). Before explaining our approach in more detail, we state the following observation for completeness. Observation 5.79. Each of the following is true. (i) The maximum in (5.150) exists. ˜ (α), α ≥ 1. (ii) The Kuhn–Tucker constraint qualification is satisfied at p ˜ (α), α ≥ (iii) The Kuhn–Tucker conditions are necessary and sufficient for p 1, to be a global maximizer of F˜α over P. Part (i) can be shown using the same arguments as in the proof of Lemma 5.12. Parts (ii) and (iii) immediately follow from Sect. 5.2.7. For the corresponding definitions, the reader is referred to App. B.4.3.
218
5 Resource Allocation Problem in Communications Networks
Remark 5.80. It is interesting to point out that the power control problem (5.150) can be generalized by letting each link k ∈ A be assigned a distinct function Ψαk (x), x > 0, given by (5.28) with α = αk ≥ 2. The subsequent results are still valid for the more general case if we define α ≥ 2 to be α := mink∈A αk . With this definition, we obviously have αk → ∞ for each k ∈ A when α → ∞. This generalization offers some advantages in terms of more efficient distributed implementation since the links in A do not need to agree on the choice of a common utility function. Similarly, each link in B can choose its utility function for the second term on the right-hand side of (5.151), regardless of the choices made by other links. This, however, seems to offer no advantages. See the subsequent remarks and discussion on this issue at the end of this section. The role of the weight vector b > 0 is essentially the same as that of the weight vector w > 0 in the original utility-based power control problem (5.31). As the second term on the right-hand side of (5.151) determines the allocation of extra transmit powers to the links specified by B, a reasonable choice for Ψ in (5.151) is Ψ (x) = log(x), x > 0, whenever these extra resources are to be utilized in such a way as to achieve the best possible throughput performance. Note that within the class of utility functions satisfying (C.5-2)–(C.5-4), the logarithmic function seems to be the best candidate for this purpose as it is a good approximation of the rate function in the high SIR regime and its relative concavity defined by (5.27) is the smallest one within the function class of interest. As explained in Sect. 5.2.6, if the relative concavity increases, then a solution to (5.150) can be expected to be more fair at the cost of the throughput performance. The desired degree of fairness is however achieved by attempting to provide to each QoS link its desired SIR performance, for which the first term on the right-hand side of (5.151) is responsible. For this reason, we think that a logarithmic function is the best choice for Ψ , with the weight vector w reflecting, for instance, queue states along the links or relative importance of the links. As mentioned above, the first weighted sum on the right-hand side of (5.151) is used to ensure that each QoS link meets its SIR target, provided that α is large enough and the SIR targets are feasible. Note that if α is large, then the choice of the weight vector a > 0 has a negligible impact on a solution to (5.150). Indeed, a careful examination of the proof of Theorem 5.78 shows that the first term on the right-hand side of (5.151) is approximately equal to a1 mink∈A (SIRk (p)/γk ) for any p ∈ P+ , provided that α ≥ 2 is sufficiently large. The reader can further observe from Fig. 5.6 that increasing α makes the performance less sensitive to the choice of the weight vector. On this account, it appears reasonable to choose a constant weight vector a > 0. The problem formulation (5.150) can be interpreted as a kind of barrier method approach, where the first sum term on the right-hand side of (5.151) acts as a penalty (barrier) function for not meeting the SIR targets. Intuitively, this can be explained if one writes the power control problem with hard QoS
5.7 Utility-based Power Control with QoS Support
219
support (5.146) as follows p∗ (ω) = arg max H(p)
H(p) =
p∈P
wk Ψ (SIRk (p)) + D(p)
(5.152)
k∈K
where D : RK ++ → {−∞, 0} is a penalty (or indicator) function defined to be −∞ ∃k∈A SIRk (p) < γk D(p) = 0 otherwise . Considering (5.151) with B = K, A = ∅, and b = w shows that F˜α (p) − H(p) = ak Ψα (SIRk (p)/γk ) − D(p) k∈A 1−α for any p ∈ RK /(1−α), x > 0, α ≥ 2, ++ . Now, an examination of Ψα (x) = x shows that Ψα (x) → −∞ if x < 1 and Ψα (x) → 0 if x ≥ 1. Thus, since D(p) = −∞ if ∃k∈A SIRk (p)/γk < 1 and D(p) = 0 otherwise, it follows that F˜α (p) (with with B = K and b = w) approximates H(p) for any fixed p > 0 and the approximation accuracy becomes better as α increases. Equivalently, we can say that the first sum term in (5.151) approximates the indicator function of the (positive) feasible power region P◦+ (ω) defined by (5.145). The objective of the following lemma and the subsequent theorem is to make the above intuitive arguments rigorous.
Lemma 5.81. If P◦+ (ω) = ∅, then there are constants 0 < c1 and c2 < +∞ (independent of α ≥ 2) such that p(α)) ≤ max SIRk (˜ p(α)) ≤ c2 ∀α≥2 c1 ≤ min SIRk (˜ k∈K
k∈K
(5.153)
˜ (α) > 0, α ≥ 2, is defined by (5.150). Thus, if P◦+ (ω) = ∅, then there where p are constants −∞ < c3 and c4 < +∞ (independent of α ≥ 2) such that ˜ (α) ≤ c4 for all α ≥ 2. c3 ≤ F˜α p Proof. The upper bound in (5.153) immediately follows from boundedness of P+ ⊂ P and positivity of z: For every p ∈ P+ , we have 0 < maxk∈K SIRk (p) ≤ c2 = maxn∈N Pn / mink∈K zk < +∞. Thus, for all α ≥ 2 and all p ∈ P+ , strict increasingness of Ψ and Ψα implies that F˜α (p) ≤ a1 Ψα (c2 / mink∈A γk ) + b1 Ψ (c2 ) ≤ c4 = b1 Ψ (c2 ) < +∞ where we used the fact that Ψα (x) < 0, x > 0, for all α ≥ 2. The lower bounds hold as, for all p ∈ P◦+ (ω) = ∅ and all α ≥ 2, we have Ψα (SIRk (p)/γk ) ≥ Ψα (1) ≥ −1, k ∈ A. Thus, since ˜ (α) is a maximizer of F˜α over P+ , we have P◦+ (ω) ⊂ P+ and p F˜α (˜ p(α)) ≥ F˜α (p) ≥ a1 Ψα (1) + bk Ψ (SIRk p)
k∈B
≥ b1 Ψ (min SIRk p) − a1 = c3 > −∞ k∈B
220
5 Resource Allocation Problem in Communications Networks
for all α ≥ 2 and any fixed p ∈ P◦+ (ω) = ∅. From this, (C.5-3) and SIRk (p) ≥ γk > 0 for each k ∈ A and every p ∈ P◦+ (ω), we conclude the lower bound in (5.153). This completes the proof. Notice that the bounds in the lemma are independent of α. This simple observation is used to show the following. Theorem 5.82. Let ω be given and let γk = γ(ωk ) > 0, k ∈ A. Suppose that A = ∅, B = ∅ and A \ B = ∅. Then, for any > 0 and an irreducible matrix V ≥ 0, there exists α(, V) ≥ 2 such that max
k∈A\B
SIRk (˜ p(α)) ≤1+ γk
(5.154)
for all α ≥ α(, V). Moreover, if P◦+ (ω) = ∅ (as assumed by (C.5-16)), then, for any nonnegative but not necessarily irreducible matrix V, 1 − ≤ min k∈A
SIRk (˜ p(α)) . γk
(5.155)
Proof. The proof is deferred to Sect. 5.10 Notice that irreducibility of V (no isolated subnetworks; see Sect. 5.9.1) is a key ingredient in the proof of (5.154). Figure 5.12 illustrates Theorem 5.82. As α increases, the SIRs of the QoS links in A \ B approach their SIR targets in accordance with the theorem. If α is sufficiently large, then the SIR targets are met. The available “extra” resources are allocated to both other link subsets to maximize the aggregate utility function: (a) to the QoS links in A ∩ B in addition to their SIR target and (b) to the best-effort links in B \ A. These links benefit from the “extra” resources since they are allocated to maximize the second term on the right-hand side of (5.151). As pointed out in Remark 5.80, if α = mink∈A αk , αk ≥ 2, then the assertion of Theorem 5.82 remains true for the following more general problem ˜ (α) := arg max F˜α (p) p p∈P
(5.156)
where α = (α1 , . . . , α|A| ) ≥ 2 and F˜α (p) =
k∈A
ak ψαk SIRk (p)/γk + bk ψ SIRk (p) .
(5.157)
k∈B
Since each link is assigned a distinct parameter αk , a solution to the above problem is more suitable for distributed settings because each link in A, say link k, can increase its parameter αk autonomously, that is, without the need for exchanging any information with the other links. Figure 5.13 illustrates Theorem 5.82 and the advantage of assigning distinct utility functions to the links.
5.7 Utility-based Power Control with QoS Support 12
221
K=5 a=b=1 SNR=40dB ψ(x) = − log(x)
10
A∩B
SIR
8
B\A 6
SIR Target
A\B A\B
4
A∩B 2 2
6
10
α
14
Fig. 5.12: The SIR performance of five links as a function of α for some irreducible matrix V and Ψ (x) = log(x), x > 0.
ω2
1 - max-min-fairness 2 - Utility-based power control 3 - Utility-based power control with soft QoS support 1
A=2 B=1
3
2→3 as α → ∞ 2
ω1
Fig. 5.13: An illustration of power control policy (5.156). The depicted set is the intersection of a feasible QoS region Fγ (P), γ(x) = ex , x ∈ R, with R2+ . We have two links such that the first link is a best-effort link (B \ A = B = {1}) and the second link is a “pure” QoS link (A \ B = A = {2}). The dashed line represents the QoS requirement of the second link: All points above or on the line satisfy the QoS requirement. The corresponding SIR levels are larger than or equal to the SIR target of link 2.
Point 1 corresponds to max-min fairness (see Sects. 5.2.6 and 5.6), while point 2 is achieved by means of utility-based power control (5.31) with Ψ (x) = log(x), x > 0. As this point is below the dashed line, the SIR target of link 2 is not met in point 2 although the target is clearly feasible. Now, using power p(α)) control with soft QoS support (5.156), Theorem 5.82 shows that SIR2 (˜ converges to γ2 as α = α2 → ∞. In other words, for large values of α, the operating point of the scheme is close to point 3 in Fig. 5.13. Note that this
222
5 Resource Allocation Problem in Communications Networks
is true even if point 2 is somewhere between points 1 and 3, which is due to (5.154) and the fact that 2 ∈ / B (no overshoot of link 2 which is a pure QoS link). In contrast, if we had 2 ∈ A ∩ B, then, for sufficiently large α, the operating point could be any boundary point on the left side of point 3. The exact operating point depends on the choice of the weight vector b in (5.157).
5.8 Utility-Based Joint Power and Receiver Control In this section, we consider the possibility of maximizing an aggregate utility function over a joint space of power vectors and receivers. This problem is much more intricate than the corresponding QoS-based power control problem addressed in Sect. 5.5.3. To the best of our knowledge, it is in particular not known which class of utility functions allows for a convex formulation of the problem so that a global solution can be efficiently found in distributed wireless networks. A simple example of a perfectly synchronized network with two links and the logarithmic utility function Ψ (x) = log(x), x > 0, shows [155] that Conditions (C.5-2)–(C.5-4) are not sufficient to convert the problem into a convex optimization problem using a logarithmic transformation that, as shown in Sect. 6.3, convexizes the utility-based power control problem with fixed receivers. Simulations further show that the problem is indeed locally solvable if Ψ (x) = log(x), x > 0 (Definition B.40). Nevertheless, some optimization is still possible in the sense that, starting at some point, the algorithm increases the aggregate utility function until convergence. In this section, we first reformulate the joint power and receiver control problem as a “pure” power control problem under optimal adaptive receivers. If a perfect synchronization at the output of each receiver is a reasonable assumption, such an equivalent problem formulation may be of interest since then an optimal receiver can be obtained in a closed form solution for any given power vector, as it was shown in Sect. 4.3.2. The reader will however realize in Sect. 5.8.2 that an efficient implementation of a gradient projection algorithm for such an equivalent power control problem can be difficult to accomplish in decentralized wireless networks. As a remedy, we suggest decomposing the overall problem into two coupled sub-problems, namely an utility-based power control problem for fixed receivers and a traditional receiver control problem for given transmit powers; these subproblems are addressed in an alternating fashion. A similar idea is used in the case of a QoS-based power control under the minimum interference function (see the part of Sect. 5.5.3 about the matrix-based algorithm). 5.8.1 Problem Statement As in Sect. 4.3.2, we use ck ∈ CW , k ∈ K, to represent the kth receiver. The physical meaning of ck and the constant W ≥ 1 is explained in Sects. 4.3.2 and
5.8 Utility-Based Joint Power and Receiver Control
223
4.3.5. The receivers of all links are collected in a matrix C = (c1 , . . . , cK ) ∈ CW ×K . Without loss of generality, it is assumed that C ∈ C where C = U = (u1 , . . . , uK ) ∈ CW ×K : ∀k∈K uk 2 = 1 is the compact set of all W × K complex-valued matrices normalized such that their columns are of unit norm. Hence, each receiver is a vector on a unit sphere in CW denoted by SW −1 , that is, we have ck ∈ SW −1 , k ∈ K. As C is to be optimized together with the power vector p, the SIR is not only a function of the power vector p ∈ P but also of the receivers C. This can be easily seen from this part of Sect. 4.3.2 where we derived an optimal receiver under the assumption of perfect synchronization.34 The crucial property is, however, that the kth receiver impacts only the kth signal-to-interference ratio. This stands in clear contrast to transmit powers that in general influence all links in a network. In order to emphasize the dependence of the kth signalto-interference ratio on the kth receiver, we write SIRk (p, ck ), which is in accordance with Sect. 4.3.2. From practical point of view, it is reasonable to make the following assumption: W −1 → R+ : (p, ck ) → SIRk (p, ck ) is continuous. (C.5-21) RK + ×S
In particular, it follows from Theorem 5.53 and Definition 5.51 that the SIR is continuous for any standard interference function. An immediate consequence of (C.5-21) and Theorem B.11 is the following. Observation 5.83. If (C.5-21) is satisfied, then the SIR has a maximum on the compact set P × SW −1 . Furthermore, for any fixed p ≥ 0, there exists c∗k (p) such that SIR∗k (p) := SIRk (p, c∗k (p)) = max SIRk (p, x) . x∈SW −1
(5.158)
As the SIR is a continuous function, reasoning along the same lines as in the proof of Lemma 5.12 proves the following observation. Observation 5.84. Suppose that (C.5-21) is true and that Ψ : R++ → Q ⊆ R fulfills Conditions (C.5-2)–(C.5-4). Then, for any weight vector w = (w1 , . . . , wK ) > 0, the function G : RK ++ × C → R given by G(p, C) :=
wk Ψ SIRk (p, ck )
(5.159)
k∈K
attains its maximum on P+ × C. Furthermore, at the maximum, every power vector is positive.
34
See also Sect. 5.5.2 where the interference may be a function of the receiver design.
224
5 Resource Allocation Problem in Communications Networks
By the observation, we can state the joint power and receiver control problem as follows: Find a pair (p∗ , C∗ ) ∈ P × C such that (p∗ , C∗ ) = arg max max G(p, C) . p∈P C∈C
(5.160)
Now, using (5.158) as well as considering the facts that Ψ is strictly increasing and the kth receiver impacts only the SIR of link k, this problem can be written as a power control problem under optimal adaptive receivers: max wk Ψ (SIRk (p, ck )) max max G p, C = max p∈P C∈C
p∈P
= max p∈P
= max p∈P
k∈K
k∈K
ck ∈SW −1
wk Ψ
max SIRk (p, ck )
ck ∈SW −1
wk Ψ SIR∗k (p)
(5.161)
k∈K
= max G p, C∗ (p) = G(p∗ , C∗ ) p∈P
where SIR∗k (p) is defined by (5.158) and C∗ (p) = (c∗1 (p), . . . , c∗K (p)) denotes an optimal receiver matrix for a given p ≥ 0. By Observation 5.83, we have c∗k (p) = arg max SIRk (p, c), c∈SW −1
k ∈ K.
and c∗k = c∗k (p∗ ). These simple observations imply that (p∗ , C∗ ) = (p∗ , C∗ (p∗ )) and wk Ψ (SIR∗k (p)) p∗ = arg max F (p) = arg max (5.162) p∈P
p∈P
k∈K
where F (p) = k∈K wk Ψ (SIR∗k (p)) is the aggregate utility function similar to that introduced in Sect. 5.2.5. In words, the original problem (5.160) is equivalent to the power control problem (5.162) once the optimal receivers for a given power vector are known explicitly. As a consequence of (5.162), the joint power and receiver control problem could be treated in a similar way as the power control problem with fixed receivers if we knew SIR∗k (p) as a function of p ≥ 0 and if SIR∗k (p) was a continuously differentiable function. Note that since c∗k (p) maximizes the signal-to-interference ratio for a given power vector p, it actually minimizes the interference power at the kth receiver output. Consequently, we have SIR∗k (p) = pk /Ik (p), k ∈ K, where Ik is the minimum interference function given by (5.76) with Uk = SW −1 . 5.8.2 Perfect Synchronization If time asynchronism of observed signals at the input to each receiver is small relative to the inverse signal bandwidth, then it may be reasonable to assume
5.8 Utility-Based Joint Power and Receiver Control
225
perfect synchronization when optimizing the receivers. For instance, a sufficiently good synchronization must be ensured in systems using orthogonal frequency division multiplexing (OFDM) as a part of the air interface. The reason is that, in OFDM-based systems, the subcarrier signal bandwidth is narrow and the symbol duration on each subcarrier is relatively large. Thus, if OFDM is used in combination with multiple antenna systems where transmit/receive beamforming is employed to reduce the interference on each subcarrier, the design of the receive beamforming vectors (on a per-carrier basis) can be based on the assumption of perfect synchronization.35 Under the assumption of perfect synchronization, it follows from Lemma 4.11 that −1 k∈K (5.163) SIR∗k (p) = pk bH k Zk (p)bk , where the link-specific positive definite matrix Z−1 k (p) is defined by (4.14) and (l) (l) (k) where bk ∈ CW , with bk = 0 and bk := bk , is the effective transmit vector of transmitter l associated with receiver k. The effective transmit vectors are assumed to be arbitrary but fixed. The notion of effective transmit vectors is explained in Sect. 4.3.2. See also the examples in Sect. 4.3.5. As a result, in the case of perfect synchronization, SIR∗k is known explicitly so that the problem (5.162) can be written as −1 p bk . (5.164) wk Ψ pk bH p∗ = arg max k Zk p∈P
k∈K
Furthermore, it follows from Lemma 4.11 that an optimal receiver matrix is −1 ∗ ∗ C∗ = a1 Z−1 (5.165) 1 (p )b1 , . . . , aK ZK (p )bK ∈ C ∗ with constants a1 , . . . , aK > 0 chosen such that c∗k = ak Z−1 k (p )bk satisfies c∗k 2 = 1 for each k ∈ K. Because p∗ > 0, we can focus on P+ . By Lemma 4.11, we know that c∗k (p) is unique for every p > 0 and Zk (p) is positive definite for all p > 0. As a consequence of this and the fact that each entry of Zk (p) a continuously differentiable function of p, all partial derivatives of SIR∗k (p) given by (5.163) exist and are continuous functions on RK ++ . This in turn is because the inverse −1 matrix Zk (p) exists for all p > 0, regardless of the choice of effective transmit vectors, and the entries in Z−1 k (p) vary continuously with the entries in Zk (p). We can thus apply the following identity36 to compute the kth partial ∂F (p) of the aggregate utility function derivative ∇k F (p) := (∇F (p))k = ∂p k F (p): dA−1 dA(x) −1 (x) = −A−1 (x) A (x) . dx dx 35
36
Recall from Sect. 4.3.5 that the receivers can in particular stand for receive beamforming vectors. Note that the identity holds for any invertible and differentiable matrix function A(x), x ∈ R.
226
5 Resource Allocation Problem in Communications Networks
Indeed, a straightforward application of this identity to compute the partial derivatives yields (for each k ∈ K) 2 ∇k F (p) = wk Ψ SIR∗k (p) gk,k (p) − wl pl Ψ SIR∗l (p) gl,k (p) (5.166) l∈Kk −1 where gl,k (p) = bH l Zl (p)bk ∈ C for (k, l) ∈ K×K. Note that gk,k (p) > 0 for any p ≥ 0 and bk ∈ CW , b = 0. The inverse function Ik (p) = 1/gk,k (p), p ≥ 0, is the minimum standard interference function defined by (5.75). Equation (5.166) reveals that a distributed computation of the partial derivatives of the aggregate utility function F (p) may be difficult and extremely costly in terms of wireless resources. The main problem appears to be how the lth transmitter or the lth receiver should compute or estimate −1 2 |bH l Zl (p)bk | without excessive overhead signaling between the nodes. Note that this quantity depends on all effective transmit vectors associated with receiver k through the matrix Z−1 l (p) where Zl (p) is defined by (4.14). Thus, methods for a direct computation of p∗ and, based on this, c∗k = c∗k (p∗ ) are likely to be not amenable to distributed implementation.
5.8.3 Decentralized Alternating Computation Utility-based algorithms for a joint optimization of transmit powers and receivers can be made more suitable for being implemented in decentralized networks if we alternate the computation of the transmit powers with the computation of the receivers. The basic principle is to increase the value of the function G(p, C) defined by (5.159) in an alternating fashion. More precisely, each iteration, say iteration n ∈ N0 , consists of the following two sub-steps: (i) Power control step: For some given receiver matrix C(n), a power vector p(n + 1) is computed such that G(p(n), C(n)) ≤ G(p(n + 1), C(n)). (ii) Receiver control step: Given p(n + 1), the receiver matrix is updated to C(n + 1) such that G(p(n + 1), C(n)) ≤ G(p(n + 1), C(n + 1)). The process is repeated until convergence. Note that the idea is essentially the same as in the case of the matrix-based QoS-based power control algorithm presented in Sect. 5.5.3 (Algorithm 5.2). The only difference lies in the power control step. The algorithm converges since, by Observation 5.84, G has a maximum on P+ × C and we have G(p(n), C(n)) ≤ G(p(n + 1), C(n)) ≤ G(p(n + 1), C(n + 1)) for every n ∈ N0 . In other words, the algorithm generates a non-decreasing sequence {G(p(n), C(n))}n∈N0 , which is bounded above by G(p∗ , C∗ ). Consequently, the algorithm must converge to some, say, point of attraction of the algorithm. It is however important to emphasize that this point does not necessarily correspond to a global maximum. In fact, the above arguments do not
5.8 Utility-Based Joint Power and Receiver Control
227
even show that the point of attraction of the algorithm is a local maximizer of the problem (5.160). A popular method for proving the convergence to a local maximum is the verification of the Second-Order Sufficiency Conditions (Definition B.55). Both sub-steps (i) and (ii) can be implemented in a distributed manner. As far as the sub-step (i) is concerned, any of the power control algorithms presented in Chap. 6 can be applied to obtain a new power update for a given set of receivers. Since, in practice, only a finite and often relatively small number of power control iterations can be performed, monotonicity of the power control algorithm is desired to ensure that an improvement is achieved after the algorithm is stopped. Distributed algorithms for computing optimal receivers in the step (ii) are widely established. These algorithms are based either on blind or pilot-based methods for receiver estimation. The reader is referred to [97, 156] for more information and to [97, pp. 326–327] for an extensive list of references. In the case of a pilot-based estimation scheme, in addition to a sequence of data symbols, each transmitter sends a known and link-specific pilot sequence, which is typically a pseudo-noise deterministic sequence with good autocorrelation properties (see for instance [94, 157, 158]). The length of all pilot sequences depends on the number of participating links. In contrast, blind estimation methods estimate the optimal receivers during the data transmission and can be used instead of, or in addition to, a pilot-based estimation scheme. A practical scheme could work as follows: At the beginning of every frame, the power control step is executed only once.37 Then, the resulting transmit powers are used for data transmission. During this time, the receivers are updated online after each transmitted symbol using a blind estimation method. If necessary, a pilot-based estimation method can be used for fine tuning subsequently in each frame. In any case, stochastic algorithms should be applied due to noisy measurements/estimations (see Sect. 6.6.5). Numerical experiments suggest that the scheme should not relay exclusively on blind methods to estimate the optimal receivers with a sufficient accuracy. The reader can find an example of convergence behavior in [155]. 5.8.4 Max-Min SIR Balancing Let us further consider the alternating algorithm presented in the preceding section. Suppose that for every n ∈ N0 , the power vector p(n + 1), which is computed in the nth power control step (i) for a given receiver matrix C(n), is equal to the max-min SIR-balanced power vector defined by (5.100) on page 191. Furthermore, assume that the resulting gain matrix is irreducible for every C(n), n ∈ N0 , which is a reasonable assumption in practice. It may then be shown using the results of Sect. 5.6.2 and [159] that the joint power 37
Alternatively, the steps (i) and (ii) could be repeated one or several times, depending on the available resources and coherence time of the wireless channel.
228
5 Resource Allocation Problem in Communications Networks
and receiver control algorithm converges to a max-min SIR-balanced solution over the joint space of admissible power vectors and receivers. More precisely, it converges to a global solution of the max-min SIR balancing problem over P × C: max(p,C)∈P×C mink∈K (SIRk (p, ck )/γk ). The algorithm can be easily implemented in centralized networks since, as shown in Theorem 5.73, the max-min SIR-balanced power vector under fixed receivers is equal to an appropriately normalized positive right eigenvector of a certain nonnegative irreducible matrix. One can alternatively obtain this vector via utility-based power control, as shown in Sect. 5.6.4 for the special case of equal values γ1 = · · · = γK > 0. The main problem thus remains an efficient distributed computation of the max-min SIR-balanced power vector. Here, the saddle point characterization pointed out in Sect. 5.6.4 can be helpful to solve this problem.
5.9 Additional Results for a Noiseless Case In this section, we are going to apply the results of Sects. 1.2.4 and 1.2.5 as well as of Sect. 1.7 to gain further insight into the trade-off between efficiency and fairness (Sect. 5.9.1). Moreover, in Sect. 5.9.2, we utilize the results of Sect. 1.6 to study the problem of existence and uniqueness of log-SIR fair power allocation. In doing so, the interference function of each link is assumed to be a (noiseless) linear interference function:
(C.5-22) Ik (p) = l∈K vk,l pl and SIRk (p) = SIR0k (p) for each k ∈ K where SIR0k (p) is defined by (4.20). Thus, each SIR has the ray property (4.21): SIRk (p) = SIRk (cp) for any p > 0 and c > 0. It is interesting to notice that the noiseless interference functions of the form given by (C.5-22) do not satisfy axioms A1–A3 of Definition 5.51, and thus the interference functions considered in this section are not standard interference functions. In fact, they are interference functions in the sense of Definition 5.56 that provides an alternative axiomatic framework for interference functions. For more detailed discussion, the reader is referred to Sect. 5.5.2. It is also important to emphasize that the neglect of the background noise does not necessarily mean that, in practical systems, the channel must be noiseless in order to apply the results. In fact, the background noise is always present but, when designing power control algorithms, the noise might be neglected for the following two reasons: 1.
The multiple access interference is dominant in the sense that zk k∈K vk,l pl for each k ∈ K and for all p > 0 in some neighborhood of an optimal power allocation. In such a case, the background noise can be omitted as it has a negligible impact on optimal transmit powers. 2. The noise variance is not known so that it might be better to neglect the noise and to allocate transmit powers as if the channel was noiseless.
5.9 Additional Results for a Noiseless Case
229
The noiseless assumption is widely used in the literature [49, 50, 51, 53, 52, 57, 58, 61]. An overview can be found in [64, 60] For the analysis, an important observation is that the existence of noise in our model is a key ingredient in the proof of Lemma 5.12. As a consequence, the statement of this lemma is not necessarily true in the case of noiseless channels and some additional constraints on the gain matrix V ≥ 0 are necessary to ensure that the aggregate utility function attains its maximum. Also, Observation 5.47 regarding the continuity of the minimum signal-to-interference ratio on the set of nonnegative power vectors is true in general if the noise variance is positive. Throughout this section, we assume the following. (C.5-23) V is a member of N+ K defined by (1.88) and is chosen such that the supremum of the aggregate utility function over RK ++ with z = 0 is bounded (see (5.167) in the following subsection for a restatement of the problem).
Notice that if V ∈ N+ K ⊂ NK , then one has k∈K vk,l pl > 0 for each k ∈ K and any p > 0. This ensures that the signal-to-interference ratios are welldefined despite the lack of noise. However, the reader should not think that (C.5-23) is sufficient to ensure that some objective functions considered in this section attain their maxima or minima. K and EK (V) ⊂ For any given V ≥ 0, the sets RK (V) ⊂ RK + , LK (V) ⊂ R K R+ are defined by (1.86) and are all nonempty sets. Moreover, recall that + + K K K R+ K (V) = RK (V)∩R++ , LK (V) = LK (V)∩R++ and EK (V) = EK (V)∩R++ . + In contrast, these sets are empty for some V ∈ NK . See also the discussion in Sect. 1.7 and App. A.4. 5.9.1 The Efficiency–Fairness Trade-off In Sect. 5.1.1, we described the fundamental efficiency–fairness trade-off in wired communications networks. It was pointed out that throughput-optimal policies may lead to significant rate deviations among competing flows. It is even possible that some flows are denied access to the links under throughputoptimal policies. For instance, the example in Fig. 5.1 shows a simple scenario where the longer flow is allocated zero source rate under a throughput-optimal rate allocation. Because this is in general not tolerable, the network designers are forced to address the issue of fairness. The most common understanding of fairness is the max-min fairness, where, roughly speaking, the aim is to make the QoS performance measures of interest of all links as equal as possible in the sense that no link can improve its QoS performance without deteriorating the performance of any other link that has the same or worse QoS performance. If the QoS performance measure of interest is the data rate, which, without loss of generality, is assumed throughout this section, then max-min fairness is achieved under max-min fair rate allocation given by Definition 5.16. On the other hand, however, it is quite intuitive that, for instance, an allocation of equal data rates to all links is suboptimal in terms of
230
5 Resource Allocation Problem in Communications Networks
throughput or some other (suitably chosen) aggregate utility function. Since the value of the utility function can be identified as some measure of overall efficiency of the network, the outlined trade-off situation can be referred to as the efficiency–fairness trade-off. The discussion below aims at highlighting the potential of incompatibility between efficiency and fairness issues. In fact, in the case of wired networks, this trade-off depends on the network topology. Simple examples show that there exist wired network topologies where an “ideal” combination of fairness and efficiency is possible [1]. The constraints of wired network topology together with some fixed link capacities are, in the case of wireless networks, replaced by some constraints on transmit powers and the structure of the gain matrix V ≥ 0, which describes the crosstalk between different links (see Sect. 4.3.1 for exact definitions). Under the assumptions of Sect. 5.2.4 (no link scheduling and an established network topology) and (C.5-22), the utility maximization problem in wireless networks becomes a power control problem of the form (see App. B.1 for the definition of “arg sup” and “arg inf”) wk Ψ (SIRk (p)) = arg sup wk Ψ (SIRk (p)) (5.167) p∗ := arg sup p∈P
p∈RK ++ k∈K
k∈K
for some given weight vector w ≥ 0, w = 0, where the function Ψ : R++ → Q ⊆ R is assumed to satisfy (C.5-2)–(C.5-4). The vector p∗ ≥ 0 is referred to as a (w, Ψ )-fair power allocation (see Sect. 5.2.5). Notice that in contrast to the problem formulation in Sect. 5.2.5, the weight vector in (5.167) may contain zero entries. Also, p∗ as defined in (5.167) may have zero components. Under a (w, Ψ )-fair power allocation given by (5.167), the data rate of link k is νk (p∗ ) = Φ(SIRk (p∗ )) (see Sect. 4.3.4 for the definition of the rate function Φ : R+ → R+ ) and, of course, the data rates are in general different for different links. This raises the following interesting question: Defining a ¯ as max-min SIR power vector (allocation) p ¯ := arg max min SIRk (p) = arg max min SIRk (p) p p∈P
k∈K
k∈K p∈RK ++
(5.168)
under which conditions (if at all) is a (w, Ψ )-fair power allocation p∗ a maxmin SIR power allocation? In other words, we are asking whether there exists a power vector p that is both (w, Ψ )-fair and max-min SIR in the sense of (5.168)? Unless otherwise stated, it is assumed without loss of generality that ¯ ∈ ΠK , p∗ ∈ ΠK which means w1 = ¯ p1 = p∗ 1 = 1. (C.5-24) w ∈ ΠK , p Remark 5.85. Notice that the identities in (5.167) and (5.168) are immediate consequences of (C.5-22), which is assumed to hold throughout this section. In (5.168), we assumed that mink∈K SIRk (p) attains its maximum on the set of positive power vectors RK ++ . From App. A.4.3, we know that this does not need to be true for general nonnegative matrices, as in this case a max-min
5.9 Additional Results for a Noiseless Case
231
SIR power allocation may not exist. The restriction to positive power vectors is reasonable in light of the requirement that a max-min SIR power allocation should be fair in the sense that no link is denied access to wireless resources. See also the discussion before Observation 5.88. Remark 5.86. We point out that the definition of max-min SIR power allocation in (5.168) is a “noiseless” version of the notion of max-min SIR-balanced power allocation considered in Sect. 5.6 with Γ = diag(γ1 , . . . , γK ) = I (see also Definitions 5.61 and 5.62 as well as the next remark). As mentioned before, however, the lack of the background noise will require some additional assumptions on the gain matrix (in addition to (C.5-23)) to ensure that a max-min SIR power allocation (5.168) exists. In contrast, a max-min SIRbalanced power vector of Definition 5.61 exists regardless of the choice of the gain matrix. Remark 5.87. Note that all the results presented in this section straightforwardly extend to the max-min SIR balancing power control problem, in which case the objective is to maximize mink∈K (SIRk (p)/γk ) for some positive definite matrix Γ (see also Definition 5.61). Everything we need to do is to consider the matrix ΓV instead of V. Because all diagonal elements of Γ are positive, ΓV and V have the same properties which are relevant for the theory presented here. In particular, V has a positive right (respectively, left) eigenvector associated with ρ(V) ∈ σ(V) if and only if ΓV has a positive right (respectively, left) eigenvector associated with ρ(ΓV) ∈ σ(ΓV). If V is irreducible, the existence of the maximum in (5.168) immediately follows from the Collatz–Wielandt formula (Theorem A.35). Furthermore, it ¯ ∈ RK follows from this theorem that the maximum is attained at p ++ if and ¯ is a positive right eigenvector of V associated with ρ(V). Thus, any only if p positive right eigenvector of an irreducible gain matrix V, which is unique up to positive scaling, is a max-min SIR power allocation. Finally, in such a case, we have p) = . . . = SIRK (¯ p) = 1/ρ(V) < +∞ (5.169) 0 < SIR1 (¯ ¯ > 0 is a right eigenvector of which immediately follows from the fact that p V associated with ρ(V) > 0. Obviously, (5.169) implies that ν1 (¯ p) = . . . = p). νK (¯ It is worth pointing out that irreducibility of V can be explained by means of the directed graph G(V) associated with V. Indeed, it is well known (see App. A.4.1) that V is irreducible if and only if G(V) is strongly connected. In terms of interference, we can interpret strong connectivity of G(V) as a kind of interference coupling spanning the entire network. Therefore, when V is irreducible, a network is said to be entirely coupled . In the case of general, not necessarily irreducible gain matrix V, the problem of determining a max-min SIR power allocation is much more subtle. The weak form of the Perron–Frobenius theorem (Theorem A.39) ensures
232
5 Resource Allocation Problem in Communications Networks
merely that, for any V ≥ 0, there exists a nonnegative right eigenvector of V associated with ρ(V). Thus, the gain matrix may have no positive right eigenvector associated with the spectral radius, which is equivalent to writing that R+ K (V) = ∅. The existence of such a positive eigenvector, however, is necessary to obtain the fairness condition (5.169) with all links being active. Indeed, if R+ K (V) is an empty set, then the state of perfectly balanced signalmink∈K SIRk (p) can to-interference ratios being equal to 1/ρ(V) = supp∈RK ++ be only approached by a sequence of positive power vectors that converges to a vector in RK (V), which is the set of all nonnegative right eigenvectors p), k ∈ K, with of V associated with ρ(V). Although the SIR values SIRk (¯ ¯ ∈ RK (V), are positive and finite, some of them in fact represent the limp its of ratios whose numerators and denominators tend to zero. For practical power control algorithms, this may mean that some links must be deactivated in order to be sufficiently close to the desired balanced state (5.169). Thus, if R+ K (V) = ∅, we say that a max-min SIR power allocation does not exist because the fairness condition (5.169) cannot be achieved with all links being active simultaneously. Obviously, whenever R+ K (V) = ∅, as in the particular case of irreducibility of V, then no such difficulty is encountered and any member of R+ K (V) is a max-min SIR power allocation satisfying (5.169). The above discussion is summarized in the following observation. Observation 5.88. The maximum in (5.168) exists if and only if there is a positive right eigenvector of V associated with ρ(V), which is equivalent to R+ K (V) = ∅. Thus, irreducibility of V is sufficient but not necessary for the maximum in (5.168) to exist. The irreducibility property, however, ensures that there is a unique maximizer (up to positive multiples) in (5.168). General nonnegative matrices may lack the uniqueness property. Efficiency of a Max-Min SIR Power Allocation In order to conform with the results presented in Sects. 1.2.4–1.2.5 and 1.7, we rewrite the utility-based power control problem (5.167) as an equivalent minimization problem 1 ∗ wk ψ SIRk (p) = arg inf wk φ p = arg inf SIRk (p) p∈RK p∈RK ++ k∈K ++ k∈K (5.170) = arg inf G(p, w) p∈RK ++ K where φ(x) = ψ(1/x) = −Ψ (1/x) and G : RK ++ × R+ → R is defined to be 1 (Vp)k wk φ wk φ G(p, w) := . (5.171) = SIRk (p) pk k∈K
k∈K
5.9 Additional Results for a Noiseless Case
233
Now since (C.5-2)–(C.5-4) hold and the infimum is finite (by (C.5-23)), it is clear by Sect. 6.3 that φ belongs to the function class G(V) (Definition 1.27 with the modifications in (C.1-3)). Indeed, based on the results of Sect. 6.3, it may be seen that under (C.5-2)–(C.5-4) the problem can be converted into an equivalent convex optimization problem. This in turn implies that every local infimum is a global one. The problem falls into the theoretical framework presented in Sects. 1.2.4– 1.2.5, 1.7 and A.4.3. For the purposes of the following discussion, the most important results are provided by Theorems 1.71, A.43, A.45 and 1.77. These theorems are extensions of the results of Sects. 1.2.4–1.2.5 to more general nonnegative matrices that belong to the set N+ K (Condition (C.5-23)). By Theorem 1.71, we know that if w ∈ EK (V) ⊂ ΠK , then38 (Vp)k wk φ ≥ φ ρ(V) pk k∈K
for all p ∈ RK ++ . Moreover, we have (Vp)k inf wk φ = φ ρ(V) p p∈RK k ++ k∈K
if and only if w ∈ EK (V). This implies that if w happens to be equal to the Hadamard product of nonnegative left and right eigenvectors of V associated with ρ(V), then p∗ given by (5.170) is a nonnegative right eigenvector of V associated with ρ(V), that is, p∗ ∈ RK (V). At the same time, Theorem A.35 and strict monotonicity of φ imply that, for any p ∈ RK (V), (Vp)k (Vp)k (Vs)k max φ = φ max = φ inf max = φ ρ(V) . k∈K k∈K k∈K pk pk sk s∈RK ++ So, it follows that whenever w ∈ EK (V) and R+ K (V) = ∅, then any power vector equal to p ∈ R+ (V) is both a (w, Ψ )-fair power allocation and a K max-min SIR power allocation. Moreover, if w ∈ / EK (V), then p ∈ R+ K (V) is suboptimal in terms of the utility-based power control problem (5.167) or, equivalently, (5.170). This means that we can always find some power / R+ allocation s = s(w) ∈ RK ++ , s ∈ K (V), such that (Vp)k (Vs)k wk φ wk φ > , w∈ / EK (V), p ∈ R+ K (V) . pk sk k∈K
k∈K
The preceding discussion prompts us to conclude the following. 38
Note that in contrast to the notation used here, the first part of the book uses p to denote a nonnegative right eigenvector of X ∈ XK associated with ρ(X). In this part of the book and, in particular, in this section, p is at first used to denote a power vector that may be equal to a nonnegative right eigenvector of V ≥ 0.
234
5 Resource Allocation Problem in Communications Networks
Observation 5.89. Given any V ∈ N+ K , suppose that the weight vector w ≥ 0 ∗ in (5.170) is a member of EK (V). If there exists p ∈ R+ K (V), then p = c1 p = ¯ for some constants c1 > 0 and c2 > 0. c2 p In the particular case of an irreducible gain matrix V, the existence of unique eigenvectors (up to a positive scaling factor) is ensured by the Perron– Frobenius theorem (Theorem A.32). Thus, in this case, the (normalized) weight vector w ∈ ΠK for which a max-min SIR power allocation is (w, Ψ )-fair (and vice versa) is positive and unique. Existence of a Max-Min SIR Power Allocation and a Positive Weight Vector In the light of Observations 5.88 and 5.89, one is interested in characterizing the entire class of interference scenarios in networks, including entirely coupled networks, for which a max-min SIR power allocation exists. In the context of Sect. 1.7.3, this is equivalent to characterizing the set of matrices BK (Definition 1.73). Such a characterization is possible using a normal form of the gain matrix (Definition A.41) as well as the notions of maximal and isolated diagonal blocks (Definition A.42). A normal form of a gain matrix on hand corresponds to a network with links k ∈ K labeled in such a manner that the gain matrix has the following block lower triangular form ⎛ (1) ⎞ V 0 ··· 0 ⎜ V(2,1) V(2) . . . 0 ⎟ ⎟ (5.172) V=⎜ ⎝ ··· ··· ··· ··· ⎠ V(S,1) V(S,2) . . . V(S) where, without loss of generality, the diagonal blocks V(s) , 1 ≤ s ≤ S, are assumed to be irreducible matrices of not necessarily the same dimensions. A result of such a labeling is that the links are grouped into a number S ≥ 1 of (disjoint) link subsets of K, which can be seen as forming (disjoint) entirely coupled subnetworks with subnetwork gain matrices V(s) , 1 ≤ s ≤ S. Obviously, in the case of an irreducible gain matrix V, one has S = 1 and V(1) = V (entirely coupled network). Let us refer to the (entirely coupled) subnetworks extracted by the above labeling by their indices s, 1 ≤ s ≤ S. Note that nondiagonal blocks in a normal form, that are not necessarily square matrices, determine the interference between different subnetworks. If a diagonal block V(s) in the normal form (5.172) is isolated in the sense of Definition A.42, then this means that the sth subnetwork does not perceive any interference from other subnetworks, and thus it can be referred to as being interference isolated . Thus, the block-irreducibility in the sense of Definition A.44 means that a network consists solely of a number of interferenceisolated subnetworks. Let us group all interference-isolated and entirely coupled subnetworks in the index set I ⊆ {1, . . . , S}. Further, let M ⊆ {1, . . . , S} denote the set of
5.9 Additional Results for a Noiseless Case
235
all those entirely coupled subnetworks (not necessarily interference-isolated) whose subnetwork gain matrices that correspond to some diagonal block in the normal form are maximal in the sense of Definition A.42. Now, using Observation 5.88, we can paraphrase Theorem A.43, proved originally in [6], to characterize those interference scenarios for which a max-min SIR power allocation exists. Observation 5.90. A max-min SIR power allocation exists if and only if I = M. According to this observation, a max-min SIR power allocation exists for any network with gain matrix V if and only if the following two conditions are fulfilled. (C.5-25) Any interference-isolated and entirely coupled subnetwork s ∈ I is also maximal so that ρ(V(s) ) = ρ(V). (C.5-26) Any entirely coupled subnetwork s which perceives nonzero interference from another entirely coupled subnetwork (in the sense that some link receiver in subnetwork s perceives interference from some link transmitter not in s) is not maximal so that ρ(V(s) ) < ρ(V). For any interference scenario violating either (C.5-25) or (C.5-26), no maxmin SIR power allocation exists. Note that the fulfillment of the conditions (C.5-25) and (C.5-26), and thus the existence of a max-min SIR power allocation, is solely a feature of the gain matrix V and cannot be influenced by system parameters such as power constraints. Combining Observation 5.89 with Observation 5.90 shows that Conditions (C.5-25)–(C.5-26) and w ∈ EK (V), where w is the weight vector in the utility maximization problem (5.170), are necessary and sufficient conditions for any member of RK (V) to be both a max-min SIR power allocation and a (w, Ψ )-fair power allocation. However, even if the interference coupling satisfies (C.5-25) and (C.5-26), there is no guarantee that E+ K (V) = ∅ because there may be no positive left eigenvector of V associated with ρ(V), that is, L+ K (V) may be an empty set. In the context of the utility-based power control problem (5.170), non-strict positivity of the weight vector w means the allocation of nonzero transmit power to links with zero weights, which is a waste of resources and does not make much sense. For this reason, we are interested in the characterization of interference scenarios for which both positive left and right eigenvectors associated with ρ(V) can be constructed for a given gain matrix V. In terms of Sect. 1.7.3, this is equivalent to characterizing the ¯ K (Definition 1.74). Such a characterization is provided set of gain matrices B by Theorem A.45, which we can paraphrase using Observation 5.88 as follows. Observation 5.91. Consider the normal form (5.172). Then, E+ K (V) = ∅ if and only if I = M = {1, . . . , S}, which is equivalent to saying that V is block-irreducible (Definition A.44).
236
5 Resource Allocation Problem in Communications Networks
In other words, under the following condition, there is a power allocation that is both a max-min SIR power allocation and a (w, Ψ )-fair power allocation with w ∈ RK ++ . (C.5-27) All entirely coupled subnetworks are both interference-isolated and maximal. For any network violating (C.5-27), E+ K (V) is an empty set so that there is no pair of positive left and right eigenvectors of V associated with ρ(V). The Efficiency–Fairness Trade-off as a Saddle Point In this section, we assume that the weight vector w ∈ ΠK in (5.170) is a variable and study the impact of this vector on a solution to the utility-based power control problem (5.170). The main ingredients in the study are provided by Theorem 1.77 which can be reformulated as follows. Observation 5.92. For any V ∈ N+ K , we have inf
max G(s, w) = φ(ρ(V)) = max
w∈ΠK s∈RK ++
inf G(s, w) .
w∈ΠK s∈RK ++
+ T Additionally, if q ∈ L+ K (V) and p ∈ RK (V) with q p = 1, then (p, q ◦ p) is a saddle point of G such that
min max G(s, w) = G(p, q ◦ p) = max+ min G(s, w) .
+ s∈RK ++ w∈ΠK
w∈ΠK s∈RK ++
(5.173)
By the observation, −Ψ (1/ρ(V)) is a saddle value of the aggregate utility function (5.171). Moreover, it follows from Theorem 1.77 that (p, u) = arg inf
sup G(s, w) = arg inf
+ s∈RK ++ w∈Π K
max G(s, w)
+ s∈RK ++ w∈ΠK
if and only if (p, u) ∈ RK (V) × EK (V). Such a saddle point exists whenever + + + R+ K (V) = ∅ and EK (V) = ∅, in which case any pair (p, w) ∈ RK (V)×EK (V) is a saddle point of the aggregate utility function in (5.167) and there are no + saddle points of the type (5.173) other than those included in R+ K (V)×EK (V). Observation 5.91 implies that there is such a saddle point if and only if each entirely coupled subnetwork is interference-isolated and maximal. Note that in view of the discussion in the preceding subsection, we confine our attention K to “positive” saddle points, that is, pairs in RK ++ ×R++ , since only such points are of interest in practice. The saddle point property stated in Observation 5.92 is an illustration of the nature of the efficiency–fairness trade-off or, more precisely, of the incompatibility of (w, Ψ )-fairness and fairness in the max-min SIR sense. The ¯∈ first equality in (5.173) characterizes the max-min SIR power allocation p RK ++ as the one which maximizes the aggregate utility function in (5.167)
5.9 Additional Results for a Noiseless Case
237
¯ is a (w, Ψ )under a “worst-case” weight vector w ∈ Π+ K . In other words, p fair power allocation when w maximally degrades the value of the aggregate utility, that is, when w ∈ E+ K (V). On the other hand, from the second equality in (5.173), we see that any weight vector w ∈ E+ K (V) makes the maximal value of the aggregate utility function as small as possible so that the worst (w, Ψ )-fair performance (among all weight vectors w ∈ Π+ K ) is achieved under a max-min SIR power allocation. The above saddle point interpretation makes an inherent trade-off between (w, Ψ )-fairness/efficiency and max-min SIR fairness (in the sense of (5.168)) apparent: Max-min SIR fairness and (w, Ψ )-fairness coincide if and only if w ∈ E+ K (V), which corresponds to the worst-case network performance expressed in terms of the aggregate utility function in (5.167). This feature can be expressed compactly in the form of the following chain inequality: For any + p ∈ R+ K (V) (max-min SIR power allocation) and w ∈ EK (V), one has (Vs)k (Vp)k uk φ uk φ < inf sk pk s∈RK ++ k∈K k∈K (Vs)k < inf wk φ sk s∈RK ++ k∈K (Vp)k (Vs)k = wk φ wk φ < pk sk k∈K
k∈K
+ K whenever s ∈ / R+ / E+ K (V) and u ∈ K (V), (s, w) ∈ R++ × ΠK .
Fairness Gap In Sect. 5.6 and in this section so far, the focus of our study has been on the max-min SIR power allocation and its relation to utility-based power control. As explained in Sect. 5.9.1, such a power allocation is of key interest as it ensures the largest possible improvement of the worst SIR among all links and leads to max-min fairness (Definition 5.16) if the gain matrix is irreducible. Now, under the assumption of the noiseless case (C.5-22), let us turn our attention to the following power control problem inf max SIRk (p) = inf max SIRk (p) .
p∈P+ k∈K
k∈K p∈RK ++
(5.174)
The problem can be interpreted as a problem of finding a positive power vector (if it exists) that maximally deteriorates the best SIR among all links. In analogy to the terminology used in the previous section, the problem is referred to as the min-max SIR power control problem. If the infimum in (5.174) is attained, then the vector p = arg min max SIRk (p) k∈K p∈RK ++
(5.175)
238
5 Resource Allocation Problem in Communications Networks
is called a min-max SIR power allocation. At the first glance, the usefulness and applicability of such a problem formulation seems to be very limited. The most possible degradation of the best SIR is usually not a desired optimality criterion for resource allocation in wireless networks, and it can be even viewed as the “worst-case” approach to the resource allocation problem. However, note that there is a specific general relation between the worst SIR performance under a max-min SIR power allocation and the best SIR performance when p given by (5.175) is used. Precisely, for any given gain matrix V, Lemma A.46 shows that sup min SIRk (p) ≤ 1/ρ(V) ≤ inf max SIRk (p) .
k∈K p∈RK ++
k∈K p∈RK ++
(5.176)
In light of this relationship, the min-max SIR power control problem becomes worth considering whenever both inequalities in (5.176) hold with equalities and both optimizers are the same. In such cases, a min-max SIR power allocation and a max-min SIR power allocation achieve the same SIR performance, which corresponds to max-min fairness. An answer to the question whether these two conditions are satisfied depends on the interference coupling, which is determined by the gain matrix. The max-min and min-max SIR power control problems define two in some sense complementary notions of fairness. For this reason, in what follows, we refer to networks with gain matrices for which equality in (5.176) holds as networks with no fairness gap. In this context, it is important to bear in mind that a power allocation which is both max-min SIR and min-max SIR does not need to exist even in networks with no fairness gap, that is, even if both inequalities in (5.176) hold with equalities. However, whenever the set of optimizers is the same for both problems, we can take advantage of having the possibility to choose between (5.168) and (5.175) as a basis for computing a max-min SIR power allocation.39 In practice, potential algorithms for the problem (5.175) may turn out to be more efficient than the algorithms for the original max-min SIR power control problem. First let us focus on a class of gain matrices for which there is no fairness gap. Again, the characterization of such matrices is obtained using the normal form of a given gain matrix and the notions of maximal and isolated diagonal blocks (see Sect. 5.9.1). It follows from the previous subsections (see also Theorem A.47) that the first inequality in (5.176) is always an equality. Thus, for any V ≥ 0, we have sup min SIRk (p) = sup min
k∈K p∈RK ++
k∈K p∈RK ++
pk 1 . = (Vp)k ρ(V)
(5.177)
This means that the minimum SIR under a max-min SIR power allocation always equals 1/ρ(V). Moreover, a max-min SIR power allocation exists if 39
Note that such an alternative problem formulation is not possible when the noise variances are positive since then there is no min-max SIR power allocation other than p = 0.
5.9 Additional Results for a Noiseless Case
239
and only if the supremum in (5.177) is attained, and corresponds to a positive right eigenvector of V (Observation 5.88). Obviously, in this case, all SIRs are equal as in the case of irreducible matrices (5.169). As far as the second inequality in (5.176) is concerned, a necessary and sufficient condition for equality is provided by Theorem A.48. By this result, for any V, one has pk 1 = inf max k∈K (Vp) ρ(V) p∈RK ++ k if and only if I ⊆ M. Together with (5.177), this leads to the following paraphrase of Theorems A.47, A.48 in terms of interference coupling. Observation 5.93. There is no fairness gap in a network with the gain matrix V if and only if I ⊆ M. In words, a necessary and sufficient condition for no fairness gap is that any isolated diagonal block in the normal form of the gain matrix is also maximal. This is equivalent to saying that (i) any of entirely coupled and interference-isolated subnetworks s ∈ I is maximal, that is, we have ρ(V(s) ) = ρ(V), s ∈ I. At the same time, the observation implies that we have a fairness gap if (and only if) (ii) there is an entirely coupled and interference-isolated subnetwork s ∈ I which is not maximal so that ρ(V(s) ) < ρ(V). Note that the existence/nonexistence of a fairness gap is governed by an interplay between spectral and combinatorial properties40 of the interference coupling in a network. This subtle interplay can be illustrated by the interference coupling examples in Fig. 5.14. It may be easily verified using Observation 5.93 that the network on the left hand side of Fig. 5.14 has no fairness gap. The network on the right hand side differs from the one on the left hand side only by a smaller but still positive spectral radius of the gain matrix of one entirely coupled and interference-isolated subnetwork (with triangles). This is however sufficient to loose the “no fairness gap” feature of the left hand side network, regardless of how small the decrease in spectral radius is. In contrast to this, a decrease of the spectral radius of the gain matrix of the subnetwork with squares would not influence the fairness gap at all. As explained before, even if there is no fairness gap, there may be no power allocation that is optimal in the sense of both the max-min SIR problem and the min-max SIR problem. The existence of such optimizers requires slightly stronger conditions on the gain matrix than those required in Observation 40
We say “combinatorial properties” because it is only of interest whether there is interference between some links or not. It does not matter how strong the interference is.
240
5 Resource Allocation Problem in Communications Networks
ρ(V(3)) < 1
ρ(V(2)) = 1
ρ(V(2)) < 1
ρ(V(1)) = 1
ρ(V(3)) < 1
ρ(V(1)) = 1 ρ(V(4)) = 1
ρ(V(4)) = 1
Fig. 5.14: Exemplary networks with four entirely coupled subnetworks. The arrows model the interference between the links of different subnetworks (inter-subnetwork interference), with the arrow heads directed to the receivers where the interference is perceived. The interference within the subnetworks (intra-subnetwork interference) is not depicted and can be assumed arbitrary.
5.93. These conditions are provided by Theorem A.49. This theorem states ¯ ∈ RK that, for any given gain matrix V, a power allocation p ++ satisfying ¯ = arg max min p k∈K p∈RK ++
pk pk = arg min max k∈K (Vp)k (Vp) p∈RK ++ k
(5.178)
¯ is a exists if and only if I = M. Moreover, (5.178) holds if and only if p right eigenvector of V associated with ρ(V). This leads us to the following observation. ¯ ∈ RK Observation 5.94. Given any V, there exists a power allocation p ++ which is both max-min SIR and min-max SIR if and only if I = M. Moreover, ¯ is max-min SIR and min-max SIR if and only if p ¯ ∈ R+ p K (V) Comparing Observations 5.90 and 5.94 reveals a very elegant property: Given a particular interference scenario, a max-min SIR power allocation exists if and only if there is a min-max SIR power allocation, in which case both power allocations are equivalent. By the definition of the fairness gap, this further implies that the “no fairness gap” property is necessary for the existence of a max-min SIR power allocation. Consequently, we can conclude from the discussion in Sect. 5.9.1 that a network, with the gain matrix V, for which a power vector exists that is both max-min SIR and min-max SIR has to satisfy the following conditions. (i) Any interference-isolated and entirely coupled subnetwork s ∈ I is also maximal. (ii) Any entirely coupled subnetwork s which perceive nonzero interference from some other entirely coupled subnetworks is not maximal.
5.9 Additional Results for a Noiseless Case
241
Using a simple example of interference coupling, Fig. 5.15 illustrates how subtle the existence conditions of Observation 5.94 are. The network on the left picture of Fig. 5.15 can be readily seen to satisfy the conditions. Now, whenever there is a link in a non-maximal subnetwork (some subnetwork l such that ρ(V(l) ) < 1) causing interference to some single link within a maximal subnetwork (see the right picture of Fig. 5.15), then the conditions are not fulfilled anymore.
ρ(V(2)) = 1
ρ(V(3)) < 1
ρ(V(1)) = 1
ρ(V(2)) = 1
ρ(V(3)) < 1
ρ(V(1)) = 1 ρ(V(4)) < 1
ρ(V(4)) < 1
Fig. 5.15: Exemplary networks with four entirely coupled subnetworks. The arrows model the inter-subnetwork interference, with the arrow heads directed to the receivers where the interference is perceived. The intra-subnetwork interference is not depicted as it can be assumed arbitrary.
5.9.2 Existence and Uniqueness of Log-SIR Fair Power Allocation In this section, we address the problem of the existence and uniqueness of (w, Ψα )-fair power allocation when α = 1, that is, when Ψα (x) = Ψ (x) = log(x), x > 0. In doing so, in addition to (C.5-22), it is assumed throughout this section that (C.5-28) V is irreducible. This implies (C.5-23), and hence ensures that each SIR is well defined on the set of positive power vectors. For brevity and because of the use of the logarithmic function as the utility function, the optimal power vector p∗ given by (5.167) (see also (5.31) and the discussion in Sect. 5.2.5) is referred to as a log-SIR fair power vector (allocation). To be precise and avoid confusion, we introduce the following definition. Definition 5.95 (Log-SIR Fair Power Vector). We say that p∗ > 0 is log-SIR fair if
242
5 Resource Allocation Problem in Communications Networks
p∗ = arg max F (p)
(5.179)
p>0
where (and throughout this section) F : RK ++ → R is defined to be F (p) = arg max wk log(SIRk (p)) p>0
k∈K
for a given weight vector w = (w1 , . . . , wK ) > 0. Log-SIR fair policies are of great interest since they closely approximate throughput-optimal policies in the high SIR regime, as discussed in Sects. 5.2.5 and 5.4.2. However, if the noise is neglected, the following important questions arise immediately. (i) Does the maximum in (5.179) exists? This is equivalent to asking whether the supremum supp>0 F (p) is bounded and attained for some p > 0? Note that in the noiseless case, the existence of a maximum cannot be concluded from Lemma 5.12. (ii) Is p∗ unique up to positive multiples41 if a maximum exists? The answer to the above questions strongly depends on the choice of the weight vector w > 0 and a realization of the matrix V ≥ 0 or, more precisely, on the number and positions of zero entries in the matrix V ≥ 0. Of course, the number of zeros cannot be too large as V is required to be irreducible. But, is the irreducibility property sufficient for the maximum in (5.179) to exist? The answer is “no”. To see this, consider the irreducible matrix V = 01 10 , which is equal to the matrix in (1.16) with K = 2. In this case, the aggregate utility function F (p) yields F (p) = (w1 − w2 ) log(p1 /p2 ). So, if w1 > w2 , then the right-hand side tends to infinity as p2 /p1 → 0. This shows that the aggregate utility function F (p) does not need to be bounded above on the set of positive reals, and therefore a maximum in (5.179) may not exist. Even if a maximum exists, p∗ is not necessarily unique. For instance, if w1 = w2 , then the function in this toy example is identically zero. Thus, in this case, any positive power vector is log-SIR fair. In order to shed some light on the problem and to establish some connections to the results from Sect. 1.6, let us formulate the problem (5.179) in the QoS domain (see also Sect. 5.3): ω ∗ = arg max wT ω
(5.180)
ω∈Fγ
where γ(x) = ex , x ∈ R, which is the inverse function of the logarithmic function. With this choice of the function γ, we refer to this feasible QoS region Fγ as the feasible log-SIR region. Note that once the problem (5.180) is solved, we can readily obtain a solution to (5.179) by choosing p∗ to be equal to a positive right eigenvector of Γ(ω ∗ )V associated with the Perron 41
Positive multiples have no impact on F due to (C.5-22).
5.9 Additional Results for a Noiseless Case
243
root λp (ω ∗ ). By the Perron–Frobenius theorem A.32, such a positive right eigenvector exists and is unique up to positive multiples. Since the exponential function is log-convex, it follows from Corollary 1.42 that the feasible log-SIR region Fγ is a convex set. Therefore, as discussed in Sect. 5.3, the problem (5.180) reduces to finding a vector ω ∗ at the boundary of Fγ (provided that such a vector exists) where the hyperplane with the normal vector w > 0 supports the feasible log-SIR region. The maximum however does not need to exist, as the example above illustrates. Indeed, if Fγ is convex but not strictly convex in the sense of Definition 1.44, then42 (i) there may be no ω ∗ at the boundary of Fγ where the hyperplane with the normal vector w > 0 supports the feasible log-SIR region, or (ii) every point at the boundary may satisfy this condition, and thus be optimal in the sense of (5.180). Considering the simple example with two users and trace(V) = 0, we see that the case w1 = w2 corresponds to (ii), while w1 = w2 to (i), in which case the aggregate utility function F : RK ++ → R is unbounded. The both cases are illustrated in Fig. 5.16. ω2
w(1) w(2)
V=
0 1 10
Feasible log-SIR Region
ω1
Fig. 5.16: The feasible log-SIR region Fγ (the half-plane). The region is convex but not strictly convex (see Definition 1.44). Hence, it is not possible to find a point on ∂Fγ where a hyperplane with the normal vector w(1) supports Fγ . In contrast, the hyperplane with the normal vector w(2) supports Fγ at every point on ∂Fγ .
These examples could entice the reader to think that the problem is directly connected to the question whether the feasible log-SIR region Fγ is strictly convex or not. An example of a strictly convex region43 is depicted in Fig. 5.17. From this example, we see that, at least in the two-dimensional 42 43
As explained in Sect. 5.3, note that Fγ is a special case of the feasibility set. It is important to mention that, in the two-dimensional case, the self-interference on at least one link (as assumed in Fig. 5.17) is necessary for Fγ to be strictly convex. If K = 2 and trace(V) = 0, then Fγ is convex as depicted in Fig. 5.16.
244
5 Resource Allocation Problem in Communications Networks
case, the situation seen in Fig. 5.16 could not occur if the region was strictly convex. ˆ ω w ω2 ω∗
V=
c 1 1 c
, c>0
ˇ ω Feasible log-SIR Region
ω1
Fig. 5.17: The feasible log-SIR region Fγ under self-interference. In this case, the region is strictly convex, and hence a convex combination of two arbitrary points on ∂Fγ := {ω : ρ(Γ(ω)V) = 1} is an interior point of Fγ (except for the points on the boundary). Furthermore, the maximum in (5.180) exists and ω ∗ is a unique point on ∂Fγ where the hyperplane with normal vector w = (w1 , . . . , wK ) > 0 supports Fγ .
Although strict convexity of Fγ is certainly a key ingredient in the existence and uniqueness of a log-SIR fair power allocation, this property is not sufficient for this to be true in general. As V is assumed to be irreducible, Theorem 1.63 provides a necessary and sufficient condition for the feasible logSIR region to be a strictly convex set. According to this theorem, Fγ is strictly convex if and only if VVT is irreducible. As shown in Sect. 1.6, there exist irreducible matrices for which VVT is reducible and it may be easily verified that the matrix used in the example of Fig. 5.16 belongs to this class. Now, we construct an example to show that irreducibility of VVT is not sufficient for the existence of a log-SIR fair power allocation. Example 5.96. Suppose that w = 1 = (1, . . . , 1) and ⎛ ⎞ ⎛ ⎞ 010 101 V = ⎝1 0 1⎠ ⇒ VVT = ⎝0 2 1⎠ . 110 112 As both V and VVT are irreducible, Theorem 1.63 implies that Fγ is strictly convex. Furthermore, the aggregate utility function F : RK ++ → R defined by (5.179) is bounded above:
5.9 Additional Results for a Noiseless Case
245
3
pk p1 p2 p3 = log + log + log (Vp)k p2 p1 + p3 p1 + p2 k=1 p1 p2 p3 < log + log + log =0 p2 p3 p1
F (p) =
for any p > 0. A consequence of this is that supp>0 F (p) < +∞. The supremum is however not attained on RK ++ , which can be shown by contradiction: Suppose that there exists p∗ > 0 for which F (p) attains its minimum. As there are no constraints except for positivity requirements, the Kuhn–Tucker conditions (Definition B.50) imply that (with trace(V) = 0) ∂F (p) = 0, j = 1, 2, 3 ∂pj p=p∗
⇒
vk,j p∗j = 1, j = 1, 2, 3 . (Vp∗ )k k =j
Hence, with our choice of the matrix V, one obtains for j = 2 v1,2 p∗2 v3,2 p∗2 v3,2 p∗2 + = 1 + = 1. (Vp∗ )1 (Vp∗ )3 (Vp∗ )3 Now since V is irreducible and p∗ > 0, we have (Vp∗ )3 > 0. Thus, as p∗2 > 0, it follows that v3,2 = 0, which is a contradiction as v3,2 > 0. This example shows that strict convexity is not sufficient for p∗ defined by (5.179) to exist and it seems that at the time of writing this book, a necessary and sufficient condition is still an open problem. Nevertheless, the strict convexity property is useful in that it ensures a unique solution to (5.179) whenever it exists. 44 Theorem 5.97. Suppose that (C.5-28) holds. Let s = log(p), p ∈ RK ++ , and
Fe (s) := F (es ) =
k∈K
wk log
esk , w > 0. (Ves )k
Then, Fe (s) is a strictly concave function of s ∈ RK for any w > 0 if and only if VVT ≥ 0 is irreducible.
Proof. Since log-convexity is preserved under sums [16, p. 105], log( l al esl ) is a convex function of s ∈ RK for any nonnegative vector a = (a1 , . . . , aK ). As an immediate consequence of this, Fe is concave on RK so that we only need to show strict concavity. By Sect. 1.6.1, we know that if VVT ≥ 0 is ˆ, p ˇ > 0 with p ˆ = c p ˇ for all c > 0 such that (1.80) irreducible, then there are p holds for at least one k ∈ K. So, if VVT is irreducible, then, by (1.80) for some k ∈ K, there must exist ˆs, ˇs ∈ R with ˆs = c + ˇs for all c ∈ R such that 44
The component-wise logarithmic transformation is well-defined as the entries of the power vector are positive. The transformation is extensively used in Sect. 6.3 to arrive at a convex formulation of some power control problems.
246
5 Resource Allocation Problem in Communications Networks 1 1 (es( 2 ) )k = Fe s wk log 1 2 (Ves( 2 ) )k k∈K 1 (eˆs )k (eˇs )k > wk log + w log k 2 (Veˆs )k (Veˇs )k
k∈K
1 = Fe (ˆs) + Fe (ˇs) 2
k∈K
where s(μ) = (1 − μ)ˆs + μˇs, μ ∈ (0, 1). Thus, since Fe (s) is concave, we can conclude that if VVT is irreducible, then Fe (s) is strictly concave. ˜ > 0 for To prove the converse, assume that there is a weight vector w
(es )k ˜k log (Ves )k is not strictly concave. Then, there are ˆs, ˇs with which k∈K w ˆs = c + ˇs for all c ∈ R such that Fe (˜s) =
1 Fe (ˆs) + Fe (ˇs) . 2
(5.181)
where ˜s = 1/2(ˆs + ˇs). Now since ∀k∈K
(eˆs ) 12 (eˇs ) 12 (e˜s )k k k ≥ (Ve˜s )k (Veˆs )k (Veˇs )k
and
˜ >0 w
(5.182)
(5.181) implies that the inequality in (5.182) holds with equality for each k ∈ K. So, by Lemma 1.61 and Theorem 1.63, VVT is reducible. Combining Theorem 5.97 with Theorem 1.63 yields the following corollary. Corollary 5.98. Fγ is strictly convex if and only if Fe (s) is strictly concave on RK . Now assume that Fe is bounded above on RK and that its supremum is attained for some s∗ ∈ RK . Assume further that Fγ is strictly convex or, equivalently, that VVT is irreducible. Then, by the results above, s∗ is unique (up to addition of a constant). By strict monotonicity of the logarithmic func∗ tion, this implies that p∗ = es is a unique (up to positive scaling) log-SIR fair power allocation.
5.10 Proofs Proof of Observation 5.17 Given any power vector p ∈ P, the rate of link k ∈ K is equal to Φ(SIRk (p)). As Φ : R+ → R+ is a continuous and strictly increasing function (Sect. 4.3.4), a uniform increase of the rates is achieved by uniformly increasing the SIRs of the links. Now, for some sufficiently small t ≥ 0, let p(t) ≥ 0 be such that t = SIRk (p(t)) = pk (t)/(Vp(t) + z)k ,
k∈K
(5.183)
5.10 Proofs
247
which, in a matrix form, can be written as tz = (I − tV)p(t) (see Sect. 5.2.2). Due to z > 0, we have t = 0 if and only if p(t) = 0. Obviously, the zero vector is not a max-min fair power vector as 0 = maxk∈K SIRk (0) < mink∈K SIRk (p) for any p > 0, p ∈ P. When t > 0, we can rewrite (5.183) as z = (1/tI − V)p(t) > 0, from which together with Theorem A.51 we can conclude that p(t) ≥ 0 exists if and only if ρ(V) < 1/t. Moreover, in this case, p(t) > 0 is unique and given by −1 −1 z = t I − tV z, t ∈ 0, 1/ρ(V) . (5.184) p(t) = 1/tI − V By (5.183) and the above discussion, increasing t ∈ (0, 1/ρ(V)) causes a uniform increase of all rates. Moreover, considering the Neumann series (Theorem A.16) and (5.184) shows that each entry of p(t) is strictly increasing in t and pk (t) → ∞ for some k ∈ K as t → 1/ρ(V), which can be easily deduced from
Theorem A.12. Thus, there are t1 ∈ (0, 1/ρ(V)) and N1 ⊆ N such that l∈K(n) pl (t1 ) = Pn for each n ∈ N1 . Now let K1 = ∪n∈N1 K(n) ∪ J1 where J1 ⊆ K is defined as follows: We have l ∈ J1 if and only if there are k ∈ ∪n∈N1 K(n), some M = M (k, l) ∈ N, and a sequence of natural numbers {jm }M m=0 with j0 = l and jM = k such that vjr ,jr−1 > 0 for each r = 1, . . . , M . As a consequence, for any l ∈ K1c := K \ K1 and k ∈ K1 , we have vk,l = 0, implying that the transmit power of each link with an index belonging to K1c can be increased without decreasing any SIRi (p(t1 )), i ∈ K1 . Thus, if K1c = ∅ ¯ with ν¯k = Φ(SIRk (p(t1 ))) is a max-min fair or, equivalently, K1 = K, then ν rate allocation. As p(t1 ) is unique and positive, so is also the max-min fair rate allocation. Otherwise, if K1c = ∅, let us assume (without loss of generality) that K1c = {1, . . . , |K1c |} and define |Kc |×|K1c |
V = (vk,l )1≤k,l≤|K1c | ∈ R+ 1 z = zk + vk,l pl (t1 )
1≤k≤|K1c |
l∈K1
|Kc |
∈ R++1 .
(5.185)
Since the transmit powers of the links with indices in K1c have no impact on the SIR performance of the links described by K1 , the gain matrix V has the following form (under the assumption that K1c = {1, . . . , |K1c |}): |Kc |×|K1 | |K |×|K1 | X , U ∈ R+ 1 V = V0 U , X ∈ R+ 1 . (5.186) |Kc |
Using p (t) = (pk (t))1≤k≤|K1c | ∈ R++1 to denote the power vector associated with the links in K1c , it follows with (5.185) and (5.186) that (1/t1 I − V )p (t1 ) = z . This together with Theorem A.51 implies that ρ(V ) < 1/t1 or, equivalently, t1 < 1/ρ(V ). We can further conclude from Theorem A.51 that, for every t ∈ (t1 , 1/ρ(V )), there is a unique positive vector p (t) given by
248
5 Resource Allocation Problem in Communications Networks
−1 p (t) = 1/tI − V z . Now increasing t leads to a uniform increase of the rates of those links whose indices belong to the set K1c , while the rates of the links designated by K1 remain unchanged due to (5.186). Again, since each entry of p (t) is strictly increasing in t and pk (t) → ∞ as t → 1/ρ(V ) for some k ∈ K1c , there must be t2 ∈ (t1 , 1/ρ(V )) and N2 ⊆ N , N2 ∩ N1 = ∅, such that l∈K(n) pl (t2 ) = Pn , n ∈ N2 . Now we can proceed in this way until there are no links left, which is achieved because the number of links is finite and each link is subject to some power constraints (P is compact). In each step, a positive unique sub-vector of the power vector is determined so that the resulting power vector is positive ¯ . Then, ν ¯ with ν¯k = Φ(SIRk (¯ p)) is and unique. Let this vector be denoted by p unique and positive. Moreover, by Definition 5.16 (consider also the derivation ¯ is max-min fair as no rate νk can be procedure following Definition 5.16), ν increased without decreasing νl , l = k, which is smaller than or equal to νk . This completes the proof. Remark 5.99. Notice that if the gain matrix V is of the form (5.186), then, by Definition A.27, the matrix is reducible. The links with the indices in K1 (or, in other words, associated with the matrix U) perceive no interference from other links but they may cause interference to other links since X is not a zero matrix. If U is irreducible (Definition A.27), then the associated links form an entirely coupled subnetwork or, more precisely, a subnetwork entirely coupled by interference. If V is irreducible, then the network is said to be entirely coupled by interference (see also Sect. 5.9). In this case, it is easy to deduce from the proof of Observation 5.17 that all link rates are equal under max-min fairness and the corresponding max-min fair power vector is given by (5.184) for some t ∈ (0, 1/ρ(V)). Irreducibility of V is however not necessary for such a property. Indeed, if, for instance, transmit powers are subject to a sum power constraint, then the above proof shows that the max-min fair power vector is of the form (5.184) for some t ∈ (0, 1/ρ(V), regardless of the choice of V ≥ 0. Proof of Theorem 5.33 It is sufficient to focus on a strictly increasing function γ (the proof for decreasing ones proceeds analogously). Since ω ∗ in (5.57) exists and maximizes wT ω over ω ∈ Fγ (P) for some w > 0, we can conclude from Theorem B.5 ˆ ∈ Fγ (P) (see also [16, pp. 54–56]) that ω ∗ is a maximal point of Fγ (P). If ω ˆ such that is maximal, then, by Definition B.4, there is no ω ∈ Fγ (P), ω = ω, ˆ ≤ ω. Thus, (5.59), Observation 5.28 and the component-wise monotonicity ω ˆ is a boundary point ˆ ∈ ∂Fγ (P), that is, ω of p(ω) (Lemma 5.24) imply that ω of Fγ (P). This proves (i). The proof of (ii) is by contradiction. So, given any ˆ ≤ω ˇ ˆ ∈ ∂Fγ (P) is not maximal so that ω irreducible matrix V, assume that ω ˆ = ω. ˇ ∈ Fγ (P), ω ˇ Let ω(μ) = (1−μ)ω ˆ +μω ˇ and note that as Fγ (P) for some ω is convex (by Theorem 5.32), p(ω(μ)) > 0 exists and p(ω(μ)) ∈ P, μ ∈ (0, 1).
5.10 Proofs
249
ˆ ≤ ω(μ) ≤ ω, ˇ μ ∈ (0, 1), Lemma 5.24 Moreover, since ω
and (5.59) show ˆ ≤ k∈K(n) pk (ω(μ)) ≤ that there is n ∈ N such that Pn = k∈K(n) pk (ω)
ˇ ≤ Pn , μ ∈ (0, 1), where the last inequality is due to p(ω) ˇ ∈ P. k∈K(n) pk (ω) Combining this with Theorem 2.15, which states that pk (ω) is strictly logconvex if V is irreducible, implies that, for all μ ∈ (0, 1), ˆ +μ ˇ pk (ω(μ)) < (1 − μ) pk (ω) pk (ω) Pn = k∈K(n)
k∈K(n)
k∈K(n)
≤ (1 − μ)Pn + μPn = Pn . This contradicts the assumption and hence proves the theorem. Proof of Theorem 5.35 Again, we can confine our attention to a strictly increasing function γ. If ω maximizes x → wT x over Fγ (P) for some given w > 0, then, by Theorem B.5, ω is maximal and then, by (i) of Theorem 5.33, ω ∈ ∂Fγ (P). To show the converse, let ω ∈ ∂Fγ (P) be arbitrary so that, by Theorem 5.33, ω is maximal, and hence, by convexity of Fγ (P) (Theorem 5.32) and Theorem B.5, it maximizes x → wT x over Fγ (P) for some w ≥ 0, w = 0. So it remains to show that w must in fact be positive. Let us assume by contradiction that there is j ∈ K such that wj = 0, and let ω() = (ω1 , . . . , ωj−1 , ωj − , ωj+1 , . . . , ωK ) for some sufficiently small > 0 so that ω() ∈ QK (such exists as Q ⊆ R is an open set). Clearly, we have ω() ≤ ω, and hence ω() ∈ Fγ (P) due to the downward comprehensivity of Fγ (P) (Observation 5.30). Moreover, as wT ω = wT ω() holds, ω() maximizes x → wT x over Fγ (P). This implies that ω() ∈ ∂Fγ (P) since otherwise we could increase the value of wT ω() by increasing the entries of ω() corresponding to the positive weights. Note that this immediately follows from the definition of ∂Fγ (P) in (5.59), Observation 5.28 and the component-wise monotonicity of p(ω) (Lemma 5.24). So, as ω() ∈ ∂Fγ (P) and V is irreducible, (ii) of Theorem 5.33 shows that ω() is maximal. Thus, by Definition B.4 and ω() ≤ ω, we have ω() = ω, which is a contradiction. Proof of Theorem 5.53 ˜ ≥ 0 be arbitrary and let {p(n)}n∈N0 with p(n) ≥ 0 be any seLet p ˜ . Define the sequence p(n) ≥ 0 as pk (n) = quence so that limn→∞ p(n) = p inf l≥n pk (l), k ∈ K. Obviously, we have pk (n) = 0 for all n ∈ N0 if p˜k = 0. Otherwise, if p˜k > 0, then there exists n0 such that pk (n) > 0 for all n ≥ n0 . Thus, ˜ ≤ α(n)p(n). for every n ≥ n0 , there exists a constant α(n) ≥ 1 such that p So, as p(n) ≤ p(n), it follows from axioms A1–A3 of Definition 5.51 that +∞ >
Ik (p(n)) Ik (p(n)) Ik (p(n)) 1 Ik (p(n)) ≥ ≥ ≥ = >0 Ik (˜ p) Ik (˜ p) Ik (α(n)p(n)) α(n)Ik (p(n)) α(n)
250
5 Resource Allocation Problem in Communications Networks
˜ (n) > 0, n ∈ N0 , be p(n)}n∈N0 with p for all n ≥ n0 . On the other hand, let {˜ ˜ (n) > p ˜ (n + 1) for all n ∈ N0 and limn→∞ p ˜ (n) = p ˜. any sequence such that p 1 ˜ (n) = p ˜ + n+1 1, n ∈ N0 . Now due to An example of such a sequence is p ˜ (n), for every n ∈ N0 , there must exist a constant β(n) ≥ 1 positivity of p such that p(n) ≤ β(n)˜ p(n). Thus, axioms A1–A3 of Definition 5.51 imply that 0<
Ik (β(n)˜ Ik (˜ p(n)) p(n)) Ik (p(n)) ≤ ≤ β(n) < +∞ Ik (˜ p) Ik (˜ p) Ik (˜ p)
˜ (n) > p ˜ (n + 1) for all n ∈ N0 , it follows from axioms for all n ∈ N0 . As p p(0)) ≥ Ik (˜ p(1)) ≥ · · · ≥ Ik (˜ p) > 0, and hence A2-A3 that Ik (˜ lim
n→∞
Ik (˜ p(n)) = 1. Ik (˜ p)
(5.187)
˜ , we have α(n) → 1 and β(n) → 1 as n → ∞. So, since p(n) converges to p Combining this and (5.187) with the lower and upper bounds yields lim
n→∞
Ik (p(n)) =1 Ik (˜ p)
from which the theorem follows. Proof of Theorem 5.59 We use the same techniques as in [55]. Let p(0) ≥ 0 be arbitrary, and let p > 0 be the fixed point of I so that p = I(p ). If CI (ω) < 1, it follows from Theorem 5.54 that PI (ω) = 0, and hence, by Theorem 5.57, p exists and is equal to the (unique) minimum valid power vector p. As p is positive and p(0) nonnegative, there is a constant μ ≥ 1 such that p(0) :=
1 μ p(0)
≤ p(0) ≤ p(0) := μp
and
p(0) ≤ p .
Thus, as p is the minimum point of PI (ω), one has SIRk (p(0)) ≤ γk for each k ∈ K. This together with (5.90) implies that p(0) ≤ p(1). Moreover, A3 implies that p = I(p ) ≥ I(p(0)) = p(1) so that p(1) ≤ p . Now let n ∈ N be arbitrary and assume that p(n) ≤ p . Again, an application of (5.90) and A3 shows that p(n) ≤ p(n + 1) and p = I(p ) ≥ I(p(n)) = p(n + 1), from which we obtain p(n) ≤ p(n + 1) ≤ p . So, the sequence {p(n)}n∈N0 is non-decreasing and bounded by p , and therefore, by (5.91), it must converge to a fixed point of I as n → ∞. By Theorem 5.57, the fixed point p is unique so that p(n) → p as n → ∞. The sequence is component-wise strictly increasing to the minimum valid power vector since, for each k ∈ K, we have pk (n) < pk (n + 1), unless pk (n) = pk . On the other hand, A2 implies that p(0) = μp = μI(p ) ≥ I(μp ) = I(p(0)) = p(1), μ ≥ 1. Furthermore, since p ≤ p(0), it follows from A3 that
5.10 Proofs
251
p = I(p ) ≤ I(p(0)) = p(1). Now let n ∈ N be arbitrary and assume that p(n) = μp for some μ > 1. Repeating the same arguments (with n instead of 0) shows that p(n) ≥ p(n + 1) and p ≤ p(n + 1). Thus, the sequence {p(n)}n∈N0 is non-increasing and bounded below by p . So, by (5.91) and Theorem 5.57, it converges to p , which is the unique fixed point of I. By (5.90), the sequence is component-wise strictly decreasing to the minimum valid power vector since pk (n) > pk (n + 1) for each k ∈ K, unless pk (n) = pk . Now since p(n) ≤ p(n) ≤ p(n) for every n ∈ N0 , axiom A3 implies that I(p(n)) ≤ I(p(n)) ≤ I(p(n)), n ∈ N0 . Hence, by continuity of I (Theorem 5.53), one obtains lim I(p(n)) = lim I(p(n)) = lim I(p(n)) = p .
n→∞
n→∞
n→∞
If p(0) ≤ p , we can define p(n) = p(n), n ∈ N0 , so that (i) follows from strict increasingness of {p(n)}n∈N0 . Similarly, if p ∈ PI (ω), then (ii) follows from strict decreasingness of p(n) = p(n), n ∈ N0 . This completes the proof. Proof of Lemma 5.65 ¯ is obvious as we have mink∈K SIRk (Pt /K1) > 0 and The positivity of p mink∈K SIRk (p) = 0 for any nonpositive vector p. Also, (ii) should be obp1 < Pt , then it would be possible to increase vious since if we had ¯ p1 . p)/γk by allocating the power vector c¯ p ∈ P with c = Pt /¯ mink∈K SIRk (¯ In contrast, (iii) might not be so obvious. The proof is by contradiction, and ˜ ⊂ K, K ˜ = ∅, and > 0 such that hence assume that there are K min
˜c k∈K
SIRk (¯ p) SIRj (¯ p) SIRj (¯ p) − = min = min ˜ j∈K γk γ γ j∈K j j
˜ c = K \ K. ˜ We define p ˜ (s, t) as where K ˜c Pt s · t · p¯k k ∈ K
t ∈ T(s) = 1,
p˜k (s, t) := ˜ s k∈K˜ c p¯k + k∈K˜ p¯k t · p¯k k∈K ˜ (s, t) > 0, p ˜ (1, 1) = p ¯ where s ∈ (0, 1]. By construction, (i) and (ii), we have p ˜ (s, t) ∈ P for all t ∈ T(s) and s ∈ (0, 1]. Moreover, for any s ∈ (0, 1], one and p has SIRk (˜ p(s, 1)) SIRk (¯ p) SIRj (¯ p) ≥ min = min . min ˜ ˜ j∈K γ γ γ k∈K k∈K k k j ¯ ), (b) Thus, since (a) int(T(s)) = ∅ for any s ∈ (0, 1) (by positivity of p p → mink∈K (SIRk (p)/γk ) is continuous on RK + (Observation 5.47) and (c), p(s, t))/γk ) is strictly for any fixed s ∈ (0, 1], the function t → mink∈K (SIRk (˜ increasing in t ≥ 1, we can conclude that there exists s() ∈ (0, 1) (sufficiently close to one) and t() ∈ T(s()), t() > 1, such that
252
5 Resource Allocation Problem in Communications Networks
min k∈K
SIRk (¯ p) SIRk (˜ p(s(), 1)) SIRk (˜ p(s(), t())) ≤ min < min k∈K k∈K γk γk γk
˜ (s(), t()) ∈ P. This, however, contradicts (5.106) and completes the where p proof. Proof of Theorem 5.68 ¯ > 0, considering Lemmas 5.66 and 5.67 shows that (i) implies (ii). Now As p let us prove (ii)→(iii). An examination of (5.111) shows that ρ(B)¯ p = B¯ p ¯ = 1 is equivalent to ρ(B)¯ with 1T p p = ΓV¯ p + Γz, which in turn can be ˜ = (¯ ¯ is positive, rewritten to give (5.109) with β = ρ(B) and p p, 1). Since p ˜ . Thus, p ˜ with p˜K+1 = 1 is a positive right eigenvector of A so is also p and the associated eigenvalue is equal to ρ(B) > 0. It remains to show that ρ(A) = ρ(B) and that there are no positive eigenvectors of A other than ˜ . Since A is nonnegative, it follows from Theorem A.39 that there exists p ˜ = a · (u, c) ∈ RK+1 u with u ∈ RK u = A˜ u + and c = c(u) ≥ 0 such that ρ(A)˜ + for all a > 0. So, by (5.110), one obtains ρ(A)u = ΓVu + Γz · c 1 1 T 1 ΓVu + 1T Γz · c . ρ(A)c = Pt Pt Multiplying both sides of the first equation by 1T and comparing the result to the second equation show that c = c(u) = 1/Pt 1T u. Putting this into the first equation yields ρ(A)u = Bu, from which it follows that au ≥ 0 is a nonnegative eigenvector of B associated with ρ(A) ≥ 0 for any constant a > 0. From (5.112), we see that B is positive, and hence also irreducible. Part (v) of Theorem A.32 (see also Remark A.34) implies then that ρ(A) = ρ(B). It further implies that there are no nonnegative eigenvectors of B other than ¯ > 0 and its positive multiples au, a > 0. So, choosing a > 0 such that u=p ˜ = (u, c) = (¯ p, 1) is a unique positive right c = 1/Pt 1T u = 1 shows that p eigenvector of A associated with ρ(A) > 0 such that p˜K+1 = 1. (iii)→(i): The proof of the previous implication shows that there exists exactly one ˜ ∈ RK+1 ˜K+1 = 1 such that (5.109) holds. Furthermore, positive vector p ++ with p β = ρ(A) is a simple eigenvalue of A. Thus, by Lemma 5.66, (iii) must imply (i) since otherwise the max-min SIR-balanced power vector could not satisfy (5.109). This completes the proof. Proof of Lemma 5.72 Let n ∈ N be arbitrary. First we prove part (i). Since 1/Pn z1Tn ≥ 0 and V is irreducible, we can conclude from (5.123) that B(n) ≥ 0 is irreducible as well. Thus, by the Perron–Frobenius theorem for irreducible matrices (Theorem A.32), there exists a positive vector p which is an eigenvector of B(n)
5.10 Proofs
253
associated with ρ(B(n) ), and there are no nonnegative eigenvectors of B(n) associated with ρ(B(n) ) other than p and its positive multiples. Among all the positive eigenvectors, there is exactly one eigenvector p > 0 such that gn (p) = c1 . This proves part (i). In order to prove (ii), note that if A(n) was irreducible, then we could invoke Theorem A.32 and proceed essentially as in part (i) to conclude (ii) (with the uniqueness property resulting from the normalization of the eigenvector so that its last component is equal to c2 > 0). In order to show that A(n) is irreducible, let G(A(n) ) be the associated directed graph of {1, . . . , K + 1} nodes (see the definition preceding Definition A.29). Since ΓV is irreducible, it follows from Observation A.30 that the subgraph G(ΓV) is strongly connected. Furthermore, as the vector Γz is positive, we can conclude from (5.121) that there is a directed edge leading from node K + 1 to each node n < K + 1 belonging to the subgraph G(ΓV). Finally, note that as ΓV is irreducible, each row of ΓV has at least one positive entry. Hence, the vector 1/Pn 1Tn ΓV must have at least one positive entry. From this and (5.121), we can conclude that there is a directed edge leading from a node belonging to G(ΓV) to node K + 1. So, G(A(n) ) is strongly connected and, by Observation A.30, A(n) is irreducible. This proves the lemma. Proof of Theorem 5.74 Because of (5.125), we can confine our attention to the matrices B(n) , n ∈ / N0 be arbitrary. Rearranging (5.127) shows that N . Let j ∈ N0 and m ∈ (n) (n) ∗ ¯ (n) ¯ > 0, we ¯ 1/tn p = B p with gn (¯ p(n) ) = 1. Thus, by Lemma 5.72 and p (n) ∗ have 0 < ρ(B ) = 1/tn for each n ∈ N and, by Theorem 5.73, ρ(B(j) ) = 1/t∗ where t∗ = arg max t subject to t ∈ T = ∩n∈N Tn . Equivalently, we can write t∗ = minn∈N t∗n , implying that t∗ ≤ t∗n for each n ∈ N . In order to conclude (5.128), it remains to show that t∗ = t∗j < t∗m . To this end, note that since ¯ so that gm (¯ p ) < Pm . m∈ / N0 , the mth power constraint is inactive under p Thus, as V is irreducible, Observation 5.64 and Theorem 5.73 imply that ¯ = 1Tm p ¯ (t∗ ) = 1Tm (1/t∗ I−ΓV)−1 Γz < Pm . On the other hand, by (5.127), 1Tm p ¯ (m) = 1Tm (1/t∗m I − ΓV)−1 Γz = Pm . Thus, considering the we must have 1Tm p Neumann series (Theorem A.16) yields t∗ 1Tm
∞
(t∗ ΓV)l Γz < t∗m 1Tm
l=0
∞
(t∗m ΓV)l Γz
l=0
where the both series converge due to (5.126). Consequently, we have 0<
1 ρ(B
(j)
)
= t ∗ < tm =
1 ρ(B(m) )
< +∞
⇔
ρ(B(j) ) > ρ(B(m) ) .
This proves (5.128) and (5.129) is an immediate consequence of 1/t∗ = p)/γk , k ∈ K. The proof is complete. SIRk (¯
254
5 Resource Allocation Problem in Communications Networks
Proof of Theorem 5.78 If, for any > 0, there is α() ≥ 1 such that (5.149) holds, then, by Definition 5.42, ω is feasible. So, we only need to prove the converse. To this end, let gk (p) = SIRk (p)/γk , k ∈ K. By Observation 5.9, we can confine our attention any p ∈ P+ , we define K(p) := k ∈ to positive power vectors in P+ . For K : gk (p) = minl∈K gl (p) and Kc (p) = K \ K(p). Since multiplying the aggregate utility function by any positive constant has no impact on the set
of maximizers, without loss of generality, we can assume that k∈K(p) wk = 1 for all α ≥ 2 and any p ≥ 0. Moreover, for brevity, we consider ψα (x) = −Ψα (x), x > 0, and fα (p) = −Fα (p). So, by strict monotonicity of ψα and > 0 for every k ∈ K, p ∈ P+ , and α ≥ 2, we have the fact that wk ψα (gk (p))
(for any p ∈ P+ and with k∈K(p) wk = 1) Rα (p) =
fα (p) = 1+ ψα (gk0 (p))
k∈Kc (p)
wk
g (p) α−1 k0 ≥1 gk (p)
where k0 ∈ K(p) is arbitrary. As gk0 (p) < gk (p) for all k ∈ Kc (p), we see p) → 1 that Rα (p) → 1 as α → ∞ for any p ∈ P+ . In particular, we have Rα (¯ ¯ given by (5.69). On the as α → ∞ for any max-min SIR power allocation p other hand, we have minp∈P+ fα (p) p) fα (¯ ≥ maxk∈K ψα (gk (¯ p)) maxk∈K ψα (gk (¯ p)) fα (p(α)) = minp∈P+ maxk∈K ψα (gk (p)) maxk∈K ψα (gk (p(α)))Rα (p(α)) ≥ Rα (p(α)) ≥ 1 = minp∈P+ maxk∈K ψα (gk (p))
Rα (¯ p) =
¯ minimizes maxk∈K ψα (gk (p)) over P+ . Comwhere we used the fact that p bining this with limα→∞ Rα (¯ p) = 1 shows that maxk∈K ψα (gk (p(α))) → 1 as α → ∞ . maxk∈K ψα (gk (¯ p)) ¯ given by (5.100) such that p(α) − p ¯ 2 → 0 as α → ∞. Hence, there is p → R : p → 1/C(ω, p) with C(ω, p) > 0 defined This and continuity of RK + + ¯ ) − 1/C(ω, p(α)) = 0. by (5.70) (Observation 5.47) imply limα→∞ 1/C(ω, p This in turn implies that for every > 0, there exists α() ≥ 2 such that 1/C(ω, p ¯ ) − 1/C(ω, p(α)) = 1/C(ω) − 1/C(ω, p(α)) ≤ for all α ≥ α(), where we used (5.69). Thus, 1/C(ω, p(α)) = mink∈K SIRk (p(α))/γk ≥ 1/C(ω) − for all > 0 and α ≥ α(), and the theorem follows from Observation 5.48.
5.10 Proofs
255
Proof of Theorem 5.82 Without loss of generality, the weight vectors a and b are assumed to be a = b = 1. Furthermore, assume that α ≥ 2. First, we prove (5.155) by contradiction. Thus, suppose that there are 1 ∈ (0, 1) and A0 ⊆ A such that p(α))/γk < 1 − 1 for all α ≥ 2. One obtains then 0 < a := maxk∈A0 SIRk (˜ F˜α (˜ p(α)) ≤ |A0 |Ψα (a) +
k∈A\A0
Ψα
SIR (˜ k p(α)) + Ψ SIRk (˜ p(α)) γk k∈B
= |A0 |Ψα (a) + φ(˜ p(α)) < |A0 |Ψα (1 − 1 ) + φ(˜ p(α)), α ≥ 2 (5.188) p(α)) is where the inequalities are due to strict increasingness of Ψα and φ(˜ bounded for all α ≥ 2 as P◦+ (ω) = ∅ (by Lemma 5.81). Now observe that, as α → ∞, we have Ψα (x) → −∞, x ∈ (0, 1), and Ψα (x) → 0, x ≥ 1. Moreover, F˜α (p) is bounded below for any p ∈ P◦+ (ω) by a constant that is independent of α. This together with (5.188) implies that F˜α (˜ p(α)) < |A0 |Ψα (1 − 1 ) + φ(˜ p(α)) ≤ F˜α (p), p ∈ P◦+ (ω) = ∅ ˜ (α) for sufficiently large values of α. This, however, contradicts the fact that p maximizes F˜α (p) over P+ ⊂ P (with P◦+ (ω) ⊂ P+ ), and thus proves (5.155). In order to prove (5.154), we first note that this bound trivially holds if ˜ (α) is a positive P◦ (ω) = ∅ but also if P◦ (ω) = ∅ with P◦+ (ω) = ∅ since p vector. So, in what follows, assume that P◦+ (ω) = ∅. By Observation 5.79, the ˜ (α) to be a global Kuhn–Tucker conditions are necessary and sufficient for p maximizer of F˜α (p). So, for any fixed α ≥ 2, it follows from (5.42) that −1 u(α) = I − VT D(α) t(α) −1 ˜ (α) = I − D(α)V D(α)z ∈ P+ ⊂ P . p The notation in (5.189) is defined as follows ˜ (α) = diag(d1 (α), . . . , dK (α)) where (a) D(α) := D p ˜ (α) , k ∈ K . dk (α) = SIRk p
(5.189)
(5.190)
Note that (5.153) implies that dk (α) > 0 for each k ∈ K regardless of the choice of α ≥ 2, and hence D(α) is positive definite for all α ≥ 2. ˜ (α) = (u1 (α), . . . , uK (α)) > 0 with (b) u(α) := u p ⎧ ⎪ p(α)) k ∈A\B ⎨φα,k (˜ uk (α) = φα,k (˜ (5.191) p(α)) + θk (˜ p(α)) k ∈ A ∩ B ⎪ ⎩ p(α)) k ∈ B \ A. θk (˜ where
256
5 Resource Allocation Problem in Communications Networks
φα,k (p) =
α−1 γk > 0, SIRk p
α≥2
θk (p) = Ψ (SIRk (p))/Ik (p) > 0 . ˜ (α) = (t1 (α), . . . , tK (α)) ≥ 0 depends on the Lagrangian (c) t(α) := t p multipliers and results from the power constraints p ∈ P+ (inequality constraints). Note that we must have t(α) = 0 since otherwise there would be no positive solution u(α) > 0 to (5.189), regardless of the choice of V. However, by Theorem A.52 in the appendix, we see that there exists a positive vector u(α) satisfying (5.189) if and only if ρ(D(α)V) = ρ(VT D(α)) < 1, which is satisfied due to the second equality in (5.189) ˜ (α) > 0. and the existence of p Since D(α) is positive definite for all α ≥ 2 and V is irreducible, the matrix VT D(α), α ≥ 2, is irreducible as well. Let y(α) and x(α) with y(α)T x(α) = 1 be left and positive right eigenvectors of VT D(α) associated with ρ(VT D(α)) > 0, respectively. Then, by Theorem A.56 in the appendix and the fact that ρ(VT D(α)) = ρ(D(α)V), the first equality of (5.189) can be written as u(α) =
1 Z(α)t(α) + R(α)t(α) 1 − ρ(D(α)V)
(5.192)
where Z(α) = x(α)y(α)T is positive and R(α) ∈ RK×K follows from (A.46). ˜ (α) ∈ P+ belongs to a bounded Since, for all α ≥ 2, ρ(VT D(α)) < 1 and p subset of RK ++ , it follows from the second equality in (5.189) and Theorem A.60 that there exists a constant c1 ∈ (0, 1) independent of α such that 1 ≤ 1/(1 − ρ(D(α)V)) ≤ 1/c1 < +∞ for all α ≥ 2. This and (A.46) further show that R(α) is bounded (in any matrix norm) for all α ≥ 2. From this and (5.192), we then have u(α)1 ≤ Z(α)t(α)1 /c1 + R(α)t(α)1 . By (5.153) and (5.191), the left-hand side is bounded (consider uk (α) for any k ∈ B = ∅). So, by the discussion above and the fact that induced matrix norms are compatible with their underlying vector norms, one obtains t(α)1 ≥
u(α)1 ≥ c2 > 0 . Z(α)1 /c1 + R(α)1
Thus, t(α)1 is bounded away from zero for all α ≥ 2. On the other hand, the Neumann series u(α) = (I − VT D(α))−1 t(α) =
∞ considering T j j=0 (V D(α)) t(α), we obtain uk (α) =
∞ j=0 l∈K
(j)
ak,l (α)tl (α)
(5.193)
5.10 Proofs
257
(j)
where ak,l (α) = (Aj (α))k,l and A(α) = (ak,l (α)) = (VT D(α)). Now assume p(α))/γk0 > 1 + 2 that there exists k0 ∈ A \ B and 2 > 0 such that SIRk0 (˜ for all α ≥ 2, which contradicts (5.154). This is equivalent to saying that ∀α≥2 0 < b(α) := γk0 /SIRk0 (˜ p(α)) < 1 − 3 , k0 ∈ A \ B for some 3 = 2 /(1 + 2 ) ∈ (0, 1). Consequently, if α goes to infinity, (5.191) implies that the left-hand side of (5.193) with k = k0 ∈ A \ B tends to zero. However, the right-hand side of (5.193) is bounded away from zero by some constant for all α ≥ 2, and hence (5.154) must hold. To see that the right-hand side of (5.193) cannot be arbitrarily close to zero, let tl0 (α) = maxl tl (α) ≥ t(α)1 /K ≥ c3 := c2 /K > 0 and note that since VT D(α) is irreducible, Lemma A.28 implies that, for each pair (k, l), 1 ≤ l, k ≤ K, there is a finite (j) j = j(k, l) ≥ 0 such that ak,l (α) > 0. Moreover, since the SIRs cannot be arbitrarily small in the minimum due to Lemma 5.81, there exists some j ≥ 0 (j) and c4 > 0 such that ak0 ,l0 (α) ≥ c4 , from which we have uk0 (α) ≥ c3 · c4 > 0 for all α ≥ 2. This completes the proof.
6 Power Control Algorithms
6.1 Introduction This chapter presents algorithmic solutions to the power control problems as stated in the previous chapter. We primarily focus on utility-based power control algorithms with and without QoS support. First, we consider recursive gradient-based algorithms with a constant step size [160, 16]. Although much more powerful algorithms can be devised to solve the problem, such methods are of great interest in practice because of their simplicity. The significance of simple iterative algorithms that allow an efficient distributed implementation cannot be emphasized enough in the case of wireless networks where the judicious assessment of the complexity–performance trade-off is particularly important. Given the limited and costly nature of wireless resources, minimizing the control message overhead for each iteration step must be a high priority. In case of gradient-based algorithms, one of the major challenges is the computation of the gradient vector of the aggregate utility function in a distributed manner. In general, due to the mutual dependence of links, this computation involves coordination and exchange of global information between all network nodes. Therefore, the use of classical flooding protocols to exchange this information results in a relatively high cost in terms of wireless resources (see also Sect. 6.6.4). In this book, we present a scheme based on the use of so-called adjoint network to efficiently distribute some locally measurable quantities to all other transmitters. A network is said to be adjoint to a given (primal) network with the gain matrix V if it has the same network topology and its gain matrix is VT . The overall scheme may be referred to as cooperative flooding as, in a broad sense, nodes cooperate by transmitting its local information to other nodes. More precisely, instead of each node sending its message separately as in the case of classical flooding protocols, nodes cooperate by transmitting simultaneously over the adjoint network in such a way that each node can estimate its gradient component based on some local measurements. S. Stanczak et al., Fundamentals of Resource Allocation in Wireless Networks, Foundations in Signal Processing, Communications and Networking 3, 2nd edn., c Springer-Verlag Berlin Heidelberg 2009 DOI 10.1007/978-3-540-79386-1 6,
262
6 Power Control Algorithms
Gradient projection methods belong to so-called primal optimization methods. These methods work purely on the optimization variable itself. In contrast, iterative optimization methods that operate on both the primal and dual variables of an associated Lagrangian function are referred to as primaldual methods. Sect. 6.7.1 deals with a primal-dual algorithm for finding a stationary point of some Lagrangian function for a utility-based power control problem with QoS support. As mentioned in Sect. 5.7.1, the main reason for considering a primal-dual algorithm here is to alleviate an inherent projection problem so that the scheme is amenable to distributed implementation. However, the potential of the Lagrangian optimization approach, with its key concept of Lagrangian duality and the possibility of incorporating the dual variables into the iteration is not fully exploited until Sect. 6.8, where we focus on two primal-dual power control iterations that rely on a certain construction of a generalized nonlinear Lagrangian function. The proposed Lagrangian and the related algorithmic solutions fall into a large framework of generalized Lagrangian optimization. In particular, we will show how primal-dual methods can provide quadratic convergence, while still being allowing for efficient implementation in decentralized wireless networks. The distributed scheme of the considered primal-dual algorithms relies, again, on the concept of the adjoint network as in the case of the gradient projection methods.
6.2 Some Basic Definitions Throughout this chapter, the utility function is assumed to have the form given by (5.23) where Ψ : R++ → Q fulfills (C.5-2)–(C.5-4). The weight vector w is any fixed vector satisfying (C.5-1) (positivity). As before, the transmit powers are assumed to be subject to individual power constraints on each node (4.18). In order to conform to the classical formulation as a minimization problem, we define ψ : R++ → Q to be (see also (5.48))1 ψ(x) := −Ψ (x), Thus, using F (p) :=
x ∈ R++ .
wk ψ(SIRk (p)),
∀k∈K wk > 0
(6.1) (6.2)
k∈K
with the signal-to-interference ratio (SIR) defined by (4.4), the power control problem in (5.31) can be rewritten in an equivalent form as p∗ = arg min F (p) .
(6.3)
p∈P
Note that due to Lemma 5.12 and ψ(x) = −Ψ (x), the minimum exists. Throughout this chapter, it is assumed that 1
The reader is also referred to Sect. 5.3 for definitions and other interpretations of the functions Ψ (x) and ψ(x) = −Ψ (x)
6.2 Some Basic Definitions
trace(V) = 0
263
(6.4)
which implies that there is no self-interference on each link (see also Sect. 5.4). However, we point out that this requirement does not impact the generality of the analysis and could be easily dropped. It is convenient to define the interference function I(p) > 0 as Ik (p) := (Vp + z)k = vk,l pl + zk = vk,l pl + zk , k ∈ K (6.5) l∈K
l∈Kk
where the last equality follows from (6.4). Hence, pk . SIRk (p) = Ik (p)
(6.6)
For completeness, the list below summarizes key properties of the function ψ, which are an immediate consequence of (6.1) and (C.5-2)–(C.5-4). (C.6-1) ψ : R++ → Q is a twice continuously differentiable and strictly decreasing function. (C.6-2) We have lim ψ(x) := +∞
x→0
⇒
lim ψ (x) = lim
x→0
x→0
dψ (x) = −∞ . dx
(6.7)
This requirement guarantees that p∗ given by (6.3) is positive. (C.6-3) ψe (x) := ψ(ex ) is convex on R. Since ψ is twice continuously differentiable and ex > 0 is strictly monotonic on R, Theorem B.24 implies that this is equivalent to d2 ψe (x) ≥ 0, x ∈ R . (6.8) dx2 It is worth pointing out that the last condition implies that ψ is strictly convex. To see this, let x ˆ, x ˇ ∈ R with x ˆ = x ˇ be arbitrary, and let x(μ) = (1 − μ)ˆ x + μˇ x, μ ∈ (0, 1), be their convex combination. By convexity of ψe , we have ψe (x(μ)) = ψ(ex(μ) ) ≤ (1 − μ)ψ(exˆ ) + μψ(exˇ ) ψe (x) =
for all μ ∈ [0, 1]. On the other hand, it is a well-known fact that the arithmetic mean bounds above the geometric one (B.18). Hence, we have 1−μ xˇ μ e (6.9) ≤ (1 − μ) exˆ + μ exˇ ex(μ) = exˆ for all μ ∈ (0, 1). Equality holds if and only if exˆ = exˇ or, equivalently, if and only if x ˆ=x ˇ. Thus, combining this and (6.9) with the previous inequality as well as taking into account that ψ is strictly decreasing yields ψ((1 − μ)ˆ z + μˇ z ) < (1 − μ)ψ(ˆ z ) + μψ(ˇ z ),
zˆ, zˇ > 0, zˆ = zˇ
for all μ ∈ (0, 1) where we used zˆ = exˆ > 0 and zˇ = exˇ > 0. This proves the claim. This observation, however, should not tempt the reader to conclude that F (p) is a convex function of p.
264
6 Power Control Algorithms
Remark 6.1. We point out that all the results presented in this chapter can be straightforwardly extended to the case where different links are assigned different utility functions ψk : R++ → Qk ⊆ R, provided that each ψk satisfies (C.6-1)–(C.6-3).
6.3 Convex Statement of the Problem The main objective of this section is to show that the power control problem in (6.3) can be transformed into a convex problem, provided that (C.6-1)–(C.6-3) are fulfilled. A key ingredient in this formulation is the fact that SIRk (es ) is a log-concave function of the logarithmic power vector [161] s := log(p) ,
p ∈ P+ = P ∩ RK ++
(6.10)
where the logarithm is taken elementwise. There is no loss in generality in assuming that p ∈ P+ (a positive vector) since, in the minimum, every link must be assigned a positive transmit power (see (6.7) and Observation 5.9). Thus, we have min F (p) = inf F (p) = min F (p) . p∈P
p∈P+
p∈P+
By strict monotonicity of the logarithm function, we see that every p ∈ P+ is associated with a unique s ∈ S where S ⊂ RK is the set of all admissible logarithmic transmit powers, that is, S := {s ∈ RK : s = log(p), p ∈ P+ } = {s ∈ RK : es ≤ p for some p ∈ P}
(6.11)
where the last equality is due to the fact that P (and hence also S) is downward comprehensive (Remark 4.13). Following the terminology used for P, we refer to S as the admissible power region. Consequently, if F (p) attains its minimum on P+ , for some p∗ , then equivalently, Fe (s) := F (es ) = wk ψ(SIRk (es )) (6.12) k∈K
attains its minimum on S for s∗ = log(p∗ ). Lemma 6.2. Let s(μ) := (1 − μ)ˆs + μˇs, μ ∈ (0, 1). Then, SIRk (es(μ) ) ≥ SIRk (eˆs )1−μ SIRk (eˇs )μ , for all ˆs, ˇs ∈ RK and μ ∈ [0, 1].
1≤k≤K
(6.13)
6.3 Convex Statement of the Problem
265
Proof. By H¨older’s inequality (Theorem A.4), vk,l esl (μ) + zk = (vk,l esˆl )1−μ (vk,l esˇl )μ + zk1−μ zkμ Ik (es(μ) ) = l∈Kk
≤
l =k
vk,l esˆl + zk
1−μ
l∈Kk
μ vk,l esˇl + zk
= Ik (eˆs )1−μ Ik (eˇs )μ
l∈Kk
for all μ ∈ [0, 1]. Thus, considering (6.6) yields SIRk (es(μ) ) ≥
e(1−μ)ˆsk +μˇsk (esˆk )1−μ (esˇk )μ = = SIRk (eˆs )1−μ SIRk (eˇs )μ Ik (eˆs )1−μ Ik (eˇs )μ Ik (eˆs )1−μ Ik (eˇs )μ
which completes the proof. An immediate consequence of Lemma 6.2 is that the logarithmic SIR hk (s) := log(SIRk (es )),
1≤k≤K
(6.14)
is a concave function of s ∈ S. Now we use this result to show that Fe (s) is convex on RK [127, 128]. Theorem 6.3. Fe (s) is convex on RK , i.e., we have Fe (s(μ)) ≤ (1 − μ)Fe (ˆs) + μFe (ˇs)
(6.15)
for all ˆs, ˇs ∈ RK and μ ∈ (0, 1). Proof. Let ˆs, ˇs ∈ RK with ˆs = ˇs be arbitrary. For all μ ∈ (0, 1), we have wk ψ(SIRk (es(μ) )) Fe (s(μ)) = k∈K (a)
≤
wk ψ SIRk (eˆs )1−μ SIRk (eˇs )μ
k∈K
=
k∈K
=
wk ψ e(1−μ)hk (ˆs)+μhk (ˇs) wk ψe (1 − μ)hk (ˆs) + μhk (ˇs)
k∈K (b)
≤
(6.16)
wk (1 − μ)ψe hk (ˆs) + μψe hk (ˇs)
k∈K
= (1 − μ)Fe (ˆs) + μFe (ˇs) . While inequality (a) follows from Lemma 6.2 and strict monotonicity of ψ (the function is strictly decreasing), inequality (b) is due to convexity of ψe (x) = ψ(ex ).
266
6 Power Control Algorithms
So, Fe is convex on S ⊂ RK , and therefore, with (obvious) convexity of S, we arrive at an equivalent convex formulation of the power control problem in (6.3). Corollary 6.4. Suppose that (C.6-1)–(C.6-3) hold, and let s = log(p) be the logarithmic power vector. Then, the power control problem s∗ = arg minF (es ) = arg minFe (s) s∈S
s∈S
(6.17)
is a convex optimization problem. These results establish a strong connection to the results in the first part of the book. Indeed, by Theorem B.36, we see that ψe is convex on R (as required by (C.6-3)) if and only if γ : Q → R++ with γ(ψ(x)) = x, x > 0, is log-convex. In other words, if the inverse function of ψ is log-convex, then the power control problem can be transformed into a convex optimization problem. By Sect. 5.3, the log-convexity property implies that the feasible QoS region is a convex set.
6.4 Strong Convexity Conditions In this section, we strengthen Theorem 6.3 by proving sufficient conditions on strong convexity of Fe (s). For more information about strong convexity, the reader is referred to Sect. B.2.1. The main motivation behind the analysis is to ensure a linear convergence of the algorithm (Sect. 6.5.2). First we are going to show that if ψe : R → R is strongly convex on any bounded interval on the real line, then Fe is strongly convex on ¯ ⊂ RK . To this end, let I ⊂ R be an arbitrary bounded convex set S a bounded interval chosen such that hk (s) ∈ I for each k ∈ K and all ¯ Since hk : RK → R is continuous, it is clear that there exists s ∈ S. such an interval. Therefore, by the assumption (see also Definition B.25), there exists some constant c > 0 such that ψe (1 − μ)hk (ˆs) + μhk (ˇs) ≤ 2 (1 − μ)ψe hk (ˆs) + μψe hk (ˇs) − 1/2cμ(1 − μ) hk (ˆs) − hk (ˇs) , 1 ≤ k ≤ K, for ¯ Incorporating this into inequality (b) in (6.16) yields all μ ∈ (0, 1) and ˆs, ˇs ∈ S. Fe (s(μ)) ≤ (1−μ)Fe (ˆs)+μFe (ˇs)−1/2cμ(1−μ)h(ˆs)−h(ˇs)2W for all μ ∈ (0, 1) ¯ where h(s) := (h1 (s), . . . , hK (s)) is the vector of the logarithand ˆs, ˇs ∈ S, mic SIRs, W = diag(w1 , . . . , wK ) is positive definite, and u2W = uT Wu. Since W is positive definite and all norms are equivalent on finite dimensional metric spaces (Theorem A.3 in the appendix), we deduce that there exists a constant c1 > 0 such that Fe (s(μ)) ≤ (1 − μ)Fe (ˆs) + μFe (ˇs) − 1/2c1 μ(1 − μ)h(ˆs) − h(ˇs)22 ¯ Now note that h : RK → RK is a bijection. This for all μ ∈ (0, 1) and ˆs, ˇs ∈ S. immediately follows from the fact that R++ → R : x → log(x) is bijection
6.4 Strong Convexity Conditions
267
and p(ω) defined by (5.54) is a bijection, provided that zk > 0 for each ¯ (Definition B.46) since the k ∈ K. Moreover, h(s) is Lipschitz continuous on S Jacobian matrix of h(s) is bounded in the matrix 2-norm on the bounded set ¯ Therefore, as h is bijection, it is actually bi-Lipschitz continuous, implying S. that there exists a constant 0 < M < +∞ with 1/M ˆs − ˇs2 ≤ h(ˆs) − ¯ Combining this with the inequality above h(ˇs)2 ≤ M ˆs − ˇs2 for all ˆs, ˇs ∈ S. implies that there exists a constant c2 > 0 such that Fe (s(μ)) ≤ (1−μ)Fe (ˆs)+ ¯ We summarize μFe (ˇs) − 1/2c2 μ(1 − μ)ˆs − ˇs22 for all μ ∈ (0, 1) and ˆs, ˇs ∈ S. these observations in a lemma. Lemma 6.5. Let (C.6-1)–(C.6-3) be satisfied, let zk > 0, k ∈ K, and let V be an arbitrary nonnegative matrix. In addition, suppose that, for any bounded interval I ⊂ R, there exists a constant c > 0 (dependent on I) such that ψe (x) − 1/2cx2 is convex on I. Then, Fe (s) is strongly convex on any bounded convex subset of RK . Proof. The lemma follows from the discussion above and Observation B.26 saying that any continuous function f : RK → R is strongly convex (with modulus of strong convexity c) if and only if f (x) − 1/2cx22 is convex. Since ψ is assumed to be twice continuously differentiable, the requirement on strong convexity of ψe is equivalent to (Theorem B.28) c ≤ ψe (x) =
d2 ψe (x) = ex ψ (ex )ex + ψ (ex ) , 2 dx
x ∈ I ⊂ R.
(6.18)
If Ψ : R++ → Q is given by (5.28) with α > 1, then (6.18) is satisfied by ψ(x) = −Ψ (x). Indeed, taking the second derivative of ψe (x) gives ψe (x) = (α − 1)ex(1−α) , α > 1, which is positive and bounded away from zero on any bounded interval I ⊂ R. The strong convexity condition is also satisfied by ψ(x) = −Ψ (x) with Ψ given by (5.29) since then ψe (x) = (α − 1)ex /(1 + ex )α , α ≥ 2. In contrast, the requirement is not met when Ψ (x) = log(x), x > 0 in which case the second derivative of ψe (x) = −x is identically zero on R. Note that, in the lemma above, there are no additional limitations on the choice of the gain matrix V ≥ 0. On the extreme, V may even be the zero es k )). matrix, in which case Fe (s) = k wk ψ( zk ) = k∈K wk ψe (sk − log(z k 2 Thus, ∇ Fe (s) = diag w1 ψe (s1 − log(z1 )), . . . , wK ψe (sK − log(zK )) when V = 0. Now we see that if (6.18) holds and V = 0, the Hessian of F (s) is K contrast, positive definite (∇2 Fe (s) 0) on any bounded
subset of R . In if ψ(x) = − log(x), x > 0, we obtain Fe (s) = k wk log(zk ) − wT s which is affine in s ∈ RK , and therefore not strongly convex. As Ψ (x) = log(x), x > 0, is of great interest for wireless applications (see Sects. 5.2.3 and 5.3), we prove a sufficient condition under which the strong convexity property of Fe (s) is guaranteed on any bounded convex subset of RK , provided that (C.6-1)– (C.6-3) are satisfied. It turns out that some very mild restrictions on the gain matrix V ≥ 0 are sufficient to reestablish the strong convexity property of Lemma 6.5, also for the logarithmic function.
268
6 Power Control Algorithms
Lemma 6.6. Suppose that (C.6-1)–(C.6-3) hold. Let V ≥ 0 be a matrix such that, for each l ∈ K, there exists k = l with vk,l > 0. Then, Fe (s) is strongly convex on any bounded convex subset of RK . In other words, each column of the matrix V is required to have at least one positive entry. Proof. If ψ(x) = − log(x), x > 0, then ψe (x) = −x, x ∈ R. So, since ψe (x) is convex and strictly decreasing, it is sufficient to consider ψe (x) = −x. In other words, if the lemma holds for the linear function, then it holds for any function satisfying (C.6-1)–(C.6-3). ¯ with ¯ is any bounded convex subset of RK . Let ˆs, ˇs ∈ S Suppose that S ˆs = ˇs be arbitrary. Note that the lemma has the same setup as Theorem 2.13 and R → R++ : x → ex is a log-convex function. Hence, proceeding essentially as in the proof of Theorem 2.13 shows that there exists k0 ∈ K such that fk0 (μ) := Ik0 (es(μ) )
and
gk0 (μ) := log(fk0 (μ))
are strictly log-convex and strictly convex functions of μ ∈ (0, 1), respectively. In fact, gk0 (μ) is strongly convex. To see this, let l0 ∈ K with sˆl0 = sˇl0 be arbitrary, and let k0 ∈ K, k0 = l0 , be such that vk0 ,l0 > 0. Note that, by assumption, such an index exists. Taking the second derivative of gk0 (μ) yields s (μ) s (μ) 2 l l vk0 ,l (ˇ sl − sˆl )2 Ik0 (es(μ) ) − vk0 ,l (ˇ sl − sˆl ) le le gk0 (μ) = . (Ik0 (es(μ) ))2 Now it may be verified that, for any x1 , . . . , xn ∈ R and nonnegative constants n n 2 n
2 a1 , . . . , an , we have = 12 i,j ai aj (xi − i=1 ai xi j=1 aj − i=1 ai xi xj )2 ≥ 0. Hence, as vk0 ,l0 > 0, zk0 > 0 and ˆs, ˇs are members of a bounded set ¯ there exists a constant c > 0 such that S,
sl (μ) e vk0 ,l (ˇ sl − sˆl )2 gk0 (μ) ≥ zk0 l ≥ c(ˇ sl0 − sˆl0 )2 (Ik0 (es(μ) ))2 ¯ From this, it follows that gk (μ) is strongly for all μ ∈ (0, 1) and ˆs, ˇs ∈ S. 0 ¯ This convex, and hence hk0 (s(μ)) = sk0 (μ) − gk0 (μ) is strongly concave on S. is equivalent to saying that there exists a constant c > 0 such that hk0 (s(μ)) ≥ ¯ So, (1−μ)hk0 (ˆs)+μhk0 (ˇs)+1/2cμ(1−μ)ˇs −ˆs22 for all μ ∈ (0, 1) and ˆs,ˇs ∈ S. with ψe (x) = −x, we obtain ψe (hk0 (s(μ))) ≤ (1−μ)ψe hk0 (ˆs) +μψe hk0 (ˇs) − ¯ This implies that for any 1/2cμ(1 − μ)ˆs − ˇs22 for all μ ∈ (0, 1) and ˆs, ˇs ∈ S.
¯ fixed ˆs, ˇs ∈ S, there exists at least one addend in k wk ψe (hk (s(μ))) for which the inequality above is satisfied, with an appropriately chosen positive constant c > 0. From this, strong convexity of Fe (s) on any bounded convex subset of RK follows. Let us summarize both lemmas in a theorem.
6.4 Strong Convexity Conditions
269
Theorem 6.7. Let (C.6-1)–(C.6-3) be satisfied, and let z > 0. Suppose that one of the following holds. (i) ψe is strongly convex on any bounded interval on the real line. (ii) Each column of V has at least one positive entry. Then, Fe is strongly convex on any bounded convex subset of RK . It is important to emphasize that in the setup of Lemma 6.6, the gain matrix is not necessarily irreducible. To illustrate the result, consider the following matrix ⎛ ⎞ 0 v1,2 v1,3 V = ⎝v2,1 0 0 ⎠ v1,2 , v1,3 , v2,1 > 0 . 0 0 0 The matrix is reducible and satisfies the condition of Lemma 6.6. It may be verified that with this choice of V, the Hessian of Fe (s) is positive definite on any bounded subset of RK . The explanation is basically the same as in Sect. 2.3.3: For each l, there is a k such that vk,l > 0. This implies that each link is an interferer to some other link. Thus, since the noise term is positive for all k ∈ K, it follows that for each l ∈ K, there must exist k ∈ K such that Ik (es ) is strictly log-convex along the lth coordinate of s. This in turn implies that Fe (s) is strongly log-convex on any bounded convex set. However, this is no longer true if we take the transpose of the matrix above ⎞ ⎛ 0 v1,2 0 V = ⎝v2,1 0 0⎠ . v3,1 0 0 Now link 3 is exposed to interference from link 1 but it is an interferer to no other link. Consequently, there is no k such that Ik (es ) is strictly log-convex along the third coordinate of s. Choosing ψ(x) = − log(x), x > 0 yields Fe (s) that is linear in s3 , and hence the function cannot be strongly convex. By Theorem B.36, we know that the inverse function of ψ is strictly logconvex on Q if and only if ψe is strictly convex on R++ , which is true if ψe is strongly convex on any bounded interval on the real line. Therefore, Lemma 6.5 corresponds in some sense to Theorem 2.12. In contrast, Lemma 6.6 corresponds to Theorem 2.13, which has the same setup and asserts that, for every ˆs, ˇs ∈ Fγ , there exists k ∈ K such that pk (ω(μ)) is a strictly log-convex function of μ ∈ [0, 1]. The reader should notice a striking analogy between these results. In fact, the proof of Lemma 6.6 is based on the proof of Theorem 2.13. The
last theorem in Sect. 2.3.3 (Theorem 2.15) suggests that each addend in k wk ψe (hk (s)) would be strongly convex on any bounded convex set if V was irreducible, regardless of the choice of ψe for which (C.6-1)–(C.6-3) hold. Indeed, if V is irreducible, an examination of the proof of Theorem 2.15 reveals that Ik (es ) is strictly log-convex on RK for each k ∈ K. Therefore, proceeding essentially as in the proof of Lemma 6.6 shows that ψe (hk (s)) is
270
6 Power Control Algorithms
strongly log-convex on any bounded convex subset of RK , provided that V is an irreducible matrix. This may have a positive effect on the convergence rate of our algorithms.
6.5 Gradient Projection Algorithm Under the assumption of (C.6-1)–(C.6-3), we consider the following recursive gradient projection algorithm with a constant step size δ > 0 (small enough) [160]: ( ) s(n + 1) = ΠS s(n) − δ∇Fe (s(n)) , s(0) ∈ S, n ∈ N0 (6.19) where ΠS [x] denotes the projection of x ∈ RK on S (with respect to the Euclidean norm; see Theorem B.45). The kth partial derivative ∇k Fe (s) = (∇F (p))k (see Definition B.13 and the footnote about the notation) is equal to ∂Fe ∇k Fe (s) = (s) = esk gk (es ) − vl,k SIRl (es )gl (es ) , k ∈ K, (6.20) ∂sk l∈Kk
with s = log(p) and gk (p) =
wk ψ (SIRk (p)) , Ik (p)
k ∈ K.
(6.21)
The operation of projecting a vector in RK on S can be easily parallelized. We will discuss this in Sect. 6.5.4. In contrast, the problem of parallel computing ∇k Fe (s) is anything but trivial. This problem is addressed in Sect. 6.6. In the following section, we show that the sequence {s(n)} generated by (6.19) converges to a stationary point of Fe on S (Definition B.44) that minimizes Fe (s) over S. 6.5.1 Global Convergence Although the problem is convex, it is not obvious that the algorithm converges to a stationary point of the objective function (Definition B.44). This is because, in a general case, some step size control is necessary to obtain the convergence. In view of distributed implementation, however, the step size is assumed to be constant. It is well known [160, 162] (see also Sect. B.4.2) that the gradient projection algorithm with a constant step size converges to a stationary point if each of the following is satisfied: (i) Fe (s) is bounded below on S, (ii) Fe (s) is continuously differentiable and the gradient ∇Fe (s) is Lipschitz continuous on S (see Definition B.46), and
6.5 Gradient Projection Algorithm
271
(iii) 0 < δ < 2/M where M is the Lipschitz constant. Note that this condition can be satisfied only if ∇Fe (s) is Lipschitz continuous. Whereas the first condition is satisfied by assumption, the Lipschitz continuity condition is not met on S in general. Indeed, if we let the kth entry of ˆs = ˇs + c ∈ S, c ∈ R, ˆs = ˇs, tend to −∞ while keeping all the other entries constant, it is easy to see from (6.20) and (6.7) that ∇Fe (ˆs) − ∇Fe (ˇs)2 may grow without bound.2 However, the problem stems from unboundedness of S and can be evaded by letting the step size depend on the start point. Indeed, it is intuitive to expect that for every given s(0) ∈ S, ∇Fe (s) satisfies the Lipschitz continuity condition on ¯ := {x ∈ S : Fe (x) ≤ Fe (s(0)) < +∞} . S
(6.22)
¯ is bounded and, by convexity of Fe (s), a convex set for every Obviously, S s(0) ∈ S. As shown below, the Lipschitz continuity property is a consequence of the fact that the Hessian ∇2 Fe (s) exists and is continuous on S. ¯ Lemma 6.8. Suppose that (C.6-1)–(C.6-3) hold, s(0) ∈ S is arbitrary, and S ¯ is given by (6.22). Then, ∇Fe (s) is Lipschitz continuous on S, that is, there exists a constant M > 0 such that ∇Fe (ˆs) − ∇Fe (ˇs)2 ≤ M ˆs − ˇs2
(6.23)
¯ for all ˆs, ˇs ∈ S. ¯ be arbitrary. By the twice continuous differentiability Proof. Let ˆs, ˇs ∈ S of ψe (x), x ∈ R, each entry of the Hessian matrix ∇2 Fe (s) is a continuous function on RK . This implies that the gradient ∇Fe : RK → RK is Gateaux differentiable (Definition B.14), and hence, by [148, p. 69], one has ∇Fe (ˆs) − ∇Fe (ˇs)2 ≤ sup ∇2 Fe (ˆs + μ ˇs − ˆs) 2 ˆs − ˇs2 0≤μ≤1
√ where A2 = λmax is the induced matrix 2-norm and λmax is the largest T eigenvalue of A A (see (A.9) in Sect. A.2 for the definition of a matrix induced ¯ is bounded, it is obvious that the norm). Now, because each entry of s ∈ S ¯ in the matrix 2-norm. Defining this Hessian ∇2 Fe (s) is bounded above over S bound as M yields (6.23). In fact, ∇Fe (s) satisfies the Lipschitz continuity condition on every bounded ¯ for any s(0) ∈ S. Thus, by subset of S and, in particular, on the convex set S Lemma 6.8 and Theorem B.48, if 0 < δ < 2/M,
M = sup ∇2 Fe (s)2
(6.24)
¯ s∈S
2
This is not the case when ψ(x) = log(1/x), x > 0. See the brief discussion at the end of this section.
272
6 Power Control Algorithms
¯ for every the sequence {s(n)} generated by (6.19) will stay within the set S n ∈ N0 . Moreover, the algorithm will decrease the value of the objective function Fe , unless a stationary point s∗ ∈ S has been reached. This point satisfies (s − s∗ )T ∇Fe (s∗ ) ≥ 0 for every s ∈ S (Theorem B.43 and Definition B.44). In fact, due to convexity of Fe (s) shown in Sect. 6.3, we can conclude ∗ that s∗ minimizes Fe (s) over S, and therefore p∗ = es minimizes F (p) over P. Let us summarize these observations in a theorem [82, 163]. Theorem 6.9. Let (C.6-1)–(C.6-3) be fulfilled, and let {s(n)} be a sequence generated by (6.19). Then, for sufficiently small δ > 0, {s(n)} converges to a point s∗ ∈ S. Moreover, the point of attraction s∗ minimizes Fe (s) over S, and hence is given by (6.17). Remark 6.10. It is important to emphasize that the choice of δ in (6.19) depends on the start point s(0), which is evident from (6.24). However, this should not pose a significant problem to wireless networks where successful transmission requires some minimum SIR at the output of each linear receiver. This information could be used to predetermine a worst-case step size that would work under any feasible scenario at the possible expense of the convergence rate. In order to ensure some signal-to-interference ratios, nodes may start the iteration process with predefined transmit powers and interference powers not exceeding some predefined threshold. This could be achieved by limiting the number of active links with relatively small signal power gains. The Hessian of Fe (s) can be easily calculated to give wk ψe hk (s) ∇hk (s)∇hk (s)T ∇2 Fe (s) = k∈K
+
wk ψe hk (s) ∇2 hk (s) .
(6.25)
k∈K
We see that, by concavity of hk (s) and strict decreasingness of ψe , the second addend is positive semidefinite. The first addend is positive semidefinite as well since ψe (x) = ψ(ex ) is convex. As the sum of positive semidefinite matrices is positive semidefinite, this implies that ∇2 Fe (s) 0 for all s ∈ RK whenever (C.6-1)–(C.6-3) are satisfied. This is in full agreement with Theorem 6.3, which, however, does not require differentiability. The Lipschitz constant M in (6.24) is equal to M = sups∈S¯ λmax (∇2 Fe (s)). A closed form solution for the supremum is in general an intricate problem. In order to obtain a simpler condition, note that, for any matrix A, we have ρ(A) ≤ A∞ and ρ(A) ≤ AT ∞ where ρ(A) denotes the spectral radius of A (Definition A.13)
and A∞ is defined by (A.10). Hence, ρ(A) ≤ min{maxi j |ai,j |, maxj i |ai,j |}. Now since the Hessian matrix is symmetric, we obtain |(∇2 Fe (s))i,j | . λmax (∇2 Fe (s)) ≤ κ(s) := max i
j
6.5 Gradient Projection Algorithm
273
Therefore, choosing 0 < δ < 2/M with M = sups∈S¯ κ(s) ensures the convergence of the algorithm. When compared with (6.24), the Lipschitz constant here is significantly easier to estimate due to a simple relationship between κ(s) and the entries of the Hessian matrix. In a special case, when Ψ (x) = −ψ(x)
= log(x), x > 0, an examination of (6.25) reveals that ∇2 Fe (s) = − k wk ∇2 hk (s) where (esi v )2 −esi v I (es ) k,i k,i k 2 i = j = k s )2 ∇ hk (s) i,j = esi vk,i esIjkv(e k,j elsewhere . Ik (es )2 Since Ik (es ) ≥ zk > 0 for all s ∈ S, the Hessian matrix is bounded above in the matrix 2-norm on S. Therefore, in this special case, the Lipschitz continuity condition is satisfied on the entire S. This in turn implies that there is a step size δ that works for any start point. 6.5.2 Rate of Convergence The rate (or speed) of convergence says how fast the method converges to the point of attraction, and thus a high convergence rate is strongly desired. In this subsection, we apply the strong convexity results from Sect. 6.4 to guarantee a linear convergence of the gradient projection algorithm (6.19). General definitions of convergence rate or, equivalently, rate of convergence can be found in App. B.4.1. Here, we focus on one special definition of convergence rate for convex problems considered, for instance, in [162]. Remark 6.11. It is important to point out that due to the dynamic nature of wireless networks as well as strict limitations on wireless resources, only a relatively small number of iterations can be carried out in real-world wireless networks. For this reason, the algorithms are frequently required to have a faster initial convergence. Note that this additional requirement is necessary as the initial rate of convergence is not captured by the traditional notion of rate convergence considered in this book due to its asymptotic nature. As far as the rate of convergence is concerned, much more powerful algorithms than (6.19) can be devised to solve the power control problem [162]. Usually, these algorithms require excessive coordination/signaling between nodes in a network. For instance, the choice of the step size needs often to be optimized in every iteration to improve the rate of convergence. Such a step size control may require the exchange of a significant amount of information between nodes, and hence waste scarce wireless resources. Also, “traditional” Newton-like methods may provide a superlinear convergence but require the inverse of the Hessian matrix which is prohibitive for wireless network applications. An exception represents here the primal-dual approach based on a generalized Lagrangian function. In case of convex problems (each local optimum is a global one), rate of convergence is often evaluated in terms of an error function e : RK → R
274
6 Power Control Algorithms
satisfying e(x) ≥ 0 for all x ∈ RK and e(x) = 0 if and only if x = x∗ , where x∗ is a global optimizer, which attracts the iterates [162]. A typical choice of an error function assumed in this section is the Euclidean distance e(x) = x − x∗ 2 .
(6.26)
As for the power control algorithm presented above, everything we can guarantee is a linear (or geometric) convergence defined as follows [162]. Definition 6.12. A sequence of real-valued vectors {x(n)} is said to converge linearly to x∗ if there exist constants a > 0 and β ∈ (0, 1) such that e(x(n)) ≤ aβ n .
(6.27)
Provided that δ is sufficiently small, each iteration update of the power ¯ defined control algorithm in (6.19) stays within the bounded convex set S by (6.22). At the same time, Theorem 6.7 asserts that Fe is strongly convex on any bounded subset of RK , provided that the conditions of the theorem are satisfied. Thus, considering Theorem B.49 and Definition (6.27) yields the following corollary. Corollary 6.13. Suppose that one of the following is satisfied: (i) ψe : R → Q is strongly convex on any bounded interval in R, (ii) each column of V has at least one positive entry. Then, provided that δ is chosen positive and small enough, the sequence {s(n)} generated by (6.19) converges to s∗ given by (6.17) linearly. Linear convergence is obtained if lim supn→∞ e(x(n+1)) ≤ β for some e(x(n)) β ∈ (0, 1) (see also App. B.4.1). Thus, asymptotically (n → ∞), the ratio of consecutive errors is reduced by a factor of at least β ∈ (0, 1) at each iteration. So, a linear convergence is a fairly satisfactory rate of convergence, provided the factor β is not too close to unity. Among others, this factor is influenced by the step size δ in the case of iteration (6.19). For this reason, it may be beneficial to determine an appropriate step size at the beginning of each frame interval.3 As already mentioned before, the step size cannot be too large since otherwise divergence will occur. On the other hand, a small step size ensures linear convergence but the convergence in the sense of speed of descent may be very slow. To speed up the convergence, an appropriate scaling can be performed.
3
Dynamic power control may periodically adjust transmit powers to changing channel and network conditions. For more information regarding the assumed frame structure, we refer the reader to Sect. 4.3 and Sect. 5.2.1.
6.5 Gradient Projection Algorithm
275
6.5.3 Diagonal Scaling The rate of convergence of gradient methods depends on the condition number of ∇2 Fe (s(n)), which, for a positive semidefinite matrix, is defined as the ratio of the largest eigenvalue of the Hessian to its smallest one [162]. If the Hessian ¯ (a linear convergence), the condition number is finite is positive definite on S but it can still be relatively large causing gradient methods to converge very slow. In such cases, the problem can often be alleviated by appropriately scaling the update direction. The scaled power control algorithm with a constant step size takes the form ( ) s(0) ∈ S (6.28) s(n + 1) = ΠSn s(n) − δ D(n)∇Fe (s(n)) , where D(n) is a symmetric positive definite matrix for every n. The projection in (6.28) is performed with respect to a different norm given by xD(n) = xT D(n)x. Thus, for any fixed x ∈ RK , ΠSn [x] is a unique vector that minimizes y − xD(n) over all y ∈ S. Ideally, D(n) should be the inverse of the Hessian matrix of Fe (s(n)) but this would require extensive global coordination and centralized computation [162]. Thus, a reasonable choice of D(n) is a matrix for which all the diagonal 1 1 entries of D(n) 2 ∇2 Fe (s(n))D(n) 2 are approximately equal to unity. This may be achieved by a diagonal matrix D(n) whose kth diagonal element dk (n) is given by ∂2F −1 e s(n) dk (n) = 2 ∂sk where the second partial derivatives of Fe (s) follow from (6.25). 6.5.4 Projection on a Closed Convex Set In general, gradient projection algorithms are not amenable to distributed implementation as the computation of the projection (6.19) may involve all components of the update vector. Fortunately, the geometric structure of S makes a parallel implementation possible. First of all, Theorem B.45 asserts that the projection exists and is unique since S is a closed convex set (Definitions B.1 and B.21). By Theorem B.45, given an arbitrary n ∈ N0 , the projection ΠS [u(n)] of the update vector u(n) = s(n) − δ∇Fe (s(n)) on S with respect to the Euclidean norm is equal to ΠS [u(n)] = arg minu(n) − x22 . x∈S
In what follows, assume that n ∈ N0 is arbitrary but fixed, and let u = u(n). From (4.18) and s = log(p), we see that S is the N -fold Cartesian product S = S1 × · · · × SN where Sm =
|K(m)| |K(m)|
x∈R
:
k=1
xk
e
≤ Pm .
276
6 Power Control Algorithms
Here Pm is the individual power constraint on node m ∈ N . Each of these sets, say set Sm , is a closed subset of R|K(m)| with RK = R|K(1)| × · · · × R|K(N )| . Therefore, it follows that (see also [162]) the projection of u on S can be accomplished by projecting u(m) on Sm ⊂ R|K(m)| where u(m) ∈ R|K(m)| is a subvector of u such that u(m) = (uk )k∈K(m) . Obviously, the projection of u(m) on Sm can be carried out at node m ∈ N without any coordination with other nodes. In other words, each node, say node m, must solve the following problem ΠSm [u(m) ] = arg minu(m) − x22 x∈Sm
which is a standard quadratic optimization problem over a closed convex set Sm ⊂ R|K(m)| . Obviously, if u(m) ∈ Sm , then ΠSm [u(m) ] = u(m) . Finally, note that in the special case when there are individual power constraints on each link P1 , . . . , PK , then the projection of u on S is obtained by projecting the kth component of u on (−∞, log(Pk )] (the projection on a box). So, in this case, the projection is a straightforward operation.
6.6 Distributed Implementation An essential advantage of the gradient projection algorithm is its amenability to efficient implementation in distributed networks. In particular, there is no need for step size control or complex operations such as matrix inversion. The projection operation can be performed without any coordination between nodes. Actually, the major problem is to parallelize the computation of ∇Fe (s) e in such a way that each node, say node n, can calculate ∇k Fe (s) = ∂F ∂sk (s) for all k ∈ K(n), without resorting to extensive internode communication. The parallelization can be seen as separating the algorithm into K local algorithms operating concurrently at different transmitter–receiver pairs. In this section, we focus on the problem of a decentralized implementation of the gradient projection algorithm presented in Sect. 6.5. The ideas are however applicable to other algorithms and, in fact, are utilized later in Sects. 6.7 and 6.8 to develop other distributed gradient-based algorithms as well as distributed primal-dual algorithms. 6.6.1 Local and Global Parts of the Gradient Vector Consider the nth iteration in (6.19) and assume that p = p(n) = es(n) is the nth power vector. Using s = log(p), it is easy to see that ∇Fe (s) defined by (6.20) is a version of ∇F (p) scaled with the positive definite diagonal matrix P = diag(p1 , . . . , pK ) = diag(es1 , . . . , esK ): ∇Fe (s) = P ∇F (p),
s = log(p) .
(6.29)
6.6 Distributed Implementation
277
Thus, ∇k Fe (s) can be easily obtained from ∇k F (p) by multiplying it with pk = esk . In what follows, we focus on ∇F (p). Considering (6.20) reveals that we can rewrite the gradient vector as follows ∇F (p) = (I + Γ(p))g(p) − (I + VT )Γ(p)g(p) = η(p) − θ(p) . η(p)<0
(6.30)
θ(p)<0
Here and hereafter, η(p) = (η1 (p), . . . , ηK (p)), θ(p) = (θ1 (p), . . . , θK (p)), g(p) := (g1 (p), . . . , gK (p)) with gk (p) defined by (6.21) and Γ(p) := diag SIR1 (p), . . . , SIRK (p) . So the problem of computing ∇k F (p) at the kth transmitter is equivalent to the computation of both ηk (p) and θk (p) at this transmitter. Let us first focus on ηk (p). The problem of computing θ(p) is deferred to the next section. It follows from (6.21) that wk ψ (SIRk (p)) . ηk (p) = 1 + SIRk (p) gk (p) = 1 + SIRk (p) Ik (p)
(6.31)
Hence, ηk (p) can be easily calculated at a node where link k originates, provided that the signal-to-interference ratio SIRk (p) is known at this node. In fact, knowledge of a good estimate of the signal-to-interference ratio in every iteration and for each (logical) transmitter–receiver pair is crucial for the algorithm to be implemented. In order to obtain an estimate of SIR, each transmitter may send a training sequence4 that is known to the respective receiver. It is important to emphasize that all transmitters must be synchronized so that the transmission can take place simultaneously on all links. Using some standard estimation method (see for instance [156] for further information and references), each receiver estimates the signal-to-interference ratio on its link and sends the estimate back to the corresponding transmitter (node) using a reliable low-rate feedback channel. In this book, we do not address the problem of estimating SIR. Instead, it is assumed that a good estimate of SIR is available at the corresponding transmitter and receiver side. Based on this information, each transmitter, say the transmitter on link k, is able to calculate the estimate of gk (p) < 0 and ηk (p). We also assume that the kth receiver is able to determine the interference Ik (p) based on the SIR estimate. Remark 6.14 (Remark on Control Channels). In this book, we assume the availability of a dedicated low-rate control channel (link) for each transmitterreceiver pair that is used to exchange some local measurements or quantities 4
By a training sequence, we mean a deterministic sequence of symbols generated by a pseudorandom number generator. We assume that the elements of this sequence “approximate” zero-mean independent and identically distributed random variables.
278
6 Power Control Algorithms
between the associated transmitter and receiver. All control links are assumed to be mutually orthogonal and orthogonal to all data links. In other words, each control link perceives no or negligible interference and causes no or negligible interference to any other link. 6.6.2 Adjoint Network In contrast to ηk (p), the problem of computing vl,k SIRl (p)gl (p) θk (p) = SIRk (p)gk (p) +
(6.32)
l∈Kk
is significantly more tricky. Interestingly, θk (p) can be estimated at each link (node) by a scheme that relies on the concept of an adjoint network defined as follows [164, 128, 163]. Definition 6.15 (Adjoint Network). Consider an arbitrary wireless network with K (logical) links and the gain matrix V. Let us call it the primal network. Then, a network with K (logical) links and the gain matrix U ∈ RK×K is said to be adjoint to the primal network if U = VT . + Note that for any given primal network, an adjoint network is not unique in general. The definition above merely states that the gain matrix of an adjoint network is a transpose matrix of the gain matrix of the primal network. In a special case, if V = VT , then any network is adjoint to itself. In what follows, assume that a primal network with the gain matrix V is given. The reason for introducing the definition becomes more clear if we have a look at θ(p) in (6.30) or (6.32). We see that θ(p) results from the multiplication of the vector Γ(p)g(p) with (I+VT ). This suggests that the entries of the vector θ(p) may be made available to some nodes in the network by transmitting appropriately scaled pilot symbols over an adjoint network. Obviously, the following two conditions should be satisfied: (i) the kth transmitter in an adjoint network has an access to the kth coordinate of the vector Γ(p)g(p), and (ii) the kth coordinate of (I+U)Γ(p)g(p) corresponds to the kth transmitter in the primal network where U is the gain matrix of an adjoint network. In order to satisfy both conditions, we consider a so-called reversed network defined as follows. Definition 6.16 (Reversed Network). We call a network reversed if the roles of transmitters and receivers on each link in a primal network are reversed. In a reversed network, link k ∈ K is a link between the kth receiver and the kth transmitter (in the primal network).
6.6 Distributed Implementation
279
By the reversed roles we mean that, in each transmitter–receiver pair, say the pair on link k, the kth transmitter becomes the kth receiver and vice versa. The corresponding link in a reversed network is labeled by k. A nice feature of a reversed network is that condition (i) can be easily met. In fact, the kth transmitter in a reversed network knows SIRk (p) and Ik (p) since it is the kth receiver in the primal network (see the previous section). Consequently, since ψ is common for all links, the kth transmitter in the primal network only needs to inform the kth receiver about its weight wk . This however must be done only once before starting the iteration process. Condition (ii) is automatically satisfied by a reversed network. See Example 6.17 for an illustration of the definition. All distributed power control schemes presented in this book assume that all links are reciprocal with respect to the power in the following sense: (C.6-4) Given any primal network, suppose that Vk > 0, and Vk,l ≥ 0 are the signal power gain of link k and the interference power gain between transmitter l and receiver k, respectively. Then, in the reversed network, the signal power gain of link k and the interference power gain between transmitter k and receiver l (in the reversed network, that is, after the change of the roles) are, respectively, equal to Vk and Vk,l . This property is called power reciprocity and is significantly weaker than the usual channel reciprocity, in which case the phases of the complex-valued channel coefficients must coincide as well. Unfortunately, a reversed network is in general not adjoint to the primal network. To see this, let us write V as V = DG, where D = diag 1/V1 , . . . , 1/VK is a diagonal matrix of the inverse signal path gains (see Sect. 4.3.1) and G = (Vk,l )1≤k,l≤K ∈ RK×K , +
trace(G) = 0
incorporates the interference path gains (interference or crosstalk factors) between different links. Now since the roles of transmitters and receivers are reversed on each link in a reversed network, its gain matrix U is equal to U = DGT . Obviously, we have U = VT , unless D is a scaled identity αI for some α > 0. The matrix D, however, is not a scaled identity in general, which is simply due to different channel realizations on different links. D is a scaled identity if the wireless channel is an additive white Gaussian noise channel (AWGN) for each link, provided that all receivers are normalized appropriately. Example 6.17. To illustrate Definition 6.16, suppose that ck ∈ CW is the kth (k) (logical) receiver and bl ∈ CW is the lth effective transmit vector associated with receiver k (see the definitions in Sects. 4.3.1 and 4.3.2). Thus, we have
280
6 Power Control Algorithms (k)
Vk = |cTk bk |2 > 0
(k)
Vk,l = |cTk bl |2 ≥ 0
(k, l) ∈ K × K
with Vk,k = 0. Two examples from Sect. 4.3.5: • In the case of multiple antenna systems, ck is the kth receive beamforming (k) vector, which acts as a linear receiver, and bl = H(k,l) ul , where ul ∈ CW is the lth transmit beamforming vector, which acts as a (logical) transmitter, and H(k,l) ∈ CW ×W is the multiple antenna channel from transmitter l to receiver k. (k) • In the case of a standard flat-fading CDMA system, we have bl = W hk,l ul ∈ C where hk,l ∈ C is a channel coefficient between transmitter l and receiver k, ul ∈ CW is used to denote the lth spreading sequence and W ≥ 1 is the spreading factor. In the reversed network, ck is the kth transmitter, while uk acts as the kth receiver. Consequently, due to the power reciprocity of all the channels (C.6-4), the signal power gains Vk,k = |cTk bk |2 are the same in both the primal network and the reversed network. In contrast, the interference power gains are different in general as Vk,l = Vl,k . It is important to emphasize that in the reversed network, the same vectors as in the primal network are used, but with the reversed roles. We further see from the example that D = αI if Vk = 1/α for each k ∈ K, which is usually not satisfied due to the impact of the channels. Using the above definitions, the gain matrix of an adjoint network is V = GT D. Comparing this with the gain matrix DGT of a reversed network, we see that instead of multiplying GT by D on the right, the matrix is multiplied by D on the left. A straightforward examination shows that the right-hand side multiplication is achieved if each transmitter in a reversed network inverts its “own” wireless channel such that the resulting signal power gain between each transmitter–receiver pair is equal to 1 (or, equivalently, such that Vk = 1). The effect of this on the gain matrix U = DGT of the reversed network is that D = I (due to the channel inversions) and (GT )k,l = Vl,k /Vl (due to the effect of the channel inversions on other links). Thus, in this case, we obtain U = VT . We summarize these observations in a theorem. Theorem 6.18. Assume a flat fading wireless channel. Then, a reversed network is adjoint to a given primal network if each transmitter in the reversed network, say √ transmitter k ∈ K, inverts its channel by multiplying transmit symbols by 1/ Vk > 0. We point out that these ideas can be extended to frequency-selective fading channels, provided that all channels are invertible.5 In such a case, each symbol should be convoluted with the inverse impulse response of the corresponding wireless channel. Given some primal network, in all that follows, the adjoint 5
The channel is invertible if its transfer function has no zeros on the unit circle.
6.6 Distributed Implementation
281
network refers to the reversed network in which each transmitter performs the channel inversion as described above. To illustrate the theorem, we neglect the background noise (see a remark on the noisy case in Section 6.6.3) and assume, for simplicity, that Vk,l = |hk,l |2 > 0 and Vk = |hk,k |2 > 0 where hk,l ∈ C with |hk,k | is a given channel coefficient. In other words, hk,l , 1 ≤ k, l ≤ K, are realizations of the wireless channel at the beginning of the considered frame interval. Now if all transmitters in the reversed network concurrently transmit sequences of √ zero-mean random symbols Xk multiplied by 1/ Vk = 1/|hk,k |, then by Theorem 6.18 the resulting network has the gain matrix VT . This is illustrated in Fig. 6.1 under the assumption of noiseless links. In this example, the signalPrimal Network
Adjoint Network E1
S1
S1
E1
h1,1
h1,1 h1,2
h2,1
h2,1 S2
h2,2
h1,2 E2
S2
h2,2
E2
Fig. 6.1: In the primal network, the received signal samples at E1 and E2 are y1 = h1,1 X1 +h1,2 X2 and y2 = h2,2 X2 +h2,1 X1 , respectively, where X1 , X2 are zeromean independent information-bearing symbols with E[|X1 |2 ] = p1 , E[|X2 |2 ] = p2 . In the adjoint network, the roles change such that E1 and E2 transmit X1 /|h1,1 | and X2 /|h2,2 |, respectively. As a result, the received signal samples are y˜1 = h1,1 /|h1,1 |X1 + h2,1 /|h2,2 |X2 and y˜2 = h2,2 /|h2,2 |X2 + h1,2 /|h1,1 |X1 , respectively.
to-interference ratios in the primal network at E1 and E2 are SIR1 (p) = p1 /(v1,2 p2 ) with v1,2 = |h1,2 |2 /|h1,1 |2 > 0 and SIR2 (p) = p2 /(v2,1 p1 ) with v2,1 = |h2,1 |2 /|h2,2 |2 > 0, respectively. So, the gain matrix is ⎛ ⎞ |h |2 0 |h1,2 2 V = ⎝ |h2,1 |2 1,1 | ⎠ . 0 |h2,2 |2 In the adjoint network, if E[|X1 |2 ] = p1 and E[|X2 |2 ] = p2 , we have SIR1 (p) = p1 /(v2,1 p2 ) (at node S1 ) and SIR2 (p) = p2 /(v1,2 p1 ). The gain matrix for the adjoint network is therefore given by ⎞ ⎛ |h |2 0 |h2,1 2 2,2 | ⎠ ⎝ = VT . |h1,2 |2 0 2 |h1,1 |
282
6 Power Control Algorithms
This procedure straightforwardly extends to networks with an arbitrary number of links, provided that the network and signal model introduced in Chap. 4 holds. Finally, we point out that the channel inversion in the adjoint network may cause some problems. Indeed, if Vk is too small, then the transmit power on link k in the adjoint network can be unacceptably high, thereby violating some power constraints. A simple but effective solution to this problem is to define a certain threshold and prevent those links from transmission for which the signal power gain falls below this threshold. In addition, each transmitter in the adjoint network, say the transmitter on link k, could scale its training √ sequence by α/ Vk for some 0 < α ≤ 1 common to all links. The effect of such a scaling could be easily corrected at the receiver, provided that all transmitters use the same scaling factor and all receivers know the value of α. Obviously, a good choice of α depends on the realization of the wireless channel, and thus such an approach would require some coordination between nodes. 6.6.3 Distributed Handshake Protocol The basic idea is to use the primal network and the adjoint network alternately to obtain an estimate of ∇k F at the kth transmitter in the primal network. To illustrate the principle, let us consider the nth iteration of the power control algorithm in (6.19). Before starting the iteration process, each transmitter, say transmitter k, reports the current weight wk to its receiver. Primal Network: Assume that the “local part” ηk (p(n)) of the gradient vector has already been estimated using the procedure described in Sect. 6.6.1. Let ηˆk (p(n)) be the estimate such that ηˆk (p(n)) ≈ ηk (p(n)) = gk (p(n)) + SIRk (p(n))gk (p(n)) . By the procedure of Sect. 6.6.1, both SIRk (p(n)) and Ik (p(n)) are known to the kth receiver. Based on this information as well as on the knowledge of wk , the kth receiver computes gk (p(n)) < 0 given by (6.21). Adjoint Network: All transmitters concurrently send sequences of zeromean independent (training) symbols Xk (not necessarily known to the receivers) with E[|Xk |2 ] = |SIRk (p(n))gk (p(n))| for each k ∈ K. Note that this involves the channel inversion as specified in Theorem 6.18 so that the actual transmit powers are higher than |SIRk (p(n))gk (p(n))|. Each receiver, say receiver k, estimates the received power by averaging over all symbol intervals and multiplies the result by −1 (since gk (p(n)) < 0). If the Gaussian noise is not negligible (when compared with the multiple access interference), then the noise variance must be estimated (if not known) and subtracted from the estimated received power to obtain vl,k SIRl (p(n))gl (p(n)) . θˆk (p(n)) ≈ − SIRk (p(n))gk (p(n)) + l∈Kk
6.6 Distributed Implementation
283
Now since the kth receiver in the adjoint network is the kth transmitter in the primal network, the latter one computes ∇k Fˆ (p(n)) = ηˆk (p(n)) − θˆk (p(n)),
1≤k≤K
which is “close” to ∇k F (p(n)), provided that the estimates are accurate enough. Algorithm 6.1 summarizes the whole procedure for the nth iteration. In the following description, “transmitter” and “receiver” refer to transmitters and receivers in the primal network. We assume that the function ψ is known at all nodes and that the weight wk is known at both sides of link k ∈ K. Algorithm 6.1 Distributed gradient projection algorithm. Input: w > 0, n = 0, s(0) ∈ S, constant or non-increasing step size sequence {δ(n)}n∈N0 (see the following section). Output: s ∈ S 1: repeat 2: Concurrent transmission of training sequences at powers (p1 (n), . . . , pK (n)). 3: Receiver-side estimation of the signal-to-interference ratios and interferences. Based on these estimations, each receiver calculates gk (p(n)), k ∈ K. 4: All receivers feed the signal-to-interference ratios back to the corresponding transmitters using a per-link control channel. Transmitter-side computation of gk (p(n)), and then ηk (p(n)) for each k ∈ K. 5: Concurrent transmission of sequences of zero-mean independent symbols Xk with E[|Xk |2 ] = |SIRk (p(n)) · gk (p(n))|, k ∈ K
6:
over the adjoint network. Note that the transmission over the adjoint network involves channel inversion (Theorem 6.18). Transmitter-side estimation of the received power and subtraction of noise variances from the estimates to obtain θk (p(n)). Since ηk (p(n)) and θk (p(n)) are known at transmitter k, the transmitter computes ∇k Fˆ (p(n)) = ηk (p(n)) − θk (p(n)) = gk (p(n)) − (VT Γ(p(n))g(p(n)))k
where we assumed that all the variables have been estimated perfectly. 7: Update of transmit powers according to (6.19) with s(n) = log(p(n)). 8: n=n+1 9: until |Fe (s(n)) − Fe (s(n − 1))| < 10: s = s(n)
6.6.4 Some Comparative Remarks We presented the concept of adjoint network at three conferences in 2005: [128, 164, 163]. This work led to a journal article [82] that appeared in 2007 (see
284
6 Power Control Algorithms
also [165]). Later similar ideas were published by [83]; the conference version of this article was presented in 2006. To the best of our knowledge, all other related power control schemes rely on a flooding protocol that delivers some local information from each node to all other nodes. In [90], for instance, users “announce” prices to all other users, while in [78] each node passes messages obtained from locally measurable quantities to all other nodes. Whereas the approach of [90] is a game-theoretic one, the power control scheme of [78] utilizes the gradient-projection methods, and therefore is closely related to our scheme. Let us first discuss the main differences between our approach and that of [78]. Thereafter, we briefly discuss the approach of [90]. Reference [78] suggested that in the nth iteration step, each transmitter, say transmitter l ∈ K, calculates the message6 ml (n) = gl (p(n))SIRl (p(n)) and passes it to all other transmitters using a flooding protocol. Obviously, ml (n) can be computed at transmitter l because all the required ingredients are locally measurable quantities. After passing these
messages, each transmitter k ∈ K can compute the gradient component l vl,k ml (n) (vk,k = 0), provided that vl,k is known at transmitter k for each l ∈ K. The entries of V, however, depend on the realization of the wireless channels, so that they must be estimated and distributed over the network in such a way that transmitter k knows all entries in the kth column of V. Note that this may be infeasible in practice. In contrast, in any scheme
based on Algorithm 6.1, transmitter k knows neither the addends in the sum l vl,k ml (n) nor an estimate of the kth column of V. Instead, the messages mk (n), k ∈ K, determine the powers of random symbols transmitted concurrently by the receivers over the adjoint network so that each transmitter can estimate the corresponding sum (gradient component) from the received signal power (Steps 4-5 of Algorithm 6.1). Compared with [78], the overall signaling overhead is expected to be reduced significantly because, in addition to a low-rate feedback for each transmitter-receiver pair, only the received powers in the adjoint network must be estimated on each link separately. In both schemes (and many other ones), the transmitters must be coarsely synchronized to estimate the SIRs. Consequently, if the transmission in the primal network is coarsely synchronized, then no additional signaling overhead is necessary to have a coarse synchronization in the adjoint network. The overall overhead can be further reduced by strongly limiting the estimation time in the adjoint network. This, however, will lead to the problem of noisy estimates, which is addressed in the following section. An interesting approach to network-centric optimization based on pricing, a general concept originating from the game theory framework, is proposed in [90]. The authors study precisely the problem form discussed in Sect. 6.3 in the context of an ad-hoc network. The iterative price and transmit power update discussed in [90] is called the Asynchronous Distributed Pricing Algorithm 6
This is a straightforward generalization of [78] where only Ψ (x) = log(x), x > 0 is considered.
6.6 Distributed Implementation
285
(ADP) and consists, on each link, of the sequence of link-asynchronous power and price updates. A price, or payment, can be interpreted as unit increase in performance/utility per unit decrease of the perceived interference. In contrast to the feedback scheme based on the adjoint network, links need the knowledge of all power gains and other links’ prices to perform the ADP. The authors propose to provide such knowledge by periodical flooding or beacon broadcast. On the other hand, the ADP update requires, by definition, no synchronization at all. It is shown in [90] that the set of points of attraction of the ADP includes the Kuhn–Tucker points of the general problem on hand. The precise convergence features of the ADP are addressed in [90] by recasting the problem in the framework of so-called supermodular games. Global convergence of the ADP is shown for a class of utility functions that are included in the set determined by (C.5-2)–(C.5-4) in Sect. 5.2.5 (see also remarks in Sect. 5.2.5). An interesting observation is that the function class ensuring the global convergence of the ADP is precisely the same class which allows for the global convergence of the power control iteration presented later in Sect. 6.8.5 (see the conditions in Corollary 6.48). The reader is referred to [90] for numerous interesting examples of utility functions which ensure global convergence of the ADP, and thus also of all utility-based power control algorithms presented in this book. 6.6.5 Noisy Measurements In the previous section, we assumed that all unknown variables such as the received powers or the signal-to-interference ratios can be estimated with accuracy allowing for the treatment of the algorithm within the framework of deterministic optimization theory. However, due to estimation errors and other distorting factors such as quantization noise, this assumption is not adequate for many real-world wireless networks. Even if all the estimators are consistent7 or strongly consistent, larger estimation inaccuracies in steps 3 and 6 of the above scheme may appear simply by virtue of strongly limited estimation time. Also the neglect of the background noise in the computation steps or an erroneous assumption about the noise variance in the adjoint network may result in biased estimates. Indeed, since information conveyed over the adjoint network is contained in the average received power, the gradient estimates will converge (say in probability if the estimator is consistent) to the true value plus the noise variance when the background noise is not subtracted from the estimates and independent of the estimator and the received signal. Clearly, in such a case, the algorithm will not converge to the optimum even under the assumption of perfect estimation. In the case of such uncertainties, the proposed algorithm has to be analyzed in a more general context of stochastic approximation theory. The topic is too 7
An estimator is said to be consistent if the estimate converges in probability to the quantity being estimated as the estimation time grows. It is said to be strongly consistent if the estimate converges to the true value almost surely [166]
286
6 Power Control Algorithms
broad to be considered here in more detail. Moreover, it requires mathematical tools and concepts that are different from those used so far in this book. For these reasons, we only mention some basic ideas from this theory. Reference [163] provides a preliminary analysis of the above scheme in this context. A comprehensive reference for stochastic approximation algorithms is [143]. The reader is also referred to [144, 167, 3, 160]. For simplicity, the uncertainty of the estimation of SIRs in step 3) in the above procedure is neglected. So, we focus on the problem of estimating the gradient components in step 6) which are assumed to be random variables of the form Δk (n) = ∇k Fe (s(n)) + Mk (n), k ∈ K, n ∈ N0 . In general, the estimation noise processes {Mk (n)}, k ∈ K, are dependent on the estimator type, the estimation time, and the receiver noise process. In the literature, the following assumptions are often made to simplify the analysis. (C.6-5) The receiver noise processes are martingale differences uncorrelated with transmit symbols [168, 143] and have variances σk2 < ∞. In particular, this includes additive white Gaussian noise. (C.6-6) The estimation noise is zero-mean and exogenous, in the sense that its future evolution, given the past history of the iterates and the receiver noise, depends only on the noise. Note that while (C.6-5) is not restrictive, (C.6-6) is not necessarily fulfilled by Algorithm 6.1. For instance, if an erroneous assumption about the receiver noise variance is made, the estimates of ∇Fe (s(n)) may be biased. Also, (C.6-6) can be violated as the evolution of the estimation noise may depend on the iterate. This is simply because the transmit power in the adjoint network (and thus also the estimation accuracy in case of a limited estimation time) depends on s(n). This dependence can be reduced by extending the estimation time of each Δk (n), k ∈ K, n ∈ N0 . In what follows, we assume that the two assumptions (C.6-5)–(C.6-6) are satisfied. So, for each k ∈ K, we can write ( ) Mk (n + 1) = Δk (n + 1) − E Δk (n + 1)|s(0), Δk (m), m ≤ n where we have E[Δk (n + 1)|s(0), Δk (m), m ≤ n] = ∇k Fe (s(n)) and Δk (n) =
n−1 m=0 Mk (m). In words, the estimation noise process {Mk (n)}, k ∈ K, is a martingale difference independent of transmit symbols and with finite variance. Moreover, {Δk (n)}, k ∈ K, is a martingale. Recall that X(n) = Y (n + 1)−Y (n) is said to be a martingale difference if the process {Y (n)} (a sequence of random variables) is a martingale, that is to say, if E[Y (n + 1)|Y (i), i ≤ n] = Y (n) with probability one for all n [168, 143]. Thus, the expectation of X(n) conditioned on the past is zero. Moreover, since the variance is finite, the martingale differences are uncorrelated in that for m = n, we have E[Y (n + 1) − Y (n)][Y (m + 1) − Y (m)] = 0.
6.6 Distributed Implementation
287
It can be inferred from the landmark paper [144] that taking many observations of noise-corrupted gradient samples in each iteration and then averaging them to obtain a good estimate of the gradient vector is in general inefficient. Instead, the authors proposed considering a diminishing step size sequence δ(n) > 0, n ∈ N0 . Thus, in case of noisy measurements, the power control algorithm (6.19) usually takes the form ( ) (6.33) s(n + 1) = ΠS s(n) − δ(n)Δ(n) , s(0) ∈ S where Δ(n) = (Δ1 (n), . . . , ΔK (n)) is the vector of noisy measurements of the gradient in the nth iteration and {δ(n)} is a non-increasing sequence of positive real numbers satisfying
∞ (C.6-7) n=0 δ(n) = ∞ and (C.6-8) limn→∞ δ(n) = 0. The algorithm (6.33) is called stochastic power control algorithm (see also Sect. 5.5.3). The choice of the step size sequence {δ(n)} is central to the effectiveness
∞ 2 of this algorithm. A typical choice of the step size sequence satisfies n=1 δ (n) < ∞. However, this condition can often be weakened considerably [143, 167]. A standard example of a step size sequence that satisfies (C.6-7) and (C.6-8) is δ(n) = C/(n + 1) for any C > 0. Under the assumptions (C.6-5)–(C.6-6), some standard results can be exploited to deduce convergence of the stochastic power control algorithm (6.33) to a stationary point of Fe . Usually, two types of convergence are of interest: Convergence in distribution (or weak convergence) or almost sure convergence [143, 168]. Convergence in the mean square sense was considered in [160]. For instance, with (C.6-5)–(C.6-6), it follows from [143] that the algorithm converges weakly to a stationary point if δ(n) = c n−μ for some μ ∈ (0, 1]. Almost sure convergence of the algorithm follows from [167, 143] when the noise process is subject to some additional constraints. In [143], the reader can find further useful results for correlated and/or non-exogenous (state dependent) noise processes. Below we present two exemplary conditions for the almost sure and weak convergence of (6.33). Observation 6.19. Suppose that Fe and S are defined by (6.12) and (6.11). Let (C.6-1)–(C.6-8) be satisfied and suppose that either of the following holds.
∞ (i) There is an even p ∈ N so that n=0 (δ(n))p/2+1 < ∞ and sup E[|Δk (n)|p ] < ∞,
n∈N
(ii)
∞ n=0
k ∈ K.
e−q/δ(n) < ∞, for each q > 0.
Then, a sequence of random vectors {s(n)} generated by (6.33) converges to a stationary point of Fe on S almost surely and, by Theorem 6.9, every stationary point is a global minimizer.
288
6 Power Control Algorithms
Proof. The observation follows from [143, Sect. 5, Theorem 3.1]. An example of the step size sequence satisfying the conditions of Observation 6.19 is δ(n) = C/(n + 1), n ∈ N0 , for any finite constant C > 0. Almost sure convergence, however, is an asymptotic notion, and hence it may be of little value in cases, where the iteration is truncated after some finite number of steps. In such cases, the interest should be in the weakest notion of probabilistic convergence, referred to as weak convergence (or convergence in distribution) [143]. Below we state a convergence result for a particular case of slowly changing step size sequence and refer to the literature for extensions [143]. Observation 6.20. Suppose that (C.6-1)–(C.6-8) hold and that there is an integer sequence {a(n)} with limn→∞ a(n) = ∞ and limn→∞ sup0≤i≤a(n) |δ(n+ 1)/δ(n) − 1| = 0. Let the estimator be such that {Δk (n)}, k ∈ K, is uniformly integrable. Then, (6.33) converges weakly to a stationary point of Fe on S. Proof. The observation follows from [143, Theorem 2.3, Sect. 8]. Finally, in [3], methods for averaging the iterates (in parallel to the stochastic recursion (6.33)) are presented to improve the convergence rate of the algorithm. In [163], it is shown that the averaging scheme of [3] significantly decreases the variance of the convergence curve of the algorithm (6.33) for reasonable SIR values. In general, the optimal averaging by [3] takes the form 1 s¯k (r) = W (δ(r))
r+W (δ(r))−1
sk (n), r ∈ N0 , k ∈ K
n=r
with the window size sequence {W (δ(r))} determined by the step size sequence. For instance, if δ(r) = O(1/r), {W (δ(r))} increases and, under a fixed step size, the window size remains fixed. The fundamental insight from [3] is that the iterate averaging provides the optimal convergence rate (in the sense of minimizing the trace of some error covariance matrix) if δ(n) decreases slower than O(1/n), while for faster decreasing step size sequences, the gain from parallel averaging is negligible.
6.7 Incorporation of QoS Requirements Sect. 5.7 deals with the problem of incorporating SIR targets of links into utility-based power control. One straightforward approach to achieve this goal is to optimize the aggregate utility function over the feasible power region given by Definition 5.40 (see also Remarks 4.12 and 5.41 on pages 106 and 170, respectively). This approach is called utility-based power control with hard QoS support since each link is guaranteed to meet its SIR target whenever the targets are feasible. Compared with traditional utility-based approaches
6.7 Incorporation of QoS Requirements
289
without any SIR guarantee considered in the previous sections, an additional challenge is to perform the projection on the feasible power region, which, in contrast to the projection on the admissible power region S, may require a great deal of coordination between nodes in decentralized networks. This is illustrated later in this section. Furthermore, it is shown that primal-dual algorithms for finding stationary points of an associated Lagrangian function provide interesting alternatives to primal algorithms such as gradient projection algorithms that operate exclusively on the power vector (primal variable). In this section, we analyze a primal-dual algorithm based on a standard Lagrangian function and show that it can be efficiently implemented in a distributed manner. Sect. 6.8, in contrast, deals with primal-dual algorithms based on a generalized Lagrangian approach. The reader will also find there a discussion on different aspects of primal and primal-dual methods. In Sect. 5.7.2, we introduced the concept of soft QoS support, which, in addition to the need for mitigating the projection problem mentioned above, is motivated by the fact that the SIR targets might be infeasible for some channel states, in which case the utility-based power control problem with hard QoS support has no solution. The idea is based on the observation that any feasible SIR targets can be satisfied provided that the utility functions are chosen suitably in the traditional utility-based power control. All this is discussed in detail in Sect. 5.7.2, to which the reader is referred for further information. Later in Sect. 6.7.2, we consider algorithmic aspects of utilitybased power control with soft QoS support. 6.7.1 Hard QoS Support This section presents algorithmic solutions for the utility-based power control problem with hard QoS support (5.146) stated in Sect. 5.7.1. We first reformulate the problem as a minimization problem in the logarithmic power domain: (6.34) s∗ (ω) := arg min Fe (s) s∈S◦ (ω)
where Fe : RK → R is given by (6.12). In what follows, the sets S◦ (ω) ⊂ RK and S(ω) ⊆ RK are logarithmically transformed sets P◦+ (ω) = P◦ (ω) ∩ RK ++ defined by (5.145) and P+ (ω) = P(ω) ∩ RK ++ , where the feasible power region P◦ (ω) and the valid power region P(ω) are introduced in Sect. 5.5.1. In words, every p ∈ P◦+ (ω) corresponds to exactly one s ∈ S◦ (ω) and s = log(p). Similarly, every p ∈ P(ω) has exactly one counterpart s = log(p) ∈ S(ω). The following is a formal definition of validity and feasibility in the logarithmic power domain (see also Remark 6.22 later). Definition 6.21 (Valid and Feasible Power Vectors). Let ω ∈ QK be given and let γk = γ(ωk ) for each k ∈ K. Then, we say that s ∈ RK is a valid power vector if SIRk (es ) ≥ γk , k ∈ K. The set S(ω) := ∩k∈K Sk (ω) with (6.35) Sk (ω) := s ∈ RK : SIRk (es ) = esk /Ik (es ) ≥ γk
290
6 Power Control Algorithms
is called the valid power region. The feasible power region is defined to be S◦ (ω) := S(ω) ∩ S
(6.36)
Any member of S◦ (ω) is called a feasible power vector. If S◦ (ω) = ∅, then ω is said to be feasible. Analogous to Sect. 5.5, valid power vectors are those vectors s ∈ RK for which SIRk (es ) ≥ γk := γ(ωk ) holds for each k ∈ K. If additionally s satisfies the power constraints, we call it feasible. The set Sk (ω) can in turn be interpreted as the valid power region when the SIR target of link k is equal to γ(ωk ) and all other SIR targets are zero. Note that if γ(ωk ) ≡ 0 or, equivalently, γk = 0, then link k is called best-effort link ; otherwise, it is called QoS link (Definition 5.77). Regarding the choice of the function γ, the reader may find it interesting to consider Remark 5.39 on page 170 and the discussion in Sect. 5.7.1. Remark 6.22. Definition 6.21 is a straightforward extension of Definition 5.40 to logarithmic power vectors. Note that s ∈ S(ω) implies that p = es ∈ P(ω). The converse holds as well, provided that each link has a positive SIR target since then p ∈ P+ for every p ∈ P(ω).8 In contrast, if γk = 0 for some k ∈ K, then the converse is not true in general as vectors with zero entries have no counterparts in S(ω). We may thus have S◦ (ω) = ∅ even if P◦ (ω) = ∅, which is true whenever P◦+ (ω) = ∅: There is no feasible logarithmic power vector although the SIR targets are feasible in the sense of Definition 5.40. In such cases, QoS links can satisfy their SIR targets but only if at least one best-effort link is denied access to the resources. One should be aware of the subtle differences mentioned in the remark above. From practical point of view, however, it is reasonable to exclude the “pathological” cases by assuming that P◦+ (ω) = ∅ (Condition (C.5-16)). Note that whenever this is true, ω is feasible if and only if S◦ (ω) = ∅. For some technical reasons, throughout this section, we even assume the slightly stronger condition (C.5-17), which implies that (C.6-9) int(S◦ (ω)) = ∅. As a consequence, ω is feasible. As discussed in Sect. 5.5.1, the interior of P◦ (ω) can be empty. Observation 5.46 implies that if V is irreducible and P◦+ (ω) = ∅, then P◦+ (ω) is either a singleton, in which case the problem (6.34) is trivial, or int(P◦+ (ω)) = ∅. So, from practical point of view, the above assumption is reasonable. Observation 6.23. Both S(ω) and S◦ (ω) are closed convex sets. 8
As z > 0, Theorem A.51 implies (see also Chap. 2 as well as Sects. 5.2.2 and 5.3) that p(ω) > 0 if γk > 0 for each k ∈ K. Furthermore, we see from (5.54) that pk (ω) = 0 if and only if γk = 0.
6.7 Incorporation of QoS Requirements
291
Proof. Since the empty set is closed and convex by definition, we can assume S(ω) = ∅ and S◦ (ω) = ∅. By Observation 5.45, P(ω) is a convex set. So, it is obvious that S(ω), which is a logarithmic transformation of P(ω) ∩ RK ++ , is a convex set as well. Furthermore, from (6.35), we can deduce that Sk (ω) is closed (but not bounded). Thus, the observation follows from the fact that intersection of closed convex sets is closed and convex. An immediate consequence of Observation 6.23 and (C.6-9) is the following. Observation 6.24. Suppose that (C.6-9) holds. Then, (6.34) is a convex optimization problem and the Slater’s constraint qualification is satisfied. By Theorem 6.3, the function Fe (s) is convex on RK so that convexity of the problem (6.34) immediately follows from Observation 6.23. The Slater’s constraint qualification in turn is a direct consequence of Observation 6.23 and (C.6-9), which implies that there exists s ∈ S◦ (ω) such that s ∈ int(S◦ (ω)). Remark 6.25. Slater’s condition is a simple constraint qualification, which together with the convexity property of the objective function ensures that strong duality holds for the problem (6.34). However, note that if the constraints can be written as a set of linear inequalities, then Slater’s condition reduces to feasibility condition [16, pp. 226–227]. The reader is also referred to B.4.3. In the remainder of this section, we first present two simple algorithmic approaches to the problem (6.34). After that, we consider the possibility of exploiting barrier properties of the function ψ to guarantee the QoS requirements. In this case, however, it must be emphasized that the limiting point of the presented algorithm only approximates a solution (6.34). Gradient Projection Algorithm A straightforward approach is to apply a gradient projection method to the problem (6.34). In this case, the iteration takes the form s(n + 1) = ΠS◦ (ω) s(n) − δ(n)∇ Fe (s(n)) (6.37) where ΠS◦ (ω) : RK → S◦ (ω) denotes the projection on S◦ (ω) given by (B.27) and the step size sequence δ(n) > 0 is either (C.6-10) δ(n) = δ, n ∈ N0 , for some sufficiently small δ > 0, or it satisfies (C.6-7)–(C.6-8). The kth entry of the gradient vector is given by (6.20), and therefore the gradient vector can be computed in a distributed manner using the scheme presented in Sect. 6.6. In fact, the only difference to the algorithm presented in Sects. 6.5–6.6 is that, instead of S, the projection
292
6 Power Control Algorithms
is now performed on S◦ (ω). Notice that, by Observation 6.23 and Theorem B.45, the projection exists and is unique. Now, considering Observations 6.23 and 6.24 shows that the statements of Theorem 6.9 and Corollary 6.13 apply to (6.37) as well, which leads to the following result. Observation 6.26. Let (C.6-1)–(C.6-3) and (C.6-9) be fulfilled, and let {s(n)} with n ∈ N0 be a sequence generated by (6.37). Then, for a sufficiently small δ(n) = δ > 0, n ∈ N0 , {s(n)} converges to some s∗ (ω) ∈ S◦ (ω). Moreover, s∗ (ω) minimizes Fe (s) over S◦ (ω). If the conditions of Corollary 6.13 are satisfied, then the convergence rate is linear. From practical point of view, the projection of a power vector s ∈ RK on S◦ (ω) may be significantly more elaborate than just the projection on S. To see this, let us write S◦ (ω) as the intersection of convex sets Sk (ω)∩S, k = 1 · · · K, where Sk (ω) is the logarithmically transformed set Pk (ω) ∩ RK ++ defined by (5.66). So, with trace(V) = 0, we have (6.38) Sk (ω) := s ∈ RK : aTk es ≤ −γk zk where ak = γk vk,1 , . . . , γk vk,k−1 , −1, γk vk,k+1 , . . . , γk vk,K . This shows that the projection operation can be somewhat alleviated by applying a so-called cyclic projection algorithm [66]: The links successively perform their “own projections”, with link k projecting on Sk (ω) ∩ S, which is in general easier to accomplish than projecting on S◦ (ω). For more details, the reader is referred to [66] (see also references therein). The problem is however that even if we ignore the power constraints, the projection on Sk (ω) may still be difficult to perform in a distributed environment. To see this, we write the projection on Sk (ω) in terms of ak and the other parameters to obtain max{0, aTk es + γk zk } ak ΠSk (ω) (s) = log es − ak 22
(6.39)
which may violate the power constraints. From (6.39), we see that the cyclic projection algorithm is in general not suitable for distributed implementation as the operation (6.39) may require a lot of coordination between the nodes. The main problem is the lack of the knowledge of the vector ak . A Primal-dual Algorithm In order to mitigate or even avoid the projection problem, one can consider a primal-dual method to find stationary points of an associated Lagrangian function. In this section, we focus on a primal-dual algorithm that minimizes a linear Lagrangian function over S × RK + . Primal-dual algorithms based on generalized Lagrangian functions are presented later in Sect. 6.8 and compared to the conventional ones. Note that in the context of Lagrangian optimization theory, the problem (6.34) is called the primal problem and the power vector s is referred to as the primal variable.
6.7 Incorporation of QoS Requirements
293
Following the notation in Sect. 5.7.2, we use A = {k ∈ K : γk > 0} to denote the index set of users having (positive) SIR targets. Let A := |A| ≥ 1, and let L : RK × RA → R be a Lagrangian function associated with the problem (6.34) and defined to be (see also App. B.4.3) λk fk (s), (s, λ) ∈ RK × RA . (6.40) L(s, λ) := Fe (s) + k∈A
Here and hereafter, λ = (λ1 , . . . , λA ) ∈ RA is the dual variable and fk : RK → R, k ∈ A, are the constraint functions associated with the QoS requirements. An important observation is to rewrite the QoS constraints such that the constraint functions takes the following form fk (s) := Ik (es )/esk − 1/γ(ωk ),
k ∈ A.
(6.41)
So, the function fk determines the set Sk (ω) given by (6.39). Note that Lagrangian function (6.40) is obtained by augmenting the objective function with a weighted sum of constraint functions that only take the QoS constraints into account; the power constraints are in contrast not considered in the Lagrangian function for simplicity reasons.9 Thus, to be more precise, the Lagrangian function (6.40) should be rather referred to as a partial Lagrangian function associated with the problem. For brevity, though, we do not use the word “partial” in this context and refer to (6.40) as a Lagrangian function associated with the problem (6.34). Observation 6.27. The function fk : RK → R given by (6.41) is convex for each k ∈ A. Moreover, the Lagrangian function (6.40) is a convex-concave function [153] (see also Definition B.60). Proof. The first part immediately follows from the proof of Lemma 6.2, which shows that Ik (es )/esk is log-convex on RK and hence also convex. So, by Observation 6.24 and the fact that the sum of convex functions is convex, L is a convex function with respect to the power vector s ∈ RK for any fixed λ ∈ RK . Thus, the second part follows from Definition B.60 since the Lagrangian function is concave in the dual variable λ for any fixed s ∈ RK . Since fk (s) ≤ 0, k ∈ A, for any fixed s ∈ S(ω) = ∅, L(s, λ) ≤ L(s, 0) for all s ∈ S(ω) and all λ ≥ 0. Hence, for any given s ∈ S◦ (ω) = ∅, L(s, λ) attains its maximum (which exists) at λ = 0, From this and unboundedness / S(ω), it follows that of λ → L(s, λ) on RA + for any s ∈ min Fe (s) = min max L(s, λ)
s∈S◦ (ω)
s∈S λ≥0
from which we have 9
In Sect. 6.8, the power constraints are taken into account under the assumption of individual power constraints on each link.
294
6 Power Control Algorithms
s∗ (ω) = arg min max L(s, λ) . s∈S λ≥0
(6.42)
The corresponding dual problem to (6.34) is defined to be (see App. B.4.3 for some definitions related to the Lagrangian duality theory)10 λ∗ (ω) = arg max min L(s, λ) . λ≥0 s∈S
(6.43)
where the minimum (for any λ ≥ 0) and the maximum can be shown to exist. Now, an application of standard results from convex optimization theory [169, pp. 355–371] and [153] (see also App. B.4.4 and App. B.4.3) together with Observations 6.24 and 6.27 proves the following. Observation 6.28. If (C.6-9) is true, then max min L(s, λ) = min max L(s, λ) λ≥0 s∈S
s∈S λ≥0
(6.44)
and the complementary slackness conditions hold so that we have λ∗k (ω) = 0 for each k ∈ T (s∗ (ω)) where T (s) := {k ∈ K : fk (s) < 0}. The pair (s∗ (ω), λ∗ (ω)) ∈ S◦ (ω) × RK + is a saddle point of the Lagrangian function : with respect to S × RA + L(s∗ (ω), λ) ≤ L(s∗ (ω), λ∗ (ω)) ≤ L(s, λ∗ (ω))
(6.45)
for all s ∈ S and λ ∈ RK +. Considering Observations 6.24 and 6.28 as well as App. B.4.3, we can conclude that the Kuhn–Tucker conditions (see Definition B.50 and Remark below) provide necessary and sufficient conditions for optimality. In other words, the pair (s∗ (ω), λ∗ (ω)) provides optimal solutions to the primal and dual problems if and only if it satisfies the Kuhn–Tucker conditions. As a consequence, we can solve the problem (6.34) by solving the Kuhn–Tucker conditions, which is equivalent to satisfying the complementary slackness conditions and finding a stationary point of the Lagrangian function (6.40) (on S × RK + ). The Kuhn–Tucker points are saddle points (s∗ (ω), λ∗ (ω)) ∈ S◦ (ω) × RA + of the Lagrangian function (6.40). Remark 6.29. Note that the Kuhn–Tucker conditions of Definition B.50 are formulated for the case when the primal variable in the Lagrangian function is unconstrained. In this section, however, the primal variable (power vector) s ∈ RK is confined to belong to S because, as aforementioned, we have not taken into account the power constraints in the definition of the Lagrangian function. As a consequence, the condition that the gradient vanishes must be substituted by a condition analogous to that in (B.26). More precisely, s ∈ S must be a stationary point of the Lagrangian function (6.40) on S × RK + in the sense of Definition B.44: (∇L(s∗ (ω), λ∗ (ω)))T (z − s∗ (ω)) ≥ 0 for all z ∈ S, which reduces to ∇L(s∗ (ω), λ∗ (ω))) = 0 if S = RK . 10
Note that this is not a classical formulation of the dual problem due to the power constraints s ∈ S.
6.7 Incorporation of QoS Requirements
295
In order to find a saddle point of L with respect to X := S × RA + , one can consider a primal-dual algorithm of the following form ˜ , n ∈ N0 . x(n + 1) = ΠX x(n) − δ I0K −I0A ∇L(x(n)) (6.46) The notation in (6.46) is defined as follows: x = (s, λ) ∈ RK+A is a stacked ˜ : RK+A → R vector (see the remarks on notation on p. 347 in App. A.1), L is an extension of the Lagrangian function to the domain RK+A such that ˜ L(x) = L(s, λ), (s, λ) ∈ RK × RA , δ > 0 is a sufficiently small step size,11 Im ˜ is is the identity matrix of dimension m, 0 is a suitable zero matrix, ∇L(x) K+A K A ˜ the gradient of L (with respect to x ∈ R ), and ΠX : R × R → X (see (B.27)) yields ΠX (x) = ΠS (s), max{0, λ1 }, · · · , max{0, λA } , x = (s, λ) . (6.47) Note that the projection of the primal variable s on S is necessary since, as mentioned above, the constraints on transmit powers are not included in the Lagrangian function. However, in contrast to the projection on S◦ (ω), the operation given by (6.47) can be performed in a distributed manner, that is, the nodes can compute the desired value without any additional coordination between them. This is explained in more detail in Sect. 6.5.4. In the remainder of this subsection, let us assume individual power constraints on each link, in which case S reduces to (C.6-11) S = {s ∈ RK : esk ≤ Pk , k ∈ K} for some given Pk > 0, k ∈ K. Note that this assumption is made for clarity reasons only and the algorithm is amenable to distributed implementation for any admissible power region having the Cartesian product structure as discussed in Sect. 6.5.4. However, with (C.6-11), S has a particularly simple structure and the projection operation (6.47) simplifies to ΠX (x) := min{s1 , log P1 }, . . . , min{sK , log PK }, (6.48) max{0, λ1 }, · · · , max{0, λA } . The partial derivatives of Fe with respect to sk are given by (6.20), whereas the ˜ other partial derivatives in ∇L(x) yield (note that trace(V) = 0 as discussed in Sect. 4.3.1) ∂φ Ik (es ) esk (s, λ) = −λk s + λl vl,k s , k ∈ K ∂sk ek el l∈Kk
∂Fe (s) = 0, k ∈ A ∂λk 11
∂φ (s, λ) = fk (s), k ∈ A . ∂λk
As in the case of the gradient projection algorithm, one could also assume a non-increasing step-size sequence {δ(n)}, n ∈ N0 , satisfying (C.6-7)–(C.6-8).
296
6 Power Control Algorithms
where we used φ : RK × RA → R to denote φ(s, λ) = k∈A λk fk (s). Hence, under the assumption (C.6-11) implying (6.48), the algorithm (6.46) takes the form , ⎧ ⎪ s (n + 1) = min s (n) − δ gk (s(n))esk (n) k k ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ μk (n)Ik (es(n) ) ⎪ ⎪ − ⎪ ⎪ ⎪ esk (n) ⎨ (6.49) + esk (n) Δk s(n), μ(n) , log(Pk ) , k∈K ⎪ ⎪ ⎪ ⎪ ⎪ λk (n + 1) = max 0, λk (n) + δfk (s(n)) , k∈A ⎪ ⎪ ⎪ ⎪ ⎪ μk (n + 1) = λk (n + 1), k∈A ⎪ ⎪ ⎩ μk (n + 1) = 0, k ∈K\A where the iterations are performed simultaneously, gk : RK → R is defined by (6.21) and Δk : RK × RK + → R is given by (note that vk,k = 0) μ l Δk (s, μ) = vl,k s − SIRl (es )gl (s) el l∈K μl vl,k s + SIRl (es )gl (s) = (6.50) el l∈K
=
vl,k ml (s, μl ),
μ = (μ1 , . . . , μK ) .
l∈K
where
μl + SIRl (es )gl (s) . esl The second step in (6.50) follows from the fact that gl (s) defined by (6.21) is negative on RK since ψ is strictly decreasing. Also it is worth pointing out that if A = ∅, then the algorithm reduces to the gradient-projection algorithm presented in Sect. 6.5. ml (s, μl ) =
Theorem 6.30. If V is irreducible, then the sequence (s(n), λ(n)) generated by (6.46) converges to a saddle point (of the Lagrangian function) ∗ (s∗ (ω), λ∗ (ω)) ∈ S◦ (ω) × RA + given by (6.42) and (6.43). Moreover, s (ω) ◦ minimizes Fe (s) over S (ω). Proof. As the problem is convex and Slater’s conditions are satisfied, the Kuhn–Tucker conditions (Definition B.50) are necessary and sufficient for the minimum in (6.34). By Observation 6.28, the Kuhn–Tucker points are saddle points of the Lagrangian (6.40). Due to irreducibility of V and positivity of z, Lemma 6.5 and Lemma 6.6 imply that the function RK → R : s → L(s, λ) is strictly convex for any λ ≥ 0. On the other hand, proceeding as in the ∗ proof of Lemma 1.32 shows that the function RA + → R : λ → L(s (ω), λ) is strictly concave at any λ = 0. So, by [153, pp. 125–126], we can conclude that a simultaneous application of gradient methods to minimize the Lagrangian over
6.7 Incorporation of QoS Requirements
297
primal variables and to maximize it over dual variables (which is accomplished by (6.46)) converges to a saddle point (s∗ (ω), λ∗ (ω)). Moreover, by the Kuhn– Tucker theorem (Theorem B.53), s∗ (ω) ∈ S minimizes the objective function Fe over S. An example of the convergence behavior of the algorithm can be found in [170]. The algorithm can be implemented in distributed wireless networks using a scheme based
on the adjoint network presented in Sect. 6.6. Except for Δk (s(n), μ(n)) = l∈K vl,k ml (s(n), μl (n)) given by (6.50), all the other quantities in (6.50) (such as the weights wk ) are either known locally or can be computed from local measurements (such as the SIR) and, if necessary, conveyed to the corresponding transmitter/receiver by means of a low-rate control channel. On the other hand, Δk (s(n), μ(n)) can be estimated from the received signal power in the adjoint network with the exchanged roles of transmitters/receivers and channel inversion on each link, as described in Sect. 6.6. The only difference to the scheme of Sect. 6.6 is that each receiver, say receiver l ∈ K, transmits in the nth iteration a sequence of independent zeromean random symbols with the variance being equal to ml (s(n), μl (n)) defined in (6.50). The main steps of the distributed power control scheme/algorithm are summarized below (Algorithm 6.2). Algorithm 6.2 Distributed primal-dual algorithm based on Lagrangian (6.40) Input: w > 0, > 0, n = 0, s(0) ∈ S, ω with S◦ (ω) = ∅, δ > 0 (sufficiently small). Output: s ∈ S 1: repeat 2: Concurrent transmission at transmit powers esk (n) , k ∈ K, with receiver-side estimation of Ik (es(n) ) and SIRk (es(n) ). 3: Each transmitter-receiver pair exchanges independently the necessary estimates and variables including Ik (es(n) ), sk (n), k ∈ K and λk (n), k ∈ A. 4: Concurrent transmission in the adjoint network with transmitter-side estimation of the received power at each transmitter (or receiver in the adjoint network). The variance of the zero-mean input symbols is mk (s(n), μk (n)) given by (6.50). 5: Transmitter-side computation of sk (n + 1), k ∈ K, and λk (n + 1), k ∈ A, according to (6.46) with (6.47) or (6.49) in the case of individual power constraints on each link. 6: n=n+1 7: until |L(s(n), λ(n)) − L(s(n − 1), λ(n − 1))| <
The algorithm assumes that the weight wk is known at both the transmitter side and the receiver side of link k ∈ K. Also, all transmitters and receivers must know the function ψ and its derivative. If different functions ψk , k ∈ K, are associated with distinct links (see Remark 5.8 and the discussion at the
298
6 Power Control Algorithms
end of Sect. 5.7.2), then it is sufficient that each transmitter-receiver pair has only a local knowledge of the function associated with the respective link. Simple Barrier Algorithms Another possibility for enforcing the SIR targets is to exploit barrier properties of the function ψ(x) = −Ψ (x) and, with it, of the objective function Fe : RK → R given by (6.12). The idea is very simple: By Definition 6.21 of the valid power region S(ω), we know that s ∈ S(ω) if SIRk (es ) − γk > 0 for each k ∈ K. Thus, since ψ is assumed to fulfill (C.6-2) and (C.6-9) is satisfied, this suggests using ψ as a barrier function to ensure some given SIR targets. It must be however emphasized that the approach is not a standard barrier method where the objective function is extended by an additional barrier function to enforce the feasibility of the power vector updates [16, pp. 562– 578] (see also remarks in Sect. 5.7.2). Here, we only exploit the fact that the objective function itself has some barrier properties. More precisely, the idea is to minimize the function12 F¯ (s) := wk ψ Θk (s) , s ∈ D := int(S(ω)) = ∅ (6.51) k∈K
over D◦ := int(S◦ (ω)), where Θk : RK → R is given by Θk (s) := SIRk (es ) − γk ,
γk = γ(ωk ), k ∈ K .
Notice that Θk (s) > 0 for all s ∈ D so that the function (6.51) is well defined. Formally, the problem is formulated as follows: ˆs∗ (ω) := arg min F¯ (s) s∈D◦
(6.52)
where (C.6-9) is assumed to hold. By Lemma 6.2, SIRk (es ), k ∈ K, is logconcave on S, and hence Θk (s), k ∈ K, is a log-concave function of s ∈ D. Thus, proceeding as in the proof of Theorem 6.3 shows that F¯ : D → R is convex when (C.6-1)–(C.6-3) are satisfied. Since D◦ ⊂ RK is a convex set, we have the following observation. Observation 6.31. Suppose that (C.6-1)–(C.6-3) are satisfied. Then, the problem (6.52) is a convex optimization problem. The problem (6.52) can be solved using, for instance, a gradient projection algorithm similar to that presented in Sect. 6.5. In fact, the algorithm has the form (6.19) except that: (a) A start point s(0) must be chosen such that s(0) ∈ D. 12
For convenience, in this section, we use of the symbols D and D◦ to denote the interiors of the valid power region and the feasible power region, respectively.
6.7 Incorporation of QoS Requirements
299
(b) The function Fe is replaced by F¯ with the entries of the gradient vector ∇F¯ : D → RK given by ∂ F¯ ∇k F¯ (s) = (s) = esk gk es − vl,k SIRl es gl es ∂sk
(6.53)
l∈K
for each k ∈ K. (c) D → R : s → gk (es ) is given by gk (es ) =
wk ψ (Θk (s)) , Ik (es )
k ∈ K.
(6.54)
Reasoning along the same lines as in Sect. 6.5 shows that ∇F¯ is Lipschitz continuous on any compact subset of D since F¯ is twice continuously differentiable on D. Thus, if the step size is sufficiently small, the resulting gradient projection algorithm based on (6.53) converges to a stationary point of F¯ on D◦ (Definition B.44), regardless of the choice of a start point s(0) ∈ D. By Observation 6.31 and Theorem B.43 (with Definition B.44), we can then conclude that any point of attraction of the algorithm minimizes F¯ over D◦ . Note that due to Condition (C.6-2), the function ψ acts as a kind of barrier (function) for the QoS constraints; it enforces that the sequence of logarithmic power vectors generated by the gradient projection algorithm remains within the interior of the valid power region D, provided that the step size is small enough. As a result, every iteration is a valid power vector and only the projection on S is necessary. The main disadvantage of this approach is that a start point must be a vector belonging to D, and thus it must satisfy each SIR target with strict inequality. Such a power vector can be distributedly computed using, for instance, one of the QoS-based schemes discussed in Sect. 5.5.3. It is also important to point out that the step size is strongly influenced by the choice of a start point. Roughly speaking, the closer the corresponding SIRs are to the SIR targets, the smaller the step size is. The reader can easily observe that the algorithm can be implemented in a distributed manner using the handshake protocol presented in Sect. 6.6 and, especially, in Sect. 6.6.3. All the differences to the scheme of Sect. 6.6 result from using Θk (s) as the basic performance measure of link k ∈ K, instead of the signal-to-interference ratio SIRk (es ). In particular, in order to compute gk (es ) given by (6.54), one has to estimate the value Θk (s), which is easily obtained by subtracting γk from the corresponding SIR estimate. Note that the variance of random symbols transmitted over the adjoint network is equal to |SIRl (es )gl (es )|, with the function gl given by (6.54). Summarizing, we can say that distributed implementation of the “barrier” algorithm takes the same form as Algorithm 6.1 except that gk (p) is now defined by (6.54) and Fe is replaced by F¯ whenever necessary.
300
6 Power Control Algorithms
6.7.2 Soft QoS Support In this section, we extend the gradient projection algorithm from Sect. 6.5 to solve the utility-based power control problem with soft QoS support, which is analyzed in Sect. 5.7.2. We focus on the problem (5.150) being a combination of the utility-based power control policy from Sect. 5.2.5 and a power control approach that approximates a max-min SIR balancing solution. The latter approach is discussed at the beginning of Sect. 5.7.2 and the gradient projection algorithm for this problem is a minor modification of the algorithm presented in Sect. 6.5. In fact, everything remains the same except that gk (p) from (6.20) must be replaced by wk ψ SIRk (p)/γk , γk = γ(ωk ) > 0 . gk (p) = γk Ik (p) The power control problem is clearly convex and can be solved in a distributed manner using the scheme in Sect. 6.6.2, provided that the receivers in the adjoint network know the SIR targets γk > 0. ˜ (α) defined Now let us address the problem of computing a power vector p ˜ (α) > by (5.150). An examination of Observation 5.9 (page 138) shows that p 0. Hence, as in the previous sections, the problem can be formulated in the logarithmic power domain as follows:13 ˜s := ˜s(α) = arg min F˜e (s)
(6.55)
s∈S
where F˜e (s) :=
k∈A
ak ψα
SIR (es ) k + bk ψ SIRk (es ) . γk
(6.56)
k∈B
|A|
|B|
The notation in (6.56) is defined as follows: a ∈ R++ and b ∈ R++ are given weight vectors (see also (C.5-18)), ψα (x) = −Ψα (x), x > 0, where Ψα : R++ → Q is given by (5.28), and ψ(x) = −Ψ (x), x > 0, is any function that satisfies (C.6-1)–(C.6-3). In Sect. 5.7.2, the reader can find an elaborate discussion regarding a suitable choice of the functions used in (6.56) and their impact on the overall performance. Reasoning along the same lines as in Sects. 5.2.5 and 6.3–6.5 shows that (i) the minimum in (6.55) exists, (ii) The problem is convex, and thus globally solvable, and (iii) ∇F˜e (s) is Lipschitz continuous on every bounded subset of S. By Sects. 6.3 and 6.5, the statements (i)–(iii) are true for either of the sum terms in (6.56) so that they also hold for F˜e (s) and the problem (6.55). 13
In this section, it is not necessary to emphasize the dependence of a solution and the objective function on the parameter α so that we drop it for simplicity.
6.7 Incorporation of QoS Requirements
The gradient-projection algorithm for the problem (6.55) is , s(n + 1) = ΠS s(n) − δ∇F˜e (s(n)) , s(0) ∈ S, n ∈ N0
301
(6.57)
where the projection operator ΠS : RK → S is defined by (B.27) and δ > 0 is a constant step size that is small enough. Standard results from convex optimization theory together with the properties (i)–(iii) show that, for a sufficiently small δ > 0, the sequence {s(n)} generated by (6.57) converges to ˜s ∈ S given by (6.55). References [170, 171] presents some results of numerical tests of the algorithm. The kth entry of the gradient vector ∇F˜e (s) is given by (note that vk,k = 0) ∇k F˜e (s) = esk φk (s) + Δk (s) (6.58) where Δk (s) = −
vl,k SIRl (es )φl (s) =
l∈K
=
vl,k |SIRl (es )φl (s)|
l∈K
(6.59)
vl,k ml (s) .
l∈K
The structure of the function φk : RK → R depends on the type of link k: ⎧ ⎪ k ∈A\B ⎨uk (s) φk (s) = uk (s) + vk (s) k ∈ A ∩ B (6.60) ⎪ ⎩ k ∈B\A vk (s) where uk (s) = ak ψα SIRk (es )/γk /(γk Ik (es )) vk (s) = bk ψ (SIRk (es ))/Ik (es ) . Thus, a distributed realization is again similar to that discussed in Sect. 6.7.1. The value Δk (s) defined by (6.59) can be estimated using a scheme based on the concept of the adjoint network as described in Sect. 6.7.1. The only difference is that each receiver, say receiver l ∈ K, transmits in the nth iteration a sequence of independent zero-mean random symbols with the variance equal to φl (s(n)) defined by (6.60). All other quantities that need to be known to update the power vectors are either known locally or can be computed from local measurements and, if necessary, conveyed to the corresponding transmitter/receiver by means of a low-rate control channel. As mentioned at the end of Sect. 5.7.2 (see also Remark 5.80), the QoS links belonging to the set A can be assigned different functions ψαk , αk ≥ 2, such that ψαk (x) = ψα (x), x > 0, for some α ≥ 2. In this case, each transmitter, say transmitter k ∈ A, must inform its receiver about αk . For relatively
302
6 Power Control Algorithms
large values of αk , it might be that the step size has to be relatively small due to rapid variations of the gradient, resulting in a slow progress of the algorithm. One possible remedy is to let such links start with small values of αk , k ∈ A, and then increase αk gradually as the algorithm proceeds until either some predefined maximum value is achieved or the SIR targets are met with satisfactory accuracy. As already mentioned in Sect. 5.7.2, each QoS link can increase its αk , k ∈ A, independently so that no information exchange to the other users is required.
6.8 Primal-Dual Algorithms Up to now, when studying utility-based and QoS-based power control problems in this book, we primarily focused on the so-called primal optimization methods. These methods work purely on the optimization variable itself; the K power vector p ∈ RK ++ or the logarithmic power vector s = log(p) ∈ R . The only exception is Sect. 6.7.1, where we considered a primal-dual algorithm for finding a stationary point of some Lagrangian function for the utility-based power control problem with hard QoS support. So far, however, we have not fully exploited the potential of the Lagrangian optimization approach, with its key concept of Lagrangian duality and the possibility of incorporating the dual variables into the iteration (see App. B.4.3). Iterative optimization methods that operate on both the primal and dual variables of some associated Lagrangian function are referred to as primaldual methods. Analogously, iterative methods that operate purely on the dual variables are called dual methods. For the general theory, we refer here to [16, 162, 172] for the case of convex programming and to [173] for the particular case of linear programming. This section completes the framework for utility-based power control of this chapter by presenting selected primal-dual approaches to the power control problems. As in the entire book, the concepts here are discussed with the goal in mind of designing power control algorithms that combine general efficiency (acceptable trade-off between computational effort and convergence rate) with amenability to adaptive/real-time application. The latter feature consists basically in the possibility of decentralized implementation. First, in Sect. 6.8.1, we exhibit the benefits of the conventional primaldual approach to power control in terms of applicability and (distributed) implementation issues. We consider a problem formulation which unifies, and slightly generalizes, the formulations from Sects. 5.5, 6.3 and 6.7. The primaldual algorithms presented in this section represent an interesting alternative to the primal algorithms considered before as well as to the primal-dual algorithm of Sect. 6.7.1. In particular, they may have certain advantages in terms of convergence and applicability. Next, in Sects. 6.8.2–6.8.4, we focus on two primal-dual power control iterations which rely on a certain construction of a generalized nonlinear La-
6.8 Primal-Dual Algorithms
303
grangian function. The proposed Lagrangian and the related algorithmic solutions fall into a large framework of generalized Lagrangian optimization, which has its origin in the simple concept of Hestenes [174]. It is worth noting that the framework of generalized Lagrangians was in the focus of interest of applied optimization theory in the 1970’s (see, for instance, [174], [175], [176], [177]). The corresponding concepts fell out from the main stream of the research with the proliferation of interior point methods (pushed by [178], but actually invented earlier). The potential and performance of interior point methods predefined the later trend in optimization theory towards highly efficient and fast convergent solutions to convex or linear problem re-formulations. Such tendency stands often in opposition to amenability to efficient use in resource allocation iterations in wireless networks: As is already evident from the preceding sections, a power control iteration needs a well-balanced compromise between computational effort, fast convergence, robustness (to noisy measurements and other inaccuracies) and efficient decentralized implementation (to allow real-time/adaptive application under fading and changing traffic characteristics). In this light, resorting to generalized Lagrangian concepts for resource allocation purposes in wireless networks appears to be attractive, and is in fact demonstrated to be so in Sects. 6.8.2–6.8.4. Next, in Sect. 6.8.5 we show how a primal-dual power control approach can provide significantly improved convergence, while still being amenable to efficient implementation in a wireless network. For this purpose, we apply again a slight modification of the Lagrangian and combine it with some further tricky reformulations. In Sects. 6.8.1–6.8.5 we generalize the focus of our considerations to finding local solutions to power allocation problems and discuss the convex problems (or problems with only global solutions) only as particular instances. Finding only local solutions might remain the only practically accomplishable goal when the considered link performance functions, and/or other conditions, are fixed and do not allow a convex problem (re-) formulation. While the preceding sections present some fundamental concepts that are used throughout the book, this section has rather a character of an add-on overview of potential benefits resulting from the use of Lagrangian duality and related concepts. Moreover, in order to limit the technicality, we only provide outlines of the proofs and refer the interested reader to the literature for the complete proofs. Most of the non-standard results of this section and further references can be found in [179, 84]. Remark 6.32 (Some definitions and remarks). We assume individual power constraints on each (logical) link so that the set of admissible power vectors P defined in Sect. 4.3.3 takes the form ˆk } for some predefined pˆk > 0, k ∈ K. (C.6-12) P = {p ∈ RK + : ∀k∈K pk ≤ p The set of logarithmically transformed positive admissible transmit powers S defined by (6.11) is then S = {s ∈ RK : ∀k∈K esk ≤ pˆk }.
304
6 Power Control Algorithms
The power vector is referred later to as the primal variable. The tuple of all dual variables is called the dual variable. Given any x ∈ RK , the set B(x) = Br (x) denotes an open ball of radius r > 0 centered at x (Definition B.1), and r > 0 is assumed to be chosen appropriately small in each considered case in this section. We make frequent use of the notions of the Lagrangian optimization such as the Lagrangian function, Lagrange multiplier, dual variable, Kuhn–Tucker conditions, Second-Order Sufficiency Conditions, or constraint qualification, for which the necessary background can be found in App. B.4.3. Throughout this section, we can use “∗” as a superscript to distinguish the Kuhn–Tucker points or SOSC points. For instance, sometimes we simply write (s, λ), but sometimes (s∗ , λ∗ ) to distinguish a Kuhn–Tucker or SOSC point from other points. 6.8.1 Improving Efficiency by Primal-Dual Methods In this section, we present two kinds of conventional primal-dual power allocation iterations. In the next sections, we confront them, in terms of implementation issues, applicability and robustness, with a specialized concept of primal-dual iteration applied to a certain, highly nonlinear Lagrangian function. For this purpose, we first reformulate the power control problem so as to unify QoS-based and utility-based power control problems that have been to a large extent considered separately. A Generalized Formulation of the Power Control Problem Consider the power control problem
min F (p)
p∈RK
subject to
⎧ ˆ≤0 ⎪ ⎨p − p −p≤0 ⎪ ⎩ θk (p) − θˆk ≤ 0, k ∈ A
(6.61)
ˆ = (ˆ where the minimum is assumed to exist, where p p1 , . . . , pˆK ) (so that the first two constraint inequalities are equivalent to p ∈ P), and where the following notation is used. 14 (link) performance func(a) θk : RK + → R is a twice Frechet-differentiable tion of link k ∈ A. In this section, it is assumed that a smaller value of θk (p) implies a better performance, and therefore it is desired that θk (p), k ∈ A, is as small as possible. Obviously, when large values of some performance function of interest are desired, we need to change sign of the performance function to satisfy the assumption. 14
For simplicity, in this section, all functions are assumed to be continuously differentiable and Frechet-differentiable functions are just referred to as differentiable functions. See also Definition B.14 and Remark B.15.
6.8 Primal-Dual Algorithms
305
(b) θˆk ∈ R is a number that represents a hard constraint (value) on the performance of link k ∈ A, measured by the performance function θk , k ∈ A. (c) A ⊆ K denotes a subset of logical links that are subject to additional constraints with respect to the performance functions θk , k ∈ A. (d) F : RK + → R is a twice differentiable objective function. In particular, we can define wk ψ(SIRk (p)), (6.62) F (p) = k∈K
RK ++
for some predefined w ∈ and strictly decreasing ψ : R+ → Q ⊆ R, when we are interested in a utility-based power control considered at many places throughout the book. Similarly, given a weight vector w ∈ RK ++ , we can set wk pk (6.63) F (p) = k∈K
when a user-centric approach to power control is considered, in which case we usually set A = K (Chap. 5.5). More general objectives representing, for instance, combinations/mixtures of (6.62) and (6.63), can be utilized if the interest is in the utility-based power allocation under additional QoS constraints (Sect. 5.7). For instance, choosing F (p) to be of the form (6.63) and θk (p) = −SIRk (p),
−θˆk = γ(ωk ),
k∈A=K
(6.64)
then the problem (6.61) becomes the QoS-based power control problem of Sect. 5.5.3 (see Sects. 5.3 and 5.5.3 for the meaning of the function γ). It is also worth pointing out that if (6.62) is considered, then the constraints on link performance can be considered optionally and in the special case when A = ∅, the problem (6.61) with (6.62) is nothing but the “pure” utility-based power control problem (6.3) with a continuously differentiable function ψ. Moreover, if in problem (6.61) we set (6.62) and assume (6.64) with the difference that A = ∅ is any subset of K, then we obtain the equivalent minimization form of the utility-based power control problem with hard QoS support (5.146). Some algorithmic solutions to this problem were presented and discussed in Sect. 6.7.1. Also note that, under the assumption of (6.64), the link performance constraints can be written as γ −1 (SIRk (p)) − ωk ≤ 0 ,
k∈A
(6.65)
where γ −1 : R+ → Q is the inverse function of γ (Theorem B.7). Note that in the preceding sections and chapters of this book, ψ (or Ψ ) is the inverse function of γ, but in fact, γ can be any strictly decreasing function defined on R+ (see also Remark 5.39 in Sect. 5.5.1). A different kind of constraints that have not been discussed so far in this book, are the constraints related to the limitations on power spectral density.
306
6 Power Control Algorithms
Such limitations appear to be of particular interest in networks that share a common bandwidth with other wireless communications systems. As an example, consider a wireless mesh network that consists of several mesh access clusters. Such mesh access clusters often represent distinct wireless systems sharing a common bandwidth and space [180]. To mitigate the inter-system interference in such a mesh, one has to adjust the power spectral density of the mesh access clusters to a specified spectral mask which ensures acceptable system coexistence. The task of mitigating the inter-system interference consists basically in the adjustment of the transmit power constraints of links according to the topology of the mesh access clusters and wave propagation conditions as well as in restricting the maximum received power on links with critically located receivers. The subclass of links with interference-critical receivers can be modeled by A ⊆ K and the corresponding received power constraints can be incorporated into the problem (6.61) by defining the link performance functions as (6.66) θk (p) = (Vp)k + pk + σk2 , k ∈ A. Hereby, the values of θˆk , k ∈ A, shall represent the adjusted received power constraints (maximum allowed powers at link receivers k ∈ A). It must be emphasized that the received power constraints (6.66) makes no sense in connection with the objective (6.63) since then a solution to the power control problem (6.61) is trivially p = 0. In what follows, we restrict our attention to objective and performance functions satisfying the following condition. (C.6-13) If p∗ ∈ RK + is a local solution to the problem (6.61) in the sense p∗ = arg minp∈B(p∗ ) F (p) and p∗ satisfies the constraints in (6.61), then p∗ ∈ RK ++ . The condition characterizes a non-idling property in the sense that no link is allocated zero power under the locally optimal power allocation. Recall that in the previous sections such non-idling property (Observation 5.9) is a consequence of the assumptions (C.5-2)–(C.5-3) on the utility function Ψ (x) = −ψ(x), x > 0. So, the assumption (C.6-13) can be seen as a relaxation of the previous constraints on the utility functions. In particular, unless otherwise stated, we do not require that a local solution to the problem (6.61) is a global one. Under the assumption (C.6-13), we can apply a logarithmic transform of the power vector as in Sect. 6.3, so that the problem (6.61) can be rewritten in an equivalent form as min Fe (s) s
subject to
fk (s) ≤ 0,
k∈L
(6.67)
where we defined Fe : RK → R such that Fe (s) = F (es ), s ∈ RK . Here and hereafter, the constraint functions fk : RK → R, with k ∈ L = {1, . . . , L} and L := K + |A| are defined as
6.8 Primal-Dual Algorithms
esk − pˆk , fk (s) = θk (es ) − θˆk
307
1≤k≤K K +1≤k ≤L
in order to obtain a uniform notation for all constraint inequalities in (6.61). As in the preceding sections, we say that a (logarithmic) power vector s ∈ RK is feasible if fk (s) ≤ 0 for each k ∈ L. Obviously, we need to make here the assumption that the set of feasible power vectors is not empty: (C.6-14) There exists some s ∈ RK such that fk (s) ≤ 0 for each k ∈ L. The set of constraint inequalities satisfied with equality (tight constraints) at s ∈ RK is denoted by T (s) = {k ∈ L : fk (s) = 0} . Due to the rather weak assumptions with respect to the functions F and θk , k ∈ A, the problem (6.61) is not necessarily globally solvable in the sense of Definition B.40. In other words, there may exist local minimizers that are not global ones. Definition 6.33 (Locally optimal power vector). A power vector s∗ ∈ RK is said to be locally optimal (in the sense of problem (6.67)) if s∗ = arg minFe (s) s∈B(s∗ )
fk (s∗ ) ≤ 0, k ∈ L .
(6.68)
Here and hereafter, given a vector x belonging to some Euclidean space, the set B(x) = Br (x) is used to denote an open ball of radius r > 0 centered at x (Definition B.1), where r > 0 is assumed to be chosen appropriately in each considered case. Furthermore, in this section we assume the following. (C.6-15) Any local minimizer (6.68) satisfies the Second-Order Sufficiency Conditions (SOSC, Definition B.55). Precisely, if s∗ is given by (6.68), then s∗ satisfies the SOSC for some λ∗ ∈ RL , and we say simply that (s∗ , λ∗ ) is an SOSC point of problem (6.67). Conversely, it is known that any SOSC point (s∗ , λ∗ ) implies (6.68) (App. B.4.3). The SOSC strengthen the Kuhn–Tucker conditions (Definition B.50) since they sieve local problem solutions from other “uninteresting” Kuhn–Tucker points [175, 176]. Although it is possible to state a power control problem with local minimizers violating the SOSC, the assumption (C.6-15) is in general nonrestrictive in the interesting/relevant problem statements. Such assumption is required by a large part of the theory of general primal-dual algorithms [175, 176, 162].
308
6 Power Control Algorithms
Conventional Primal-Dual Power Control We first turn our attention to conventional primal-dual algorithms based on the linear Lagrangian function. An example of such an algorithm was already presented in Sect. 6.7.1 as one possible approach to the utility-based power control with hard QoS support. As is apparent from Sect. 6.7.1, the operation of projecting a power vector on the feasible power region may be not amenable to distributed implementation since it may require a lot of coordination between the nodes. Furthermore, one can conclude that under general performance functions θk , k ∈ A in (6.61), the primal power control algorithms may have to resort to projection operations that are too computationally demanding to be efficiently implemented in practice. The algorithmic optimization theory suggests therefore the extension of the primal methods by barrier functions or the use of primal-dual methods when the problem form is general [181, 172, 162]. In the case when barrier methods are used and the barrier functions are fixed, there is a hardly tractable trade-off between inaccuracy of the achieved solution and numerical problems due to the barrier slope. On the other hand, a sequential update of the barrier function, as in the case of the primal interior point methods, see [16, pp. 569–578] and also Sect. 6.7.2, leads to an intricate loop-in-loop structure. Furthermore, it is not clear how to efficiently determine an appropriate sequence of barrier functions that depends, for instance, on the gain matrix V (here we can resort only to general V-independent barriers, such as self-concordant functions [16]). For these reasons, the primal-dual iterations seem to be an attractive alternative when the resource allocation optimization (6.67) is considered in its full generality. Certainly, a detriment of primal-dual algorithms, when compared to purely primal iterations such as the gradient projection algorithm, is a higher-dimensional update involving an update of both the power vector and the dual variable. The conventional (linear) Lagrangian function (in short, Lagrangian) associated with the problem (6.67) takes the form ¯ λ) = Fe (s) + λk fk (s), (s, λ) ∈ RK × RL (6.69) L(s, k∈L
where the nonnegative vector λ = (λ1 , . . . , λL ) is the dual variable (see App. ¯ is B.4.3). Note that in contrast to the partial Lagrangian from Sect. 6.7.1, L obtained by adding to the objective function a weighted sum of all constraint functions of the problem (6.61). A kind of a canonical form of a primal-dual iteration, which is sometimes referred to as the primal-dual iteration [175], [182], is given by ⎧ ¯ λ(n)) (primal iteration) ⎨s(n + 1) = arg minL(s, s∈RK , ⎩λ(n + 1) = max λ(n) + δ∇ L ¯ s(n + 1), λ(n) , 0 (dual iteration) λ (6.70)
6.8 Primal-Dual Algorithms
309
for n ∈ N0 and some step size δ > 0, where the maximum in the dual iteration is component-wise. Note that the primal iteration, the update of the primal variable, involves a global minimization problem, which can be too difficult to ¯ λ(n)) has local minima be solved efficiently if, for instance, Lagrangian L(s, that are not global ones. In practice, the nth iteration of the primal variable (which is the power vector) in (6.70) is usually computed by some locally convergent method of unconstrained minimization, which converges to a local ¯ λ(n)) over RK . For instance, the power vector update s(n + minimizer of L(s, 1) in (6.70) can be determined by applying the gradient minimization method ¯ λ(n)), s(m + 1) = s(m) − t∇s L(s(m),
m ∈ N0
(6.71)
with t > 0 denoting a step size. In general, a sequence generated by this ¯ λ(n)) but the point obtained algorithm does not converge to arg mins∈RK L(s, from (6.71) can be used to perform subsequently the dual iteration in (6.70). Obviously, if both Fe and θk , k ∈ A, are convex functions, then Lagrangian (6.69) is convex in s ∈ RK for any λ(n), so that, in this particular case, any locally convergent iteration such as (6.71) finds a globally optimal primal update s(n + 1). For instance, by Sect. 6.7.1, (6.69) is a convex function of s ∈ RK if the weighted objective (6.62) is assumed and the function ψ in (6.62) satisfies (C.6-1)–(C.6-3) and the link performance function θk , k ∈ A, is given by (6.64). The update of the dual variable in (6.70), referred to as the dual iteration/update, is a gradient projection iteration applied to the function ¯ λ) ¯ := min L(s, RL → R : λ → D(λ) s∈RK
(6.72)
which is the classical dual function in Lagrangian optimization theory ([181, 183, 162] and App. B.4.3). As a consequence, the conventional primal-dual iteration (6.70) is a maximization iteration of the dual function (6.72) over ¯ λ). According to Definition RL mins∈RK L(s, + and solves the problem maxλ∈RL + B.56 in App. B.4.3, the latter problem is precisely the dual problem of the original primal problem (6.67). The power control algorithm (6.70) is known to have any SOSC point of the problem (6.67) as its point of attraction, provided that function (6.72) is well defined for any obtained iterate λ(n), n ∈ N0 [175]. A necessary condition to have such a feature is that the SOSC point of attraction, say (s∗ , λ∗ ) ∈ RK × RL + , is a saddle point of Lagrangian (6.69). By the definitions in the Appendices B.4.4 and B.4.3, we know that (s∗ , λ∗ ) ∈ RK × RL + is a saddle ¯ with respect to RK × RL if and only if point of the Lagrangian L + ¯ ∗ , λ∗ ) = max inf L(s, ¯ λ) = min sup L(s, ¯ λ) L(s s∈RK λ∈RL +
or, equivalently,
s∈RK λ∈RL +
¯ λ∗ ) = max L(s ¯ ∗ , λ) . min L(s,
s∈RK
λ∈RL +
(6.73)
310
6 Power Control Algorithms
Whereas a point of attraction of (6.70) achieves the value on the left-hand side, the value on the right-hand side of (6.73) is, in fact, achieved by the solution to (6.67) since ¯ λ) = Fe (s) fk (s) ≤ 0, k ∈ L sup L(s, L ∞ otherwise. λ∈R+ Clearly, if (6.67) has several local minima and a locally convergent iteration, such as (6.70), is applied, then we have only a local saddle point of the type ¯ ∗ , λ∗ ) = mins∈B(s∗ ) L(s, ¯ λ∗ ) = maxλ∈B(λ∗ ) L(s ¯ ∗ , λ). L(s The global saddle point property (6.73) is satisfied basically under convexity of the problem (6.67), in which case the SOSC point is unique and (6.73) represents the classical strong duality property in Lagrangian optimization [181, 162] (App. B.4.3). For instance, in the case of utility-based power control problems with and without QoS support (see Sects. 6.3 and 6.7, where we take (6.62) with certain ψ and optionally (6.64)), the sequence of power vectors obtained by (6.70) converges to s∗ which is a unique global solution to (6.67). If the saddle point property holds at an SOSC point (s∗ , λ∗ ) and the step size δ is sufficiently small, then it is known [175] that the quotient convergence of iteration (6.70) to (s∗ , λ∗ ) is linear (see App. B.4.1). On the other hand, when none of the SOSC points (s∗ , λ∗ ) of the problem (6.67) satisfies (6.73), the power control according to (6.70) is not applicable since then its iterates are not well defined. As an alternative primal-dual iteration to (6.70), one can consider ¯ y(n + 1) = y(n) + δ −I0K I0L ∇L(y(n)) (6.74) λ(n + 1) = max{λ(n + 1), 0} for n ∈ N0 , where y = (s, λ) ∈ RK+L and the maximum operation in the second equation is taken component-wise. A similar primal-dual approach is used in Sect. 6.7.1 to solve the utility-based power control problem with hard QoS support. An iteration process of the type (6.74) appears also in [184] (see also the references therein). As far as the convergence is concerned, the algorithm in (6.74) exhibits similar properties as the canonical primal-dual method (6.70): ¯ Any SOSC point (s∗ , λ∗ ) ∈ RK ×RL + , which is a saddle point of L with respect , is a point of attraction of the algorithm. Under utility-based to RK × RL + power control from Sects. 6.3 and 5.7, where (6.62) with certain functions ψ and optionally (6.64) is taken, the sequence of power vectors generated by (6.74) converges to the unique global solution to (6.67), as in such case we have a convex problem. In contrast to the canonical primal-dual method (6.70), the sequence of iterates generated by (6.74) remains, in general, well defined even if no SOSC
6.8 Primal-Dual Algorithms
311
point has a desired saddle point property. However, in such cases, the convergence of (6.74) is unspecified and the algorithm may diverge or converge to undesired/incidental points [184], [148]. Furthermore, from the implementation point of view, the iteration (6.74) has two important advantages over (6.70). First of all, both primal (with respect to s ∈ RK ) and dual (with respect to λ ∈ RL + ) iterations in (6.74) are performed concurrently. The resulting computational effort of each such update is equal to the effort of a (K + L)-dimensional gradient projection iteration. In comparison to that, a single iteration in the canonical primal-dual method (6.70) involves an Ldimensional gradient iteration with respect to the dual variable λ ≥ 0 and a subsequent completion of a K-dimensional unconstrained optimization. Assuming that a gradient method accomplishes such an optimization in, say, N ∈ N steps, we can recognize the following property. (P.1) The per-iteration computational effort of (6.74) is by the order NK+L K+L = O(1/N ), N → ∞, smaller than the per-iteration effort of the canonical primal-dual algorithm (6.70). The second advantage of (6.74) over (6.70) is that the concurrent update of the primal and dual variable in (6.74) allows for the termination of the iteration by an arbitrary rule, just as in the case of standard gradient methods [181]. A similar robustness of the convergence rate to termination rules is not offered by the iteration (6.70). Due to the separate primal and dual updates and the minimization required for the primal iteration, the iteration (6.70) offers a linear convergence rate under exact/unterminated conduction of the minimization in any single update (6.70). More generally, for the linear convergence of (6.70) it is at least required that the sequence of accuracies ¯ λ(n)), n ∈ N0 (6.75) (n) = s(n + 1) − arg min L(s, s∈RK
satisfies a specific technical summation condition (this can be shown arguing along similar lines as for instance in [175]). Otherwise, the convergence of (6.70) becomes sublinear, so that, in the view of practical implementation, we have the following important property. (P.2) The convergence of the power control algorithm (6.74) is robust to termination conditions, while the linear convergence of iteration (6.70) is highly dependent on the sequence of accuracies (6.75), and therefore is strongly influenced by the termination of the involved optimization. (P.3) The iteration (6.74) can be efficiently implemented in a distributed manner using the adjoint network as described in Sect. 6.7.1. The distributed scheme is similar to the power control schemes presented in the subsequent sections. See also the discussion in Sect. 6.8.4. 6.8.2 Generalized Lagrangian Now, instead of (6.69), we associate the following nonlinear Lagrangian with the problem (6.67).
312
6 Power Control Algorithms
Definition 6.34. For the problem (6.67), let the Lagrangian function be ϕ(φ(μk )fk (s) + c), (s, μ, c) ∈ RK × RL × R (6.76) L(s, μ, c) = Fe (s) + k∈L
where the functions ϕ : R → R and φ : R → R+ are both twice differentiable and satisfy the following conditions. (C.6-16) ϕ (y) > 0, y ∈ R (strictly increasing) (C.6-17) ϕ (y) > 0, y ∈ R (strictly convex) (C.6-18) ϕ (y)/(ϕ (y))2 , y ∈ R, is strictly decreasing and unbounded from above. (C.6-19) φ(y) = φ(−y), y ∈ R (even), (C.6-20) φ(y) ≥ 0, y ∈ R (nonnegative), (C.6-21) φ(y) = 0 if and only if φ (y) = 0 if and only if y = 0, where φ (0) > 0 (has unique local extremum as a minimum with value 0 at 0). It is easily verified that (C.6-20)–(C.6-21) imply strict increasingness of φ(y) for y ≥ 0. The functions ϕ(y) = ey
φ(y) = y 2
are two simplest examples that satisfy all the conditions (C.6-16)-(C.6-21) and yield ϕ (y)/(ϕ (y))2 = e−y , y ∈ R. In Lagrangian (6.76), the vector μ ∈ RK plays the role of λ ∈ RK in the linear Lagrangian (6.69) and thus it is referred to as the dual variable. The additional variable c ∈ R has no own analog in the classical Lagrangian (6.69). The following result shows the first key property of the proposed Lagrangian (6.76). Theorem 6.35. Let ±μ ∈ RL denote any of the 2L vectors such that (±μ)k = μk or (±μ)k = −μk independently for k ∈ L. If (s, λ) ∈ RK × RL is a Kuhn–Tucker point of problem (6.67) (Definition B.51), then (s, ±μ), with ±μ depending on c ∈ R and defined by λk = ϕ (c)φ(±μk ),
k∈L
(6.77)
is a stationary point of Lagrangian (6.76) for any c ∈ R. Conversely, given any c ∈ R, if (s, ±μ) ∈ RK × RL , with s feasible, is a stationary point of Lagrangian (6.76), then (s, λ), with λ given by (6.77), is a Kuhn–Tucker point of problem (6.67). Proof (Outline). Using the complementary slackness condition (pertaining to the set of Kuhn–Tucker conditions, Definition B.50), Conditions (C.6-16)– (C.6-21) and relation (6.77) one can prove that
∇s L(s, ±μ, c) = ∇Fe (s) +
6.8 Primal-Dual Algorithms
313
φ(±μk )ϕ (φ(±μk )fk (s) + c)∇fk (s)
k∈L\T (s)
+
φ(±μk )ϕ (c)∇fk (s) = 0
k∈T (s)
and ∂ L(s, ±μ, c) = ϕ (φ(±μk )fk (s) + c)φ (±μk )fk (s) = 0, ∂±μk
k∈L
for any c ∈ R, so that the stationary point property follows. Conversely, from condition ∇z L(z, c) = 0 and relation (6.77), we obtain φ(±μk )ϕ (φ(±μk )fk (s) + c)∇fk (s) = 0, ∇Fe (s) + k∈L
ϕ (φ(±μk )fk (s) + c)φ (±μk )fk (s) = 0,
k∈L
where the latter equality is (according to Conditions (C.6-16)–(C.6-18)) equivalent to complementary slackness and the first equality can be (according to ¯ λ) = 0. The condition λ ≥ 0 is a consequence (6.77)) written as ∇s L(s, of (6.77) and Conditions (C.6-16)–(C.6-21). Thus, with feasibility of s, the Kuhn–Tucker conditions hold. For the full proof, the reader is referred to [179, App. B]. By Theorem 6.35, a single Kuhn–Tucker point (s, λ) ∈ RK × RL + of the problem (6.67) corresponds to a family of 2L unconstrained stationary points of the generalized Lagrangian (6.76) which are parameterized by c ∈ R and differ only by component signs (recall Condition (C.6-19)). Obviously, in the particular case of a convex problem, we have a unique Kuhn–Tucker point, and thus only one 2L -tuple of associated stationary points of (6.76). Precisely, letting φ± be the restriction of φ to the sub-domain R± and denoting φ−1 ± as its inverse, the dual variables in Lagrangian (6.76) associated via (6.77) with λ ∈ RL + are k ∈ L. μk = φ−1 ± (λk /ϕ (c)), Note that the transform (6.77) does not influence the primal variable so that Theorem 6.35 implies the following observation. (P.4) Finding a power allocation s∗ ∈ RK representing together with some Lagrange multiplier λ∗ ∈ RL + a Kuhn–Tucker point of the problem (6.67) reduces to finding any of the associated unconstrained stationary points of Lagrangian (6.76). Such a change of optimization objectives will prove later to offer the main benefit to power control algorithms based on Lagrangian (6.76): A constrained primal-dual optimization iteration that finds s∗ ∈ RK such that (s∗ , λ∗ ) is a Kuhn–Tucker point reduces basically to an unconstrained numerical iteration finding a solution to the equation
314
6 Power Control Algorithms
∇z L(z, c) = 0,
z = (s, μ) ∈ RK × RL
where (6.77) holds. In fact, note that a Kuhn–Tucker point, say (s, λ) ∈ RK × RL , is constrained since λ ∈ RL + , while the stationary points of (6.76) associated with (s, λ) via (6.77) are any members of RK × RL . The second key feature of the generalized Lagrangian (6.76) concerns the second-order characteristics and can be seen as a specialization of Property (P.4). Theorem 6.36. If (s, λ) ∈ RK × RL satisfies strict complementarity (Definition B.54) and is an SOSC point of the problem (6.67) (Definition B.55), then for a stationary point z = (s, μ) of Lagrangian (6.76), with μ depending on c ∈ R and given by (6.77), there exists some c0 ∈ R, c0 = c0 (z), such that ∇2s L(z, c) 0,
c ≤ c0 .
(6.78)
Conversely, if (6.78) is satisfied at a stationary point z of Lagrangian (6.76) and s is feasible, then (s, λ), with λ given by (6.77), is an SOSC point of the problem (6.67). Proof (Outline). For the Hessian matrix of (6.76) we have φ2 (μk )ϕ (φ(μk )fk (s) + c)∇fk (s)∇T fk (s)+ ∇2s L(z, c) = ∇2 Fe (s)+ k∈L
φ(μk )ϕ (φ(μk )fk (s) + c)∇2 fk (s),
c∈R
k∈L
where by complementary slackness (pertaining to Kuhn–Tucker conditions), by (6.77) and by Conditions (C.6-16)–(C.6-18) one can write ¯ λ) + ∇2s L(z, c) = ∇2s L(s, φ2 (μk )ϕ (c)∇fk (f )∇T fk (s) k∈T (s)
¯ λ) + = ∇2s L(s,
k∈T (s)
λ2k
ϕ (c) ∇fk (s)∇T fk (s), (ϕ (c))2
(6.79) c ∈ R.
For T (s) = ∅, the sufficiency proof follows then from strict complementarity, Condition (C.6-18) and Debreu’s theorem [185], which says that there exists 2 ϕ (c) T ¯ λ) +
ϕ0 = ϕ0 (z) ≥ 0 such that ∇2s L(s, k∈T (s) λk (ϕ (c))2 ∇fk (s)∇ fk (s) 0 2 whenever ϕ (c)/(ϕ (c)) ≥ ϕ0 (for the case T (s) = ∅ the proof of sufficiency is immediate). Conversely, by (6.79) and Condition (C.6-18), we obtain that T ¯ λ)y+ φ2 (μk )ϕ (c)yT ∇fk (s)(yT ∇fk (s)) > 0, y = 0, c ≤ c0 yT ∇2s L(s, k∈T (s)
for some c0 ∈ R, which, together with feasibility of s and Theorem 6.35 is shown to imply the SOSC at z. The full proof can be found in [179, App. C].
6.8 Primal-Dual Algorithms
315
The strict complementarity condition required in the statement shall pose no problems under “reasonable” objective functions F and performance functions θk , k ∈ A, in (6.67). As a kind of rule of thumb, we can say that strict complementarity is satisfied at (s, λ) if no irregularity of the objective function, such as for instance saddle point or local extremum, falls together with some s for which some constraints in (6.67) are satisfied with equality. Thus, preventing the irregularities of this type in the objective function will also avoid technical intricacies of strict complementarity in the most cases (see here also the discussion in App. B.4.3). The theorem says that the Hessian of (6.76), with respect to the power vector, becomes positive definite at any stationary point associated via (6.77) with an SOSC point of the problem (6.67) provided that the parameter c ∈ R is chosen sufficiently small. Roughly speaking, Theorem 6.36 results from the 2 feature that the function ϕ (c)/(ϕ (c)) , c ∈ R, occurs in the considered Hessian of Lagrangian (6.76) in an analogous form as a scaling factor in the theorem of Debreu [185]. This theorem and Condition (C.6-18) imply directly the existence of some threshold c0 ∈ R below which the Hessian ∇2s L(z, c) is positive definite [179]. As a consequence (App. B.2), we have strict convexity of Lagrangian (6.76) as a function of s ∈ B(s∗ ) as long as c ≤ c0 , where z∗ = (s∗ , μ∗ ) is a stationary point of (6.76) associated with an SOSC point of problem (6.67). Thus, the mechanism described by Theorem 6.36 can informally be referred to as convexification [176, 175, 177]. On the other hand, by inspection of the Hessian of (6.76) with respect to the dual variable μ ∈ RL , it is readily seen that such a Hessian is a diagonal matrix ∇2μ L(z, c) = diag(∇2μ L(z, c)), z ∈ RK+L , c ∈ R. Now, letting z∗ = (s∗ , μ∗ ) be a particular stationary point considered in Theorem 6.36, we obtain 0 k ∈ T (s∗ ) 2 ∗ (∇μ L(z , c))kk = / T (s∗ ) ϕ (c)φ (0)fk (s) k ∈ and thus
∇2μ L(z∗ , c) " 0,
c∈R
(6.80)
since s∗ is feasible as (s∗ , λ∗ ), for some λ∗ ∈ RL + , is an SOSC point by Theorem 6.36. This implies further that Lagrangian (6.76) is concave as a function of dual variables on B(μ∗ ), with z∗ = (s∗ , μ∗ ) as a stationary point of (6.76) considered in Theorem 6.36. It is interesting to point out that, in contrast to the strict convexity property from Theorem 6.36, this concavity holds regardless of the choice of c ∈ R. A combination of strict convexity from Theorem 6.36 with the above concavity around a stationary point of Lagrangian (6.76) leads directly to the following saddle point property of such a point. Corollary 6.37. Let (s∗ , λ∗ ) ∈ RK × RL + satisfy strict complementarity. If (s∗ , λ∗ ) is an SOSC point of the problem (6.67) then z∗ = (s∗ , μ∗ ) is a stationary point of Lagrangian (6.76), with μ depending on c ∈ R and given by (6.77), such that there exists some c0 ∈ R, c0 = c0 (z∗ ), satisfying
316
6 Power Control Algorithms
L(s∗ , μ, c) ≤ L(z∗ , c) < L(s, μ∗ , c),
(s, μ) ∈ B(s∗ ) × B(μ∗ ),
c ≤ c0 . (6.81)
The corollary follows straightforwardly by (6.80), (6.78) and Definition B.58 with the remarks thereafter15 . In other words, the corollary states that an SOSC point of the problem (6.67) corresponds via (6.77) to saddle points of Lagrangian (6.76) in RK , isolated over B(s∗ ), whenever c ∈ R is chosen sufficiently small. Thus, as SOSC at (s∗ , λ∗ ) are sufficient conditions for a local solution (6.68), we can conclude the following property. (P.5) The problem of finding a local solution to the problem (6.67) reduces to finding an associated saddle point of the type (6.81) of Lagrangian (6.76) under an appropriately small choice of the parameter c ∈ R. By Theorem 6.35 and Corollary 6.37, a single local minimizer (6.68) is associated precisely with a 2L -tuple of saddle points of the type (6.81). Since the saddle points in such a tuple differ merely in the signs of the components of the dual variables, it is irrelevant which of them is a point of attraction of a power control iteration applied to the problem (6.67). Corollary 6.37 proves later to be the essential ingredient of the proposed power control iterations. The key advantage implied hereby is that the saddle point property (6.81) holds for arbitrary (twice differentiable) objective and performance functions F and θk , k ∈ A, provided that c ∈ R is chosen appropriately. Together with Property (P.4), one can see that a constrained optimization iteration which finds a locally optimal power allocation (6.68) transforms to an unconstrained iteration which is attracted by a stationary point z∗ ∈ RK+L being an isolated saddle point of the type (6.81). Indeed, an SOSC point itself does not correspond, in general, to a saddle point of the classical Lagrangian (6.69) unless, for instance, the problem is convex (see the discussion in App. B.4.3). Furthermore, an SOSC point, say (s∗ , λ∗ ) ∈ RK × RL , is subject to the constraint λ∗ ∈ RL + , while the saddle points (6.81) are nonrestricted in RK × RL (Property (P.4)). In order to emphasize the role of the saddle point property and the resulting Property (P.5), we put it into the context of Lagrangian duality. By Theorem B.59, (6.81) implies max
inf
μ∈B(μ∗ ) s∈B(s∗ )
L(s, μ, c) = min∗
sup
s∈B(s ) μ∈B(μ∗ )
L(s, μ, c),
c ≤ c0
(6.82)
for any local saddle point (s∗ , μ∗ ) ∈ RK × RL from Corollary 6.37. In analogy to the well-known strong duality notion in the linear Lagrangian case (App. B.4.3), the global version of equality (6.82) (with B(s∗ ) = RK and B(μ∗ ) = RL ) can be seen as a strong duality property of the generalized Lagrangian 15
Note here that (6.80) is never true for a stationary point z∗ of (6.76) such that s∗ is not feasible.
6.8 Primal-Dual Algorithms
317
(6.76). Since Corollary 6.37 holds for general functions incorporated in the Lagrangian (6.76), the equality (6.82) is ensured for any choice of (twice differentiable) objective functions F and performance functions θk , k ∈ A. At the same time, considering the linear Lagrangian (6.69), the SOSC point (s∗ , λ∗ ) ∈ RK × RL corresponding to (s∗ , μ∗ ) via (6.77) satisfies under general F and θk , k ∈ A, only the fundamental inequality [186] max
inf
λ∈B(λ∗ ) s∈B(s∗ )
¯ λ) ≤ min L(s, ∗
sup
s∈B(s ) λ∈B(λ∗ )
¯ λ). L(s,
(6.83)
For B(s∗ ) = RK and B(λ∗ ) = RL + , inequality (6.83) is equivalent to the classical weak Lagrangian duality [181], or equivalently, to the nonzero duality gap in general [182] (App. B.4.3). Let some primal-dual iteration which solves locally the problem on the lefthand side of (6.82) be given. Recall that we could apply here, for instance, the canonical primal-dual method (6.70), as it is known from Sect. 6.8.1 to solve the dual problem to the problem (6.67). Then, according to (6.82) and the property Fe (s) + L · ϕ(c) fk (s) ≤ 0, k ∈ L sup L(s, μ, c) = L ∞ otherwise, μ∈R such an iteration finds a local solution to the problem (6.67) regardless of functions F and θk , k ∈ A, provided only a sufficiently small choice of c ∈ R. In simple words, when using the generalized Lagrangian (6.76), which enforces the saddle point property (6.82), an applied primal-dual iteration is amenable to solve the general problem form locally. On the other hand, in the conventional Lagrangian case, a corresponding iteration solving the problem on the left-hand side of (6.83), if well defined, does not find a desired point in general; there is a gap between the objective value assumed at such a point and the locally optimal value16 . Summarizing, the effect of the incorporated convexification mechanism of Lagrangian (6.76) (Theorem 6.36) is the elimination of the gap between the achieved values of the original and the dual problem for general functions incorporated in the problem (6.67). On the other hand, we already know from Sect. 6.8.1 that under the linear Lagrangian (6.69) this gap is closed and (6.83) changes to (6.73), basically only under convexity of the problem (6.67)17 . For more discussion on the duality relations of Lagrangians (6.69) and (6.76) we refer to [179]. 16
17
We prefer to not use here the notion of duality gap to describe the difference between the left- and right-hand side in the local duality relation (6.83). Complying with the convention, this notion is reserved for the global version of the duality relation, where B(s∗ ) = RK , B(λ∗ ) = RL + , as discussed in App. B.4.3. Reference [177] claims that nonconvex problems for which zero duality gap is retained can be classified as “freakish”.
318
6 Power Control Algorithms
Some Notes on Alternative Concepts Lagrangian (6.76) is constructed especially for the purposes of power control in wireless networks. In a more general context it can be seen as a particular concept from the framework of generalized Lagrangian optimization. It seems to us that the most significant contributions to this framework are provided in [174, 176, 175, 177] and references therein. In particular, (6.76) shares the most of its properties with the Lagrangian function from [176], which, if formulated for the problem (6.67), takes the form ˆ ν, d) = Fe (s) + L(s, (ϕ(fk (s)d + νk )H(fk (s)d + νk ) − ϕ(νk )) k∈L (6.84) (s, ν, d) ∈ RK × RL × R++ where the function ϕ : R → R, satisfies certain axioms and H : R → {0, 1} denotes the Heaviside step function so that H(x) = 0 if x < 0 and H(x) = 1 otherwise. Another Lagrangian proposed in [177] is actually an instance of 2 (6.84) with ϕ(y) = |y| /2d. Due to such similarities, we could base the power control iterations proposed in the next section on Lagrangian (6.84) instead of (6.76). Lagrangian (6.84) is technically slightly more involved than (6.76) and the class of potential functions ϕ in (6.84) is substantially different from the function class specified by Conditions (C.6-16)–(C.6-21). The complexity of (6.84) is, however, balanced out by the advantage that it associates all of its stationary points with Kuhn–Tucker points (by a relation of the type (6.77), see [176]). Recall from Theorem 6.35 that such a relation to Kuhn– Tucker points of problem (6.67) is provided by Lagrangian (6.76) merely for those stationary points with feasible power vectors. Evidently, it is common to all power allocation algorithms in real-world wireless networks that they are started at a feasible power allocation. For this reason, they are most likely to converge to points associated with feasible power vectors as well (if such exist) whenever the step size and related parameters are chosen reasonably. Thus, the above difference in relation to Kuhn–Tucker points should play a minor role in the practical convergence behavior of power control algorithms relying on (6.76) and (6.84). For completeness, we mention here that although there are many known generalized Lagrangians having a simpler structure than (6.84) and (6.76) [175], the most of them are applicable to problems with equality constraints only. This might be a significant restriction in power control considerations. Finally, it has to be remarked that the strict decreasingness condition (C.6-18), required for the Lagrangian (6.76), can be loosened to strict monotonicity and unboundedness of the map ϕ (y)/(ϕ (y))2 , y ∈ R. The properties of the Lagrangian expressed in Theorems 6.35, 6.36, Corollary 6.37, and Properties (P.4), (P.5) are then retained, with the generalization that the values of c ∈ R in Theorem 6.36 and Corollary 6.37 have to be chosen either appropriately large or appropriately small, depending on the type of monotonicity of ϕ (y)/(ϕ (y))2 [179].
6.8 Primal-Dual Algorithms
319
6.8.3 Primal-Dual Algorithms This section describes two power control algorithms operating on the proposed Lagrangian (6.76) which, in terms of structure, parallel the conventional primal-dual iterations from Sect. 6.8.1. A comparison of features of both iteration types shows precisely how the properties of the Lagrangian (6.76) translate to iteration advantages. First consider the following iteration s(n + 1) = arg mins∈RK L(s, μ(n), c) (6.85) n ∈ N0 μ(n + 1) = μ(n) + δ∇μ L(s(n + 1), μ(n), c), with a sufficiently small step size δ > 0, which is an analog to the canonical primal-dual method (6.70). Similarly to (6.70), the minimization in the primal iteration will be in practice accomplished by some locally convergent method. If the start point of such a minimization in the primal step is taken as the last power vector iterate, that is, s(n) in (6.85), then any such method finds s(n + 1) = arg
min
s∈B(s(n))
L(s, μ(n), c)
which is, in general, not the global minimizer required in (6.85). It is likely that the particular gradient-based methods, such as the gradient minimization method s(m + 1) = s(m) − t∇s L(s(m), μ(n), c),
m ∈ N0 ,
t>0
are applied to yield the primal update s(n + 1). Obviously, in the special case of a convex problem, this method finds the required global solution in (6.85). The function RL × R → R : D(μ, c) = min L(s, μ, c) s∈RK
can be now seen as a generalization of the dual function (6.72) and the maximization of this function among all s ∈ RK as the corresponding (generalized) dual problem. By Theorem 6.36, for any choice of c ∈ R below some c0 = c0 (z∗ ), every stationary point z∗ ∈ RK+L of Lagrangian (6.76) is an SOSC point of the problem (6.67). As a result, the sequence of power vector iterates generated by (6.85) is well defined as long as it remains within some neighborhood of such a stationary point and c ∈ R is sufficiently small.18 At the same time, Corollary 6.37 implies that z∗ becomes an isolated saddle point of Lagrangian (6.76) of the type (6.81). This property is the key ingredient to ensure the following convergence behavior. 18
In many cases, such a neighborhood can be enlarged by further decreasing c ∈ R, which, however, may entail numerical problems.
320
6 Power Control Algorithms
Theorem 6.38. Let s∗ ∈ RK be a local minimizer (6.68) such that strict complementarity condition and the constraint qualification from Lemma B.52 are satisfied at z∗ = (s∗ , μ∗ ) for some μ∗ ∈ RL . Then, z∗ is a point of attraction of the iteration (6.85) for a sufficiently small step size δ > 0 and for c ≤ c0 , with some c0 = c0 (z∗ ) ∈ R. The convergence of (6.85) to z is linear in quotients (App. B.4.1). The proof of the theorem utilizes standard techniques related to Ostrowski’s Theorem [148, Sect. 10.1]: It goes along precisely the same lines as the convergence proof of Algorithm 4.9 in [176] (see [176, Theorem 4.10]) and the proof of the penalty method from [175] (see [175, Proposition 2, Corollary 2.1]). The essential feature of the iteration (6.85) is that it finds saddle points (6.81) of Lagrangian (6.76) which correspond to locally optimal power vectors (6.68), regardless of the objective and performance functions incorporated in the problem (6.67) (provided only that they are twice differentiable and Condition (C.6-13) is satisfied). Obviously, in the case of a convex problem, any such local solution is a global solution to (6.67). Precisely, according to the discussed convexification mechanism, Lagrangian (6.76) associates SOSC points of (6.67) with own saddle points under full problem generality (Corollary 6.37), and this is further equivalent to the strong duality-like relation (6.82) in the general case as well. As already discussed, the left-hand side of (6.82) is precisely the problem form solved by iteration (6.85) and the value on the right-hand side of (6.82) is achieved by the locally optimal power allocation (6.68). On the other hand, we know from Sect. 6.8.1 that iterations (6.70) and (6.74) are applicable to finding local solutions (6.68) only if the corresponding SOSC points represent saddle points of the linear Lagrangian. It is therefore apparent that (P.6) while the applicability of primal-dual power control according to (6.70) and (6.74) is basically restricted to convex forms of the problem (6.67), the power control iteration (6.85) is successful in finding locally optimal power allocations (6.68) under general objective functions F and general performance functions θk , k ∈ A, in (6.61). Iteration (6.85) is evidently an unconstrained iteration which does not incorporate mechanisms such as mappings or projections of the dual variable iterates. In fact, by Theorem 6.35, any desired point of attraction of iteration (6.86) is an unconstrained stationary point of Lagrangian (6.76) belonging to RK × RL . On the other hand, the primal-dual methods (6.70) and (6.74) are designed to converge to SOSC points, where the dual variable is constrained to nonnegative vectors and thus the projection on the nonnegative orthant is inevitable. Such a projection may deteriorate the step-by-step descent performance of the algorithm in the sense that the observed descent of the objective
6.8 Primal-Dual Algorithms
321
function can slow down after the first application of the projection19 . Thus, we can conclude that (P.7) while the primal-dual methods (6.70) and (6.74) require a projection of the dual variable, the algorithm (6.85) is unconstrained. Now, let us apply an analog of the primal-dual iteration (6.74) to the Lagrangian (6.76). For a given sufficiently small step size δ > 0, the algorithm takes then the form (6.86) z(n + 1) = z(n) + δ −I0K I0L ∇z L(z(n), c), n ∈ N0 where z = (s, μ). Theorem 6.39. Let s∗ ∈ RK be a local minimizer of the problem (6.67) such that strict complementarity condition and the constraint qualification from Lemma B.52 is satisfied at z∗ = (s∗ , μ∗ ) for some μ∗ ∈ RL . Then, for any step size δ such that |Re{λk −I0K I0L ∇2z L(z∗ , c) }| (6.87) 0 < δ < 2 min 1≤k≤K+L |λk ( −I0K I0L ∇2z L(z∗ , c))|2 where λk (·) denotes the k-th eigenvalue, z∗ is a point of attraction of iteration (6.86) for c ≤ c0 , with some μ∗ ∈ RL and some c0 = c0 (z∗ ) ∈ R. Furthermore, the convergence to z∗ is linear in quotients (App. B.4.1). Proof (Outline). Any point z∗ such that s∗ is a local minimizer of (6.67) is a stationary point of (6.76) (Theorem 6.35) and an equilibrium point of the map z → G(z) = z + δ −I0K I0L ∇z L(z, c), (z, c) ∈ RK+L × R . By Ostrowski’s Theorem [148], z∗ is then a point of attraction of the iteration (6.86) if max1≤k≤K+L |λk (∇G(z∗ ))| < 1 or, equivalently, if , max Re λk −I0K I0L ∇2z L(z∗ , c) <0 (6.88) 1≤k≤K+L
where
, Re λk ( −I0K
0 IL
2 ∇z L(z∗ , c)) = −Re vTk ∇2s L(z∗ , c)vk + Re wTk ∇2μ L(z∗ , c)wk
(6.89)
with uk = (vk , wk ) ∈ RK+L as the k-th eigenvector associated with λk . While the nonpositivity of the left-hand side in (6.88) is obvious for c ≤ c0 and some c0 ∈ R (Corollary 6.37), the negativity follows from Lemma B.52 and the fact 19
The step-by-step descent performance should not be, however, confused with the notion of convergence rate, measured in roots or quotients, see App. B.4.1.
322
6 Power Control Algorithms
that constraint qualification is satisfied. Indeed, if the left-hand side of (6.88) was zero, then, by (6.89), we would have −∇2s L(z∗ , c)vk − ∇2s,μ L(z∗ , c)wk = 0, c ≤ c0 T ∇2s,μ L(z∗ , c)vk + ∇2μ L(z∗ , c)wk which yields
further (by ∇2s L(z∗ , c) 0 and Corollary 6.37) that vk = 0. This implies that i∈T (s∗ ) ϕ (c)φ (μ∗i )(wk )i ∇fi (s) = 0 with ϕ (c)φ (μ∗j )(wk )j > 0 for some j ∈ T (s∗ ), which contradicts the condition from Lemma B.52. We refer to [179, App. D] for the detailed proof. In other words, the iteration (6.86) shows basically the same features of convergence as iteration (6.85); given a sufficiently small step size (6.87) and c ∈ R not exceeding some threshold, the iterate sequence is attracted by the saddle points of (6.76) associated with local solutions to problem (6.67) irrespective of the considered objective function and performance functions. Similarly to (6.85), iteration (6.86) is unconstrained, so that the analogs of both advantages (P.6) and (P.7) over the primal-dual methods (6.70) and (6.74) remain true for (6.86). Moreover, iteration (6.86) benefits from the joint primal-dual update structure. This is the analog of advantage (P.1) obtained by iteration (6.74); it means that the iteration is superior to both the canonical primal-dual method (6.70) and the iteration (6.85) in terms of computational effort per update. As a further consequence of concurrent primal-dual update, iteration (6.86) shares also the advantage (P.2) with iteration (6.74): In contrast to separate update iterations (6.70), (6.85) the linear convergence of (6.86) is robust to termination rules. Remark 6.40. It has to be remarked that the existence of stationary points of the proposed Lagrangian (6.76) associated with infeasible power vectors cannot be excluded in general. Thus, for some objectives F or some performance functions θk , k ∈ A, in (6.61), the proposed iterations (6.85) and (6.86) may have infeasible, and thus undesired, points of attraction in addition to the desired saddle points (6.81). As already discussed at the end of Sect. 6.8.2, this is expected to pose no problems in practice for “reasonable” objective functions and link performance functions. Alternatively, one can apply the proposed primal-dual iterations (6.85), (6.86) to the Lagrangian (6.84) to prevent infeasible points of attraction completely [176]. 6.8.4 Decentralized Implementation It is readily seen that the primal-dual methods (6.70), (6.74) correspond to the iterations (6.85), (6.86), respectively, under the particular settings c = 0, φ(y) = y, ϕ(y) = y, y ∈ R, where additionally the nonnegative projection of dual variable iterates is applied. Except the projection, this makes evident that the proposed iterations (6.85), (6.86) are generalizations of the primaldual methods (6.70), (6.74), as the proposed Lagrangian (6.76) generalizes the
6.8 Primal-Dual Algorithms
323
linear Lagrangian (6.69). Furthermore, it is readily seen that iteration (6.85) which uses the particular gradient method (6.71) in the primal update resorts exactly to the same implementation procedures as does iteration (6.86). In this light, and due to the fact that the gradient-based primal update (6.71) is likely used in practice, we can focus here on the decentralized feedback scheme realizing iteration (6.86). We confine our attention to the problem instance with the (negative) aggregate utility function of the type (6.62). The handshake scheme realizing (6.86) relies, again, on the concept of the adjoint network as in the case of the primal gradient method (Sect. 6.5) but extends the approach from Sect. 6.6 by accounting for the subclass of constrained links A. We consider A precisely as either the subclass of links requiring hard QoS support (performance function (6.64)) or the subclass of links with critically located receivers restricted in the received power (performance function (6.66)). Analogous to Sect. 6.6, it is justified to assume some elementary parameter knowledge on each node. Precisely, we assume the functions ψ, φ, ϕ and an appropriate c ∈ R (Theorem 6.39) to be set in advance at all link transmitters and receivers. A constraint pˆk is known to the corresponding link transmitter k ∈ K and any constraint θˆk , on SIR or interference power, is adjusted to the traffic class or to the restrictions on the power spectrum on the receiver of link k ∈ A. A link weight wk can be either fixed on both sides of the link according to the priority of link k ∈ K, or adjusted to the corresponding traffic class. Obviously, any receiver k ∈ K disposes of estimations of its own power gain and the received power from the associated transmitter, and thus can estimate/compute the transmit power esk , SIRk (es ) and the received sumpower (6.66), which is equal to esk (1 + 1/SIRk (es )). Main Ingredients Given (6.62), we can write the generalized Lagrangian (6.76) as L(s, ν, η, c) = wk ψ(SIRk (es )) + ϕ(φ(ηk )(θk (es ) − θˆk ) + c) k∈K
+
k∈A sk
ϕ(φ(νk )(e
− pˆk ) + c)
(6.90)
k∈K
with (s, ν, η, c) ∈ RK ×RK ×R|A| ×R. Let us assume that either (6.64) or (6.66) is true. Then, elementary calculations show that a gradient component of the first term in the Lagrangian (6.90) for the n-th iterate (s(n), ν(n), η(n)) ∈ R2K+|A| can be written, for any link k ∈ K, as
324
6 Power Control Algorithms
∂ wj ψ(SIRj (es(n) )) = mk (n) 1/SIRk (es(n) ) − 1 ∂sk j∈K vjk mj (n) esk (n) + mk (n) +
j∈Kk
(6.91)
(a)
with mk (n) = −wk ψ (SIRk (es(n) ))SIR2k (es(n) )e−sk (n) ,
k∈K
(6.92)
where we have mk (n) ≥ 0, k ∈ K, due to the strict decreasingness of ψ. It is obvious that the term mk (n)(1/SIRk (es(n) ) − 1) can be made available to the kth logical transmitter by receiver-side estimation of the corresponding SIR and a reliable low-rate feedback channel (see Sect. 6.6.1). Furthermore, the adjoint network (Sects. 6.6.2, 6.6.3) can be used to estimate the term (a) in (6.91) at the kth logical transmitter (in the primal network), provided that the allocated transmit powers in the adjoint network are given by (6.92). For a gradient component of the second term in (6.90) with respect to the transmit powers we yield, with k ∈ K, ∂ ϕ φ ηj (n) θj (es(n) ) − θˆj + c = esk (n) Σk (n) ∂sk j∈A + rk (n) +
vjk rj (n)
j∈A,j =k
(6.93)
(b)
where in the case of constraints on link SIR (6.64) we have θ 2 (es(n) ) ϕ (φ(ηk (n))(θk (es(n) ) − θˆk ) + c) kesk (n) φ(ηk (n)) rk (n) = 0 and
Σk (n) = rk (n)
1 − 1 , θk (es(n) )
k∈K
k∈A k∈ /A
(6.94)
(6.95)
while under constraints on received power (6.66) we get Σk (n) = 0, k ∈ K, and ϕ (φ(ηk (n))(θk (es(n) ) − θˆk ) + c)φ(ηk (n)) k ∈ A rk (n) = (6.96) 0 k∈ / A. In either case, we have rk (n) ≥ 0, k ∈ K, which follows from Conditions (C.6-16)–(C.6-21). Again, the value of (6.95) can be approximately computed from the corresponding SIR estimate. An estimate of (b) in (6.93) at the kth transmitter can be obtained using, again, the scheme based on the adjoint
6.8 Primal-Dual Algorithms
325
network described in Sects. 6.6.2, 6.6.3. The only difference is that now the transmit powers in the adjoint network are equal either to (6.94) or to (6.96) depending on the performance functions θk , k ∈ A, under consideration. Note that the computation of (6.94), (6.95) and (6.96) in the above way requires the knowledge of ηk (n) at the link receiver k ∈ K. A gradient component with respect to the dual variable ∂ ϕ(φ(ηj (n))(θj (es(n) ) − θˆj ) + c) = ∂ηk
(6.97)
j∈A
s(n)
ϕ (φ(ηk (n))(θk (e
) − θˆk ) + c)φ (ηk (n))(θk (es(n) ) − θˆk )
where either (6.64) or (6.66) holds, is known at the corresponding link receiver k ∈ A if the iterate ηk (n) is known at this receiver. As for the third term in (6.90), a gradient component with respect to the link power ∂ ϕ(φ(νj (n))(esj (n) − pˆj ) + c) ∂sk
(6.98)
j∈K
sk (n)
= ϕ (φ(νk (n))(e
− pˆk ) + c)φ(νk (n))e
sk (n)
and a gradient component with respect to the dual variable ∂ ϕ(φ(νj (n))(esj (n) − pˆj ) + c) ∂νk
(6.99)
j∈K
sk (n)
= ϕ (φ(νk (n))(e
− pˆk ) + c)φ (νk (n))(e
sk (n)
− pˆk )
are both available to the corresponding link transmitter k ∈ K provided the transmitters’ knowledge of νk (n). Distributed Handshake Protocol The above discussion leads to the following implementation scheme for the power allocation iteration. Recall from Remark 6.14 that the availability of a dedicated low-rate control link for each transmitter-receiver pair is assumed. Remark 6.41. Note that if the estimates needed in steps 2, 6, and 8 of Algorithm 6.3 are not accurate enough to treat them as deterministic variables, then the algorithm can be studied within the standard framework of stochastic approximation [143]. In fact, the results presented in Sect. 6.6.5 for the gradient projection algorithm can be straightforwardly extended to the case of noisy measurements in Algorithm 6.3: Using similar techniques as in Sect. 6.6.5, one can show that the sequence {s(n)}n∈N0 generated by Algorithm 6.3 under an appropriately chosen sequence of step sizes {δ(n)}n∈N0 converges to a local solution to the problem (6.61) weakly or almost surely, depending on the conditions posed on the estimation noise.
326
6 Power Control Algorithms
Algorithm 6.3 Distributed implementation of Algorithm (6.86) Input: w > 0, ψ, ϕ, φ, n = 0, s(0) ∈ RK , a sufficiently small step size δ > 0. Output: s ∈ S 1: repeat 2: Concurrent transmission of links k ∈ K using transmit power vector es(n) . 3: Receiver-side estimation of transmit power esk (n) , SIRk (es(n) ) and received power esk (n) (1 + 1/SIRk (es(n) )) on any link k ∈ K. 4: Receiver-side computation of component (6.97) on any link k ∈ A and transmitter-side computation of component (6.99) on any link k ∈ K. 5: Per-link feedback of SIRk (es(n) ) on any link k ∈ K and, under link SIR constraints, per-link feedback of component (6.95), given (6.94), on any link k ∈ A. 6: Concurrent transmission of the adjoint network using transmit powers (6.92). 7: 8:
9: 10:
Transmitter-side estimation of the received power (6.91)-(a) and transmitterside computation of component (6.91) on any link k ∈ K. Concurrent transmission of the adjoint network using either transmit powers (6.94) under link SIR constraints, or transmit powers (6.96) under received power constraints. Transmitter-side estimation of the received power (6.93)-(b) and transmitterside computation of component (6.93) on any link k ∈ A. Transmitter-side computation of ∂s∂k L(s(n), ν(n), η(n), c) and transmitterside update ∂ L(s(n), ν(n), η(n), c) ∂sk ∂ νk (n + 1) = νk (n) + δ L(s(n), ν(n), η(n), c) ∂νk sk (n + 1) = sk (n) − δ
on any link k ∈ K, and receiver-side update ηk (n + 1) = ηk (n) + δ
∂ L(s(n), ν(n), η(n), c) ∂ηk
on any link k ∈ A. 11: n → n + 1. 12: until Some termination condition is satisfied.
6.8.5 Min-max Optimization Framework In this section, we propose a specific splitting of the optimization variables in the original utility-based power control problem (6.17). This variable splitting is combined with a modified Lagrangian function, related to the framework presented in the preceding section, in order to obtain an algorithm with improved convergence properties in comparison to the iteration (6.86). The resulting power control algorithm is shown to be amenable to distributed implementation by means of a feedback scheme similar to that of Sect. 6.6.2. As in the preceding sections, we give only outlines of the proofs in this section and refer an interested reader to [84].
6.8 Primal-Dual Algorithms
327
A Class of Nonlinear Interference Functions Similarly to Sect. 5.5.2 (see (5.72)), let us consider a generalized definition of the SIR function of the form pk , p ∈ RK (6.100) SIRk (p) = +,k ∈ K Ik (p) where Ik : RK + → R++ is a given interference function. In Sect. 5.5.2, we presented an axiomatic model that embraces a variety of interference scenarios. The axiomatic model, however, is very general so that further restrictions on interference functions are usually made. In this section, we confine our attention to interference functions satisfying the following condition. (C.6-22) Ik : RK + → R++ is a twice differentiable function such that ∂2 Ik (es ) = (∇2 Ik (es ))kj = 0, ∂sk ∂sj
s ∈ S,
j, k ∈ K,
j = k. (6.101)
It is readily seen that Condition (C.6-22) characterizes a class of receivers for which the interference power at the kth receiver output can be expressed as Ik (p) = ϑkl (pl ) + c, p ∈ P, c > 0, k ∈ K, l∈K
with a twice differentiable function ϑkl : R+ → I, for some I ⊆ R+ and for each k, l ∈ K. Obviously, the linear interference function (5.73) satisfies Condition (C.6-22) and thus is included in the considerations of this section. On the other hand, the class specified by (C.6-22) is larger than the class of affine interference functions since ϑk,l may be a nonlinear function of transmit power pl ≥ 0. The nonlinearity of ϑkl may result from hardware-related nonlinear effects in transceiver signal processing20 . Summarizing, the model (6.100) under Condition (C.6-22) slightly generalizes the SIR model (4.4), which incorporates an affine interference function. Min-Max Formulation of the Power Control Problem In what follows, we first restrict the domain of ψ from (6.62) to R++ , and then extend this function by defining ψ(x) = ∞ for x ≤ 0.21 It is further assumed that ψ : R++ → R satisfies (C.6-1)–(C.6-3). As each link is assumed to be subject to individual power constraints (C.6-12), we can rewrite the utility-based power control problem (6.17) as 20
21
Note, however, that a model that sufficiently accurately accounts for, e.g., amplifier nonlinearity causing intermodulation needs to be much more complex. Thus, ψ is an extended-valued function for the original map defined on R++ , which is in the spirit of convex analysis, see [186].
328
6 Power Control Algorithms
min max
s∈RK
u∈RK
k∈K
wk ψ
esk uk
⎧ s ⎪ ˆ≤0 ⎨e − p subject to u − t ≤ 0 ⎪ ⎩ Ik (es ) − tk = 0,
(6.102) k∈K
where u = (u1 , . . . , uK ) ∈ RK is a vector of interference variables. The purpose of the telescope variable t = (t1 , . . . , tK ) ∈ RK is only to separate the constraint inequalities for the minimization variable s from constraint inequalities for the maximization variable u [184, Sect. 4]. In fact, the constraint inequalities of the problem (6.102) are easily seen to be equivalent ˆ ≤ 0 and uk − Ik (es ) ≤ 0, k ∈ K. As the function ψ is assumed to be to es − p strictly decreasing, it may be easily verified that the maximum in (6.102) is attained for the component-wise largest vector u for which the constraints in (6.102) are satisfied. Thus, for a solution of (6.102), it is necessary that u = t, the origiand thus uk = Ik (es ), k ∈ K, making the equivalence
of (6.102) swith k nal utility-based power control problem mins∈RK k∈K wk ψ( Ike(es ) ) subject to ˆ ≤ 0 evident. For a better understanding of the results presented in this es − p section, the reader should recall from Sect. 4.3.3 that a power vector s ∈ RK ˆ ≤ 0, is called admissible if it satisfies the individual power constraints es − p that is, if we have s ∈ S. Note that an admissible power vector may violate the other constraints in (6.102), depending on the values of u and t. A Reduced Lagrange–Newton Method Definition 6.42. Given the problem (6.102), let the associated Lagrangian function be esk L(s, u, μ, λu , λ, t) = + wk ψ φ(μk )(esk − pˆk ) uk k∈K k∈K s λk (Ik (e ) − tk ) + k∈K
−
φ(λuk )(uk − tk ),
(s, u, μ, λu , λ, t) ∈ R6K
k∈K
(6.103) where φ : R → R is a twice differentiable function satisfying the following conditions. (C.6-23) φ(μ) = φ(−μ), μ ∈ R (even) (C.6-24) φ(μ) ≥ 0, μ ∈ R (nonnegative) (C.6-25) φ (μ) > 0, μ ∈ R (strictly convex) (C.6-26) μ = 0 if and only if φ(μ) = 0 if and only if φ (μ) = 0 (unique irregularity as a minimum with value 0 at 0). Among numerous functions satisfying Conditions (C.6-23)–(C.6-26), the simplest one appears to be, again,
6.8 Primal-Dual Algorithms
φ(μ) = μ2 ,
329
μ ∈ R.
By comparison with Sect. 6.8.2 we observe that the general Lagrangian (6.103), not restricted to the problem (6.102), can be seen as a particular version of the general Lagrangian construction (6.76), not restricted to problem (6.67), for which ϕ(x) = x, x ∈ R is set and only convex functions φ (Condition (C.6-25)) are considered. The following theorem is a straightforward consequence of Conditions (C.6-23)–(C.6-26) and the definition of the Kuhn–Tucker conditions (Definition B.50) and parallels Theorem 6.35. Theorem 6.43. Let ±μ ∈ RK be defined as (±μ)k = μk or (±μ)k = −μk , 2K is a Kuhn– independently for k ∈ K. If (s, u, ν, η, λ, t) ∈ R2K × R2K + ×R u Tucker point of problem (6.102), then (s, u, ±μ, ±λ , λ, t), with νk = φ(±μk ),
ηk = φ(±λuk ) = λk ,
k∈K
(6.104)
is a stationary point of Lagrangian (6.103). Conversely, if (s, u, μ, λu , λ, t) ∈ R6K , with s ∈ S, is a stationary point of Lagrangian (6.103), then the point (s, u, ν, η, λ, t), with ν, η given by (6.104), is a Kuhn–Tucker point of problem (6.102). Proof (Outline). For a Kuhn–Tucker point (s, u, ν, η, λ, t), it is evident from ¯ u, ν, η, λ, t) = 0 im(6.103), (6.69) and (6.104) that the equality ∇(s,u,t) L(s, u plies ∇(s,u,t) L(±z) = 0, with ±z = (s, u, ±μ, ±λ , λ, t). Condition (C.6-26), the expressions (6.112) and the complementary slackness (pertaining to the set of Kuhn–Tucker conditions, Definition B.50) are easily shown to yield further ∇μ L(±z) = 0, ∇λu L(±z) = 0, while ∇λ L(±z) = 0 is immediate. Conversely, for a stationary point z = (s, u, μ, λu , λ, t), it is again obvious ¯ u, ν, η, λ, t) = 0. The from (6.104) that ∇(s,u,t) L(z) = 0 implies ∇(s,u,t) L(s, complementary slackness conditions νk (esk − pˆk ) = 0,
ηk (uk − tk ) = 0,
k∈K
and ν ≥ 0, η ≥ 0 follow immediately from ∇(μ,λu ) L(z) = 0 using expressions (6.112) and Conditions (C.6-23)–(C.6-26). Then, strict decreasingness of ψ, assumption w > 0 and ∇u L(z) = 0 give u = t, while Ik (es )−tk = 0, k ∈ K, is ˆ ≤ 0 holds by assumption). The detailed immediate from ∇λ L(z) = 0 (es − p proof can be found in [84, App. C]. By Theorem 6.43, a stationary point of the Lagrangian (6.103), associated via (6.104) with a Kuhn–Tucker point, can be found by unconstrained iterations. This parallels the property of the generalized Lagrangian (6.76) and stands again in contrast to the iterations based on the conventional Lagrangian, such as (6.70) and (6.74), which solve the nonnegatively constrained dual problem. The duality theory for the linear Lagrangian (App. B.4.3) extends to the modified Lagrangian (6.103) in the similar way as for the Lagrangian
330
6 Power Control Algorithms
(6.76) in the preceding section. Precisely, from the viewpoint of local problem solution by primal-dual methods, the interest is in the saddle points ∗ z∗ = (s∗ , u∗ , μ∗ , λu , λ∗ , t∗ ) of Lagrangian (6.103) which satisfy z∗ = arg and
max
min
inf
sup L(z)
(μ,λ)∈B(μ∗ ,λ∗ )λu ∈B(λu ∗ )s∈B(s∗ )u∈B(u∗ )
z∗ = arg min∗
max
sup
inf
s∈B(s )u∈B(u∗ )(μ,λ)∈B(μ∗ ,λ∗ )λu ∈B(λu ∗ )
L(z).
(6.105)
(6.106)
Analogously to the classical Lagrangian and the already discussed Lagrangian (6.76), it is easily seen that if z∗ ∈ R6K satisfies (6.105) and (6.106), then it is a saddle point of (6.103) such that (s∗ , u∗ ) is a local solution to the problem (6.102). The Lagrangian (6.103) does not incorporate, however, the convexification mechanism which is provided by Lagrangian (6.76) through a suitable choice of the parameter c ∈ R. Thus, for general functions ψ and Ik , k ∈ K, a local problem solution to (6.102) does not, in general, satisfy the saddle point property (6.105), (6.106): Similarly to the conventional weak Lagrangian duality discussed in Sect. 6.8.1, we have in general an inequality between the values L(z∗ ) achieved at (6.105) and (6.106), respectively. For finding the above saddle point of (6.103), we consider the iteration of the form (with n ∈ N0 ) s(n+1) s(n) −1 2 ∇(s,μ) L(z(n)) μ(n+1) = μ(n) − (∇(s,μ) L(z(n))) (6.107) ∇(u,λu ,λ,t) L(z(n + 1)) = 0 where z(n) = (s(n), u(n), μ(n), λu (n), λ(n), t(n)) ∈ R6K . Iteration (6.107) consists of the Newton update with respect to variables (s, μ) ∈ R2K and, concurrently, enforces a stationary point of the Lagrangian (6.103) with respect to the remaining variables (u, λu , λ, t) ∈ R4K . Thus, the iteration can be classified as a conditional Newton iteration on the Lagrangian, or Newton iteration on the Lagrangian under reduced dimensionality, or simply reduced Lagrange–Newton iteration. By condition (6.101), it may be easily verified that ∂2 L(z) = 0 ∂sk ∂sj ∂2 L(z) = 0 ∂μk ∂μj
z ∈ R6K ,
k, j ∈ K,
k = j.
(6.108)
∂2 L(z) = 0, ∂μk ∂sj Thus, the blocks ∇2s L(z(n)), ∇2μ L(z(n)), ∇2s,μ L(z(n)) in the Hessian matrix ∇2(s,μ) L(z(n)) in (6.107) have the crucial property of being diagonal matrices. By the standard four-block inverse expression [187], we have
6.8 Primal-Dual Algorithms
(∇2(s,μ) L(z))
−1
=
Δ1 (z) Δ2 (z) ΔT2 (z) Δ3 (z)
331
(6.109)
where z ∈ R6K and Δ1 (z) = (∇2s L(z) − ∇2s,μ L(z)(∇2μ L(z)) ΔT2 (z) = ((∇2s,μ L(z))T (∇2s L(z))
−1
−1
−1
(∇2s,μ L(z))T )
∇2s,μ L(z) − ∇2μ L(z))
Δ3 (z) = (∇2μ L(z) − (∇2s,μ L(z))T (∇2s L(z))
−1
∇2s,μ L(z))
−1
−1
(∇2s,μ L(z))T (∇2s L(z)) .
So, by the diagonality of ∇2s L(z(n)), ∇2μ L(z(n)) and ∇2s,μ L(z(n)), the inverse Hessian, if existent, has the same structure as the Hessian itself: The two blocks on its block-diagonal and the two outer blocks are all diagonal matrices. As a crucial consequence, any entry of the update vector in the Newton update in iteration (6.107), say entry k ∈ K, is a linear combination of only two gradient components; (∇s L(z))k and (∇μ L(z))k . Another important property of (6.107) and Lagrangian (6.103) is that the update (u(n + 1), λu (n + 1), λ(n + 1), t(n + 1)) can be written explicitly as a function of (s(n + 1), μ(n + 1)). These properties lead to the following reformulation of (6.107). Lemma 6.44. Given any n ∈ N0 , the power control iteration (6.107) can be written, for each k ∈ K, as ⎧ (esk (n) − pˆk ) esk (n) ⎪ ⎪ s (φ ) (n + 1) = s (n) + (μ (n))(w ψ (log ⎪ k k k k e ⎪ Σk (n) uk (n) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ + φ(μk (n))esk (n) ⎪ ⎪ ⎪ ⎪ ⎪ ∂ ⎪ ⎪ + λj Ij (es(n) )) − (φ (μk (n)))2 esk (n) ) ⎪ ⎪ ∂s ⎪ k ⎪ j∈K ⎪ ⎪ ⎪ ⎪ ⎪ φ (μk (n)) sk (n) esk (n) ⎪ ⎪ ((e ) − pˆk )(wk ψe (log μk (n + 1) = μk (n) + ⎪ ⎪ Σk (n) uk (n) ⎪ ⎪ ⎪ ⎪ ⎨ ∂2 λj 2 Ij (es(n) )) + φ(μk (n))esk (n) + (6.110) ∂sk ⎪ ⎪ j∈K ⎪ ⎪ ⎪ ⎪ ⎪ esk (n) ⎪ sk (n) ⎪ ) + φ(μk (n))esk (n) − e (w ψ (log k ⎪ e ⎪ uk (n) ⎪ ⎪ ⎪ ⎪ ⎪ ∂ ⎪ ⎪ + λj Ij (es(n) ))) ⎪ ⎪ ∂s ⎪ k ⎪ j∈K ⎪ ⎪ ⎪ ⎪ s(n+1) ⎪ uk (n + 1) = Ik (e ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ esk (n+1) wk ⎪ ⎩ λk (n + 1) = − ψe log uk (n + 1) uk (n + 1) where additionally
−1
332
6 Power Control Algorithms
φ(λuk (n + 1)) = λk (n + 1),
k∈K
tk (n + 1) = uk (n + 1),
and where, for each k ∈ K, we defined
Σk (n) = (φ (μk (n)))2 e2sk (n) − φ (μk (n)) esk (n) − pˆk × (6.111) ∂2 esk (n) wk ψe log + φ(μk (n))esk (n) + λj 2 Ij (es(n) ) . uk (n) ∂sk j∈K
The Lemma is obtained by applying elementary calculus to (6.103), (6.107) and (6.109). The main steps are the following. Given z = (s, u, μ, λu , λ, t) ∈ R6K , we have, for each k ∈ K, esk ∂ ∂ + φ(μk )esk + L(z) = wk ψe log λj Ij (es ) (∇s L(z))k = ∂sk uk ∂sk j∈K
∂ L(z) = φ (μk )(esk − pˆk ) (∇μ L(z))k = ∂μk esk esk ∂ (∇u L(z))k = L(z) = −wk ψ − φ(λuk ) ∂uk uk u2k ∂ (∇λu L(z))k = L(z) = φ (λuk )(uk − tk ) ∂λuk ∂ (∇λ L(z))k = L(z) = Ik (es ) − tk ∂λk ∂ (∇t L(z))k = L(z) = λk − φ(λuk ) . ∂tk (6.112) By the property ψe (y) = ψ (ey )e2y + ψ (ey )ey , y ∈ R, and by (6.101), we obtain further (6.108) and (∇2s L(z))kk =
esk ∂2 ∂2 sk log + φ(μ L(z) = w ψ )e + λ Ij (es ) k k j e ∂s2k uk ∂s2k j∈K
(∇2μ L(z))kk =
2
∂ L(z) = φ (μk )(esk − pˆk ) ∂μ2k
(∇2s,μ L(z))kk =
∂2 L(z) = φ (μk )esk ∂sk ∂μk (6.113)
for each k ∈ K. Local Convergence and Duality Now we prove that the algorithm given by (6.107), or equivalently the algorithm from Lemma 6.44, offers (local) quadratic quotient convergence, according to App. B.4.1. It is important to emphasize here that the quadratic
6.8 Primal-Dual Algorithms
333
convergence is the fastest achievable one when no higher than second-order derivatives are used in the iteration [148]. Theorem 6.45. Let z∗ = (s∗ , u∗ , μ∗ , λu∗ , λ∗ , t) ∈ R6K be a stationary point of Lagrangian (6.103) which corresponds via (6.104) to a Kuhn–Tucker point of problem (6.102) and is such that ∇2(s,μ) L(z) is continuous for z ∈ B(z∗ ) and nonsingular for z = z∗ . Then, z∗ is a point of attraction of iteration (6.107), ∂2 s and if additionally the function ψ and the partial derivatives ∂s 2 Ij (e ), s ∈ k
RK , k, j ∈ K, are continuous, then we have quadratic quotient convergence in the sense that OQ (z∗ ) ≥ 2 (Definition B.42). Proof (Outline). Let φ+ be the nonnegative part22 of φ and let φ−1 + be its inverse function (Definition B.7). Define continuous maps RK → RK : s → H(s), RK → RK : s → H(s) such that (F (s))k = Ik (es ) and (H(s))k = sk sk −wk ψ ( Ike(es ) ) I 2e(es ) , k ∈ K. Then, the considered stationary point can be k
expressed as (φ−1 + (H(s)) is to be understood component-wise) ∗ ∗ ∗ z∗ = z∗ (s∗ , μ∗ ) = s∗ , F (s∗ ), μ∗ , ±φ−1 + (H(s )), H(s ), F (s ) .
(6.114)
Considering Lemma 6.44 and the map (s, μ) → G(s, μ) = (sT μT )T −(∇2(s,μ) L(z(s, μ)))
−1
∇(s,μ) L(z(s, μ)), (s, μ) ∈ R2K
(which is well-defined on B(s∗ , μ∗ ) by the nonsingularity and continuity assumption), one can rewrite (6.107) as (s(n + 1), μ(n + 1)) = G(s(n), μ(n)), (u(n + 1), λu (n + 1), λ(n + 1), t(n + 1)) = (F (s(n + 1)), ±φ−1 + (H(s(n + 1))), H(s(n + 1)), F (s(n + 1))). By ∇z L(z∗ ) = 0, z∗ is a fixed point and a stationary point of map G so that, by [148, Theorem 10.1.6], (s∗ , μ∗ ) is a point of attraction of the Newton update in (6.107) (consequently, by (6.114), z∗ is a point of attraction of (6.107)). As the quotient convergence order of the iteration (6.107) is equivalent to the corresponding order of the conventional Newton iteration (s(n + 1), μ(n + 1)) = G(s(n), μ(n)), n ∈ N0 , we finally yield OQ (z∗ ) ≥ 2 by [148, Theorem 10.2.2] and the nonsingularity and continuity assumptions. The full proof can be found in [84, App. E]. The unconstrained nature of stationary points of Lagrangian (6.102) is the key ingredient of the proof of Theorem 6.45, which underlines the role of the modified Lagrangian (Theorem 6.43): An analogous application of the iteration (6.107) to the conventional Lagrangian would guarantee local convergence only under additional projection mechanisms and thus, the resulting convergence would not be, in general, quadratic anymore. 22
Precisely, φ+ : dom(φ+ ) → R+ where dom(φ+ ) = {x ∈ R : φ(x) ≥ 0}.
334
6 Power Control Algorithms
It has to be noted that the algorithm (6.107) does not ensure that the sequence of objective function values {Fe (s(n))}n∈N0 is monotonically decreasing, where the latter property is desired in practice, especially when the number of iterations is predefined and relatively small. Nevertheless, such a monotonicity can be enforced by introducing in the Newton update in (6.107) a sequence of damping factors a(n), n ∈ N0 , such that for some n0 ∈ N we have a(n) = 1 for all n ≥ n0 . A modification of this type retains the quadratic convergence of iteration (6.107). For more details, an interested reader is referred to [148]. As already discussed, in order to find a local minimizer esk wk ψ (s∗ , u∗ ) = arg min∗ max∗ uk s∈B(s ) u∈B(u ) k∈K (6.115) ˆ≤0 es − p subject to uk − Ik (es ) ≤ 0, k ∈ K by means of (6.107), it is desired that the point of attraction of (6.107) is a max-min and min-max point (saddle point) satisfying (6.105) and (6.106). This case is characterized in the following result. ∗
Theorem 6.46. A stationary point (s∗ , u∗ , μ∗ , λu , λ∗ , t∗ ) ∈ R6K of Lagrangian (6.103) corresponds to a local problem solution (6.115) if either of the following holds. (i) Lagrangian (6.103) is a min-max function of s ∈ RK , u ∈ RK on some B(s∗ , u∗ ) with admissible s∗ , which is equivalent to ∗ ∗ ∗ esk ∂2 wk ϕe log ∗ ≥ −φ(μ∗k )esk − λ∗j 2 Ij (es ), uk ∂sk
j∈K
ψe
∗ e esk log ∗ + ψe log ∗ < 0, uk uk
s∗ k
s∗ ∈ S . (6.116)
(ii) Lagrangian (6.103) is a convex-concave function of s ∈ RK , u ∈ RK on some B(s∗ , u∗ ) with admissible s∗ , which is equivalent to ∗ ∗ ∗ esk ∂2 wk ψe log ∗ ≥ −φ(μ∗k )esk − λ∗j 2 Ij (es ), uk ∂sk
j∈K
ψe
∗ e esk log ∗ + ψe log ∗ ≤ 0, uk uk
s∗ k
where we defined ϕe (y) =
ψe (y)ψe (y) ψe (y)+ψe (y)
ψe (y) + ψe (y) = 0
0
otherwise,
s∗ ∈ S
y ∈ R.
(6.117)
6.8 Primal-Dual Algorithms
335
Proof (Outline). Using the feature that a local problem solution (6.115) is associated with a saddle point satisfying, in particular, (6.106), it is first assumed by contradiction that (6.103) is a convex-concave or min-max function of s ∈ RK , u ∈ RK on B(s∗ , u∗ ), while (6.115) is not satisfied. By the assumption ∇λu L(z∗ ) = 0, by the expressions (6.112) and by Condition (C.6-26), we first yield u∗ = t∗ and, as a consequence, ∇2λu L(z∗ ) = 0 (note here that w > 0 and ψ is strictly decreasing). Similarly, by the expressions (6.112), (6.113), Condition (C.6-25) and admissibility of s∗ , it follows that ∇2(μ,λ) L(z∗ ) " 0. As a consequence of both features, (6.103) is a minmax-concave-convex function or a convex-concave-concave-convex function of s ∈ RK , u ∈ RK , (μ, λ) ∈ R2K , λu ∈ RK on some S(z∗ ), so that a stationary point z∗ satisfies (6.106), which is the desired contradiction (Theorems B.63 and B.61). The equivalence to conditions (6.116) and (6.117) is yielded now from Definitions B.62 and B.60, respectively, by applying elementary calculus, the assumptions w > 0 and (C.6-22), and the property ψe (y) = ψ (ey )e2y + ψ (ey )ey , y ∈ R. Particular intermediate steps in the min-max case are here the formulation of the first inequality in the Definition B.62 for z = z∗ as ∗ ∗ ∂2 esk esk ) + ψe (log ∗ ))(φ(μ∗k )esk + λ∗j 2 Ij (es ))+ ∗ uk uk ∂sk ∗
(ψe (log
∗
j∈K
∗
∗
esk esk wk ψe (log ∗ )ψe (log ∗ ) ≤ 0, uk uk
k∈K
and the application of the definition of map ϕe . For the detailed proof we refer to [84, App. F]. By Theorems 6.45 and 6.46 it is now evident that if the iterates generated by (6.107) are attracted by a particular stationary point of (6.103) in which neighborhood the Lagrangian is either a min-max or convex-concave function of power vector and interference vector, then the iteration converges to a local solution of (6.102). In such a case, under continuity of the derivatives as specified in Theorem 6.45, the convergence to a local solution is quadratic in quotients. From the proof of Theorem 6.46 one can observe that a stationary point of (6.103) is a desired point of attraction of iteration (6.107) if it is the specific saddle point (6.105), (6.106). Corollary 6.47. A stationary point z∗ = (s∗ , u∗ , μ∗ , λu∗ , λ∗ , t∗ ) ∈ R6K of Lagrangian (6.103) corresponds to a local problem solution (6.115) if z∗ is a saddle point such that (6.105) and (6.106). Finally, we remark that by the theory of convex-concave and min-max functions (see App. B.4.4 for some definitions and references) the above saddle point property characterizing a desired point of attraction is equivalent to an
336
6 Power Control Algorithms
other property; convex-concavity or min-max property of Lagrangian (6.103) as a function of s and u on some B(s∗ , u∗ ) ⊆ RK ×RK , and concave-convexity ∗ of (6.103) as a function of (μ, λ) and λu on some B(μ∗ , λu , λ∗ ) ⊆ R2K × RK . The Uniqueness Case As already discussed, the algorithm (6.107) or, equivalently, (6.110) can have points of attraction that are uninteresting, or useless, from the viewpoint of the power control problem. An interesting question is under which conditions on the function ψ and the interference functions Ik , k ∈ K, every Kuhn–Tucker point of the problem (6.102) corresponds to a global problem solution given by (6.115) with B(s∗ , u∗ ) = R2K . In other words, we are now interested in the case of global solvability (Definition B.40), which was in the main focus of the preceding sections of Chap. 6. By Theorem 6.43, the global solvability is equivalent to having any stationary point of (6.103), with s ∈ S, associated with a global problem solution. If such a property is guaranteed under a suitable choice of ψ and Ik , then the proposed power allocation algorithm (6.107) or (6.110) solves the optimization problem (6.102) globally in the following sense: Any point of attraction of iteration (6.107) with admissible s corresponds to a global problem solution (6.115), B(s∗ , u∗ ) = R2K . The characterization of a class of functions ψ and Ik , k ∈ K for which the algorithm provides a global solution in the above sense follows from Theorem 6.46. Corollary 6.48. Any stationary point z∗ = (s∗ , u∗ , μ∗ , λu∗ , λ∗ , t∗ ) ∈ R6K of the Lagrangian L given by (6.103), with s∗ ∈ S, corresponds to a global solution to the problem (6.102) if L is a convex-concave function of s ∈ RK , u ∈ RK whenever λ ≥ 0, u − t ≤ 0, s ∈ S, which is implied by ψe (y) + ψe (y) ≤ 0,
ψe (y) ≥ 0,
y∈R
(6.118)
and ∇2 Ik (es ) 0,
s ∈ RK ,
k ∈ K.
(6.119)
Additionally, a global solution to the problem (6.102) is unique if the second inequality in (6.118) is strict. The corollary follows from the properties of convex-concave functions outlined in App. B.4.4 and elementary reformulation of the conditions of Theorem 6.46 in the setting λ ≥ 0, u − t ≤ 0. The full proof can be found in [84, App. G]. It is easily verified that ψ(y) = − log(y), y > 0 and ψ(y) = 1/y, y > 0, satisfy together with the affine interference function Ik (p) = (Vp)k + σk2 ,
p ∈ P,
k∈K
(6.120)
6.8 Primal-Dual Algorithms
337
the conditions of Corollary 6.48. Thus, such functions guarantee that (6.107) is globally convergent to a solution to (6.102) in the sense discussed above. Moreover, if ψ(y) = 1/y, y > 0, then we have ψe (y) > 0, y ∈ R, so that there is a unique global solution to the problem (6.102) and, by Theorem 6.43, the point of attraction of (6.107) with admissible s is unique up to the component signs. The interpretation of the global solution (6.115), with B(s∗ ) = RK , B(u∗ ) = K R , under the convex-concavity from Corollary 6.48 follows immediately by Corollary 6.47. In such a case, the stationary point (s∗ , u∗ , μ∗ , λu∗ , λ∗ , t∗ ) is a saddle point of Lagrangian (6.103) in the sense that sup
inf L(s∗ , u∗ , μ, λu , λ, t∗ )
u K (μ,λ)∈R2K λ ∈R
= inf
s∈RK
sup L(s, u, μ∗ , λu∗ , λ∗ , t∗ )
(6.121)
u∈RK
and such point is unique up to the component signs of μ∗ and λ∗ , if the second inequality in (6.118) is strict. The global saddle point property (6.121) is an analog to the classical strong Lagrangian duality (6.73) and the saddle point property (6.82), with B(s∗ ) = RK , B(μ∗ ) = RL , of the Lagrangian (6.76). An Alternative Problem Form We already know that Lagrangian (6.103) is convex-concave in s ∈ RK , u ∈ RK , whenever ψ satisfies (6.118) and s → Ik (es ), k ∈ K, is convex on RK . This is, in particular, true for the case of affine interference functions and when ψ(y) = − log(y), y > 0, or ψ(y) = 1/y, y > 0. For such convex-concavity, the central property of convex-concave functions from Theorem B.61 and Theorem 6.43 implies the following equivalence of optimization problems. Corollary 6.49. If condition (6.118) is satisfied and if s → Ik (es ), s ∈ RK , is convex for each k ∈ K, then a solution to problem (6.102) exists if and only if a solution to the problem esk ˆ≤0 es − p subject to max min wk ψ (6.122) u s uk u − Ik (es ) ≤ 0, k ∈ K k k∈K exists, and the solutions to (6.102) and (6.122) are equal. Moreover, if the second inequality in (6.118) is strict, the solution to (6.102) and (6.122) is unique. In other words, under the conditions from Corollary 6.48, we dispose of an alternative expression of the min-max power allocation problem (6.102) in the max-min form (6.122). This becomes of interest in the context of application of solution methods, different than (6.107), specialized to either max-min problems or min-max problems (see, for instance, [188]). The convergence
338
6 Power Control Algorithms
behavior of such method types is likely to be sensitive to the particular “curvature” (min-max or max-min) of the problem. By Corollary 6.49, however, we are free to apply either of such method types to the problem (6.102), and choose the best performing method, as the solutions to (6.102) and (6.122) are equivalent under the conditions of Corollary 6.48. It may be easily verified that these conditions are satisfied under the affine interference function (6.120) and ψ(x) = − log(x), x > 0, or ψ(x) = 1/x, x > 0. Distributed Handshake Protocol We show that the algorithm (6.107) can be realized efficiently in a distributed manner using a scheme based on the concept of the adjoint network discussed in Sect. 6.6.2. Important keys to efficient distributed execution are the variable splitting in the problem formulation (6.102) and the diagonal structure (6.101) of the Hessian matrix of the interference function (Assumption (C.6-22)). Recall that the class of interference function with this property includes the most relevant case of affine interference functions (6.120). Complying with the feedback schemes presented so far in this book, we confine our attention to the affine interference function (6.120) also in the following scheme. An important property of this interference function is that ∂2 ∂ Ij (es ) = 2 Ij (es ) , ∂sk ∂sk
s ∈ RK ,
k, j ∈ K.
(6.123)
This property offers the advantage that it is sufficient for the nodes to transmit over the adjoint network only once in order to obtain both the first- and second-order information about the problem. By some elementary transformations and by (6.123), the iteration (6.110) can be rewritten as
6.8 Primal-Dual Algorithms
339
⎧ (a) ⎪ ⎪ ⎪ 1 ⎪ ⎪ s ( ψ (n + 1) = s (n) + (h (n)) − Δ (n) ⎪ k k k k ⎪ Σk (n) e ⎪ ⎪ ⎪ ⎪ ⎪ (b) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ + v Δ (n) + Δ (n) jk j k ⎪ ⎪ ⎪ j∈K ⎪ ⎪ ⎪ ⎪ ⎪ + F1 (sk (n), μk (n), wk )) ⎪ ⎪ ⎪ ⎪ ⎨ μk (n + 1) = μk (n) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
(a)
1 + (F2 (sk (n), μk (n), wk , pˆk ) ψe (hk (n)) − ψe (hk (n)) Σk (n) + F3 (μk (n), wk )(ψe (hk (n)) − Δk (n) +
(a)
vjk Δj (n) + Δk (n)))
j∈K
(b)
(6.124)
⎧ ⎪ uk (n + 1) = vkj esj (n+1) + σk2 ⎪ ⎪ ⎨ j∈K
(6.125)
⎪ esk (n+1) ⎪ ⎪ ⎩ Δk (n + 1) = −ψe (hk (n + 1)) uk (n + 1) ⎧ w k ⎨ φ(λu (n + 1)) = Δk (n + 1) k esk (n+1) ⎩ t (n + 1) = u (n + 1) k
k
for each k ∈ K and every n ∈ N0 , where we defined hk (n) = log(esk (n) /uk (n)), Δk (n) = esk (n) λk (n)/wk , and Σk (n) = F4 (sk (n), μk (n), wk , pˆk ) + Δk (n) − ψe (hk (n)) +
j∈K
(a)
vjk Δj (n) + Δk (n),
k ∈ K.
(6.126)
(b)
The functions Fi , 1 ≤ i ≤ 4, are introduced merely to group several terms from (6.110), (6.111) and are easily deduced. While the first iterate uk (n + 1) in (6.125) is the interference power at the kth receiver output under the power allocation s(n+1), the second one, Δk (n+ sk (n+1) 1), is a function of the corresponding SIR value ue k (n+1) . Thus, both iterates in (6.125) can be computed independently at any kth receiver based on the
340
6 Power Control Algorithms
corresponding estimates of the interference power and the SIR. Subsequently, the terms denoted by (a) in the iterates (6.124) and (6.126) can be computed at any kth transmitter after the SIR estimate from the corresponding receiver is fed back to the transmitter using a link-specific low-rate control channel. Further, we observe that the term (b) occurring in (6.124) and (6.126) can be estimated at the kth transmitter in the adjoint network (i.e., at the kth receiver in the original network) as in Algorithm 6.3, except that now the transmit powers in the adjoint network are determined by Δk (n), k ∈ K. Finally, it is obvious from (6.124) and (6.126) that the values of the functions Fi , 1 ≤ i ≤ 4, are computable at the kth transmitter, when the corresponding iterate values sk (n) and μk (n) are locally stored. This, in summary, ensures the decentralized computation of the iteration (6.110) by the following procedure. Algorithm 6.4 Distributed implementation of Algorithm (6.107) Input: w > 0, ψ, φ, n = 0, s(0) ∈ RK . Output: s ∈ S 1: repeat ∈ K. 2: Concurrent transmission using transmit powers esk (n) , k
3: Receiver-side estimation of interference power uk (n) = j∈K vkj esj (n) + σk2 s (n)
4: 5: 6: 7: 8: 9: 10:
s (n)
and the SIR esk (n) /uk (n) and computation of Δk (n) = −ψe (log eukk(n) ) eukk(n) on each link k ∈ K. s (n) Per-link feedback of the SIR eukk(n) to the corresponding transmitter on each link k ∈ K. s (n) s (n) Transmitter-side computation of ψe (log eukk(n) ), ψe (log eukk(n) ) and Δk (n) on each link k ∈ K. Concurrent transmission of the adjoint network using transmit powers Δk (n), k ∈ K. Transmitter-side (i.e. adjoint network receiver-side) estimation of the received
power j∈K vjk Δj (n) + Δk (n) on each link k ∈ K. Transmitter-side update (6.124) on each link k ∈ K. n → n + 1. until Some termination condition is satisfied.
Algorithm 6.4 assumes that the functions ψ and φ are globally known by all transmitters and receivers. Furthermore, the power constraint pˆk is known by transmitter k ∈ K and the weight wk > 0 is fixed according to the link priority or traffic class at the kth transmitter and kth receiver. Finally, as in all the previous power control schemes, a dedicated control channel is assumed for each transmitter-receiver pair (see Remark 6.14). As in the discussion of Algorithm 6.3, we finally point out that the methods of stochastic approximation [143] can be used to cope with the problem of noisy estimates/measurements. Indeed, reasoning along the lines of Sect. 6.6.5 shows that under some conditions on the estimation noise the sequence of
6.8 Primal-Dual Algorithms
341
iterates {(s(n), u(n))}n∈N0 obtained by Algorithm 6.4 exhibits either weak or almost sure convergence to a local solution (6.115). Relations to Other Iterations Finally, we would like to put forward some non-rigorous arguments to indicate that the combination of quadratic convergence and decentralized realization offered by iteration (6.107) may be hard to obtain within the class of Newtonrelated iteration concepts. Consider first the conventional Newton iteration z(n + 1) = z(n) − (∇2 L(z(n)))
−1
∇L(z(n)),
n ∈ N.
Quadratic convergence of this Newton method to stationary points of the Lagrangian (6.103) is obvious [148], but an efficient decentralized implementation is unlikely to be designed: The inverse Hessian matrix provides, in general, the coupling among all link-specific component updates (∇L(z(n)))k , k ∈ K (in particular, since the block ∇2s,λu L(z) is not a diagonal matrix) and the inverse itself requires matrix manipulations with global network knowledge. On the other hand, we can think of a conditional Newton iteration similar to the proposed iteration (6.107), but with any other choice of conditioning variables: For instance, we could apply s(n+1) s(n) −1 2 ∇(s,u) L(z(n)) u(n+1) = u(n) − (∇(s,u) L(z(n))) (6.127) ∇(μ,λ,λu ,t) L(z(n + 1)) = 0 to the problem (6.122). Relying on the concept of the adjoint network, the design of a decentralized implementation of such iteration type is possible, since the inverse Hessian is blockwise diagonal and easily expressible in the desired component-wise way; any kth component of the update vector is a linear combination of only (∇s L(z(n)))k and (∇u L(z(n)))k . However, by standard techniques it can be shown that quadratic convergence is prevented due to the lack of so-called strongly consistent approximation property of the update matrix in an equivalent reformulation of the iteration (we refer here to [148] for the theory framework and to [188] for some related iteration examples). Finally, one can think of a power control iteration which applies the Newton update only in subdomains. For instance, such simplified iteration form related to (6.107) could be ⎧ 2 −1 ∇s L(z(n)) 0 ⎨ s(n+1) s(n) = − ∇(s,μ) L(z(n)) 2 μ(n+1) μ(n) 0 ∇μ L(z(n)) (6.128) ⎩ ∇ u L(z(n + 1)) = 0. (u,λ,λ ,t)
The inverse Hessian is blockwise diagonal and expressible component-wise in the desired way as well, so that a decentralized implementation scheme based on the adjoint network concept is again possible. However, as in the case of (6.127), the standard theory of numerical iterations implies that quadratic quotient convergence of (6.128) is prevented [148].
342
6 Power Control Algorithms
6.8.6 Simulation Results We evaluate the algorithm (6.85) by simulations. We consider the utilitybased power control problem (6.61) with ψ(x) = − log(x), x > 0, and ψ(x) = 1/x, x > 0, in the objective function (6.62). Further, we assume predefined (traffic class-specific) constraints on link SIR on selected links k ∈ A, by setting (6.64) in (6.61). With the theory from Sect. 6.3 and with a reformulation of the link SIR constraints, this results in convex instances of the problem (6.61). It is easily proven that strict complementarity and constraint qualification condition from Lemma B.52 are satisfied at any SOSC point of such problem forms. In the proposed Lagrangian (6.103) we set ϕ(x) = ex and φ(x) = x2 , x ∈ R. Because of the problem convexity, the equality (6.82) holds for any c0 ∈ R, and thus the algorithm (6.85) converges regardless of the particular choice of parameter c ∈ R. Figures 6.2 and 6.3 show the convergence of exemplary iterate sequences obtained by the decentralized implementation of iteration (6.85) through the feedback scheme from Algorithm 6.3. Both figures seem to imply that Algorithm 6.3 ensures reliable linear convergence even under quite rough/noisy estimates. 4 K = 15 |A| = 1 3 F (p(n))
ψ(x) = − log(x) c=0 δ(n) = 0.4, n ∈ N
2
10
50
100 n
150
200
Fig. 6.2: Exemplary convergence of the objective (6.62) obtained by Algorithm 6.3 with averaging of noisy estimates according to [3]. The variance of the estimates in steps 3, 7, 9 of Algorithm 6.3 is 0.3 · σk2 , k ∈ K.
Note that the slight oscillation of the performance metric in the transient phase of convergence is, besides the influence of noisy estimates, a result of the unconstrained character of the update (6.85): The consecutive iterates z(n) = (s(n), μ(n)), n ∈ N, are allowed to be temporarily infeasible for several n ∈ N before reaching the point of attraction. Thus, the weighted aggregate performance happens to be superior to the actual optimal value at the point
6.8 Primal-Dual Algorithms
5
343
K = 15 |A| = 1
F (p(n))
ψ(x) = 1/x 4
c=0 δ(n) = 0.4, n ∈ N
3
2 0
50
100 n
150
200
Fig. 6.3: Exemplary convergence of the objective (6.62) obtained by Algorithm 6.3 with averaging of noisy estimates according to [3]. The variance of estimates in steps 3, 7, 9 of Algorithm 6.3 is 0.15 · σk2 , k ∈ K.
of attraction since the power vector and the values of link SIR functions may temporarily exceed the nominal constraints pˆk , k ∈ K, and θˆk , k ∈ A, respectively. The algorithm (6.107) is verified by simulations for the particular setting of affine interference function (6.120) and ψ(x) = − log(x), respectively ψ(x) = 1/x, x ≥ 0, in the problem (6.122). That is, we consider again the utility-based approach to power control (Chap. 6) aiming at maximization of the weighted (approximate) throughput or the weighted sum of averaged error rates under Rayleigh fading. According to Corollary 6.48 and the discussion thereafter, such settings ensure that any point of attraction of (6.107) associated with feasible power allocation corresponds to a global solution to the problem (6.122). In the Lagrangian (6.103) we take φ(x) = x2 , x ∈ R. Figure 6.4 shows exemplary convergence of iteration (6.107) in two quite large networks. Such convergence is compared with the convergence of the conventional gradient optimization method applied to the equivalent minimization problem (6.61) with (6.62) and A = ∅. The step size of the gradient method is optimized here to achieve the fastest possible descent. Analogously to the case of iteration (6.85), the observable slight oscillation in the transient phase of convergence is a result of the unconstrained character of the iteration, which allows the objective value for some iterates s(n), n ∈ N, be temporarily superior to the actual optimum. As could be expected, the quadratically convergent iteration (6.107) significantly outperforms the (linearly convergent) gradient method. It can be also readily concluded that the reduced Lagrange–Newton method is quite invariant with respect to the number of optimized links. This is a precious feature which is compatible with the requirement of scalability posed on resource allocation procedures in dis-
344
6 Power Control Algorithms
tributed networks. Broadly, scalability means the (approximate) invariance of performance and/or complexity of such procedures under variable number of nodes and/or links in the network. 2.5
3 K = 50
K = 100
ψ(x) = − log(x)
ψ(x) = − log(x) F (p(n))
F (p(n))
2
1
2
0
-1
0
5
10
n
15
20
25
1.6
0
5
10
15
n
20
25
Fig. 6.4: Comparison of convergence of the algorithm (6.107) (solid lines) with convergence of the conventional gradient projection algorithm (dashed lines), with optimally chosen constant step size, applied to the equivalent problem form (6.61) with A = ∅.
Figure 6.5 shows the convergence of exemplary iterate sequences obtained by the proposed feedback scheme from Algorithm 6.4. It can be concluded that the feedback scheme offers good robustness to estimation noise for the simulated realistic estimate variances.
40
40 K = 50
K = 100 30
ψ(x) = 1/x F (p(n))
F (p(n))
30
20
10
0
ψ(x) = 1/x
20
10
0
5
10
n
15
20
25
0
0
5
10
n
15
20
25
Fig. 6.5: Convergence of Algorithm 6.4, with no averaging of iterates. The variance of the estimates in steps 3 and 7 of Algorithm 6.4 is 0.1 · σk2 for the interference power estimates and received power estimates as well as 0.05 · σk2 for the estimates of transmit powers, k ∈ K.
A Some Concepts and Results from Matrix Analysis
The appendix provides some (very) basic concepts and results from linear algebra that are vital to understanding the theory presented in this book. This is also a good opportunity to introduce the notation used throughout the book. Proofs are provided only for the most important results such as the Perron– Frobenius theorem. For other proofs and a detailed treatment of this material, the reader is referred to any linear algebra book and [7, 5, 9, 4, 189, 6, 107].
A.1 Vectors and Vector Norms Vectors and matrices can be defined over an arbitrary field K. It could be R, the field of real numbers, or C, the field of complex numbers. These are the most common choices. Unless something otherwise stated, all matrices in this section are over K = R. Elements of K are called scalars. In addition to that, throughout the book, we use R− , R+ , R++ , to denote the sets of nonpositive, nonnegative and positive reals, respectively. The set of all n-tuples over R with two algebraic operations called vector addition and scalar multiplication form an n-dimensional vector space denoted by Rn . Elements of Rn are referred to as vectors and are written as u = (u1 , . . . , un ), which, in connection with vectors or matrices, should be regarded as a column vector. Given any two vectors u ∈ Rn and v ∈ Rm , the notation (u, v) is used to denote either an ordered pair of the vectors u and v or a (column) vector obtained by appending the (column) vector v ∈ Rm to the (column) vector u ∈ Rn .1 The meaning follows from the context. Also, note that, for any u ∈ Rn and scalar c ∈ R, the notation u + c is used throughout the book to denote u + (c, . . . , c), where (c, . . . , c) ∈ Rn . Similar convention may be also used for matrices. A subset S of Rn is a subspace of Rn if it is a vector space over R. If C is a subset of Rn , then the span of C is the set span(C) = {a1 u1 +a2 u2 +· · ·+an un : 1
In the latter case, x = (uT vT )T ∈ Rn+m is a more precise notation.
S. Stanczak et al., Fundamentals of Resource Allocation in Wireless Networks, Foundations in Signal Processing, Communications and Networking 3, 2nd edn., c Springer-Verlag Berlin Heidelberg 2009 DOI 10.1007/978-3-540-79386-1 7,
348
A Some Concepts and Results from Matrix Analysis
ai ∈ R, ui ∈ C, 1 ≤ i ≤ n},
which is a subspace of Rn . Given
n any b ∈ R and n n n c ∈ R , the sets {u ∈ R : i=1 ui ci ≤ b} and {u ∈ Rn : i=1 ui ci ≥ b} are the (affine closed) half-spaces of Rn attached to (c, b). An important concept in linear algebra is that of linear dependence (independence). Definition A.1 (Linear Dependence). Let S ⊆ Rn be a vector space over R and u1 , . . . , un be arbitrary
n vectors of S. If there are numbers a1 , . . . , an in R, 0, then the vectors u1 , . . . , un are said to not all zero, such that i=1 ai ui =
n be linearly dependent. Otherwise, if i=1 ai ui = 0 implies a1 = · · · = an = 0, then the vectors are linearly independent. In the book, we use the following notation: 0 = (0, . . . , 0) is the zero vector, 1 = (1, . . . , 1) is the vector of ones, and ei = (0, . . . , 0, 1, 0, . . . , 0) is a unit vector with 1 in the ith position and zeros elsewhere. Furthermore, given any two vectors u ∈ Rn and v ∈ Rm for some n, m ≥ 1, (u, v) ∈ Rn+m is to be regarded, if necessary, as a column vector that is obtained by appending the (column) vector v to the (column) vector u so that (u, v) = (u1 , . . . un , v1 , . . . , vm ). Throughout the book, we use partial ordering on Rn defined as follows: For any u, v ∈ Rn , there holds u ≤ v ⇔ ∀1≤i≤n ui ≤ vi
u < v ⇔ ∀1≤i≤n ui < vi
u = v ⇔ ∀1≤i≤n ui = vi .
(A.1)
Note that this is a special case of generalized inequality associated with the cone Rn+ [16, Sect. 2.4]. When ∀1≤i≤n ui ≥ c for some constant c, we write u ≥ c with equality if and only if ui = c for each 1 ≤ i ≤ n. So we have v ≥ u if and only if s = v − u ≥ 0 and v > u if and only if s = v − u > 0 where s ∈ Rn . Moreover, if u ≤ v and v ≤ s for any u, v, s ∈ Rn , then u ≤ s. We say that Rn is a normed vector space if there is a map Rn → R : x → x (called a norm on Rn ) satisfying the following properties: ∀u∈Rn u ≥ 0 u = 0 ⇔ u = 0
∀λ∈R,u∈Rn λu = |λ| · u ∀u,v∈Rn u + v ≤ u + v .
(A.2)
All the norms used in this book are lp -norms and the maximum norm: For any p ≥ 1, the lp -norm and the maximum norm of u ∈ Rn are defined to be p1 n |ui |p and u∞ = max(|u1 |, . . . , |un |) up = i=1
respectively. A vector space equipped with a lp -norm is called the lp space. Minkowski’s inequality establishes that the lp spaces are normed vector spaces. Theorem A.2 (Minkowski’s inequality). For 1 ≤ p ≤ ∞, u + vp ≤ up + vp .
(A.3)
A.2 Matrices and Matrix Norms
349
The lp spaces are also metric spaces with the distance d(u, v) from u ∈ Rn to v ∈ Rn given by d(u, v) = u − v for some lp -norm on Rn . An important result is the equivalence of norms in finite dimension [190]. Theorem A.3. If S is a normed finite-dimensional vector space, then all norms on S are equivalent: If S → R : x → xp and S → R : x → xq are any two norms on S ⊆ Rn , then there exist positive real numbers c1 , c2 such that (A.4) c1 xq ≤ xp ≤ c2 xq for all x ∈ S. The vector space Rn becomes an inner product space if it is equipped with the inner product defined as u, v =
n
u i vi ,
u, v ∈ Rn .
i=1
H¨older’s inequality provides a relationship between the inner product of two vectors and their norms. The following theorem presents H¨ older’s inequality for nonnegative vectors. H¨ older’s inequalities for general complex vectors (along with the equality conditions) can be found in [133, pp. 50-54]. Theorem A.4 (H¨ older’s Inequality). Let u ∈ Rn+ and v ∈ Rn+ be arbitrary. Then, (A.5) u, v ≤ up vq where p, q > 1 are chosen so that 1/p + 1/q = 1. Equality holds if and only if vk = c up−1 , 1 ≤ k ≤ K, for some constant c > 0. k When p = 2, Rn is called Euclidean space (or, more precisely, Euclidean nspace) with theEuclidean distance d(u, v) = u − v2 and the Euclidean norm u2 = u, u. The Euclidean space Rn is a Hilbert space since it is complete with respect to its norm induced by the inner product. When p = q = 2, the general form of the inequality (A.5) is better known as the Cauchy–Schwarz inequality: |u, v| ≤ u2 v2
(A.6)
for all u, v ∈ Rn , with equality if and only if u = cv for some constant c = 0. If not otherwise stated, throughout the book, Rn can be regarded as the Euclidean space, on which also other norms can be defined.
A.2 Matrices and Matrix Norms If n, m ≥ 1, a matrix X of size n × m with entries in K is an array with n rows and m columns:
350
A Some Concepts and Results from Matrix Analysis
⎞ x1,1 · · · x1,m ⎟ ⎜ X = ⎝ ... . . . ... ⎠ . xn,1 · · · xn,m ⎛
We write X = (xi,j )1≤i≤n,1≤j≤m or simply X = (xi,j ) if the matrix size is clear. The entries of X are also denoted by (X)i,j . The set of all n×m matrices over R form a vector space denoted by Rn×m . The zero matrix 0 is a matrix all whose entries are equal to zero. If m = n, the matrix X is said to be square. A n × n diagonal matrix X denoted by X = diag(x) = diag(x1 , . . . , xn ) is a matrix with diagonal entries x1 , . . . , xn and all off-diagonal entries being equal to zero. In particular, I = diag(1) = diag(1, . . . , 1) is referred as the identity matrix or, simply, the identity. If the dimension of the identity matrix I ∈ Rn×n is not clear from the context, then we write In . Given any matrix X = (xi,j ) ∈ Rn×n , we use diag(X) to denote the n × n diagonal matrix such that (diag(X))i,i = xi,i , 1 ≤ i ≤ n. Thus, X is a diagonal matrix if and only if X = diag(X). For two matrices X ∈ Rn×n and Y ∈ Rm×m , we slightly abuse the formalism by defining X 0 diag(X, Y) = (A.7) 0T Y where 0 ∈ Rn×m and 0T ∈ Rm×n (see also Definition A.5). For square matrices, the following standard definitions are used: A matrix X ∈ Rn×n is said to be (a) a permutation matrix if exactly one entry in each row and column is equal to 1 and all other entries are 0, (b) an invertible matrix if there exists a matrix X−1 ∈ Rn×n called the inverse (matrix) of X such that XX−1 = I, (c) an upper (respectively, lower) triangular matrix if xi,j = 0 whenever j < i (respectively, j > i). If xi,j = 0 whenever j ≤ i (respectively, j ≥ i), then X is said to be strictly upper (respectively, lower) triangular. As in the case of vectors, we define a partial ordering on Rn×m as follows: For any A ∈ Rn×m and B ∈ Rn×m , A ≤ B ⇔ ∀i,j ai,j ≤ bi,j A < B ⇔ ∀i,j ai,j < bi,j A = B ⇔ ∀i,j ai,j = bi,j . Again, if ∀i,j ai,j ≥ c for some constant c, we write A ≥ c. For any two matrices A, B ∈ Rn×m , their Hadamard product A ◦ B is the entry-wise product of A and B. When considered in connection with matrices, vectors are to be regarded as column vectors. Given a matrix X ∈ Rn×m , a matrix norm of X is denoted by X. General matrix norms satisfy the conditions in (A.2), with the vector u replaced by some matrix. Additionally, if AB exists, we have
A.3 Square Matrices and Eigenvalues
351
AB ≤ AB . The simplest notion of a matrix norm of X ∈ Rn×m is the Frobenius norm given by X2F = |xi,j |2 = trace(XT X) (A.8)
i,j
where trace(X) = i xi,i is the trace of a matrix X. Other widely used matrix norms are induced matrix norms: For any X ∈ Rn×m , define X =
max
u∈Rm ,u=1
Xu
(A.9)
2 where · denotes any vector norm satisfying √ (A.2). If the l -norm is used, then X2 is the matrix 2-norm equal to λmax where λmax is the largest eigenvalue of XT X (see below) and XT is the transpose of X defined as follows.
Definition A.5 (Matrix Transpose). The transpose matrix of X = (xi,j ) ∈ Rn×m is defined as a matrix XT ∈ Rm×n such that (XT )i,j = xj,i . Furthermore, we have X1 = max Xu1 = max u1 =1
j
n
|xi,j |
i=1 m
X∞ = max Xu∞ = max u∞ =1
i
(A.10) |xi,j | .
j=1
Finally, we point out that every matrix X ∈ Rn×m can be considered as a linear map from Rm into Rn . Recall that a map f : Rm → Rn is said to be linear if f (u + v) = f (u) + f (v) and f (au) = a f (u) for every u, v ∈ Rm and a ∈ R. Because any linear map between finite-dimensional spaces can be represented as a matrix, we call such maps matrix transformations and write Rm → Rn : u → Xu = v. The image of u under X is the vector v. The image of Rm under X is called the range of X and is denoted by R(X). The kernel or nullspace of X is the set ker(X) of those u ∈ Rm for which Xu = 0. The rank of X denoted by rank(X) ≤ min{m, n} is the greatest number of linearly independent rows (or columns). The column space of the matrix X is equal to R(X), whereas the row space of X is the range of XT .
A.3 Square Matrices and Eigenvalues From now on we focus on square matrices of size n × n over R.
352
A Some Concepts and Results from Matrix Analysis
Definition A.6 (Eigenvalues and Eigenvectors). For an arbitrary n × n matrix X, scalars λ ∈ C and n-vectors p = 0 (with p ∈ Cn ) satisfying λp = Xp (over C) are called eigenvalues and right eigenvectors of X, respectively. The pair (λ, p) is an eigenpair of X or, equivalently, p is an eigenvector of X associated with an eigenvalue λ. A left eigenvector q = 0 of X associated with λ is a n-vector satisfying λq = XT q. It is important to emphasize that even if X is a real-valued matrix, its eigenvalues are in general complex numbers. It is easy to see that eigenvalues of X are exactly those scalars λ ∈ C for which both of the following hold: (i) the matrix A = λI−X is singular, that is, A is not invertible. This means that there is no matrix A−1 ∈ Rn×n such that AA−1 = A−1 A = I. (ii) p(λ) = det(λI − X) = 0 in C where det(A) denotes determinant of A. Recall that det(A) =
σ
n
(σ)
a1,σj j=1
where the sum is taken over all permutations σ = (σ1 , . . . , σn ) of (1, 2, . . . , n) and (σ) = ±1 is equal to 1 if σ is the product of an even number of transpositions, and −1 otherwise. The polynomial p(λ) is called the characteristic polynomial of X. Hence the second statement says that the eigenvalues of X are the roots of its characteristic polynomial in the field of complex numbers. We see that if p is the characteristic polynomial of X, then p(X) = 0. Any polynomial whose value at X is the zero matrix is said to annihilate X or to be annihilating polynomial of X. So, the characteristic polynomial is an example of an annihilating polynomial of X. Now a monic polynomial q(λ) of least degree that annihilate X is called a minimal polynomial of X, which is known to be unique for a given matrix. Note that every annihilating polynomial of X is divisible without remainder by the minimal polynomial. In particular, this is true for the characteristic polynomial. As the order of the polynomial is n, we see that X ∈ Rn×n has altogether n eigenvalues. Some of these eigenvalues, however, can be repeated. Definition A.7 (Algebraic and Geometric Multiplicity). The multiplicity of λ as a root of the characteristic polynomial is called algebraic multiplicity of the eigenvalue λ. The geometric multiplicity of λ is the dimension of ker(λI − X) in Rn . The geometric multiplicity is smaller than or equal to the algebraic multiplicity. Eigenvalues with algebraic multiplicity 1 are called simple. The following fundamental result is invoked throughout the book whenever we say that the eigenvalues of a matrix are continuous functions of its entries [7].
A.3 Square Matrices and Eigenvalues
353
Theorem A.8. Let X ∈ Rn×n be arbitrary, and let us fix some norm on Rn . Suppose that λ ∈ C is any eigenvalue of X, with multiplicity μ, and d is the distance from λ to the other eigenvalues of X. Let Bρ (λ) be an open disk of radius ρ with 0 < ρ < d centered at λ. Then, there exists > 0 such that if A ∈ Rn×n and A < , the sum of the algebraic multiplicities of the eigenvalues of X + A in Bρ (λ) is equal to μ. For a definition of an open disk, the reader is referred to Sect. B.1. As aforementioned, the geometric multiplicity of each eigenvalue is always smaller than its algebraic multiplicity. Now a square matrix X ∈ Rn×n is said to be simple if and only if, for each distinct eigenvalue, the geometric multiplicity is equal to the algebraic mulitplicity. An important consequence of this is that the simple matrix X has n linearly independent left and right eigenvectors. The sets of right and left eigenvectors are quasi-biorthogonal so that, if Q = (q1 , . . . , qn ) and P = (p1 , . . . , pn ) are the matrices of left and right eigenvectors, respectively, then QT P = I and PQT = I. Finally, we point out that any simple matrix is similar to a diagonal matrix. Definition A.9 (Similarity). We say that the matrices A, B ∈ Rn×n are similar if there exists a nonsingular matrix T ∈ Rn×n such that B = T−1 AT. Notice that similar matrices have the same eigenvalues. A.3.1 Matrix Spectrum, Spectral Radius and Neumann Series First let us define the matrix spectrum. Definition A.10 (Matrix Spectrum). The set of distinct eigenvalues of X is referred to as the spectrum of X and is denoted by σ(X). Since the roots of a polynomial with real coefficients occur in conjugate pairs, ¯ ∈ σ(X) where x λ ∈ σ(X) implies that λ ¯ denotes the conjugate complex. Furthermore, we have σ(X) = σ(XT ). The following result expresses a simple matrix X ∈ Rn×n in terms of the matrix spectrum. Theorem A.11. Suppose that X ∈ Rn×n is a simple matrix and p is a scalar polynomial. Let Q = (q1 , . . . , qn ) and P = (p1 , . . . , pn ) be the left and right eigenvectors of X corresponding to eigenvalues λ1 , . . . , λn . Then, p(X) =
n
p(λj )pj qTj
(A.11)
j=1
where p(X) := c0 Xn + a1 Xn−1 + · · · + an−1 X + an I. Using the theorem and the fact that det(λI − X) = simple, one can show that
!n
j=1 (λ
− λj ) if X is
354
A Some Concepts and Results from Matrix Analysis
(λI − X)−1 =
n j=1
1 p qT , λ − λj j j
λ = λj
(A.12)
holds for any simple matrix X ∈ Rn×n where the notation of Theorem A.11 was used. The spectral theorem for simple matrices (Theorem A.11) can be generalized to find spectral resolution of a function defined on the spectrum of an arbitrary matrix X ∈ Rn×n [107, Theorem 5.4.1]. This spectral resolution in turn implies the following generalization of (A.12). Theorem A.12. Let X ∈ Rn×n be!arbitrary, and let σ(X) = {λ1 , . . . , λp } be s its spectrum. Suppose that q(λ) = k=1 (λ − λk )mk is the minimal polynomial of A. Then, there exist matrices Zk,j ∈ Rn×n , 1 ≤ j ≤ mk , 1 ≤ k ≤ s, such that mk s (j − 1)! −1 (λI − X) = Zk,j . (A.13) (λ − λk )j j=1 k=1
The matrices Zk,j = 0 are linearly independent and commute with X. The matrices Zk,j in (A.13) are called principal component matrices [36] of X or simply components of X [107, p.174]. It is important to notice that the principal component matrices are intrinsic property of the matrix X. By [107, Theorem 5.5.2], there holds Zk,j =
1 (X − λk I)j−1 Zk,1 , 1 ≤ j ≤ mk , 1 ≤ k ≤ s (j − 1)!
(A.14)
where the matrices Zk,1 , 1 ≤ k ≤ s, are idempotent and their column spaces are given by [107, Theorem 5.5.3] R(Zk,1 ) = ker[(X − λk I)mk ] .
(A.15)
Further characterizations and properties of these matrices can be found in [107, pp. 176–180]. For our purposes in Sect. A.4.4, it is sufficient to bear in mind the fact that Zk,1 are idempotent, meaning that Z2k,1 = Zk,1 , 1 ≤ k ≤ s. So, the eigenvalues of Zk,1 are either 0 or 1. The agreement between the range (column space) of Zk,1 and the kernel (null space) of (X − λk I)mk given by (A.15) is of interest insofar as it can be used together with the idempotency property to show that if p ∈ Rn and q ∈ Rn are right and left eigenvectors of X, then Z1,1 p = p and qT Z1,1 = qT (see also [107, p. 179]). Definition A.13 (Spectral Radius). For any square matrix X ∈ Rn×n , we define ρ : Rn×n → R as ρ(X) := max{|λ| : λ ∈ σ(X)} . The real number ρ(X) is called the spectral radius of X.
(A.16)
A.3 Square Matrices and Eigenvalues
355
Thus, in order to obtain an upper bound on the magnitudes of all eigenvalues of X, it is sufficient to bound above the spectral radius. A crude upper bound is given by the following observation Observation A.14. For any matrix norm and X ∈ Rn×n , there holds ρ(X) ≤ X
(A.17)
Proof. Suppose that (λ, p) with p = 0 is any eigenpair of X. Let · be any matrix norm, and let P ∈ Rn×n be a matrix all of whose columns are equal to p. Then, |λ|P = λP = XP ≤ XP. Hence, |λ| ≤ ρ(X) ≤ X for any matrix norm. In particular, the observation implies that ρ(X) = ρ(Xk )1/k ≤ Xk 1/k for any k ≥ 1. This and the equivalence of the norms in finite dimensional spaces are key ingredients in proving the following result. Theorem A.15. Let X ∈ Rn×n be arbitrary. Then, ρ(X) = lim Xk 1/k k→∞
(A.18)
for every matrix norm · .
∞ Given any X ∈ Rn×n , the convergence of the Neumann series k=0 Xk is fundamental for some of the concepts introduced in this book. The following result provides a necessary and sufficient condition for the Neumann series to converge
Theorem A.16. Let X ∈ Rn×n be arbitrary. Then, the following statements are equivalent.
∞ (i) k=0 Xk converges. (ii) ρ(X) < 1. (iii) limk→∞ Xk = 0.
∞ In these cases, (I − X)−1 exists, and (I − X)−1 = k=0 Xk . A.3.2 Orthogonal, Symmetric and Positive Semidefinite Matrices A matrix X ∈ Rn×n is said to be normal if X commutes with XT . In other words, if X is normal, then XXT = XT X. Important subsets of normal matrices are the sets of orthogonal and symmetric matrices. Definition A.17 (Orthogonal and Symmetric Matrices). We say that X ∈ Rn×n is an orthogonal matrix if XT X = XXT = I. It is said to be symmetric if XT = X. If X is orthogonal and p = 0 is an eigenvector of X associated with λ ∈ σ(X), then |λ|2 p22 = (λp)T (λp) = pT XT Xp = p22 . Hence, the eigenvalues of any orthogonal matrix are (in general) complex numbers of modulus one. For symmetric matrices, we have the following standard results.
356
A Some Concepts and Results from Matrix Analysis
Theorem A.18. The eigenvalues of symmetric matrices are real. Theorem A.19. Symmetric matrices are orthogonally similar to a diagonal matrix. In other words, given a symmetric matrix X, there exists an orthogonal matrix U such that UXUT is diagonal. ˆ X ˇ be arbitrary symmetric matrices, and let X(μ) = Theorem A.20. Let X, ˆ + μX ˇ for μ ∈ [0, 1]. Then, (1 − μ)X ˆ + μρ(X) ˇ ρ(X(μ)) ≤ (1 − μ)ρ(X)
(A.19)
for all μ ∈ (0, 1). In words, the theorem says that the spectral radius is convex on the set of symmetric matrices (for the definition of convexity, the reader is referred to App. B). This immediately becomes obvious when one considers that ρ(X) = sup{uT Xu : uT u = 1, X = XT } . Another fundamental notion in matrix analysis is that of positive semidefiniteness. Definition A.21 (Positive Semidefinite Matrix). We say that a symmetric matrix X ∈ Rn×n is positive semidefinite if uT Xu ≥ 0 for all u ∈ Rn . If there is strict inequality for all u ∈ Rn with u = 0, then X is said to be positive definite. We write X0 ⇔ 0"X
and
X0 ⇔ 0≺X
(A.20)
if X is positive semidefinite and positive definite, respectively. Moreover, we write X Y and X Y if X − Y 0 and X − Y 0, respectively. Clearly, every symmetric positive semidefinite matrix is orthogonally similar to a diagonal matrix and all its eigenvalues are real. In addition, however, all eigenvalues are nonnegative. Theorem A.22. A symmetric matrix X ∈ Rn×n is positive semidefinite if and only if σ(X) is a subset of the nonnegative reals. It is positive definite if and only if σ(X) is a subset of the positive reals. An immediate consequence of Definition A.21 is that AAT ∈ Rn×n is positive semidefinite for any (not necessarily square) matrix A ∈ Rn×m . This is because uT AAT u = AT u22 ≥ 0 for any u ∈ Rn . In particular, for any u ∈ Rn , the outer product of u given by uuT ∈ Rn×n is a positive semidefinite matrix with rank 1. Moreover, as uuT u = u22 u, the vector au for any a = 0, is an eigenvector of uuT associated with the positive eigenvalue u22 > 0.
A.4 Perron–Frobenius Theory
357
Remarks on Complex-valued Matrices The above definitions extend to matrices over the field of complex numbers C if the matrix transpose is substituted by the matrix conjugate transpose. Definition A.23 (Matrix Conjugate Transpose). Let X = (xi,j ) ∈ Cn×m be a n × m matrix whose entries are complex numbers. Then, the transpose conjugate matrix of X is defined as a matrix XH ∈ Cm×n such that (XH )i,j = xj,i where a denotes the conjugate complex of a ∈ C. Now a matrix X ∈ Cn×n is called (a) unitary if XH X = XXH = I, (b) Hermitian if XH = X, and (c) positive semidefinite (definite) if it is Hermitian and uH Xu ≥ 0 (uH Xu > 0) for all u ∈ Cn (with u = 0). With these definitions, Theorems A.18 and A.19 are true for Hermitian matrices. Theorem A.22 extends to complex-valued matrices by substituting the matrix transpose by the matrix conjugate transpose. The same is true for the remark after Theorem A.22.
A.4 Perron–Frobenius Theory In this section, we focus on vectors and n × n matrices defined over R+ ⊂ R and R++ ⊂ R+ . Definition A.24. Any square matrix X ∈ Rn×n whose entries are nonnegative (positive) reals is called a nonnegative (positive) matrix. The set of all n × n nonnegative (positive) matrices is denoted by Nn (Mn ). Later we will additionally introduce the set of all n × n irreducible matrices which is denoted by Xn . We have Mn ⊂ Xn ⊂ Nn . The following implications can be easily verified. ∀k≥1 X ≥ 0 ⇒ Xk ≥ 0 and X > 0 ⇒ Xk > 0 X > 0, u ≥ 0, u = 0 ⇒ Xu > 0
(A.21) (A.22)
X ≥ 0, u > 0, Xu = 0 ⇒ X = 0 .
(A.23)
If |X| = (|xi,j |), then we further have |Xm | ≤ |X|m for all m ∈ N and if 0 ≤ A ≤ B, then 0 ≤ Am ≤ Bm . Moreover, if |A| ≤ |B|, then AF ≤ BF where · F is the Frobenius norm defined by (A.8). Finally, it is clear that XF = |X|F . Combining these observations yields 1/m
Am F
1/m
≤ |A|m F
1/m
≤ Bm F
for all m ∈ N and A, B ∈ Rn×n such that |A| ≤ B. Now if we let m → ∞ and apply Theorem A.15, we obtain the following important result.
358
A Some Concepts and Results from Matrix Analysis
Theorem A.25. Let A, B ∈ Rn×n . If |A| ≤ B, then ρ(A) ≤ ρ(|A|) ≤ ρ(B). Moreover, if |A| < B, then ρ(|A|) < ρ(B). Therefore, if X ≥ 0, then the spectral radius ρ(X) is monotonic with respect to the matrix entries. An important set of nonnegative matrices is the set of stochastic matrices. Definition A.26 (Stochastic Matrix). X ∈ Nn is said to be stochastic if ∀1≤i≤n
n
xi,j = 1 .
j=1
If both X and XT are stochastic, then X is said to be doubly stochastic.2 It follows from the definition that, for any stochastic matrix X, we have X1 = 1 so that 1 is an eigenvalue of X. Furthermore, since ρ(X) ≤ X∞ = 1 (by (A.10) and (A.17)), it can be concluded that ρ(X) = 1 in the case of any stochastic matrix X. The Perron–Frobenius theory addresses the problem to what extent the nonnegativity property of a matrix is inherited by its eigenvalues and eigenvectors [4, 5, 6, 189]. Intuitively, it can be expected that the spectral properties in terms of positivity of the spectral radius and associated eigenvectors somehow depend on the number and position of positive entries of X. For instance, consider X = 00 10 in which case there is neither a positive eigenvalue (σ(X) = {0}) nor a positive eigenvector. In contrast, X = 01 10 has significantly stronger properties (in terms of Perron–Frobenius theory) than the previous example, although the matrices differ from each other only in one position: The spectrum is now σ(X) = {+1, −1} so that √ the matrix has √ a simple positive eigenvalue λp = ρ(X) = 1. Moreover, (1/ 2, 1/ 2) is a positive right eigenvector associated with λp . It should be emphasized that these positivity properties (that is the existence of a positive right eigenvector that is associated with a simple positive eigenvalue being equal the spectral radius) are not exclusively due to the increased number of positive entries in X. In fact, what really matters is a combination of the number of positive entries and their “suitable positions”. This becomes clear when considering the matrix X = 10 10 . Despite having the same number of positive entries as the previous example, it is not possible to associate a positive eigenvector with λp = ρ(X) = 1. An important difference between these two matrices is captured by the notion of reducibility. A.4.1 Perron–Frobenius Theorem for Irreducible Matrices Definition A.27 (Reducible and irreducible matrices). A nonnegative matrix X ∈ Nn , n ≥ 1, is said to be reducible if 2
Notice that stochastic and doubly stochastic matrices are also sometimes referred to as semi-stochastic and stochastic matrices, respectively.
A.4 Perron–Frobenius Theory
359
(i) n = 1 and X = 0 or (ii) there exists a permutation matrix P such that A 0 T P XP = BC where A and C are both square matrices. Otherwise, X is said to be irreducible. The set of all n × n nonnegative irreducible matrices is denoted by Xn ⊂ Nn . Note that by the definition, any positive scalar is an irreducible matrix and the zero is a reducible one. Hereafter, unless explicitly stated, we assume square matrices to be of dimension greater than 1. One-dimensional matrices are used in the definition of a normal form (A.29) of a nonnegative matrix. The following result used later in the proof of Theorem A.52 is sometimes stated as the definition of irreducible matrices [4]. Lemma A.28. We have X ∈ Xn if and only if, for each (i, j) with 1 ≤ i, j ≤ (k) n, there exists k ≥ 0 such that xi,j := (Xk )i,j > 0. Let G(X), with X ∈ Nn , be defined as the directed graph of N = {1, . . . , n} nodes in which there is a directed edge leading from node j ∈ N to i ∈ N if and only if xi,j > 0. Definition A.29 (Strongly Connected Graph). We say that G(X) is strongly connected if for each pair of nodes (i, j) there is a sequence of directed edges leading from i to j. The following connection between irreducibility and strong connectivity is well known [189]. Observation A.30. We have X ∈ Xn if and only if G(X) is strongly connected. This observation is, for instance, useful to verify the irreducibility property of relatively small matrices. So it may be easily seen that X = 01 10 is irreducible, whereas the other two examples above are not. The following lemma (which is directly connected to the previous one) shows that irreducible matrices can be easily converted into positive ones. The proof can be found in many textbooks (see references at the beginning of this appendix). Lemma A.31. If X ∈ Xn , then (I + X)n−1 > 0. Now we use this result to prove the Perron–Frobenius theorem which is of central importance for the theory presented in this book. Theorem A.32 (Perron–Frobenius theorem). Let X ∈ Xn . Then, there exists an eigenvalue λp ∈ σ(X) such that
360
A Some Concepts and Results from Matrix Analysis
(i) λp = ρ(X) > 0, and hence λp ≥ |λ| for any eigenvalue λ = λp , (ii) positive left and right eigenvectors can be associated with λp , (iii) the eigenvectors associated with λp are unique up to positive multiples, (iv) λp is a simple root of the characteristic equation of X. (v) If λ ∈ σ(X) and λ u = Xu for some u ≥ 0, u = 0, then λ = λp . A proof of this theorem is provided by most books on matrix theory. We present a complete proof of this result here because of its key role in the theory of nonnegative matrices. Proof. Let s ∈ Rn+ with s = 0, and let λp (s) := min i
(Xs)i si
where the ratio is assumed to be +∞ if si = 0. It is clear that 0 ≤ λp (s) < +∞ for all s ≥ 0, s = 0. Moreover, it follows that λp (s)si ≤ (Xs)i for all 1 ≤ i ≤ n. Therefore, λp (s)s ≤ Xs, from which we obtain λp (s) ≤ 1T Xs/1T s ≤ X1 . In words, λp (s) is uniformly bounded above for all s ≥ 0 with s = 0. Since λp (1) > 0, this implies that λp = sup min s∈Rn +
i
(Xs)i si
(A.24)
s =0
satisfies 0 < λp (1) ≤ λp ≤ X1 < +∞. Furthermore, as λp (αs) = λp (s) for all α > 0, one has (Xs)i λp = sup min si s∈C i where C = {s ∈ Rn+ : sT s = 1} ⊂ Rn . Hence, λp is attained for some vector p ∈ Rn+ . In other words, there must exist p ∈ Rn+ with p = 0 such that λp = mini (Xp)i /pi . From this it follows that ∀1≤i≤n (Xp)i ≥ λp pi with equality for some i. Thus, u = Xp − λp p ∈ Rn+ . Suppose that u ≥ 0, u = 0. Then, by Lemma A.31 and (A.22), (I + X)n−1 u = (I + X)n−1 Xp − λp p > 0 ⇒ Xy − λp y > 0 where y = (I + X)n−1 p. So the last inequality yields ∀1≤i≤n λp <
(Xy)i . yi
But this contradicts (A.24), and therefore u = 0 or, equivalently, Xp = λp p. This implies that λp is a real-valued eigenvalue of X. Now we show that λp ≥ |λ| for all λ ∈ σ(X). Let λs = Xs, s = 0. Taking the modulus of both sides yields |λ||s| ≤ X|s| where |s| = (|s1 |, . . . , |sn |). Therefore, by (A.24), |λ| ≤ min
1≤i≤n
(X|s|)i ≤ λp . |si |
A.4 Perron–Frobenius Theory
361
This completes the proof of (i). (ii) Multiplying Xp = λp p by (I + X)n−1 > 0 gives (I + X)n−1 Xp = X(I + X)n−1 p = Xy = λp y. Therefore, by (A.22), a right eigenvector of X associated with λp is positive. Obviously, X ∈ Xn if and only if XT ∈ Xn so that the proof of (i) can be applied to XT . In particular, since σ(X) = σ(XT ), there must exist q ∈ Rn+ , q = 0, such that λp q = XT q with λp ≥ |λ| for all λ ∈ σ(XT ). Finally, proceeding as above shows that it is possible to associate with λp a positive left eigenvector. (iii) Assume that the claim is not true, and let s > 0 and u > 0 be two linearly independent right eigenvectors of X associated with λp . Then, there must exist constants α and β such that p = αs + βu ≥ 0 has at least one zero coordinate and p = 0 is a right eigenvector of X associated with λp . However, this contradicts p > 0, so p must be unique up to positive multiples. The same reasoning applies to q. (iv) Let A(λ) ∈ Rn×n with λ ∈ R be the adjugate matrix of the characteristic matrix (λI − X). So we have A(λ) = (ai,j (λ)) = p(λ)(λI − X)−1 where p(λ) is the characteristic polynomial of the matrix X. Furthermore,
n a (λ). We are going to show that the derivative of p(λ) is p (λ) = j=1 j,j p (λp ) = 0. Due to (ii) and (iii), λp has a unique (up to positive multiples) positive right eigenvector p > 0. Therefore, A(λp ) = 0 and each column of A(λp ) is either the zero vector or a vector of all whose elements are nonzero and have the same sign. Considering the transpose XT shows that the same applies to the rows of A(λp ). Thus, since A(λp ) = 0, it follows that all entries and have the same sign as, say s = 0. This implies that of A(λp ) are nonzero
n s · p (λp ) = s j=1 aj,j (λp ) > 0. So p (λp ) = 0, and therefore λp is a simple root of p(λp ) = 0 or, equivalently, a simple eigenvalue of X.3 (v) Let λ ∈ σ(X) and let u ≥ 0, u = 0, be a right eigenvector of X associated with λ. Suppose that y > 0 satisfies λp y = XT y and note that due to (ii), such a vector exists. Multiplying both sides of λp yT = yT X by u ≥ 0 yields λp yT u = yT Xu = λ yT u. Thus, since yT u > 0 (due to positivity of y), we obtain λ = λp . Definition A.33 (Perron Root and Perron Eigenvector). λp = ρ(X) > 0 in Theorem A.32 is called the Perron root of X ∈ XK . We say that p ∈ Rn+ is the right Perron eigenvector of X if (and only if ) Xp = λp p
p ∈ Rn++
p1 = 1 .
(A.25)
If X is replaced by XT in (A.25), then p is called the left Perron eigenvector of X. Remark A.34. It is important to notice that part (v) of Theorem A.32 implies that if X is irreducible, then X has no nonnegative right eigenvectors except 3
In fact, note that p(λ) is monotonically increasing for all λ ≤ λp since λp is the largest root of p(λ) = 0 over R. Hence, we have p (λp ) > 0.
362
A Some Concepts and Results from Matrix Analysis
for all positive scalars multiples of the Perron eigenvector. Obviously, the same statement is true for left eigenvector of an irreducible matrix. The proof of (i) in Theorem A.32 gives rise to the so-called Collatz– Wielandt formula for the Perron root of irreducible matrices. Theorem A.35 (Collatz–Wielandt Formula). For every X ∈ Xn , one obtains min λp = ρ(X) = max n
s∈R++ 1≤i≤n
(Xs)i (Xs)i = min max . 1≤i≤n s∈Rn si si ++
(A.26)
The maximum and minimum are attained if and only if s is a positive right eigenvector of X associated with ρ(X). The proof of the min-max characterization in (A.26) proceeds along similar lines as the proof of the max-min representation, which is itself a part of Theorem A.32. Finally, we point out the following characterization of the Perron root. Theorem A.36. Let X ∈ Xn be arbitrary, and let λp = ρ(X) be the Perron root of X. Then, for each 1 ≤ i ≤ n and every s > 0, we have 1 1 log (Xk )i,j sj = lim log (Xk )j,i sj = log λp . k→∞ k k→∞ k j=1 j=1 n
n
lim
(A.27)
The proof of this theorem can be found in [15, pp. 72–73]. A.4.2 Perron–Frobenius Theorem for Primitive Matrices A set of primitive matrices constitutes an important subset of irreducible matrices. Definition A.37 (Primitive Matrices). A nonnegative matrix X ∈ Nn is said to be primitive if there exists k ≥ 1 such that Xk > 0. Comparing this definition with Lemma A.28 reveals that every primitive matrix is irreducible. The converse however does not need to hold. Furthermore, a primitive matrix does not need to be positive, although every positive matrix is primitive. For example, X = 11 10 is primitive since X2 = 21 11 . For primitive matrices, there is a somewhat stronger version of the Perron– Frobenius theorem. Since the primitivity is not necessary for the results presented in the book, we omit the proof. Theorem A.38 (Perron theorem). Let X ∈ Nn be primitive. Then, there exists an eigenvalue λp ∈ σ(X) such that (i) λp = ρ(X) > 0 and λp > |λ| for any eigenvalue λ = λp , (ii) positive left and right eigenvectors can be associated with λp ,
A.4 Perron–Frobenius Theory
363
(iii) the eigenvectors associated with λp are unique up to positive multiples, (iv) λp is a simple root of the characteristic equation of X. (v) If λ ∈ σ(X) and λ u = Xu for some u ≥ 0, u = 0, then λ = λp . As any primitive matrix is irreducible, λp = ρ(X) > 0 and (ii)–(v) follow from Theorem A.32. So, the only difference to the case of irreducible matrices is that λp > |λ| for any λ ∈ σ(X) with λ = λp . In other words, the Perron root of any primitive matrix is the only eigenvalue on the boundary of a disk centered at zero and with radius ρ(X), and thus all other eigenvalues are interior points of this disk. A.4.3 Some Extensions to Reducible Matrices The following result, which is sometimes referred to as the weak form of the Perron–Frobenius theorem [7], shows that some of the spectral properties of irreducible matrices carry over to general nonnegative matrices. Theorem A.39 (Weak Form of the Perron–Frobenius theorem). If X ∈ Nn , then λp = ρ(X) is an eigenvalue of X associated with a nonnegative eigenvector p = 0. Note that except for the nonnegativity property, there are no additional constraints on X. Proof. It is clear that any nonnegative matrix X can be written as a limit value of a sequence of positive matrices {X(k) }k∈N : X = lim X(k) k→∞
(k)
with
X(k) > X(k+1) > 0, k ∈ N .
(A.28)
(k)
Let λp = ρ(X(k) ) for every X(k) > 0. By Theorem A.38, λp is an eigenvalue (k) (k) of X(k) , that is, λp ∈ σ(X(k) ). Moreover, A.25 implies that {λp }k∈N is a strictly decreasing sequence of positive real numbers that is bounded below (k ) by λp = ρ(X) ≥ 0. Thus, r = limj→∞ λp j exists with r ≥ λp for any subsequence {kj }j∈N of {1, 2, 3, . . . }. Now let p(k) > 0 be the right Perron eigenvector of X(k) , k ∈ N, whose existence is guaranteed by the Perron– Frobenius theorem. Since the sequence {p(k) }k∈N is bounded, the BolzanoWeierstrass theorem for sequences [191, p. 98] implies that it has a subsequence {p(kj ) }j∈N converging to some p∗ ≥ 0 such that p∗ = 0 because p(kj ) > 0 and p(kj ) 1 = 1. Thus, by (A.24), one obtains Xp∗ = limj→∞ X(kj ) p(kj ) = (k ) (k ) limj→∞ λp j p(kj ) = limj→∞ λp j limj→∞ p(kj ) = rp∗ . So, r ∈ σ(X), from which we have r ≤ λp . Combining everything yields r = λp , and hence Xp∗ = λp p∗ , p∗ ≥ 0, p∗ = 0. This completes the proof. Remark A.40. Note that in contrast to the case of irreducible matrices (Theorem A.32 and Definition A.33), Theorem A.39 does not allow us to conclude the existence of unique left and right Perron eigenvectors. Thus, in this section, q and p denote some nonnegative left and right eigenvectors of X ≥ 0 associated with ρ(X).
364
A Some Concepts and Results from Matrix Analysis
The eigenvalue λp = ρ(X) can be expressed in terms of the Perron roots of block-diagonal irreducible submatrices of X. Indeed, if X is reducible, it follows from Definition A.27 that there exists a permutation matrix P such that ⎛ (1) ⎞ X 0 ··· 0 ⎜X(2,1) X(2) . . . 0 ⎟ ⎟ (A.29) PT XP = ⎜ ⎝ ··· ··· ··· ··· ⎠ X(s,1) X(s,2) . . . X(s) n ×nj
where the diagonal block X(j) ∈ R+j following:
, 1 ≤ j ≤ s, satisfies one of the
(C.A-1) nj = 1 and X(j) = 0. (C.A-2) nj ≥ 1 and X(j) is an irreducible matrix. Thus, the diagonal blocks in (A.29) are either nonnegative irreducible matrices or zero scalars (and hence reducible). For brevity and without loss of generality, we assume in this book that every diagonal block is a nonnegative irreducible matrix. In this case, for any s > 0, we have ∀1≤i≤n (Xs)i =
n
xi,j sj > 0 .
(A.30)
j=1
Definition A.41 (Normal Form). Given X ∈ Nn , any matrix of the form (A.29) subject to either (C.A-1) or (C.A-2) is called a normal form of X. Notice that the normal form of X in (A.29) is a block lower triangular matrix. It is well known [192, p. 311] that the spectrum of any block triangular matrix is the union of the spectrums of the diagonal blocks. Thus, from (A.29), we have σ(X) = ∪sj=1 σ(X(j) ). As an immediate consequence of this, the spectral radius ρ(X) is given by (see also [6]) ρ(X) = max ρ(X(i) ) : i = 1, 2, . . . , s
(A.31)
where X(i) , 1 ≤ i ≤ s, is the i-th irreducible diagonal block of X and ρ(X(i) ) its Perron root. The following definition is needed to formulate a necessary and sufficient condition for the existence of a positive right eigenvector of X. Definition A.42 (Maximal and Isolated Diagonal Blocks). In the matrix (A.29), we say that the ith diagonal block X(i) is (i) maximal if ρ(X(i) ) = ρ(X), and (ii) isolated if X(i,j) = 0 for each 1 ≤ j < i. Now we are in a position to state the following result [6].
A.4 Perron–Frobenius Theory
365
Theorem A.43. Suppose that I ⊆ {1, . . . , s} and M ⊆ {1, . . . , s} are the sets of indices corresponding to isolated and maximal diagonal blocks, respectively. Thus, {X(i) }i∈I is the set of isolated blocks and {X(i) }i∈M is the set of maximal blocks. Then, there exists a vector p > 0 with ρ(X)p = Xp if and only if I = M. In words, the theorem says that there exists a positive right eigenvector of X associated with ρ(X) if and only if every isolated block is maximal and there are no other maximal diagonal blocks. However, it is important to emphasize that this is not sufficient for the matrix X to have positive right and left eigenvectors associated with ρ(X). In order to completely identify the set of nonnegative matrices for which there exist positive left and right eigenvectors associated with ρ(X), we consider the following definition. Definition A.44 (Block-Irreducibility). We say that X ∈ Nn is blockirreducible if there exists a permutation matrix P such that ⎛ (1) X 0 ⎜ ⎜ 0 X(2) PT XP = ⎜ ⎜ . .. ⎝ .. . 0 0
⎞ 0 .. ⎟ . ⎟ ⎟ ⎟ ··· 0 ⎠ · · · X(s)
··· .. .
(A.32)
where all X(i) , 1 ≤ i ≤ s, are square irreducible matrices. In other words, X is block-irreducible if every diagonal block in the matrix (A.29) is isolated. In the case of block-irreducible matrices, we can further strengthen the weak form of the Perron–Frobenius theorem [6]. Theorem A.45. Let X ∈ Nn be arbitrary, and let λp = ρ(X). The eigenvalue λp can be associated with positive left and right eigenvectors if and only if X is block-irreducible and every diagonal block is maximal. Since a positive eigenvector associated with ρ(X) may not exist, it is not clear whether, and if so how, the Collatz–Wielandt characterization of the Perron root (Theorem A.35) extends to general nonnegative matrices. It is well known [5, Theorem 8.1.26] that, given any X ∈ Nn , ∀s∈Rn++
min
1≤k≤n
(Xs)k (Xs)k ≤ ρ(X) ≤ max . 1≤k≤n sk sk
(A.33)
To show the lower bound, one can proceed as Rn++ be
nfollows: Let s ∈ n×n where arbitrary and define A = (ai,j ) = (cxi,j sj / j=1 xi,j sj ) ∈ R+
n c = min1≤i≤n ((Xs)i /si ). As j=1 ai,j = c for each 1 ≤ i ≤ n, 1/cA is stochastic (Definition A.26), and hence we have ρ(A) = c. Moreover, A ≤ S−1 XS with S = diag(s1 , . . . , sK ) where the inverse exists due to the positivity of s. Hence, by Theorem A.25, one obtains
366
A Some Concepts and Results from Matrix Analysis
min
1≤k≤n
(Xs)k = ρ(A) ≤ ρ(S−1 XS) = ρ(XSS−1 ) = ρ(X) sk
which proves the lower bound. The upper bound can be established in a similar manner. Now, as an immediate consequence of (A.33), we have the following lemma, which is used later to prove an “inf-max” characterization of ρ(X). Lemma A.46. If X ∈ Nn , then sup
min
1≤k≤n s∈Rn ++
(Xs)k (Xs)k ≤ ρ(X) ≤ infn max . s∈R++ 1≤k≤n sk sk
(A.34)
As shown below (Theorem A.47), the inequality on the right-hand side of (A.34) holds with equality for an arbitrary nonnegative (not necessarily irreducible) matrix X. In contrast, the first inequality can be strict for some nonnegative matrices. For instance, consider a 2 × 2 diagonal matrix X = diag(x1 , x2 ) with 0 < x1 ≤ x2 so that sup s∈R2++
min
1≤k≤2
(Xs)k = x1 sk
inf
max
s∈R2++ 1≤k≤2
(Xs)k = x2 . sk
Hence, if x1 < x2 , we have strict inequality in (A.34). Theorem A.35 implies that there is no “gap” between the left-hand side and the right-hand side of (A.34) if there exists a positive right eigenvectors of X associated with the spectral radius ρ(X) [5, Corollary 8.1.31]. For general nonnegative matrices, however, there is only an “inf-max” characterization of the spectral radius, as shown by the following theorem. Theorem A.47. For any X ∈ Nn , we have ρ(X) = infn
max
s∈R++ 1≤k≤n
(Xs)k . sk
(A.35)
Proof. We show that, for every > 0, there exists some z := z( ) > 0 such that max1≤k≤n ((Xz)k /zk ) ≤ ρ(X) + O(, 0). Together with Lemma A.46, this will prove the theorem. As discussed before, there is no loss in generality in assuming that X is of a normal form given by (A.29), with every diagonal block being an irreducible matrix. We use S(j) ⊂ {1, . . . , n}, 1 ≤ j ≤ s, with nj = |S(j)| to denote the ordered set of row/column indices corresponding to the j-th diagonal block X(j) in the normal form of X. Note that ∩sj=1 S(j) = ∅,
s ∪sj=1 S(j) = {1, . . . , n} and j=1 nj = n. nj ˜ (j) ∈ R++ be a positive right eigenvector of the j-th diagonal block Let p (j) X . Since all the diagonal blocks are irreducible matrices, Theorem A.32 ensures that, for each 1 ≤ j ≤ s, such a positive vector exists and we have ˜ (j) , ρ(X(j) )˜ p(j) = X(j) p
1 ≤ j ≤ s.
A.4 Perron–Frobenius Theory (j,0)
˜ (j,0) = (0, . . . , 0, p ˜ (j) , 0, . . . , 0) ∈ Rn+ , where p˜i Now, define p if i ∈ / S(j) and suppose that z( ) =
s
˜ (j,0) , −j p
367
= 0 if and only
>0
(A.36)
j=1
which, by construction, is positive for all > 0. If s = 1, the proof is finished since then z is a positive right eigenvector of X ∈ Xn , and thus (A.35) for s = z( ) , > 0, is obvious by Theorem A.35. Otherwise, for any k ∈ S(j), 1 ≤ j ≤ s, one obtains (with l = k − min(S(j)) + 1 where min(S(j)) is the smallest number in the index set) −j (j) (j) j−1 −i (j,i) (i) ˜ + i=1 X ˜ l p X p (Xz( ) )k = ( ) (j) zk −j p˜l −j ρ(X(j) )˜ pl
(j)
=
(j)
−j p˜l
+ O(, 0)
= ρ(X(j) ) + O(, 0) ≤ ρ(X) + O(, 0),
>0
where the last step follows from (A.31). Since this holds for each 1 ≤ k ≤ n, we have (Xz( ) )k lim max ≤ ρ(X) ( ) →0 1≤k≤n zk where z( ) defined by (A.36) is positive for all > 0. Combining this with the upper bound of Lemma A.46 proves (A.35). The next result proves a condition under which the lower bound in (A.34) holds with equality, thereby providing a “sup-min” characterization of the spectral radius for some class of nonnegative matrices. . Suppose that Theorem A.48. Consider a normal form (A.29) of X ∈ Rn×n + {X(i) }i∈I is the set of isolated diagonal blocks and {X(i) }i∈M is the set of maximal diagonal blocks. Then, one has ρ(X) = sup s∈Rn ++
min
1≤k≤n
(Xs)k sk
(A.37)
if and only if I ⊆ M. Proof. Without loss of generality, assume that the diagonal blocks X(j) , 1 ≤ ˜ (j) > 0 j ≤ s, in a normal form (A.29) are irreducible matrices. Let S(j), nj , p (j,0) (j) n ˜ ˜ , 0 . . . , 0) ∈ R+ , for 1 ≤ j ≤ s, be defined as in the and p = (0, . . . , 0, p proof of Theorem A.47. Define r(s) := min
1≤k≤n
(Xs)k . sk
368
A Some Concepts and Results from Matrix Analysis
First we prove that for (A.37) to be true, each isolated block must be maximal, that is, I ⊆ M must hold. To this end, assume that there exists j ∈ I such that j ∈ / M. The goal is then to show that the supremum in (A.37) is strictly smaller than ρ(X). Since the j-th diagonal block X(j) is isolated and irreducible, it follows from (A.29), (A.30) and the Collatz–Wielandt formula for irreducible matrices (A.26) that sup r(s) ≤ sup r(s) = ρ(X(j) ) < ρ(X)
s∈Rn ++
s∈Vj
where Vj = {s ∈ Rn+ : sk > 0 if and only if k ∈ S(j)}. The first inequality and the equality hold as both X(j) is an isolated diagonal block and min1≤k≤n ((Xs)k /sk ) ≤ mink∈S(j) ((Xs)k /sk ) for all s ∈ Rn++ , whereas the last strict inequality follows from the assumption that j ∈ / M. Thus, sups∈Rn++ r(s) is bounded above by the spectral radius of any isolated diagonal block. This proves the necessity of I ⊆ M. In order to prove the converse, we show that if I ⊆ M, then there exists s := s( ) > 0 such that r(s) := ρ(X) + O(, 0). This together with Lemma A.46 will prove the theorem. First assume each diagonal block is isolated
sthat (j,0) ˜ is a sought vector. Indeed, so that I = {1, . . . , s}. In this case, s = j=1 p by the construction and the fact that each diagonal block is irreducible, s is positive. Moreover, with this choice of s, we have r(s) = ρ(X), which yields (A.37). Now assume that I ⊂ {1, . . . , s} (a proper subset of {1, . . . , s}), in which case there is at least one non-isolated diagonal block. Let p ≥ 0 be a right eigenvector of X associated with ρ(X) so that p is not a multiple ˜ (j,0) for each j ∈ I and all c > 0. ˜ (j,0) for each j ∈ I, that is, p = c p of p Such an eigenvector exists, which immediately follows from a normal form and the existence of a non-isolated block. Moreover, we have pk > 0 for each n k ∈ S(j), j ∈ / I. To see this, let p(j) ∈ R+j be the j-th sub-vector of p that corresponds to the jth diagonal block. Then, we observe the following: (i) For any j ∈ / I, j ∈ M, the eigenvalue equation ρ(X)p = Xp with the normal form (A.29) implies that ρ(X)p(j) = X(j) p(j) + t for some t ≥ 0. Since ρ(X) = ρ(X(j) ) and X(j) is irreducible, Theorem A.52 shows that the equation can hold if and only if t = 0 and p(j) is a right eigenvector of X(j) , which is positive due to the irreducibility property. (ii) For any j ∈ / M, the eigenvalue equation as above yields ρ(X)p(j) = (j) (j) X p + t for some t ≥ 0, t = 0, where the nonnegativity is due to ρ(X) > ρ(X(j) ). So, since X(j) is irreducible, Theorem A.52 implies that p(j) is positive. Now let s := s( ) =
˜ (j,0) + p, p
>0
(A.38)
j∈I
which by the construction (and the above discussion) is positive. With this choice of s, for any k ∈ S(j) with j ∈ I, we have (with l = k − min(S(j)) + 1
A.4 Perron–Frobenius Theory
369
where min(S(j)) is the smallest number in the index set) (j) (j) ˜ + X(j) p(j) l X p (Xs)k = = ρ(X(j) ) + O(, 0) = ρ(X) + O(, 0) (j) (j) sk p˜ + p l
l
˜ (j) > 0, where the last step follows since X(j) is maximal with O(, 0) ≥ 0 and p due to j ∈ I ⊆ M. On the other hand, for any k ∈ S(j) with j ∈ / I, one has
(j,i) (i) (j,i) (i) ˜ + X(j) p(j) l ˜ l + (Xp)k p p (Xs)k i∈I X i∈I X = = sk pk pk (Xp)k ≥ = ρ(X) . pk So, as O(, 0) ≥ 0, combining both cases yields lim r(s( ) ) = lim min
→0
→0 1≤k≤n
(Xs( ) )k ( )
sk
≥ ρ(X)
with s( ) given by (A.38). As an immediate consequence of this and the lower bound of Lemma A.46, we obtain (A.37). This finishes the proof. Theorems A.47 and A.48 together imply that the inequalities on the righthand side and left-hand side of (A.34) hold with equalities if and only if I ⊆ M. This however does not mean that the supremum and the infimum are attained. And even if they are attained, the set of maximizers and the set of minimizers may be disjoint. The following result provides a necessary and sufficient condition for the existence of a positive vector that attains both the supremum and the infimum in (A.34). Theorem A.49. Let {X(i) }i∈I and {X(i) }i∈M be, respectively, the sets of isolated and maximal diagonal blocks in the normal form (A.29) of X ∈ Nn . Then, the following is true. (i) There exists ˜s ∈ Rn++ such that ˜s = arg sup s∈Rn ++
(Xs)k (Xs)k = arg infn max . s∈R++ 1≤k≤n 1≤k≤n sk sk min
(A.39)
if and only if I = M. (ii) Moreover, ˜s is given by (A.39) if and only if ˜s = p, that is, if and only if ˜s > 0 is a right eigenvector of X associated with ρ(X). Proof. By Theorem A.43, I = M implies the existence of a positive right eigenvector p > 0 of X associated with ρ(X). Since (Xp)k /pk = ρ(X), 1 ≤ k ≤ n, it follows from (A.34) that if I = M, then there is ˜s > 0 given by (A.39) and p = ˜s. This proves one direction of both parts. To prove the
370
A Some Concepts and Results from Matrix Analysis
converse of (i) and (ii), suppose that ˜s > 0 satisfying (A.39) exists. Then, by Theorem A.47, inf
max
s∈Rn ++ 1≤k≤n
(Xs)k (Xs)k (X˜s)k = min max = max = ρ(X) . s∈Rn 1≤k≤n 1≤k≤n sk s s˜k k ++
(A.40)
The Collatz-Wielandt formula (Theorem A.35) implies now that ˜s = p > 0, sk = ρ(X). This proves the other direction of part and thus min1≤k≤n (X˜s)k /˜ (ii) and, together with Theorem A.48, shows that I ⊆ M. However, as ˜s is a positive right eigenvector of X associated with ρ(X), we can deduce from Theorem A.43 that I = M. This proves the converse of part (i) and completes the proof. Finally, we point out that an alternative “max-min” characterization of the spectral radius is obtained for general nonnegative matrices when, instead of Rn++ , the supremum is taken over Rn+ and zero components are excluded in the denominator of the ratio (A.37) [5, Corollary 8.3.3]. Theorem A.50. For any X ∈ Nn , we have r(s) ρ(X) = max n s∈R+ s =0
r(s) := min
1≤k≤n sk =0
(Xs)k sk
where the maximum is attained for s = p = 0. Proof. Let X( ) = X + 11T , > 0, and let q := q( ) > 0 be a left (positive) eigenvector of the positive matrix X( ) associated with ρ(X( ) ). Suppose that s ∈ Rn+ , s = 0, is arbitrary. Then, by the definition of r(s), we have Xs − r(s)s ≥ 0, and hence, due to (A.22), one obtains 0 ≤ Xs−r(s)s < X( ) s−r(s)s for all > 0. As q is positive, this implies that 0 < qT (X( ) s − r(s)s) = (ρ(X( ) ) − r(s))qT s and 0 < qT s. So, we obtain ρ(X( ) ) > r(s) for all > 0. Therefore, since ρ(X( ) ) converges to ρ(X) as goes to zero, we have ρ(X) ≥ r(s) for all s ≥ 0, s = 0. Now choosing s = p = 0, where the existence of such p is guaranteed by Theorem A.39, shows that the upper bound is attained. This completes the proof. Now considering both Theorems A.47 and A.50 yields (Xs)k (Xs)k = ρ(X) = infn max s∈R++ 1≤k≤n s∈R+ 1≤k≤n sk sk min max n s =0
(A.41)
sk =0
for any X ∈ Nn . Note that a “min-max” characterization of the spectral radius with the minimum taken over Rn+ (as in the case of the “max-min” characterization on the left-hand side of (A.41)) does not exist for general nonnegative matrices [5, p. 504]. Indeed, in general, we have
A.4 Perron–Frobenius Theory
min max
s∈Rn + 1≤k≤n s =0 sk =0
(Xs)k = ρ(X) . sk
371
(A.42)
To see this, consider the matrix X = 11 02 , in which case ρ(X) = 2 while the min-max expression on the left-hand side of (A.42) is equal to 1. It is interesting to see that, in this example, the necessary and sufficient condition of Theorem A.48 for the “sup-min” characterization I ⊆ M is not satisfied. A.4.4 The Existence of a Positive Solution p to (αI − X)p = b Chapter 2 deals with a positive solution to a system of linear equations with nonnegative coefficients. This section provides two known results on the existence of such a solution [4], which in turn is equivalent to the existence of certain M -matrices. In addition, we show a simple approximation of the vector p for values of α being sufficiently close to ρ(X) and prove one auxiliary result that is used to prove Theorem 5.82 in Sect. 5.7.2. The reader should take care to note that in this section, p ∈ Rn+ and q ∈ Rn+ are not used to denote nonnegative eigenvectors of X associated with ρ(X). Instead, we use u and v, respectively. Theorem A.51. Let X ∈ Nn be arbitrary, and let α > 0 be any scalar. A necessary and sufficient condition for a solution p ≥ 0, p = 0, to (αI − X)p = b
(A.43)
to exist for any b > 0 is that α > r = ρ(X). In this case, there is only one solution p, which is strictly positive and given by p = (αI − X)−1 b. We emphasize that in the setup of the theorem, X = 0 is an arbitrary nonnegative matrix. Instead, the vector b is required to be positive. Proof. Assume that a solution p ≥ 0 to (A.43) exists. Since b > 0, it follows from α p = Xp + b that Xp < αp. Clearly, as Xp ≥ 0, this can hold only if α > 0 and p > 0. Now let v ≥ 0, v = 0, be a left eigenvector of X associated with r. By Theorem A.39, such an eigenvector exists. Hence, vT Xp = rvT p < αvT p, from which it follows that r < α since p > 0 and v ≥ 0, and therefore vT p > 0. Now assume that α > r = ρ(X). By Theorem A.16, the following Neumann
∞ series converges (αI − X)−1 = α−1 (I − α−1 X)−1 = α−1 l=0 (α−1 X)l . By nonnegativity of X, we have (αI−X)−1 ≥ 0. Furthermore, since (α−1 X)0 = I, each row of (αI − X)−1 has at least one positive entry. Hence, since b > 0, we must have (αI − X)−1 b > 0 for any b > 0. Putting p = (αI − X)−1 b > 0 yields the sought vector, which is unique. A combination of positivity of b and nonnegativity of X guarantees the existence of a positive solution p to (A.43), provided that ρ(X) < α. It is
372
A Some Concepts and Results from Matrix Analysis
clear that if b is an arbitrary nonnegative vector (not necessarily positive) and ρ(X) < α, then the solution p ≥ 0 with p = (αI − X)−1 b exists but does not need to be positive. If b is an arbitrary nonnegative vector, the following result shows that the positivity of p is recovered when X ∈ Nn is irreducible. Theorem A.52. Let α > 0 be any scalar, and let X ∈ Nn be irreducible. A necessary and sufficient condition for a solution p ≥ 0, p = 0 to (αI − X)p = b
(A.44)
to exist for any b ≥ 0, b = 0, is that α > r = ρ(X). In this case, there is only one solution p, which is strictly positive and given by p = (αI − X)−1 b. First we prove the following lemma. Lemma A.53. Let X ∈ Nn be irreducible, α > 0, and y ≥ 0, y = 0, a vector satisfying Xy ≤ α y . (A.45) Then y > 0 and α ≥ r = ρ(X) where r is the Perron root of X. Moreover, α = r if and only if Xy = α y. Proof. First suppose that y is not positive and yi = 0 for some fixed 1 ≤ i ≤ n.
n (k) k By (A.45), Xk y ≤ αk y, and therefore j=1 xi,j yj ≤ α yi . As X ≥ 0 is irreducible, we know from Lemma A.28 that, for each 1 ≤ j ≤ n, there is a (k) natural number k such that xi,j > 0. Thus, since yj > 0 for some j, there (k)
holds xi,j yj > 0 for some k ≥ 1 and j. This in turn implies yi > 0, thereby contradicting the assumption yi = 0. As a consequence, we must have y > 0. Now letting v be any positive left eigenvector of X yields α vT y ≥ vT Xy = rvT y. From this, we have α ≥ r since vT y > 0. Now suppose that Xy ≤ ry with strict inequality in at least one place. Then, vT Xy = rvT y < rvT y. So r < r, which does not make sense, and therefore we must have Xy = αy if α = r. Proceeding in a similar way shows that α = r implies Xy = α y. Now we are in a position to prove Theorem A.52. Proof. First assume that p ≥ 0, p = 0, exists. Since b ≥ 0, b = 0, and b + Xp = αp, we have Xp ≤ αp with at least one inequality. Of course, this can be satisfied only if α > 0. Moreover, by Lemma A.53, we have α > r = ρ(X) > 0. Conversely,
if∞ α > r = ρ(X), then the Neumann series converges (αI − X)−1 = α−1 l=0 (α−1 X)l . Furthermore, as X is irreducible, Lemma A.28 implies that (αI − X)−1 > 0. Therefore, by (A.22), (αI − X)−1 b > 0 for any b ≥ 0 with b = 0. Defining p = (I − X)−1 b > 0 completes the proof. Now we point out a connection to the existence of so-called M-matrices.
A.4 Perron–Frobenius Theory
373
Definition A.54. A real nonsingular matrix is said to be an M-matrix if all off-diagonal entries are nonpositive and its inverse is a nonnegative matrix. Theorem A.55. Let A ∈ Rn×n be any nonsingular matrix with nonpositive off-diagonal entries. The following statements are equivalent. (i) A is an M-matrix. (ii) There is a matrix X ≥ 0 and a real number α > ρ(X) such that A = αI − X. (iii) All principal minors of A are positive. (iv) Re(λ) > 0 for all λ ∈ σ(A). In the setup of Theorems A.51 and A.52, there exists a positive solution p to (αI − X)p = b if and only if ρ(X) < α. On the other hand, by (ii) in the above theorem, αI − X with X ≥ 0 is an M-matrix if and only if ρ(X) < α. So we can conclude that a positive solution p exists if and only if αI − X is an M-matrix. The following corollary is a consequence of the spectral decomposition (Theorem A.12) as well as of the fact that the Perron root of any irreducible matrix is simple and equal to the spectral radius. !s Corollary A.56. Let X ∈ Nn be irreducible and q(λ) = k=1 (λ − λk (X))mk its minimal polynomial with the eigenvalues ordered so that |λ1 (X)| ≥ |λ2 (X)| ≥ · · · ≥ |λs (X)|. Then, there exist matrices Zk,j ∈ Rn×n , 1 ≤ j ≤ mk , 2 ≤ k ≤ s, such that k (j − 1)! 1 uvT + Zk,j α − ρ(X) (α − λk (X))j j=1
s
(αI − X)−1 =
m
(A.46)
k=2
where v > 0 and u > 0 with vT u = 1 are the positive left and right eigenvectors of X associated with the Perron root ρ(X). Proof. By Theorem A.12 and parts (i) and (iv) of the Perron–Frobenius theorem (Theorem A.32), it follows that k (j − 1)! 1 Z1,1 + Zk,j α − ρ(X) (α − λk (X))j j=1
s
(αI − X)−1 =
m
(A.47)
k=2
where Z1,1 is idempotent (the idempotency property is explained on page 354). By (A.15) (with m1 = 1 due to simplicity of ρ(X) ∈ σ(X)), the range (column space) of Z1,1 is spanned by the right eigenvectors associated with ρ(X). Since X is irreducible, either of these eigenvectors is positive and unique up to positive scaling ((iii) of Theorem A.32) so that Z1,1 is a rank-1 matrix. Thus, as Z1,1 u = u and vT Z1,1 = vT holds (see the remarks after (A.15) on page 354), we must have Z1,1 = uvT . Moreover, since Z1,1 is idempotent, uvT uvT = uvT , and hence we have vT u = 1. This completes the proof.
374
A Some Concepts and Results from Matrix Analysis
Some properties of the principal component matrices Zk,j = 0 are discussed in Section A.3.1. The above corollary can be used to obtain a useful approximation of the positive solution to (A.44) when ρ(X) is sufficiently close to α. To see this, we first need to show that the second addend on the righthand side of (A.46) remains bounded when the spectral radius approaches α. !s Theorem A.57. Let q(λ) = k=1 (λ − λk (X))mk be the minimal polynomial with the eigenvalues ordered so that |λ1 (X)| ≥ |λ2 (X)| ≥ · · · ≥ of X ∈ Rn×n + |λs (X)|. For any given α > 0 and β ∈ (0, α), define U := U(α, β) = X ∈ Xn : ρ(X) < α, ∀1≤i,j≤n xi,j ≥ β or xi,j = 0 (A.48) where β is sufficiently small so that U = ∅. Then, there exists δ = δ(α, β) > 0 such that ∀2≤k≤s λk (X) ∈ / {z ∈ C : |z| ≤ α, |z − α| < δ} (A.49) for all X ∈ U. Remark A.58. Note that each positive entry of any matrix in the set U ⊂ Xn is bounded away from zero by some constant β > 0. In other words, no matrix in U has entries belonging to (0, β). As a result, no limit point of U is a reducible matrix. For the proof of the theorem, we need the following lemma. Lemma A.59. Suppose that U ⊂ Xn is defined by (A.48). Then, for any α > 0 and β > 0 for which U = ∅, there exists a constant c ∈ (0, ∞) such that X ≤ c for all X ∈ U. Proof. Let X ∈ U be arbitrary and let (k0 , l0 ), 1 ≤ k0 , l0 ≤ n, be such that X ≤ c := xk0 ,l0 , meaning that all entries of X are smaller than or equal to c = xk0 ,l0 > 0. Hence, as X ∈ U, αpk0 > ρ(X)pk0 =
n
xk0 ,l pl ≥ xk0 ,l0 pl0 = c pl0
(A.50)
l=1
where p > 0 is a right eigenvector of X. Now, as X is irreducible, p is unique up to positive multiples and there exists a sequence of pairs {(sj , sj+1 )}0≤j<m with s0 = l0 and sm = k0 for some m ∈ {1, . . . , n} such that xsj ,sj+1 ≥ β for each 0 ≤ j < m. Thus, we have αpsj > ρ(X)psj ≥ βpsj+1 , which, when combined with (A.50) and β/α < 1, implies αpk0 > cβ m /αm pk0 ≥ cβ n /αn . As pk0 > 0, one obtains c < αn+1 /β n . So, c > 0 is bounded and the bound depends only on α, β and n. This proves the lemma. Now we are in a position to prove Theorem A.57.
A.4 Perron–Frobenius Theory
375
Proof. Assume that the theorem is not true. Then, there exist n0 ∈ {2, . . . , s} and a sequence of irreducible matrices {X(m)}m∈N in U so that |α − λn0 (X(m))| ≤ 1/m,
m ∈ N.
(A.51)
Since, by Lemma A.59, all entries of X(m), m ∈ N, are bounded above, the sequence {X(m)}m∈N is bounded (in any matrix norm). As a result, we can invoke the Bolzano-Weierstrass theorem [191, p. 98] to conclude that there exists a subsequence {X(mj )}j∈N of {X(m)}m∈N such that {X(mj )}j∈N converges to some X∗ ≥ 0. It can be further deduced that X∗ is irreducible as no limit point of the set U is a reducible matrix (see also Remark A.58). This together with (A.51) implies that limj→∞ λn0 (X(mj )) = λn0 (X∗ ) = α. As |λn0 (X)| ≤ |λ1 (X)| for all X ∈ U, we have lim λn0 (X(mj )) = lim λ1 (X(mj )) = λn0 (X∗ ) = λ1 (X∗ ) = α .
j→∞
j→∞
This, however, contradicts the fact that λ1 (X∗ ) is a simple eigenvalue as X∗ is irreducible. This proves the theorem. In words, the above lemma states that, for all matrices in U given by (A.48), there is an open disk of radius δ > 0 in the complex plain at point α such that all the non-maximal eigenvalues of every matrix in U are outside of this disk. Now, let U be given by (A.48) and let ∂U = {X ∈ cl(U) : ρ(X) = α}. ˆ ⊂ U is any sequence of matrices in U that ends at a point Suppose that L(X) ˆ X ∈ ∂U. Then, by Corollary A.56 and Theorem A.57, we have (A.52) lim α − ρ(X) p(X) = uvT b, vT u = 1 ˆ X→X ˆ X∈L(X)
where p(X) is used to denote p(X) = (αI − X)−1 b. This implies that, for any b > 0 and an irreducible matrix X, the positive solution p to (αI − X)p = b > 0 with α > ρ(X) can be approximated as p≈
1 uvT b, α − ρ(X)
vT u = 1
(A.53)
provided that ρ(X) is sufficiently close to α. ¯ = {X ∈ Nn : (αI − X)−1 b ∈ P} is not an Theorem A.60. Suppose that U empty set for a given bounded set P ⊂ Rn++ , a positive vector b and some constant α > 0. Then, there exists a constant c = c(α, b, P) > 0 such that 0 ≤ ρ(X) ≤ α − c ¯ So, we have supX∈U¯ ρ(X) < α. for all X ∈ U.
(A.54)
376
A Some Concepts and Results from Matrix Analysis
¯ be arbitrary, and let u ≥ 0 be any right eigenvector of Proof. Let X ∈ U X associated with ρ(X) and normalized such that u1 = 1T u = 1. Note that by Theorem A.51, we must have 0 ≤ ρ(X) < α. If ρ(X) = 0, then the theorem holds trivially. So, assume that ρ(X) ∈ (0, α). As b is positive, it clear that there is a constant c1 > 0 such that u ≤ b/c1 . Moreover, from the boundedness of the set P, we can conclude that there is a constant c2 > 0 such that 1T (αI − X)−1 b ≤ c2 . Considering these two inequalities yields 0 ≤ 1T (αI − X)−1 u ≤
1 T c2 1 (αI − X)−1 b ≤ . c1 c1
On the other hand, by the Neumann series (Theorem A.16), one obtains 1T (αI − X)−1 u =
∞ 1 l 1 −1 1 T 1 1 I− X X u u = 1T α α α α l=0
∞ l 1 1 1 1 1 = ρ(X) 1T u = = . α α α 1 − ρ(X)/α α − ρ(X) l=0
Combining this identity with the previous inequality yields 1/(α − ρ(X)) ≤ c2 /c1 > 0 or, equivalently, ρ(X) ≤ α − c1 /c2 . Since X has been chosen to be ¯ this proves the theorem with c = c1 /c2 ∈ (0, α). any matrix in U,
B Some Concepts and Results from Convex Analysis
In this chapter, we collect definitions, notational conventions and several results from convex analysis that may be helpful in better understanding the material covered in this manuscript. Proofs are provided only for selected results concerning the notion of log-convexity and the convergence of gradient projection algorithms. For other proofs, the reader is referred to any standard analysis book (e.g., [39]) and [160, 162, 16].
B.1 Sets and Functions Now suppose Rn is a metric space. Let · : Rn → R be a norm defined on Rn and d(p, q) = p − q the distance from p ∈ Rn to q ∈ Rn (App. A.1). Let A be some subset of Rn , which is also a metric space equipped with the same metric. In the definition below, elements of Rn are referred to as points. These points are vectors if Rn is a vector space. The definition below can be generalized to any metric space. Definition B.1. Given the space Rn , we have the following definitions. (a) An open ball Br (p) of radius r > 0 centered at point p is Br (p) := {q ∈ Rn : d(p, q) < r} . Br (p) is an open interval if n = 1 and an open disk if n = 2. (b) A point p is a limit point, or a contact point, of the set A if, for all r > 0, Br (p) contains a point q = p with q ∈ A. (c) A is closed if every limit point x of A satisfies x ∈ A. (d) A point p is an interior point of A if there is r > 0 such that Br (p) ⊂ A. Set int(A) is the set of all interior points of the set A and is referred to as the interior of A. (e) A is open if every point of A is an interior point of A, i.e. A = int(A). (f ) If ∂A is the set of all limit points of A, then cl(A) = A ∪ ∂A is said to be the closure of A. A is said to be closed if and only if cl(A) = A. S. Stanczak et al., Fundamentals of Resource Allocation in Wireless Networks, Foundations in Signal Processing, Communications and Networking 3, 2nd edn., c Springer-Verlag Berlin Heidelberg 2009 DOI 10.1007/978-3-540-79386-1 8,
378
B Some Concepts and Results from Convex Analysis
(g) The complement of A, denoted by Ac , in Rn is the set of all points p ∈ Rn such that p ∈ / A. (h) A is bounded if there is a real number M and a point q ∈ Rn such that d(p, q) < M for all p ∈ A. Otherwise, it is said to be unbounded. (i) The sets A and B are said to be separated if both cl(A) ∩ B = ∅ and A ∩ cl(B) = ∅. (j) A is said to be connected if we can not write A = B∪C, with B, C separated. Throughout the book, for any a < b with a, b ∈ R, [a, b] is called a closed interval on the real line, [a, b) and (a, b] are half-open intervals, and (a, b) is an open interval. Special intervals are (a, +∞), [a, +∞), (−∞, b), (−∞, b] and R = (−∞, ∞) that are either half-open or open intervals. In contrast, cl(R) := [−∞, +∞] is used to denote a closed interval that includes the values −∞ and +∞. Also, note that, for any −∞ < a and b < +∞, the sets (intervals) (a, +∞) ⊂ [a, +∞) and (−∞, b) ⊂ (−∞, b] are said to be bounded below and above, respectively. Now let us introduce the notion of compactness. Definition B.2 (Open Cover and Compact Set). An open cover of A in Rn is a collection {Gα } of open subsets Gα ⊂ Rn such that A ⊂ ∪α Gα . A subset A of Rn is said to be compact if every open cover of A contains a finite subcover. In other words, if {Gα } is an open cover of A, then there are finitely many indices αi , i ∈ N, i ≤ m, such that A ⊂ ∪i≤m Gαi . The Heine–Borel theorem stated below is implicitly invoked in the book when the existence of maxima or/and minima should be guaranteed (see Theorem B.11). Theorem B.3 (Heine–Borel). For a subset A of the Euclidean space Rn , the following is equivalent. (i) A is closed and bounded. (ii) A is compact. The Euclidean space is defined in App. A. It should be emphasized that (i) and (ii) cease to be equivalent in general metric spaces. In the book, we sometimes use the following definition. Definition B.4 (Maximal and maximum points). Given some set A ⊂ Rn , we say that x ∈ A is a maximal (respectively, minimal) point of A, with respect to the partial ordering (A.1), if y ∈ A and x ≤ y (respectively, y ≤ x) imply x = y. A point x ∈ A is said to be the maximum (respectively, minimum) point of A if y ≤ x (respectively, x ≤ y) for every y ∈ A. Note that if a set has a maximum (minimum) point, then it is unique. In contrast, a set may have more than one maximal (minimal) point. For a more detailed discussion on minimal/maximal and minimum/maximum points, the reader is referred to [16, pp. 45–59]. Here we only state the following result, which is utilized in the book.
B.1 Sets and Functions
379
Theorem B.5. x∗ ∈ A is the maximum (respectively, minimum) point of A if and only if x∗ is the unique maximizer (respectively, minimizer) of x → wT x over A for every fixed w > 0. If x∗ ∈ A maximizes (respectively, minimizes) x → wT x over A for some w > 0, then x is a maximal (respectively, minimal) point of A. Conversely, if x∗ ∈ A is a maximal (respectively, minimal) point of A and A is convex, then there is w ≥ 0 for which x∗ maximizes (respectively, minimizes) x → wT x over A. Note that for any function (map) from the set A into the set B, we write f : A → B or A → B : x → f (x). For brevity, we sometimes skip the specification of A and/or B and simply write x → f (x), f , f (x) or f (x), x ∈ A. The set A is called the domain of f , and the elements f (x) of B are called values of f . The set of all values of f is called the range of f , denoted by f (A) ⊆ B. If f (A) = B, we say that f maps A onto B. If f (x1 ) = f (x2 ) whenever x1 = x2 , for x1 , x2 ∈ A, then f is said to be a one-to-one mapping (function) from A into B. At some points in the book, we also use dom(f ) to denote the domain of a function f . Thus, if f : A → B, then dom(f ) = A. Definition B.6 (Bijective Function). We say that f is bijective if f : A → B is a one-to-one map from A onto B (one-to-one and onto). In such a case, we also say that there is a one-to-one correspondence between A and B. We use the notion of bijectivity, specialized to real-valued functions, explicitly or implicitly at many points in the book. Theorem B.7. Let A, B ⊆ R. A function f : A → B is bijective if and only if there is a function g : B → A such that g(f (x)) = x, x ∈ A, and f (g(x)) = x, x ∈ B, where g is referred to as the inverse function to/ of f . Another feature of functions frequently encountered in the book is the ray property. Definition B.8 (Ray Property). A function f : A → R is said to have a ray property if f (cx) = f (x) for any x ∈ A and any c ∈ R such that cx ∈ A. In what follows, the domain of f is a subset of a metric space Rn and, except for Sect. B.4, is denoted by D. For brevity, the range of f is the set of all reals R so that f : D → R and D ⊆ Rn . Definition B.9 (Function Limit). Suppose that f : D → R, and p is a limit point of D. We write f (x) → q as x → p, or, equivalently, limx→p f (x) = q if there is a point q ∈ R with the following property: For every > 0, there exists δ > 0 such that |f (x) − q| < for all x ∈ D with d(x, p) < δ. Note that D. So f does limit/contact symbols that
in the definition above, p does not need to be a member of not need to be defined at p, but p needs to be necessarily a point of D (Definition B.1). In the book, we use two Landau describe the limiting behavior of a function. Formally, for a
380
B Some Concepts and Results from Convex Analysis
given limit point x0 of D and a function f : D → R, the symbols represent the following sets: O(f, x0 ) := {g : D → R : ∃C>0 ∃δ>0 ∀x∈Bδ (x0 )∩D |g(x)| ≤ C|f (x)|} o(f, x0 ) := {g : D → R : ∀ >0 ∃δ>0 ∀x∈Bδ (x0 )∩D |g(x)| ≤ |f (x)|} .
(B.1)
The widely used but sloppy notations g = O(f, x0 ) and g = o(f, x0 ) mean that g ∈ O(f, x0 ) and g ∈ o(f, x0 ), respectively. Moreover, g(x) = h(x) + O(f, x0 ) and g(x) = h(x) + o(f, x0 ) express the same as g(x) − h(x) ∈ O(f, x0 ) and g(x) − h(x) ∈ o(f, x0 ), respectively. If n = 1 and D is unbounded (with ∞ as a limit point), then O(f ) := O(f, ∞) and o(f ) := o(f, ∞) can be extended as follows O(f ) := {g : D → R : ∃C>0 ∃δ>0 ∀x∈[δ,∞)∩D |g(x)| ≤ C|f (x)|} o(f ) := {g : D → R : ∀ >0 ∃δ>0 ∀x∈[δ,∞)∩D |g(x)| ≤ |f (x)|} .
(B.2)
For completeness, the following definition introduces the notion of continuity. Definition B.10 (Continuous Function). Suppose that f : D → R, and p ∈ D. Then, f is said to be continuous at p if for every > 0, there exists δ > 0 such that |f (x) − f (p)| < for all points x ∈ D for which d(x, p) < δ. If f is continuous for all points in D, then f is said to be continuous on D, or simply continuous. The supremum and infimum of a nonempty set A ⊂ R denoted by inf A and sup A, respectively, are the greatest lower bound (the infimum) and the least upper bound (the supremum) of A. Since A is a subset of R (totally ordered set), such infimum and supremum exist and are unique. Moreover, we have sup R = +∞ and inf R = −∞. These definitions are used in the next theorem. Theorem B.11. Suppose that f : D → R is continuous and D is compact. Let the least upper bound and the greatest lower bound of all f (x) with x ranging over D be written as M = sup f (x) := sup{f (x) : x ∈ D)} x∈D
m = inf f (x) := inf{f (x) : x ∈ D}
and (B.3)
x∈D
respectively. Then, there exist points p, q ∈ D such that f (p) = M and f (q) = m. In words, the theorem asserts that a continuous function f defined on a compact set D ⊂ Rn attains its minimum and maximum on this set. As D is a subset of the Euclidean space Rn , Theorem B.3 implies that the set D in Theorem B.11 is closed and bounded.
B.1 Sets and Functions
381
Remark B.12. If f : D → R attains its minimum and/or maximum over D, then we write inf x∈D f (x) = minx∈D f (x) and/or supx∈D f (x) = maxx∈D f (x) as well as q = arg minx∈D f (x) and/or p = arg maxx∈D f (x), where p, q are any member of D such that f (q) = minx∈D f (x) and f (p) = maxx∈D f (x). As there may be no minimum and maximum, we define q = arg inf f (x)
p = arg sup f (x)
x∈D
x∈D
to belong to the closure cl(D) of D ⊆ Rn so that inf x∈D f (x) = limn→∞ f (q(n)) and supx∈D f (x) = limn→∞ f (p(n)) for any sequences {q(n)}n∈N and {p(n)}n∈N with q(n) ∈ D, p(n) ∈ D, for all n ∈ N and limn→∞ q(n) = q and limn→∞ p(n) = p. Later in Sect. B.4.1, we use the notion of limit superior of sequence of real-valued functions. Given a sequence of functions {fn }n∈N , fn : D → R, the limit superior of the sequence at x ∈ D is defined to be lim supfn (x) = (lim supfn )(x) = lim sup fm (x), n→∞
n→∞ m≥n
n→∞
x∈D
(B.4)
where supm≥n fm (x) = sup{fm (x) : m ≥ n}. The limit inferior is defined analogously. Now let fix x ∈ D, where D is some open nonempty subset of Rn , and let u ∈ Rn be a vector of unit norm (u = 1). Define Du f (x) = lim
t→0
f (x + tu) − f (x) t
(B.5)
provided that the limit exists. We call Du f (x) the directional derivative of f at x, in the direction of the unit vector u. Definition B.13 (Partial Derivatives and Gradient). Let D be an open set, let f : D → R be given, and let ei be the ith unit vector (all components are zero except for the ith component which is 1). For any x ∈ D, we define (∇i f (x))i = (∇f (x))i =
f (x + hei ) − f (x) ∂f (x) = lim h→0 ∂xi h
(B.6)
provided that the limit exists.1 We call (∇f (x))i the ith partial derivative of f at point x ∈ D. Assuming that the partial derivative exists for each 1 ≤ i ≤ n, the gradient of f at x is defined as ∇f (x) = 1
∂f ∂f (x), . . . , (x) . ∂x1 ∂xn
(B.7)
Note that if there is no risk of ambiguity and confusion, we use ∇i f (x) = (∇f (x))i to denote the ith partial derivative of f . The only risk is that this can be confused with ∇x f (x, y). See Remark B.18.
382
B Some Concepts and Results from Convex Analysis
Definition B.14 (Gateaux Differentiability). We say that f is (Gateaux) differentiable at x ∈ D if the gradient exists and satisfies ∇f (x)T u = Du f (x). The function f is called differentiable on D if it is differentiable at every x ∈ D. Remark B.15. If f is differentiable on an open set D and the gradient ∇f (x) is a continuous function on D, then f is said to be continuously differentiable. Continuously differentiable functions are Frechet differentiable, which implies Gateaux differentiability [39]. Throughout the book, all functions are assumed to be continuously differentiable on some open set. In this case, Gateaux differentiability is equivalent to Frechet differentiability, and therefore we make no distinction between these two concepts [39, pp. 211-220]. Now assume that each of the partial derivatives of a function f : D → R is a continuously differentiable function at x ∈ D. We use the notation (∇2 f (x))ij =
∂2f ∂ ∂f (x) = (x) ∂xi ∂xj ∂xi ∂xj
to indicate the partial derivative of
∂f ∂xj
with respect to xi at a point x ∈ D.
Definition B.16 (Hessian Matrix). The matrix ∇2 f (x) = ((∇2 f (x))ij ) ∈ Rn×n is called the Hessian matrix (or simply the Hessian) of f : D → R at x ∈ D ⊆ Rn . Function f is said to be twice (Gateaux) differentiable if ∇2 f (x) exists for all x ∈ int(D). Remark B.17 (Standard Notation for D ⊆ R). When D ⊆ R, the first derivative (if it exists) of f : D → R at x ∈ D is denoted by f (x) = df /dx(x). If f is itself differentiable, we denote the derivative of f at x by f (x) = d2 f /dx2 (x) and call f the second derivative of f . When f (x) exists for all x ∈ D, f is said to be twice differentiable. If, in addition, the second derivative is a continuous function, we say that f is twice continuously differentiable. Note that for f (x) to exist at x, f (y) must exist in some open ball centered at x (we consider functions over an open interval) and f (x) must be differentiable at x. Remark B.18. Given a function Rn × Rm → R : (x, y) → f (x, y) for some n, m ≥ 1, ∇x f (x, y) ∈ Rn and ∇y f (x, y) ∈ Rm denote the gradient of f at (x, y) with respect to x and y, respectively. The operator ∇f (x, y) is the usual gradient of f at z = (x, y) with respect to z ∈ Rn+m (as defined by (B.7)). Similarly, if f is a twice Frechet-differentiable function, ∇2 f (x, y) denotes its (standard) Hessian at z = (x, y) with respect to z. In contrast, ∇2x f (x, y) and ∇2x,y f (x, y) are the Hessians of f at (x, y) defined as 2 ∇x f (x, y) kl = 2 ∇x,y f (x, y) kl =
∂2 ∂xk ∂xl f (x, y),
1 ≤ k, l ≤ n
∂2 ∂xk ∂yl f (x, y),
1 ≤ k ≤ n, 1 ≤ l ≤ m
B.2 Convex Sets and Functions
383
respectively. ∇2y f (x, y) is defined in an analogous way. Moreover, the operator ∇2(x,y),y f (x, y) is equal to ∇2z,y f (x, y) with z = (xT yT )T and similarly for ∇2(x,y),x f (x, y). One of fundamental results of the analysis is the mean value theorem. Theorem B.19 (Mean Value Theorem). Suppose that f : D → R is continuously differentiable on an open interval D ⊆ R. Then, for every a, b ∈ D with a < b, there exists some ξ ∈ [a, b] such that f (b) − f (a) = f (ξ)(b − a) .
(B.8)
There is also a corresponding mean value theorem for Gateaux differentiable functions f : D → R with D ⊆ Rn . The latter is, however, not generalizable to maps f : D → Rm , m ≥ 2 [148]. Finally, we define monotonic functions. Definition B.20 (Monotonic Functions). Let D ⊆ R be an open interval (segment). Then, f is said to be monotonically increasing (respectively decreasing) on D if x < y for any x, y ∈ D implies f (x) ≤ f (y) (respectively f (x) ≥ f (y)). If the last inequality is strict, then we say that f is strictly increasing (decreasing). If f : D → R is differentiable and D ⊆ R, then f is monotonically increasing (decreasing) on D if and only if its first derivative is nonnegative (nonpositive) on this set. If the first derivative is positive (negative), then f is strictly increasing (decreasing). The converse however does not hold. At this point, it has to be noted that some authors identify the notion of increasingness (decreasingness) defined above with non-decreasingness (nonincreasingness), and analogously, the above strict increasingness (strict decreasingness) with increasingness (decreasingness). In this book, we use “nonincreasingness” and “non-decreasingness” with respect to sequences.
B.2 Convex Sets and Functions Definition B.21 (Convex Set). We say that D ⊆ Rn is a convex set if (1 − μ)ˆ x + μˇ x∈D
(B.9)
ˆ, x ˇ ∈ D. for all μ ∈ (0, 1) and x Throughout this section, unless otherwise stated, it is assumed that D ⊆ Rn is a nonempty convex set. Moreover, we denote by x(μ) the convex combination ˇ ∈ Rn , that is x(μ) = (1 − μ)ˆ ˆ ∈ Rn and x x + μˇ x, μ ∈ (0, 1). of some x
384
B Some Concepts and Results from Convex Analysis
Definition B.22 (Convex Function). A function f : D → R is said to be convex (on/over D) if f (x(μ)) ≤ (1 − μ)f (ˆ x) + μf (ˇ x)
(B.10)
ˆ, x ˇ ∈ D. The function f is said to be strictly convex if for all μ ∈ (0, 1) and x there is a strict inequality in (B.10) for all μ ∈ (0, 1). We say that a function f is concave if −f is convex. Now we provide necessary and sufficient conditions for differentiable and twice continuously differentiable functions to be convex. Theorem B.23 (First-order Condition). Let f : D → R be differentiable (over D ⊆ Rn ). Then, f is convex if and only if ∀x,z∈D
f (z) ≥ f (x) + (z − x)T ∇f (x).
(B.11)
The function f is strictly convex whenever there is strict inequality in (B.11) for all z = x. Theorem B.24 (Second-order Condition). Let D have a nonempty interior. Suppose that f : D → R is a twice continuously differentiable function. Then, f is convex if and only if ∇2 f (x) 0, x ∈ D, that is, if and only if the Hessian matrix is positive semidefinite for all x ∈ D. Note that if ∇2 f (x) is positive definite for all x ∈ D (∇2 f (x) 0), then f is strictly convex. The converse, however, does not need to hold. On the other hand, the Hessian is positive definite if f is strongly convex according to the definition below. B.2.1 Strong Convexity Strongly convex functions are defined as follows. Definition B.25. A function f : D → R is said to be strongly convex (with modulus of strong convexity c) if there exists c > 0 such that 1 ˇ 22 x−x f (x(μ)) ≤ (1 − μ)f (ˆ x) + μf (ˇ x) − cμ(1 − μ)ˆ 2
(B.12)
ˆ, x ˇ ∈ D ⊆ Rn and μ ∈ (0, 1). for all x The following simple observation establishes a one-to-one relationship between the notions of strong convexity and convexity [11]. Observation B.26. A function f : D → R is strongly convex with modulus of strong convexity c if and only if g(x) = f (x) − 1/2cx22 is convex.
B.2 Convex Sets and Functions
385
ˆ, x ˇ ∈ D be arbitrary. Suppose that f is strongly convex. Then, Proof. Let x by the definition, we have 1 ˇ 22 x−x f ((1 − μ)ˆ x + μˇ x) ≤ (1 − μ)f (ˆ x) + μf (ˇ x) − cμ(1 − μ)ˆ 2 1 ˇ + ˇ x22 − 2ˆ = (1 − μ)f (ˆ x) + μf (ˇ x) − cμ(1 − μ) ˆ x, x x22 2 1 1 x22 = (1 − μ)f (ˆ x) + μf (ˇ x) + c(1 − μ)2 ˆ x22 − c(1 − μ)ˆ 2 2 1 1 ˇ x22 + cμ(1 − μ)ˆ + cμ2 ˇ x22 − cμˇ x, x 2 2 1 x + μˇ x22 = (1 − μ)f (ˆ x) + μf (ˇ x) + c(1 − μ)ˆ 2 1 1 x22 − cμˇ x22 − c(1 − μ)ˆ 2 2 for all μ ∈ (0, 1). Hence, f ((1 − μ)ˆ x + μˇ x) − 12 c(1 − μ)ˆ x + μˇ x22 ≤ (1 − 1 1 2 2 x2 − 2 cμˇ x2 which is just convexity of g(x) = μ)f (ˆ x) + μf (ˇ x) − 2 c(1 − μ)ˆ f (x) − 1/2cx22 . Assuming convexity of g and proceeding in reverse order proves the converse. It is readily seen that any strongly convex function is strictly convex. However, as already mentioned above, the converse does not hold in general. A standard example of a strictly convex function that is not strongly convex is R → R+ : x → x4 . Another example is the function R++ → R++ : x → 1/x, which is strictly convex on R++ but not strongly convex. Yet, the latter function is strongly convex on any closed bounded interval on R++ . When f : D → R is continuously differentiable, then the following can be shown to hold [162]. Theorem B.27. Suppose that f : D → R is continuously differentiable. Then, f is strongly convex (with modulus of strong convexity c) if and only if there exists a constant c > 0 such that T (B.13) ∇f (x) − ∇f (y) (x − y) ≥ cx − y22 for all x, y ∈ D. When f : D → R is twice continuously differentiable, then we have the following result [162]. Theorem B.28. Let D ⊆ Rn have nonempty interior. Then, f : D → R is strongly convex (with modulus of strong convexity c) if and only if ∇2 f (x) cI,
x ∈ D.
(B.14)
386
B Some Concepts and Results from Convex Analysis
B.2.2 Majorization and Schur-Convexity The goal of this section is to introduced two basic definitions from majorization theory used in this book. For more information about majorization and Schur-convexity, we refer the reader to [132], which is a comprehensive reference on majorization and its applications. Definition B.29 (Majorization). We say that x ∈ Rn is majorized by y ∈ Rn (or y majorizes x) if k j=1
x[j] ≤
k
y[j] , 1 ≤ k < n
j=1
and
n j=1
xj =
n
yj
(B.15)
j=1
where x[1] ≥ x[2] ≥ · · · ≥ x[n] denote the entries of x in non-increasing order. If x is majorized by y, we write x ≺ y.2 Note that two vectors are comparable by majorization only if their entries sum up to the same value, which is implied by the definition (so, “≺” is a partial order on Rn ). For general vectors in Rn , there is a notion of weak majorization [132]. Definition B.30 (Schur-Convexity). A real-valued function f : D → R is said to be Schur-convex on D ⊆ Rn if D is a permutation symmetric set and x ≺ y on D implies f (x) ≤ f (y). If f (x) < f (y) whenever x ≺ y and x is not a permutation of y, then we say that f is strictly Schur-convex. Moreover, f is said to be Schur-concave if and only if −f is Schur-convex. In simple words, Schur-convex functions are real-valued functions that preserve the majorization order. A trivial but important observation is that vector 1 is majorized by all vectors in Rn+ whose entries sum up to n. Hence, for all x ∈ Rn+ such that x1 = n, we have f (1) ≤ f (x) if f : Rn+ → R+ is Schur-convex and f (x) ≤ f (1) if f is Schur-concave.
B.3 Log-Convex Functions For the results in the current section, we retain the assumption of D as a ˆ, x ˇ ∈ D as convex set in Rn and the notation of convex combination of x x(μ) = (1 − μ)ˆ x + μˇ x, μ ∈ (0, 1). Definition B.31 (Log-convex function). A function f : D → R++ is said to be log-convex if log f is convex, i.e., if we have log f x(μ) ≤ (1 − μ) log f (ˆ x) + μ log f (ˇ x) (B.16) 2
Notice that if used in connection with matrices, the signs “≺” and “” have other meanings. See Definition A.21.
B.3 Log-Convex Functions
387
ˆ, x ˇ ∈ D. If there is strict inequality in (B.16) for all for all μ ∈ (0, 1) and x μ ∈ (0, 1), we say that f is strictly log-convex. Moreover, we say that f is log-concave if log f is concave. Note that f is log-concave if and only if 1/f is log-convex. The list below presents some examples of log-convex and log-concave functions [16]. (i) (ii) (iii) (iv)
f (x) = ecx , x ∈ R, is both log-convex and log-concave for any real c. f (x) = xc on R++ is log-convex for c ≤ 0 and log-concave for c ≥ 0. f (x) = ex /(1 − ex ), x < 0, is log-convex. /∞ The Gamma function f (x) = 0 ux−1 e−u du is log-convex for x ≥ 1.
In all that follows, we exclusively focus on log-convex functions. For more information about log-concavity, the reader is referred to [16]. Remark B.32. As the logarithm is not defined for nonpositive arguments, any log-convex function f is by definition positive. However, it is often convenient to allow f to take on the value zero, in which case one takes log(0) = −∞ [16]. A nonnegative function f is said to be log-convex if such extended-valued function log f is convex. Observation B.33. Let D ⊆ Rn be a convex nonempty set. A positive function f : D → R++ is log-convex on D if and only if x)μ f (x(μ)) ≤ f (ˆ x)1−μ f (ˇ
(B.17)
ˆ, x ˇ ∈ D. for all μ ∈ (0, 1) and x ˆ, x ˇ ∈ D and μ ∈ (0, 1) be arbitrary and note that due to convexity Proof. Let x x)μ ) of D, x(μ) ∈ D. Writing the right-hand side of (B.16) as log(f (ˆ x)1−μ f (ˇ and considering the monotonicity of the logarithm yields (B.17). Conversely, taking the logarithm on both sides of (B.17) and rearranging gives (B.16). The next result relates log-convexity to convexity. It is an application of the (generalized) geometric-arithmetic-mean inequality: For any positive constants a, b, one has [193] a1−μ bμ ≤ (1 − μ)a + μb,
μ ∈ (0, 1) .
(B.18)
In words, this inequality says that the arithmetic mean bounds (from above) the geometric mean. The reader should however be careful since one usually refers ! to the following inequality as the geometric-arithmetic-mean in n n equality: ( i=1 xi )1/n ≤ 1/n i=1 xi , n ≥ 1, for any positive real numbers x1 , x2 , . . . , xn . So, (B.18) generalizes this inequality for two constants (n = 2). Theorem B.34. Let f : D → R++ be any log-convex function. Then, (i) f is convex, (ii) f is strictly convex on D ⊆ R if f is strictly monotonic.
388
B Some Concepts and Results from Convex Analysis
ˆ, x ˇ ∈ D be arbitrary, and let f be log-convex. Considering (B.18), Proof. Let x it follows from (B.17) that f (x(μ)) ≤ f (ˆ x)1−μ f (ˇ x)μ ≤ (1 − μ)f (ˆ x) + μf (ˇ x) for μ ∈ (0, 1). Hence, f is convex. To prove (ii), suppose that the strict convexity assertion is false. Then, there exist x ˆ, x ˇ ∈ D ⊆ R with x ˆ = x ˇ and μ ∈ (0, 1) such that f (x(μ)) = f (ˆ x)1−μ f (ˇ x)μ = (1 − μ)f (ˆ x) + μf (ˇ x). Since equality holds in (B.18) if and only if a = b, this implies that f (ˆ x) = f (ˇ x) = c for some positive c. Hence, by strict monotonicity, x ˆ=x ˇ, which contradicts x ˆ = x ˇ. It is important to note that the converse to Theorem B.34 does not hold since log-convexity is stronger than convexity. For instance, R → R++ : x → ex − 1 is convex but not log-convex. In the case of twice continuously differentiable functions, we have the following result. Theorem B.35. Let D ⊆ Rn be an open convex set and suppose that f : D → R++ is twice continuously differentiable. Then, f is log-convex on D if and only if (B.19) ∇f (x)∇f (x)T " f (x)∇2 f (x) for all x ∈ D. Proof. Let g(x) = log f (x), x ∈ D. Since f is positive, g : D → R is well defined and twice continuously differentiable on D. Thus, the theorem immediately follows by utilizing Theorem B.24. Finally, notice that log-convexity is preserved under multiplication und addition [16, pp. 105–106]. B.3.1 Inverse Functions of Monotonic Log-Convex Functions In what follows, we assume that f : D → R++ is a continuous bijection where D ⊆ R is any open interval on the real line and g : R++ → D is the inverse function of f according to Theorem B.7. Thus, f is a strictly monotonic (either increasing or decreasing) function. Moreover, f is strictly increasing (decreasing) if and only if g is strictly increasing (decreasing). Theorem B.36. Define ge : R → D such that ge (x) = g(ex ), x ∈ R. Then, f is log-convex if and only if (i) ge is convex when f is strictly decreasing, (ii) ge is concave when f is strictly increasing. Moreover, f is strictly log-convex if and only if ge satisfying (i) or (ii) is strictly convex or strictly concave, respectively.
B.4 Basics of Optimization Theory
389
Proof. Let x ˆ, x ˇ ∈ D be arbitrary, and let f be log-convex. Then, by Obserx)μ for all μ ∈ (0, 1). vation B.33, we have f ((1 − μ)ˆ x + μˇ x) ≤ f (ˆ x)1−μ f (ˇ Combining this with the property of the inverse map (Theorem B.7) yields f strictly increasing x)μ (1 − μ)ˆ x + μˇ x ≤ g f (ˆ x)1−μ f (ˇ 1−μ μ f strictly decreasing . (1 − μ)ˆ x + μˇ x ≥ g f (ˆ x) f (ˇ x) ˇ = g(ezˇ) and Define zˆ = log f (ˆ x) ∈ R and zˇ = log f (ˇ x) ∈ R. Then, x ˆ = g(ezˆ), x zˆ zˇ x) = e from which it follows that, for all zˆ, zˇ ∈ R and μ ∈ (0, 1), f (ˆ x) = e , f (ˇ one has (1 − μ)g(ezˆ) + μg(ezˇ) ≤ g e(1−μ)ˆz+μˇz f strictly increasing (1 − μ)g(ezˆ) + μg(ezˇ) ≥ g e(1−μ)ˆz+μˇz f strictly decreasing . By Definition B.22 and the definition of ge , this proves one direction of the theorem. Reversing the order of the reasoning proves the converse. The proof in the case of strict convexity is identical except that strict inequalities are used and strict monotonicity is utilized. In the case of twice continuously differentiable functions, we have the following relationship between f and g. Theorem B.37. Suppose that f and g are twice continuously differentiable. Then, f is log-convex if and only if 0 ≤ g (x) + xg (x) 0 ≥ g (x) + xg (x)
f strictly decreasing f strictly increasing
(B.20)
for all x > 0. Proof. By Theorem B.36, f is log-convex on D if and only if ge (x) = g(ex ) is either convex or concave depending on whether f is strictly decreasing or strictly increasing. Taking the second derivative of ge (x) yields ge (x) = ex (g (ex )ex + g (ex )) for all x ∈ R. Thus, by Theorems B.24 and B.36, f is log-convex if and only if 0 ≤ g (ex ) + ex g (ex ) f strictly decreasing 0 ≥ g (ex ) + ex g (ex ) f strictly increasing for all x ∈ R. Since R → R++ : x → ex is bijective and ex > 0 for all x ∈ R, this is equivalent to (B.20).
B.4 Basics of Optimization Theory As a basis for our presentation in this section, we consider the following general minimization problem
390
B Some Concepts and Results from Convex Analysis
inf f (x)
x∈Rn
subject to
hk (x) ≤ 0 k ∈ L := {1, . . . , L} gk (x) = 0 k ∈ J := {1, . . . , J}
(B.21)
where f : Rn → R, hk : Rn → R and gk : Rn → R are given continuous functions. In the context of Lagrangian optimization theory, (B.21) is referred to as the primal problem. Definition B.38 (Convex Problem). We refer to (B.21) as a convex (optimization) problem if the functions f and hk , k ∈ L, are convex and gk , k ∈ J , are linear. In the case of a maximization problem with supx∈Rn f (x) instead of inf x∈Rn f (x), the problem is said to be convex if f is concave. The set of all feasible points (vectors) for the problem (B.21) is D = x ∈ Rn : ∀k∈L hk (x) ≤ 0, ∀k∈J gk (x) = 0 .
(B.22)
Note that convexity of f and hk , k ∈ L, as well as linearity of gk imply that D is a convex set. The converse is however not true in general. In what follows, we use L (x) = {k ∈ L : hk (x) = 0} to denote the set of indices in L for which the inequality constraints hold with equality at x ∈ Rn , in which case they are said to be active at x ∈ Rn . Definition B.39 (Local and Global Optimizers). A point x∗ ∈ D is said to be locally optimal (also local optimizer, local minimizer or locally optimal point) if there exists δ > 0 such that f (x∗ ) ≤ f (x) for all x ∈ Bδ (x∗ ) ∩ D. The value f (x∗ ) is called local optimum or local minimum. We say that x∗ ∈ D is globally optimal (also global optimizer, global minimizer or simply optimal point) if f (x∗ ) ≤ f (x) for all x ∈ D. In such a case, the value f (x∗ ) > −∞ is called (global) optimum or (global) minimum. Analogously, in the case of a maximization problem, we use the words “maximizer” and “maximum” instead of “minimizer” and “minimum”, respectively. Definition B.40 (Globally and Locally Solvable Problems). Let the infimum in (B.21) be bounded. Then, the problem is said to be globally solvable if any local optimizer of (B.21) is also a global one. If there exists a local optimizer of (B.21) that is not global, then the problem is referred to as locally solvable. B.4.1 Characterization of Numerical Convergence The rate of convergence of an iteration is usually characterized by the convergence of roots or quotients. The root and quotient convergence is measured by the norm-dependent convergence factor and norm-independent convergence order. In this section, we restate some definitions and features, mostly from [148], regarding the rate of convergence. The notions of numerical convergence discussed here are used in Sects. 6.5 and 6.8.
B.4 Basics of Optimization Theory
391
Definition B.41. Let an iteration x(n + 1) = G(x(n)), n ∈ N, be given with ˜. I as the set of all sequences of iterates convergent to a point of attraction x Then, the root convergence factor of p-th order is defined as ˜) = Rp (I, x
sup
p
˜ n , lim supx(n) − x
{x(n)}n ∈I n→∞
p ≥ 1.
The quotient convergence factor of p-th order is of the form ˜) = Qp (I, x
sup
lim sup
{x(n)}n ∈I n→∞
˜ x(n + 1) − x , p ˜ x(n) − x
p≥1
(B.23)
˜ for all but finitely many n ∈ N. and is defined only if x(n) = x ˜ and I, the quotient convergence factor Qq (I, x ˜ ) is When, for some fixed x considered as a function of a variable q ≥ 1, we have the following dependence: ˜ ) = 0 for q ∈ [1, p), and Qq (I, x ˜ ) = c < ∞ for q = p and Qq (I, x ˜) = ∞ Qq (I, x ˜ ), q ≥ 1). This discontinuous befor q ∈ (p, ∞) (and analogously for Rq (I, x havior motivates the definition of the root/quotient convergence order, which is norm-independent. Definition B.42. Let an iteration x(n + 1) = G(x(n)), n ∈ N, be given with ˜. I as the set of all sequences of iterates convergent to a point of attraction x Then, the root convergence order is defined as ˜) = OR (I, x
inf
p
(B.24)
p.
(B.25)
p≥1:Rp (I,˜ x)=∞
and the quotient convergence order takes the form ˜) = OQ (I, x
inf
p≥1:Qp (I,˜ x)=∞
In general, the description of convergence rate in terms of quotient convergence is better established. Based on the definition of (norm-independent) quotient convergence order, we say that an iteration with an attraction point ˜ exhibits x ˜ ) = 1. (i) Linear quotient convergence if OQ (I, x ˜) = 2 (ii) Quadratic quotient convergence if OQ (I, x For instance, gradient-based algorithms usually have linear quotient convergence (see, for instance, Sect. 6.5.2, where we considered a gradient projection power control algorithm). On the other hand, Newton-like algorithms can achieve quadratic convergence under some differentiability conditions. This is shown for instance in Sect. 6.8.5 for the case of a primal-dual Newton-based power control algorithm. There exist also the notions of sublinear, superlinear, subquadratic and superquadratic root/quotient convergence, which represent refinements of the terms (i), (ii) and are based on the definitions of convergence factors. We refer here to [148] for further details.
392
B Some Concepts and Results from Convex Analysis
B.4.2 Convergence of Gradient Projection Algorithms Here we present some standard results that are utilized in Chap. 6 to prove global convergence of the gradient projection power control algorithm. For a thorough treatment of the convex optimization theory, the reader is referred to [160, 162, 16]. The proofs use standard techniques and are only presented for completeness. Suppose that f : Rn → R attains its minimum over D ⊂ Rn . Throughout this section, it is assumed that f is continuously differentiable and D ⊂ Rn is a nonempty, closed, and convex set. The first result proves necessary and sufficient conditions for a vector x ∈ D to be optimal. Theorem B.43. Let a continuously differentiable f : D → R be given. (i) Suppose that x ∈ D is a local minimizer of f over D. Then, ∀z∈D ∇f (x)T (z − x) ≥ 0 .
(B.26)
(ii) If f is convex, then (B.26) is also sufficient for x ∈ D to minimize f over D. Proof. Let x ∈ D be a local minimizer of f over D. Suppose that ∇f (x)T (z − x) < 0 for some z ∈ D. By the mean value theorem, for every ξ ∈ [0, 1], there exists some s = s(ξ), s ∈ [0, 1], such that f (x + ξ(z − x)) = f (x) + T ξ∇f x + sξ(z − x) (z − x). Since ∇f is continuous and ∇f (x)T (z − x) < 0 (by assumption), we must have ∇f (x + sξ(z − x))T (z − x) < 0 for sufficiently small ξ > 0. Therefore, f (x + ξ(z − x)) < f (x) where x + ξ(z − x) ∈ D for all ξ ∈ [0, 1] since D is convex. This, however, contradicts the local optimality of x, and hence proves (i). (ii) Considering Theorem B.23 shows that if f is convex, then f (z) ≥ f (x)+∇f (x)T (z−x) for all z ∈ D. Thus, by (B.26), it follows that f (z) ≥ f (x) for all z ∈ D. Definition B.44 (Stationary Point). Any vector (point) x ∈ D satisfying (B.26) is referred to as a stationary point of f (on D).3 Note that if x in (B.26) is an interior point of D or D = Rn , then the condition reduces to ∇f (x) = 0. An important component of the gradient projection algorithm is the projection on a closed convex subset of Rn . We prove that the projection is well defined and unique. Theorem B.45 (Projection Theorem). For all y ∈ Rn , a vector ΠD [y] ∈ D is said to be the projection of y on a closed convex set D ⊂ Rn if 3
The additional information about the set is used when the optimum is attained on a set other than the domain and it is not clear from the context which set is meant.
B.4 Basics of Optimization Theory
ΠD [y] = arg miny − x22 .
393
(B.27)
x∈D
The minimum in (B.27) exists and is unique. Moreover, given some y ∈ Rn , ΠD [y] is the unique solution to (B.27) if and only if T x − ΠD [y] ≤ 0. ∀x∈D y − ΠD [y] (B.28) Proof. By assumption, D is closed but not necessarily bounded. However, the problem in (B.27) is equivalent to minimizing the same metric over all x ∈ D such that y − x2 ≤ y − z2 for some arbitrary z ∈ D. Since {x ∈ D : y − x2 − y − z2 ≤ 0} is a compact set and the norm is a continuous function of the vector elements, it follows that the minimum exists. Moreover, the minimizer is unique since y − x22 is a strictly convex function of x ∈ D. The proof of the last part proceeds along the same lines as the proof of Theorem (B.43). Now let us introduce the notion of Lipschitz continuity. Definition B.46 (Lipschitz Continuity Condition). Given D ⊂ Rn , a map f : D → Rn is said to satisfy the Lipschitz continuity condition (or is called Lipschitz continuous) if there exists a constant M > 0 such that f (x) − f (y)2 ≤ M x − y2
(B.29)
for all x, y ∈ D. We point out that this definition can be extended to maps between arbitrary metric spaces. The following lemma is also known as The Descent Lemma [160, 162] and is a key ingredient in proving the convergence of gradient methods to a stationary point Lemma B.47. If f : D → R is continuously differentiable and its gradient is Lipschitz continuous with some Lipschitz constant M > 0, then f (x + y) ≤ f (x) + yT ∇f (x) +
M y22 2
(B.30)
for all x, y ∈ D. Proof. We have
.
1
f (x + y) − f (x) =
yT ∇f (x + μy)dμ 0
.
1
=
yT ∇f (x) + ∇f (x + μy) − ∇f (x) dμ
0
.
≤ y ∇f (x) + y2
1
∇f (x + μy) − ∇f (x)2 dμ
T
0
.
≤ yT ∇f (x) + y22 M
μdμ = yT ∇f (x) + y22 0
for all x, y ∈ D.
1
M 2
394
B Some Concepts and Results from Convex Analysis
We use this lemma to prove the convergence of the gradient projection algorithm: x(n + 1) = T (x(n)) , x(0) ∈ D, (B.31) where T : Rn → D is defined to be
( ) T (x) := ΠD x − δ∇f (x)
(B.32)
and δ is a positive constant (sufficiently small). Theorem B.48. Let f : D → R be continuously differentiable and bounded below on a nonempty, closed and convex set D ⊂ Rn . Suppose that ∇f : D → Rn is Lipschitz continuous with the Lipschitz constant M > 0. Let 0 < δ < 2/M and x ∈ D be arbitrary. Then, (i) F (T (x)) ≤ F (x) − (1/δ − M/2)T (x) − x22 . (ii) T (x) = x if and only if x is a stationary point. Moreover, if f is convex, we have T (x) = x if and only if x minimizes f over D. Proof. It follows from (B.28) that ∀z∈D (z − T (x))T (x − δ∇f (x) − T (x)) ≤ 0. Particularizing this to z = x ∈ D yields (T (x)−x)T ∇f (x) ≤ −1/δT (x)−x22 . On the other hand, considering Lemma B.47 gives (T (x) − x)T ∇f (x) ≥ f (T (x)) − f (x) − M/2T (x) − x22 . Therefore, f (T (x)) ≤ f (x) − (1/δ − M/2)T (x) − x22 , which proves (i). (ii) By (B.32), T (x) is the projection of x − δ∇f (x) on D. Therefore, if T (x) = x, (B.28) implies that δ∇f (x)T (z − x) ≥ 0 for all z ∈ D with δ > 0. Conversely, if δ∇f (x)T (z−x) ≥ 0 for all z ∈ D, then (x−∇f (x)−x)T (z−x) ≤ 0, from which we have T (x) = x. The assertion for a convex function follows from (ii) in Theorem B.43. Now let {x(n)} be the sequence generated by (B.31). Provided that 0 < δ < 2/M , it follows from (i) that {f (x(n))} is nonincreasing, and therefore the sequence {x(n)} converges since f is bounded below on D. From this, the left-hand side of M 1 − f (x(n + 1)) − f (x(n)) ≤ T (x(n)) − x(n)22 2 δ tends to zero, whereas the right-hand side is nonpositive if 0 < δ < 2/M so that T (x(n)) − x(n)22 must tend to zero as well. Hence, if x∗ is a limit point of {x(n)}, this sequence converges to x∗ . Moreover, by continuity of T (by (B.32), T is continuous since it is a composition of continuous maps), we must have T (x∗ ) = x∗ . Finally, if f is convex, then the sequence {x(n)} generated by (B.31) converges to some x∗ that minimizes f over D. Using Definition B.44, we can summarize this in a theorem. Theorem B.49. Suppose that the conditions of Theorem B.48 are satisfied. Then, provided that 0 < δ < 2/M , the sequence {x(n)} generated by (B.31) converges to some x∗ satisfying (z − x∗ )T ∇f (x∗ ) ≥ 0 for all z ∈ D. If f : D → R is a convex function, then x∗ minimizes f over D.
B.4 Basics of Optimization Theory
395
Finally we point out that if f : D → R is strongly convex (see Sect. B.2.1), then the rate of convergence is geometric (Definition 6.27). The proof can be found in [162]. B.4.3 Basics of Lagrangian Optimization Theory In this section, we provide some fundamental definitions and results from the Lagrangian optimization theory. The material presented here is covered in great detail in [183], [181], [146], [194]. The framework of Lagrangian optimization is used in the book mainly in Sects. 6.7.1, 5.2.7 and 6.8. The Lagrangian function (or, in short, Lagrangian) L : Rn × RL × RJ → R of the problem (B.21) is defined to be λk hk (x) + μk gk (x) . (B.33) L(x, λ, μ) = f (x) + k∈L
k∈J
The (vector) variables λ = (λ1 , . . . , λL ) ∈ RL and μ = (μ1 , . . . , μJ ) ∈ RJ are referred to as dual variables. Definition B.50 (Kuhn–Tucker conditions). Suppose that f , and hk , k ∈ L, and gk , k ∈ J , in (B.21) are differentiable functions. Then, the set of the inequalities ∇x L(x∗ , λ∗ , μ∗ ) = 0 hk (x∗ ) ≤ 0, ∗
gk (x ) ∗ λk hk (x∗ ) ∗
= 0, = 0, λ ≥0
k∈L k∈J k ∈ L (complementary slackness)
(B.34)
with (x∗ , λ∗ , μ∗ ) ∈ Rn × RL × RJ (if exists) is referred to as the Kuhn– Tucker conditions for the problem (B.21). The vectors λ∗ and μ∗ are called Lagrange multipliers for the problem (B.21). Any point (vector) (x∗ , λ∗ , μ∗ ) ∈ Rn × RL × RJ that satisfies the Kuhn–Tucker conditions is called a Kuhn– Tucker point of the problem (B.21). Note that, according to the definition of the Lagrangian function and its variables, the Lagrangian multipliers λ∗ , μ∗ correspond to particular values of dual variables which satisfy the Kuhn–Tucker conditions together with some value x∗ of the primal variable. In order to formulate optimality conditions for x ∈ Rn based on the Lagrangian function, it is crucial that the functions hk , k ∈ L, and gk , k ∈ J , satisfy a constraint qualification (condition) at x. For the general problem formulation (B.21), there are several non-equivalent versions of constraint qualification, such as the Kuhn–Tucker constraint qualification, the weak (or modified) Arrow-Hurwicz-Urawa constraint qualification or Slater’s condition. The last one is probably the best-known, but at the same time also the most restrictive one, as it requires the functions hk , k ∈ L, be linear and gk , k ∈ J , be convex [183].
396
B Some Concepts and Results from Convex Analysis
Definition B.51 (Kuhn–Tucker constraint qualification). Let x ∈ D be arbitrary and suppose that hk , k ∈ L, and gk , k ∈ J , are differentiable at x. Define y ∈ Rn to be any vector such that ∀k∈L (x) (∇hk (x))T y ≤ 0
∀k∈J (∇gk (x))T y = 0 .
Then, the Kuhn–Tucker constraint qualification is said to be satisfied at x if there exists a function h : [0, 1] → Rn being differentiable at t = 0, such that h(t) ∈ D, t ∈ [0, 1], h(0) = x and h (0) = ay for some a > 0. There is a well-known sufficient condition for the constraint qualification for differentiable maps hk , gk . Sometimes, this condition is even declared as the definition of constraint qualification [184]. Lemma B.52. Let x ∈ D be arbitrary, and let hk , k ∈ L, and gk , k ∈ J , be differentiable. Then, constraint qualification is satisfied at x if all the vectors ∇gk (x), k ∈ J , and ∇hk (x), k ∈ L (x), are pairwise linearly independent. Theorem B.53 (Kuhn–Tucker theorem). Suppose that the problem (B.21) is globally solvable for some differentiable functions f , hk , k ∈ L, and gk , k ∈ J . Then, if (x∗ , λ∗ , μ∗ ) ∈ Rn × RL × RJ satisfies the Kuhn–Tucker conditions (B.34), then x∗ is a global optimizer of (B.21). If the problem (B.21) is locally solvable and constraint qualification is satisfied at a local optimizer x∗ ∈ Rn of (B.21), then x∗ satisfies the Kuhn–Tucker conditions for some (λ∗ , μ∗ ) ∈ RL × RJ . Another interesting notion related to the Lagrangian function is the following. Definition B.54 (Strict Complementarity). Let (x∗ , λ∗ , μ∗ ) ∈ Rn ×RL × RJ satisfy the Kuhn–Tucker conditions (B.34). We say that strict complementarity is satisfied at (x∗ , λ∗ , μ∗ ) if λ∗k > 0, k ∈ L (x∗ ). In order to illustrate the meaning of strict complementarity, assume that the problem (B.21) is perturbed in the sense that the constraints hk (x) ≤ 0, k ∈ L, are loosened according to hk (x) ≤ δk , k ∈ L, with δ = (δ1 , . . . , δL ) ∈ ∗ RL + . Now the Lagrange multiplier λk corresponds to the sensitivity of the ∗ optimum value p (δ) of such a perturbed problem as a function of δk ≥ 0; precisely [16], ∂p∗ (δ) λ∗k = − |δ=0 , k ∈ L. ∂δk Thus, strict complementarity at (x∗ , λ∗ , μ∗ ) ∈ Rn × RL × RJ means that inequality constraints that are tight at (x∗ , λ∗ , μ∗ ) are relevant (or nontrivial) in the sense that the relaxation of these constraints provides an improvement in the achievable optimum value. In addition to the Kuhn–Tucker conditions, there is a great interest in Second-Order Sufficiency Conditions (SOSC), which are sufficient conditions for x ∈ Rn to be a local minimizer of the problem (B.21). In this book, we utilize the SOSC only for inequality constrained problems.
B.4 Basics of Optimization Theory
397
Definition B.55. If J = ∅, the Second-Order Sufficiency Conditions (SOSC) are said to be satisfied at a stationary point (x, λ) ∈ Rn × RL (or simply at x ∈ Rn ) of the Lagrangian of the problem (B.21) if and only if (i) (x, λ) satisfies the Kuhn–Tucker conditions, and (ii) xT ∇2x L(x, λ)x > 0 for x = 0 satisfying (∇hk (x))T x = 0 (∇hk (x))T x ≤ 0
k ∈ L (x) ∩ {k ∈ L : λk > 0} k ∈ L (x) ∩ {k ∈ L : λk = 0} .
The SOSC are of immense importance in the development and analysis of locally convergent iterations for nonconvex optimization problems. Precisely, they distinguish the local minimizers of the problem from other stationary points of the Lagrangian. Special primal-dual algorithms can be designed to converge to points satisfying the SOSC and to avoid convergence to other points, which do not correspond necessarily to local problem solutions [176] (see also Sect. 6.8). Lagrangian Duality In what follows, let us restrict to inequality constraints only (J = ∅) and, given the Lagrangian (B.33), let us define the function g(λ) = infn L(x, λ), x∈R
λ ∈ RL +,
(B.35)
which is commonly referred to as the dual function (of the problem (B.21) or Lagrangian (B.33)). Note here that the dual function defined by (B.35) exists regardless of the properties of the functions f and hk , k ∈ L. On the other hand, if the alternative definition g(λ) = minx∈Rn L(x, λ) is used, the function is defined only for those values λ ∈ RL + for which the unconstrained minimum minx∈Rn L(x, λ) exists (see for instance [182]). By the property of the supremum operator (see, e.g., [186], [11] and App. B.2), it is readily observed that the dual function g is concave on RL + for any functions f and hk , k ∈ L, incorporated in the Lagrangian. Definition B.56. Given (B.35), the optimization problem sup g(λ) λ∈RL +
is referred to as the dual problem of the problem (B.21), and δ = inf f (x) − sup g(λ) x∈D
λ∈RL +
is said to be the duality gap of the problem (B.21).
398
B Some Concepts and Results from Convex Analysis
In the context of the duality relations from Definition B.56, the original problem (B.21) is frequently referred to as the primal problem. Thus, in simple words, the duality gap is the difference between the values achieved by the solutions of the primal and dual problems. As it is easily observed that f (x) x∈D (B.36) sup L(x, λ) = ∞ x∈ /D λ∈RL + the duality gap can also be written as δ = infn sup L(x, λ) − sup infn L(x, λ). x∈R λ∈RL +
x∈R λ∈RL +
By (B.37) in the next appendix, it is evident that δ ≥ 0. Such a general property is frequently referred to as weak Lagrangian duality and is satisfied regardless of the functions f and hk , k ∈ L, incorporated in (B.33). Moreover, it can be shown that δ = 0 (no duality gap or strong duality) if (B.21) is a convex problem and some constraint qualification is satisfied [186]. Since f and hk are both convex in such a case, the function (B.36) is a convex function of x ∈ Rn as well and the dual function λ → inf x∈Rn L(x, λ) is in general a concave function of the dual variable λ ∈ RL + . These properties imply the ∗ ∗ existence of a unique saddle point (x∗ , λ∗ ) ∈ Rn × RL + such that L(x , λ ) = ∗ ∗ minx∈Rn supλ∈RL+ L(x, λ) and L(x , λ ) = maxλ∈RL+ inf x∈Rn L(x, λ). Further issues related to general saddle points are addressed in the next appendix. Finally, it has to be emphasized that one can also construct nonconvex problems with strong duality (no duality gap) (see [177]). B.4.4 Saddle Points, Saddle Functions, Min-Max Functions In this section, we consider a continuous function f : DX × DZ → R where DX ⊆ Rn , n ≥ 1, and DZ ⊆ Rm , m ≥ 1, are some nonempty sets. Suppose that the objective is either to minimize supz∈Z f (x, z) over X or to maximize inf x∈X f (x, z) over Z where X and Z are given nonempty subsets of DX and DZ , respectively. This sort of min-max and max-min problems is investigated in great detail in [186], [195], [196]. Such problems and the related notions of saddle points/values and functions are used in this book in Sects. 1.2.5, 5.9 and 6.8. It is easy to show that sup inf f (x, z) ≤ inf sup f (x, z) z∈Z x∈X
x∈X z∈Z
(B.37)
holds in general, that is, for any choice of the function f : DX × DZ → R as well as the sets X ⊆ DX and Z ⊆ DZ [186].
B.4 Basics of Optimization Theory
399
Definition B.57 (Saddle Value). If there is equality in (B.37), then the value of (B.37) is called the saddle value of f with respect to X × Z. Inequality B.37 is frequently referred to as max-min min-max inequality. It was stated originally by Ky Fan for a certain function class and currently great generalizations of it are known [196]. For the saddle value to exist, it is sufficient that there exists a saddle point defined as follows. ˜) ∈ X×Z is said to be a saddle Definition B.58 (Saddle Point). A pair (˜ x, z point of f (with respect to X × Z) if ˜) ≤ f (x, z ˜) f (˜ x, z) ≤ f (˜ x, z
(B.38)
˜) is isolated for all (x, z) ∈ X×Z. Furthermore, we say that a saddle point (˜ x, z over X (respectively, Z), if the second (respectively, first) inequality in (B.38) is strict. ˜) ∈ X × Z is a saddle point of f with respect to Theorem B.59. A pair (˜ x, z X × Z if and only if ˜) . x, z) = min max f (x, z) = max min f (x, z) = min f (x, z max f (˜ z∈Z
x∈X z∈Z
z∈Z x∈X
x∈X
(B.39)
It is important to note the subtlety that even if f has a saddle point with respect to X × Z, this does not mean that there exists a saddle point of f with respect to DX × DZ . Clearly, X, Z in Definition B.58 and Theorem B.59 are allowed to be vanishingly small and represent an -neighborhood ˜). A sufficient condition for (˜ ˜) ∈ X × Z to be a saddle point X × Z = B(˜ x, z x, z ˜), x ∈ X, is a convex function and z → f (˜ (B.38) is that x → f (x, z x, z), z ∈ Z, is concave. By Theorem B.24, this sufficient condition can be expressed as ˜) 0, ∇x f (x, z
x ∈ X,
∇z f (˜ x, z) " 0,
z ∈ Z.
Definition B.60 (Saddle Function (Convex-Concave Function)). f : DX × DZ → R is said to be a saddle function or convex-concave function (in x ∈ DX , z ∈ DZ ), if f (x, z) is convex with respect to x ∈ DX and concave in z ∈ DZ . Analogously, f is a concave-convex function if −f is convex-concave. Recall from Definition B.22 that the convexity or concavity property already imply that DX and DZ , respectively, are convex sets [186]. Strict convexconcavity is an obvious extension of Definition B.60, which requires strict convexity and strict concavity of f as a function of x ∈ DX and z ∈ DZ , respectively. By Theorem B.24, we have that a twice Frechet-differentiable function f : DX × DZ → R, is convex-concave if and only if ∇2x f (x, z) 0,
∇2z f (x, z) " 0,
(x, z) ∈ DX × DZ .
(B.40)
Under strict convex-concavity, the inequalities (B.40) are strict and represent only a sufficient condition (recall Sect. B.2). The following central property of a convex-concave function can be concluded from the discussed conditions for a saddle point.
400
B Some Concepts and Results from Convex Analysis
Theorem B.61. If function f : DX × DZ → R, is convex-concave (in x ∈ DX , z ∈ DZ ), then it has either no stationary points or only saddle points ¯) ∈ DX × DZ such that (¯ x, z ¯) = max min f (x, z) = min max f (x, y), f (¯ x, z x∈DX z∈DZ
x∈DX z∈DZ
(B.41)
¯) is unique if f is strictly convex-concave. where (¯ x, z The concept of a min-max function was introduced in [195]. The class of min-max functions is strongly related to the class of convex-concave functions4 . Definition B.62. We say that a function f : DX × DZ → R is a min-max function of x ∈ DX , z ∈ DZ , if f is twice differentiable and ∇2x f (x, z) − ∇2x,z f (x, z)T (∇2z f (x, z))
−1
∇2x,z f (x, z) 0,
∇2z f (x, z) ≺ 0,
(x, z) ∈ DX × DZ .
A max-min function is defined analogously, by replacing the roles of Hessians ∇2x f (x, z) and ∇2z f (x, z) in the Definition B.62. The definition of a strictly min-max function is a straightforward extension of the Definition B.62 which requires strictness of the first inequality. A min-max function has the following key property. Theorem B.63. If f : DX × DZ → R is a min-max function of x ∈ DX , ¯) ∈ x, z z ∈ DZ , then it has either no stationary points or only min-max points (¯ DX × DZ satisfying ¯) = min max f (x, z), (B.42) f (¯ x, z x∈DX z∈DZ
¯) is unique if f is a strictly min-max function. where (¯ x, z It is readily seen that the class of min-max functions generalizes/includes the class of twice differentiable convex-concave functions for which the second inequality in (B.40) is strict (according to App. B.2, it does not, however, generalize the class of twice differentiable convex-concave functions which are strictly convex in x ∈ DX ). Consequently, a min-max point (B.42) becomes a saddle point (B.41) if f is also strictly convex-concave.
4
It has to be noted that there is a slight difference between the notion introduced in [195] and its version utilized in this book and defined below.
References
1. Tang A, Wang J, Low S. Is Fair Allocation Always Inefficient? In: Proc. 23rd IEEE Conference on Computer Communications (INFOCOM), Hong Kong; 2004. 2. Bertsekas DP, Gallager RG. Data Networks. Prentice-Hall, Englewood Cliffs; 1992. 3. Polyak BT, Juditsky AB. Acceleration of Stochastic Approximation by Averaging. SIAM J Control Optim. 1992;30:838–855. 4. Seneta E. Non-Negative Matrices and Markov Chains. Springer, Berlin; 1981. 5. Horn RA, Johnson CR. Matrix Analysis. Cambridge University Press; 1985. 6. Gantmacher FR. Matritzentheorie. Springer, Berlin; 1986. (German translation of the Russian original). 7. Serre D. Matrices: Theory and Applications. Springer, Berlin; 2001. 8. Arnold L, Gundlach V, Demetrius L. Evolutionary Formalism for Products of positive Random Matrices. Ann Appl Probab. 1994;4(3):859–901. 9. Horn RA, Johnson CR. Topics in Matrix Analysis. Cambridge University Press; 1985. 10. Rudin W. Real and Complex Analysis. 3rd ed. McGraw Hill, New York; 1987. 11. Hiriart-Urruty JB, Lemarechal C. Fundamentals of Convex Analysis. Springer, Berlin; 2001. 12. Ahlswede R, Gacs P. Spreading of sets in product spaces and hypercontraction of the Markov operator. Ann Prob. 1976;4(6):925–939. 13. Cover TM, Thomas JA. Elements of Information Theory. Wiley Series in Telecommunications. John Wiley & Sons, Inc.; 1991. 14. Lesniewski A, Ruskai MB. Monotone Riemannian Metrics and Relative Entropy on Non-Commutative Probability Spaces. J Math Phys. 1999;40:5702– 5724. 15. Dembo A, Zeitouni O. Large Deviations Techniques and Applications. 2nd ed. Applications of Mathematics: Stochastic Modelling and Applied Probability. Springer, Berlin; 1998. 16. Boyd S, Vandenberghe L. Convex Optimization. Cambridge University Press; 2004. 17. van Lint JH, Wilson RM. A course in combinatorics. Cambridge University Press; 1994. S. Stanczak et al., Fundamentals of Resource Allocation in Wireless Networks, Foundations in Signal Processing, Communications and Networking 3, 2nd edn., c Springer-Verlag Berlin Heidelberg 2009 DOI 10.1007/978-3-540-79386-1 BM2,
402
References
18. Cvetkovic DM, Doob M, Sachs H. Spectra of Graphs. Academic Press, New York; 1978. 19. Boche H, Wiczanowski M, Stanczak S. Unifying View on Min-Max Fairness, Max-Min Fairness, and Utility Optimization in Cellular Networks. EURASIP J Wireless Commun and Net. 2007;ID 34869. 20. Boche H, Stanczak S. On Systems of Linear Equations with Nonnegative Coefficients. Appl Alg Eng Comm Comp. 2004 March;14(6):397–414. 21. Boche H, Stanczak S. Convexity of Some Feasible QoS Regions and Asymptotic Behavior of the Minimum Total Power in CDMA Systems. IEEE Trans Commun. 2004 December;52(12):2190–2197. 22. Boche H, Stanczak S. Log-Convexity of the Minimum Total Power in CDMA Systems with Certain Quality-Of-Service Guaranteed. IEEE Trans Inform Theory. 2005 January;51(1):374–381. 23. Stanczak S, Boche H. The Infeasible SIR Region Is Not a Convex Set. IEEE Trans Commun. 2006 November;54(11):1905–1907. 24. Stanczak S, Boche H. Towards a better understanding of the QoS tradeoff in multiuser multiple antenna systems. In: Smart Antennas–State-of-the-Art. EURASIP Book Series on Signal Processing and Communications. Hindawi Publishing Corporation; 2005. p. 521–543. 25. Stanczak S, Boche H, Wiczanowski M. Towards a Better Understanding of Medium Access Control for Multiuser Beamforming Systems. In: Proc. IEEE Wireless Communications and Networking Conference (WCNC). New Orleans, LA, USA; 2005. 26. Imhof L, Mathar R. Capacity Regions and Optimal Power Allocation for CDMA Cellular Radio. IEEE Trans Inform Theory. 2005 June;51(6):2011– 2019. 27. Imhof L, Mathar R. The Geometry of the Capacity Region for CDMA Systems with General Power Constraints. IEEE Trans Wireless Commun. 2005 Sept;4(5). 28. Stanczak S, Boche H. Strict Log-Convexity of the Minimum Power Vector. In: Proc. IEEE International Symposium on Information Theory (ISIT). Seattle, WA, USA; 2006. 29. Kirkland SJ, Neumann M, Ormes N, Xu J. On the Elasticity of the Perron Root of a Nonnegative Matrix. SIAM J Matrix Anal Appl. 2002;24(2):454–464. 30. Friedland S, Karlin S. Some Inequalities for the Spectral Radius of NonNegative Matrices and Applications. Duke Math J. 1975;42(3):459–490. 31. Kingman JFC. A Convexity Property of Positive Matrices. Quart, J Math Oxford Ser. 1961;12(2):283–284. 32. Catrein D, Imhof L, Mathar R. Power Control, Capacity, and Duality of Up- and Downlink in Cellular CDMA Systems. IEEE Trans Commun. 2004 Oct;52(10):1777–1785. 33. Cohen JE, Friedland S, Kato T, Kelly FP. Eigenvalue Inequalities for Products of Matrix Exponentials. Linear Algebra Appl. 1982;45:55–95. 34. Friedland S. Convex Spectral Functions. Linear Multilin Alg. 1981;9:293–316. 35. Cohen JE. Random Evolutions and the Spectral Radius of a Non-Negative matrix. Math Proc Cambridge Philos Soc. 1979;86:345–350. 36. Deutsch E, Neumann M. Derivatives of the Perron Root at an Essentially Nonnegative Matrix and the Group Inverse of an M-Matrix. J Math Anal Appl. 1984;I-29(102):1–29.
References
403
37. Elsner L. On Convexity Properties of the Spectral Radius of Nonnegative Matrices. Linear Algebra Appl. 1984;61:31–35. ¨ 38. Elsner L. Uber Eigenwereinschliessungen mit Hilfe von Gerschgorin-Kreisen. Z Angew Math Mech. 1970;50:381–384. 39. Rudin W. Principles of Mathematical Analysis. McGraw-Hill, New York; 1976. 40. Goldsmith AJ, Wicker SB. Design Challenges for Energy-Constrained Ad-Hoc Wireless Networks. IEEE Wireless Commun Mag. 2002 August;9:8–27. 41. Kandukuri S, Boyd S. Optimal Power Control in Interference-Limited Fading Wireless Channels with Outage-Probability Specifications. IEEE Trans Wireless Commun. 2002 Jan;1(1):46–55. 42. Papandriopoulos J, Evans J, Dey S. Optimal Power Control for Rayleigh-Faded Multiuser Systems with Outage Constraints. IEEE Trans Wireless Commun. 2005 Nov;4(6):2705–2715. 43. Papandriopoulos J, Evans J, Dey S. Outage-Based Optimal Power Control for Generalized Multiuser Fading Channels. IEEE Trans Commun. 2006 April;54(4):693–703. 44. Borst SC, Whiting PA. Dynamic Channel-Sensitive Scheduling Algorithms for Wireless Data Throughput Optimization. IEEE Trans Veh Technol. 2003 May;52(3):569–586. 45. Lee JW, Mazumdar RR, Shroff NB. Oportunistic Power Scheduling for Dynamic Multi-Server Wireless Systems. IEEE Trans Wireless Commun. 2006 June;5(6):1506–1515. 46. Neely MJ. Dynamic Power Allocation and Routing for Satellite and Wireless Networks with Time Varying Channels. Massachusetts Institute of Technology, LIDS. Cambridge, MA, USA; 2003. 47. Neely MJ, Lee CP, Modiano E. Fairness and Optimal Stochastic Control for Heterogeneous Networks. In: Proc. 23rd IEEE Conference on Computer Communications (INFOCOM). Miami, FL, USA; 2005. 48. Stolyar AL. Maximizing Queueing Network Utility Subject to Stability. Queueing Systems. 2005 Aug;50(4):401–457. 49. Aein JM. Power Balancing in Systems Employing Frequency Reuse. COMSAT Tech Rev. 1973;3(2):277–300. 50. Meyerhoff HJ. Method for computing the optimum Power Balance in Multibeam Satellites. COMSAT Tech Rev. 1974;4(1):139–146. 51. Alavi H, Nettleton RW. Downstream Power Control for a Spread Spectrum Cellular Mobile Radio System. In: Proc. IEEE Global Communications Conference (GLOBECOM); 1982. p. 84–88. 52. Zander J. Distributed Cochannel Interference Control in Cellular Radio Systems. IEEE Trans Veh Technol. 1992 August;41:305–311. 53. Zander J. Performance of Optimum Transmitter Power Control in Cellular Radio Systems. IEEE Trans Veh Technol. 1992 February;41(1):57–62. 54. Foschini GJ, Miljanic Z. A Simple Distributed Autonomous Power Control Algorithm and its Convergence. IEEE Trans Veh Technol. 1993 Nov;42(4):641– 646. 55. Yates RD. A Framework for Uplink Power Control in Cellular Radio Systems. IEEE J Select Areas Commun. 1995 September;13(7):1341–1347. 56. Yates RD, Huang CY. Integrated Power Control and Base Station Assignment. IEEE Trans Veh Technol. 1995 August;44(3):638–644. 57. Gerlach D, Paulraj A. Base Station Transmitting Antenna Arrays for Multipath Environments. Signal Processing (Elsevier Science). 1996;54:59–73.
404
References
58. He B, Wang MZ, Li EC. A new distributed power balancing algorithm for CDMA cellular systems. In: Proc. IEEE International Symposium on Circuits and Systems (ISCAS). vol. 3; 1997. p. 1768–1771. 59. Bambos N. Toward Power-Sensitive Network Architectures in Wireless Communications: Concepts, Issues, and Design Aspects. IEEE Personal Commun Mag. 1998 June;5:50–59. 60. Montalbano G, Ghauri I, Slock DTM. Spatio-Temporal Array Processing for CDMA/SDMA Downlink Transmission. In: Proc. Asilomar Conf. on Signals, Systems and Computers, Monterey,CA,USA; 1998. p. 1337–1341. 61. Hongyu W, Aiging H, Rong H, Weikang G. Balanced distributed power control. In: IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC). vol. 2; 2000. p. 1415–1419. 62. Wu Q. Optimum Transmitter Power Control in Cellular Systems with Heterogeneous SIR Thresholds. IEEE Trans Veh Technol. 2000 July;49(4):1424–1429. 63. Bambos N, Chen SC, Pottie GJ. Channel Access Algorithms with Active Link Protection for Wireless Communication Networks with Power Control. IEEE/ACM Trans Networking. 2000 October;8(5):583–597. 64. Zander J, Kim SL. Radio Resource Management for Wireless Networks. Artech House, Boston, London; 2001. 65. ElBatt T, Ephremides A. Joint Scheduling and Power Control for Wireless Ad Hoc Networks. IEEE Trans Wireless Commun. 2004 January;3(1):74–85. 66. Feiten A, Mathar R. Optimal Power Control for Multiuser CDMA Channels. In: Proc. 2005 IEEE International Symposium on Information Theory (ISIT). Adelaide, Australia; 2005. 67. Cruz RL, Santhanam AV. Optimal Routing, Link Scheduling and Power Control in Multi-hop Wireless Networks. In: Proc. 22nd IEEE Conference on Computer Communications (INFOCOM), San Francisco, CA, USA; 2003. 68. Hanly SV. An algorithm for combined cell-site selection and power control to maximize cellular spread spectrum capacity. IEEE Journal on Selected Areas in Communications. 1995;13(7):1332–1340. 69. Kelly FP, Maulloo AK, Tan DKH. Rate Control for Communication Networks: Shadow Prices, Proportional Fairness and Stability. J Oper Res Soc. 1998 March;49(3):237–252. 70. Mo J, Walrand J. Fair End-to-End Window-Based Congestion Control. IEEE/ACM Trans on Networking. 2000 October;8(5):556–567. 71. Goodman D, Mandayam N. Power Control for Wireless Data. IEEE Personal Commun Mag. 2000 April;7:48–54. 72. Saraydar CU, Mandayam NB, Goodman DJ. Pricing and Power Control in a Multicell Wireless Data Network. IEEE J Select Areas Commun. 2001;19(10):1883–1892. 73. Xiao M, Schroff NB, Chong EKP. A Utility-Based Power Control Scheme in Wireless Cellular Systems. IEEE/ACM Trans Networking. 2003 April;11(2):210–221. 74. Johansson M, Xiao L, Boyd S. Simultaneous routing and resource allocation in CDMA wireless data networks. In: Proc. IEEE International Conference on Communications, Anchorage, Alaska; 2003. 75. O’Neill D, Julian D, Boyd S. Seeking Foschini’s Genie: Optimal Rates and Powers in Wireless Networks. IEEE Trans Veh Technol. 2003 (accepted);Available from http://www.stanford.edu/boyd/.
References
405
76. Chiang M. To Layer or Not To Layer: Balancing Transport and Physical Layers in Wireless Multihop Networks. In: Proc. 23rd IEEE Conference on Computer Communications (INFOCOM), Hong Kong; 2004. 77. Feng N, Mau SC, Mandayam NB. Pricing and Power Control for Joint Network-Centric and User-Centric Radio Resource Management. IEEE Trans Commun. 2004;52(9):1547–1557. 78. Chiang M. Balancing Transport and Physical Layers in Wireless Multihop Networks: Jointly Optimal Congestion Control and Power Control. IEEE J Select Areas Commun. 2005 Jan;23(1):104–116. 79. Price J, Javidi T. Decentralized Rate Assignments in a Multi-Sector CDMA Network. IEEE Trans Wireless Commun. 2006 Dec;5(12):3537–3547. 80. Subramanian A, Sayed AH. Joint Rate and Power Control Algorithms for Wireless Networks. IEEE Trans Signal Processing. 2005 Nov;53(11):4204–4214. 81. Palomar DP, Chiang M. A Tutorial on Decomposition Methods for Network Utility Maximization. IEEE J Select Areas Commun. 2006 Aug;24(8):1439– 1451. 82. Stanczak S, Wiczanowski M, Boche H. Distributed Utility-Based Power Control: Objectives and Algorithms. IEEE Trans Signal Processing. 2007 Oct;55(10):5058–5068. 83. Hande P, Rangan S, Chiang M, Wu X. Distributed Uplink Power Control for Optimal SIR Assignment in Cellular Data Networks. IEEE/ACM Trans Networking. 2008 Dec;16(6):1420–1433. 84. Wiczanowski M, Stanczak S, Boche H. Providing quadratic convergence of decentralized power control in wireless networks – The method of min-max functions. IEEE Trans Signal Processing. 2008 Aug;56(8):4053–4068. 85. Huang J, Subramanian VG, Agrawal R, Berry R. Joint Scheduling and Resource Allocation in Uplink OFDM Systems for Broadband Wireless Access Networks. IEEE J Select Areas Commun. 2009 Feb;27(2):288–296. 86. Berry R, Liu P, Honig M. Design and Analysis of Downlink Utility-Based Schedulers. In: Proc. 43nd Annual Allerton Conference on Communication, Control and Computing; 2002. 87. Chen L, Low S, Chiang M, Doyle J. Optimal Cross-Layer Congestion Control, Routing and Scheduling Design in Ad Hoc Wireless Networks. In: Proc. 25th IEEE Conference on Computer Communications (INFOCOM), Barcelona; 2006. 88. Georgiadis L, Neely MJ, Tassiulas L. Resource Allocation and Cross-Layer Control in Wireless Networks. Now Publishers Inc.; 2006. 89. Huang J, Berry R, Honig ML. A Game Theoretic Analysis of Distributed Power Control for Spread Spectrum Ad Hoc Networks. In: Proc. IEEE International Symposium on Information Theory (ISIT). Adelaide, Australia; 2005. 90. Huang J, Berry R, Honig ML. Distributed Interference Compensation for Wireless Networks. IEEE J Select Areas Commun. 2006 May;24(5):1074–1084. 91. Huang J, Berry R, Honig ML. Distributed Interference Compensation for Multi-Channel Wireless Networks. In: Proc. 43nd Annual Allerton Conference on Communication, Control and Computing. Monticello, IL, USA; 2005. 92. Alpcan T, Basar T, Srikant R, Altman E. CDMA uplink power control as a noncooperative game. Wireless Networks. 2002 Nov;8(6):659–670. 93. Saraydar CU, Mandayam NB, Goodman DJ. Efficient Power Control via Pricing in Wireless Data Networks. IEEE Trans Commun. 2002;50(2):291–303.
406
References
94. Helleseth T, Kumar PV. Sequences with Low Correlation. In: Handbook of Coding Theory. vol. 2. Elsevier Science, Amsterdam; 1998. p. 1765–1853. 95. M K Simon MSA. Digital Communication over Fading Channels: A Unified Approach to Performance Analysis. John Wiley & Sons, Inc.; 2000. 96. Proakis JG. Digital Communications. 3rd ed. McGraw Hill, New York; 1995. 97. Verdu S. Multiuser Detection. Cambridge University Press; 1998. 98. Tse D, Viswanath P. Fundamentals of Wireless Communication. 1st ed. Cambridge University Press; 2005. 99. IEEE Standard for Local and Metropolitan Area Networks Part 16: Air Interface for Fixed Broadband Wireless Access Systems. IEEE Std 80216-2004 (Revision of IEEE Std 80216-2001). 2004;p. 1–857. 100. Franks LE. Signal Theory. Prentice-Hall, Englewood Cliffs; 1969. 101. Tse D, Hanly S. Linear Multiuser Receivers: Effective Interference, Effective Bandwidth and User Capacity. IEEE Trans Inform Theory. 1999 March;45(2):641–657. 102. Viswanath P, Anantharam V, Tse DNC. Optimal Sequences, Power Control, and User Capacity of Synchronous CDMA Systems with Linear MMSE Multiuser Receivers. IEEE Trans Inform Theory. 1999 September;45(6):1968–1983. 103. Viswanath P, Anantharam V. Optimal Sequences and Sum Capacity of Synchronous CDMA Systems. IEEE Trans Inform Theory. 1999 September;45(6):1984–1991. 104. Verdu S. Spectral Efficiency of CDMA with Random Spreading. IEEE Trans Inform Theory. 1999 March;45(2):622–640. 105. Rasmussen LK, Lim TJ, Johansson AL. A Matrix Algebraic Approach to Successive Interference Cancellation in CDMA. IEEE Trans Commun. 2000 Jan;48(1):145–151. 106. Poor HV, Verdu S. Probability of Error in MMSE Multiuser Detection. IEEE Trans Inform Theory. 1997 May;43:858–871. 107. Lancaster P. Theory of Matrices. Academic Press, Inc.; 1969. 108. Han SH, Lee JH. An overview of peak-to-average power ratio reduction techniques for multicarrier transmission. IEEE Trans Wireless Commun. 2005 April;12(2):56–65. 109. Koskie A, Gajic Z. A Nash Game Algorithm for SIR-Based Power Control for 3G Wireless CDMA Networks. IEEE/ACM Trans Networking. 2005 Oct;13(5). 110. Schubert M, Boche H. Iterative Multiuser Uplink and Downlink Beamforming under SINR Constraints. IEEE Trans Signal Processing. 2005 July;53(7):2324– 2334. 111. Boche H, Schubert M. Duality Theory for Uplink and Downlink Multiuser Beamforming. In: Smart Antennas–State-of-the-Art. EURASIP Book Series on Signal Processing and Communications. Hindawi Publishing Corporation; 2005. p. 545–575. 112. Alamouti SM. A Simple Transmitter Diversity Scheme for Wireless Communications. IEEE Journal on Selected Areas in Communications. 1998 October;SAC-16:1451–1458. 113. Schnurr C, Stanczak S, Sezgin A. The Impact of Different MIMO Strategies on the Network-Outage Performance. In: Proc. ITG/IEEE International Workshop on Smart Antennas (WSA), 2007. Wien, Austria; 2007. 114. Massoulie L, Roberts J. Bandwidth Sharing: Objectives and Algorithms. IEEE/ACM Trans on Networking. 2002 June;10(3):320–328.
References
407
115. Hahne E. Round-Robin Scheduling for Fair Flow Control in Data Communication Networks. MIT, Deptartment of Electrical Engineering and Computer Science. Cambridge, MA, USA; 1986. 116. Hahne E. Round-robin Scheduling for Max-Min Fairness in Data Networks. IEEE J Select Areas Commun. 1991 Sep;9(7):1024–1039. 117. Charny A, Clark D, Jain R. Congestion Control with Explicit Rate Indication. In: Proc. IEEE International Conference on Communications (ICC); 1995. 118. Ji H, Huang CY. Non-Cooperative Uplink Power Control in Cellular Radio Systems. Wireless Networks. 1998 April;4:233–240. 119. Kozat UC, Koutsopoulos I, Tassiulas L. A Framework for Cross-layer Design of Energy-efficient Communication with QoS Provisioning in Multi-hop Wireless Networks. In: Proc. 23rd IEEE Conference on Computer Communications (INFOCOM), Hong Kong; 2004. 120. Neely MJ, Modiano E, Rohrs CE. Dynamic Power Allocation and Routing for Time Varying Wireless Networks. In: Proc. 22nd IEEE Conference on Computer Communications (INFOCOM), San Francisco, CA, USA; 2003. 121. Low S, Peterson L, Wang L. Understanding TCP Vegas: A Duality Model. Journal of the ACM (JACM). 2002 March;49(2):207–235. 122. Yi Y, Shakkottai S. Hop-by-hop Congestion Control over a Wireless Multihop Network. In: Proc. 23rd IEEE Conference on Computer Communications (INFOCOM), Hong Kong; 2004. 123. Sarkar S, Tassiulas L. End-to-End Bandwidth Guarantees Through Fair Local Spectrum Share in Wireless Ad Hoc Networks. IEEE Trans Automat Contr. 2005 Sept;50(9):1246–1259. 124. Nandagopal T, Kim T, Gao X, Bharghavan V. Achieving MAC Layer Fairness in Wireless Packet Networks. In: Proc. ACM Mobicom, Boston, MA, USA; 2000. p. 87–98. 125. Tassiulas L, Ephremides A. Stability Properties of Constrained Queueing Systems and Scheduling Policies for Maximum Throughput in Multihop Radio Networks. IEEE Trans Automat Contr. 1992 December;37(12):1936–1948. 126. Tassiulas L, Ephremides A. Jointly optimal routing and scheduling in packet radio networks. IEEE Trans Inform Theory. 1992 January;38(1):165–168. 127. Boche H, Wiczanowski M, Stanczak S. Characterization of Optimal Resource Allocation in Cellular Networks. In: Proc. 5th IEEE Workshop on Signal Processing Advances in Wireless Communications. Lisboa, Portugal; 2004. 128. Stanczak S, Wiczanowski M. Distributed Fair Power Control for Wireless Networks: Objectives and Algorithms. In: Proc. the 43rd Annual Allerton Conference on Communications, Control, and Computing; 2005. invited paper. 129. Mas-Colell A, Whinston MD, Green JR. Microeconomic Theory. Oxford University Press; 1995. 130. Pratt JW. Risk Aversion in the Small and in the Large. Econometrica. 1964;32:300–307. 131. Dianati MD, Shen XS, Naik S. A New Fairness Index for Radio Resource Allocation in Wireless Networks. In: Proc. IEEE Wireless Communications and Networking Conference (WCNC). New Orleans, LA, USA; 2005. 132. Marshall AW, Olkin I. Inequalities: Theory of Majorization and its Applications. New York: Academic; 1979. 133. Mitrinovic DS. Analytic Inequalities. Springer, Berlin; 1970. 134. Verdu S. On Channel Capacity per Unit Cost. IEEE Trans Inform Theory. 1990 Sept;36(5):1019–1030.
408
References
135. Verdu S. Recent Results on the Capacity of Wideband Channels in the LowPower Regime. IEEE Wireless Commun Mag. 2002 August;9:40–45. 136. Schubert M, Boche H. QoS-Based Resource Allocation and Transceiver Optimization. Foundation and Trends in Communications and Information Theory. 2006;2(6). 137. Jorswieck EA, Boche H. Performance Analysis of Capacity of MIMO Systems under Multiuser Interference based on Worst Case Noise Behavior. EURASIP Journal on Wireless Communications and Networking. 2004;2:273–285. 138. Schubert M, Boche H. Advanced Network Calculus–A General Framework for Interference Management and Network Utility Optimization. Lecture Notes in Computer Science. Springer, Berlin; 2009. To appear. 139. Debreu G. Theory of Value. Yale University Press; 1959. 140. Ulukus S, Yates R. Stochastic Power Control for Cellular Radio Systems. IEEE Trans Commun. 1998;46(6):784–798. 141. Luo J, Ulukus S, Ephremides A. Standard and Quasi-Standard Stochastic Power Control Algorithms. IEEE Trans Inform Theory. 2005 July;51(7):2612– 2624. 142. Grandhi SA, Zander J, Yates R. Constrained Power Control. Wireless Personal Communications, Kluwer. 1995;2(3):257–270. 143. Kushner HJ, Yin GG. Stochastic Approximation and Recursive Algorithms and Applications. Springer, Berlin; 2003. 144. Robinson H, Monro S. A Stochastic Approximation Method. Ann Math Statist. 1951;22:400–407. 145. Qi L. Convergence Analysis of Some Algorithms for Solving Non-Smooth Equations. Math Oper Res. 1993 Feb;18(1):227–244. 146. Bertsekas DP. Nonlinear Programming. Athena Scientific, Belmont, Massachusetts; 2003. 147. Wunder G, Michel T. On optimization of multiuser systems using interference calculus. In: Proc. 41st Asilomar Conference on Signals, Systems, and Computers. Pacific Grove, CA, USA; 2007. Invited. 148. Ortega JM, Rheinboldt WC. Iterative Solution of Nonlinear Equations in Several Variables. Classics in Applied Mathematics 30, SIAM, Philadelphia; 2000. 149. Boche H, Stanczak S. Iterative algorithm for finding resource allocation in symbol-asynchronous CDMA channels with different SIR requirements. In: Proc. 36th Asilomar Conference on Signals, Systems, and Computers. Monterey, CA, USA; 2002. 150. Mahdavi-Doost H, Ebrahimi M, Khandani AK. Characterization of Rate Region in Interference Channels with Constrained Power. In: Proc. IEEE International Symposium on Information Theory (ISIT). Nice, France; 2007. 151. Stanczak S, Kaliszan M, Bambos N, Wiczanowski M. A Characterization of Max-Min SIR-Balanced Power Allocation with Applications. In: Proc. IEEE International Symposium on Information Theory (ISIT). Seoul, Korea; 2009. 152. Tan CW, Chiang M, Srikant R. Fast Algorithms and Performance Bounds for Sum Rate Maximization in Wireless Networks. In: Proc. 28th IEEE Conference on Computer Communications (INFOCOM). Rio de Janeiro, Brazil; 2009. 153. Rockafellar RT. Saddle Points and Convex Analysis. In: Kuhn HW, Szego GP, editors. Differential Games and Related Topics. Amsterdam, The Netherlands: North-Holland; 1971. p. 109–127.
References
409
154. Neely MJ, Modiano E, Rohrs CE. Power Allocation and Routing in Multibeam Satellites with Time-Varying Channels. IEEE/ACM Trans Networking. 2003 February;11(1):138–152. 155. Stanczak S, Feistel A, Tomecki D. On Utility-Based Power Control and Receive Beamforming. In: Proc. 41st Annual Conference on Information Sciences and Systems (CISS). Baltimore, MD, USA; 2007. 156. Giannakis GB, Hua Y, Stoica P, Tong L, editors. Signal Processing Advances in Wireless and Mobile Communications. Prentice-Hall, Englewood Cliffs; 2000. 157. Fan P, Darnell M. Sequence Design For Communications Applications. Research Studies Pr; 1996. 158. Stanczak S, Wunder G, Boche H. On Pilot-based Multipath Channel Estimation for Uplink CDMA Systems: An Overloaded Case. IEEE Trans Signal Processing. 2006 February;54(2):512–519. 159. Boche H, Schubert M. Resource Allocation in Multiantenna Systems - Achieving max-min fairness by optimizing a sum of inverse SIRs. IEEE Trans Signal Processing. 2006 June;54(6). 160. Bertsekas DP, Tsitsiklis JN. Parallel and Distributed Computation. PrenticeHall, Englewood Cliffs; 1989. 161. Boche H, Stanczak S. Log-Concavity of SIR and Characterization of the Feasible SIR Region for CDMA Channels. In: Proc. 37th Asilomar Conference on Signals, Systems, and Computers. Monterey, CA, USA; 2003. 162. Bertsekas DP. Nonlinear Programming. Athena Scientific, Belmont, Massachusetts; 1995. 163. Wiczanowski M, Stanczak S, Boche H. Distributed Optimization and Duality in QoS Control for Wireless Best-Effort Traffic. In: Proc. 39th Asilomar Conference on Signals, Systems and Computers. Monterey, CA, USA; 2005. 164. Stanczak S, Wiczanowski M, Boche H. Distributed Power Control for Optimizing a Weighted Sum of QoS Parameter Values. In: Proc. IEEE Global Telecommunications Conference (GLOBECOM). St. Louis, MO, USA; 2005. 165. Stanczak S, Wiczanowski M, Boche H. Theory and Algorithms for Resource Allocation in Wireless Networks. Lecture Notes in Computer Science (LNCS 4000). Springer, Berlin; 2006. 166. Poor HV. An Introduction to Signal Detection and Estimation. Springer, Berlin; 1998. 167. Ljung L. Analysis of Recursive Stochastic Algorithms. IEEE Trans Automat Contr. 1977 August;AC-22(4):551–575. 168. Williams D. Probability with Martingales. Cambridge University Press; 1991. 169. Bertsekas DP. Convex Analysis and Optimization. Athena Scientific; 2003. 170. Stanczak S, Feistel A, Boche H, Wiczanowski M. Towards Efficient and Fair Resource Allocation in Wireless Networks. In: Proc. 6th Intl. Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt). Berlin, Germany; 2008. Keynote speech. 171. Stanczak S, Feistel A, Boche H. QoS Support with Utility-Based Power Control. In: Proc. 2008 IEEE International Symposium on Information Theory (ISIT). Toronto, Canada; 2008. 172. Bazaraa MS, Sherali HD, Shetty CM. Nonlinear programming. John Wiley & Sons, Inc.; 1993. 173. Wright SJ. Primal dual interior point methods. Philadelphia: Soc. for Industrial and Applied Math. (SIAM); 1997.
410
References
174. Hestenes MR. Multiplier and Gradient Methods. Journal of Optimization Theory and Applications. 1969;4:303–320. 175. Bertsekas DP. Combined Primal-Dual and Penalty Methods for Constrained Minimization. SIAM Journal on Control and Optimization. 1975 May;13(3):521–544. 176. Mangasarian OL. Unconstrained Lagrangians in Nonlinear Programming. SIAM Journal on Control and Optimization. 1975 May;13(4):772–791. 177. Rockafellar RT. Augmented Lagrange multiplier functions and duality in nonconvex programming. SIAM Journal on Control and Optimization. 1974;12:268–285. 178. Karmarkar N. A New Polynomial-Time Algorithm for Linear Programming. Combinatorica. 1984;4:373–395. 179. Wiczanowski M, Stanczak S, Boche H. Performance and interference control in wireless ad-hoc and mesh networks – The generalized Lagrangian approach. IEEE Trans Signal Processing. 2008 Aug;56(8):4039–4052. 180. Akyildiz IF, Wang X, Wang W. Wireless mesh networks: a survey. Computer Networks and ISDN Systems. 2005 Mar;47(4):445–487. 181. Fiacco AV, McCormick GP. Nonlinear Programming: Sequential Unconstrained Minimization Techniques. John Wiley & Sons, Inc.; 1968. 182. Li D. Zero Duality Gap for a Class of Nonconvex Optimization Problems. Journal of Optimization Theory and Applications. 1995;85:309–324. 183. Mangasarian OL. Nonlinear Programming. McGraw Hill, New York; 1969. 184. Evtushenko Y. Generalized Lagrange multiplier technique for nonlinear programming. Journal of Optimization Theory and Application. 1977;21(2):121– 135. 185. Debreu G. Definite and semidefinite quadratic forms. Econometrica. 1952;20:295–300. 186. Rockafellar RT. Convex Analysis. New Jersey: Princeton University Press; 1972. 187. Horn RA. Elements of Matrix Analysis. New York: Prenctice Hall; 1994. 188. Evtushenko Y. Iterative Methods for Solving Minimax Problems. USSR Computational Mathematics and Physics. 1974;14(5):1138–1149. 189. Meyer CD. Matrix Analysis and Applied Linear Algebra. SIAM, Philadelphia; 2000. 190. Rudin W. Functional Analysis. 2nd ed. McGraw Hill, New York; 1991. 191. Bartle RG, Sherbert DR. Introduction to Real Analysis. John Wiley & Sons, Inc.; 1982. 192. Golub GH, Loan CFV. Matrix Computations. New York: John Hopkins University Press; 1996. 193. Hardy G, Littlewood JE, Polya G. Inequalities. 2nd ed. Cambridge University Press; 1952. 194. Luenberger DG. Optimization by vector space methods. John Wiley & Sons, Inc.; 1969. 195. Evtushenko Y. Some local properties of minimax problem. USSR Computational Mathematics and Physics. 1974;14(3):129–138. 196. Ricceri B. Minimax Theory and Applications. Dordrecht: Kluwer Academic Publishers; 1998.
Index
ad hoc (wireless) network, see network, ad hoc (wireless) additive white Gaussian noise, see noise, additive white Gaussian adjoint network, see network, adjoint admissible power region, see region, admissible power admissible power vector, see vector, admissible power ADP, see algorithm, Asynchronous Distributed Pricing Alamouti space-time code/coding, see code, Alamouti space-time algorithm Asynchronous Distributed Pricing, 284 barrier, see method(s), barrier centralized, 188, 189 decentralized, see algorithm, distributed distributed, 104, 124, 204, 226, 283, 284, 297, 326, 340, see also implementation, distributed dual, see method(s), dual fixed-point, 183, 185, 187, 204, 205 gradient, 275, 309, 311, 343, see also algorithm, gradient-based gradient projection, 270 gradient projection, 262, 270, 283, 291, 309, 392, 394 gradient-based, 261, 319, 391 heuristic, 90 power control, 106
power control, 319, 392 standard, 183 power control, 227, 228, 261, 274, 275, 287, 297 max-min SIR, 204, 205 QoS-based, 177, 180 stochastic, 188, 287 utility-based, 261, 285, see also algorithm, power control primal, see method(s), primal primal-dual, see method(s), primaldual projected gradient, see algorithm, gradient projection stochastic, 188, 227, see also approximation, stochastic almost sure convergence, see convergence, almost sure amplifier, 90, 129 antenna(s) multi-element, 111, see also antenna(s), multiple multiple, 93, 116, 225 omnidirectional, 111 single-element, 111 application data, 81 delay-insensitive, 81 delay-sensitive, 81 elastic (data), 82, 214 real-time, 81 voice, 81 approximation, 107, 123, 140
412
Index
stochastic, 187, 285, 325 strongly consistent, 341 array gain, see gain, array Asynchronous Distributed Pricing Algorithm, see algorithm, Asynchronous Distributed Pricing averaging (of iterates), 187, 287, 288 AWGN, see noise, additive white Gaussian back-pressure (policy), 135 backlog, 133, 214 band-limited signal, see signal, band-limited bandwidth, 88, 94, 119 effective, 150 barrier algorithm/method, see method(s), barrier base station, 111 base station, 106, 204 beamforming, 111, 114, 116, 225 beamforming vector, see vector, beamforming bijection, see function, bijective bi-Lipschitz, 154 bit, 85, 92, 110 bits per channel use, 110 bits per second, 150 bits per symbol, 108 blind estimation, see estimation, blind block (of a matrix) diagonal, 52 isolated (diagonal), 234, 239 block (of a matrix) diagonal, 364 isolated (diagonal), 364 maximal (diagonal), 235, 239, 364 block irreducibility, see matrix, block-irreducible boundary (of a set), 32, 154, 157 broadcast, 285 cardinality, 87 carrier, 81 Cauchy–Schwarz inequality, see inequality, Cauchy–Schwarz CDMA, see multiple access, code division
CDMA-based network, see network, CDMA-based cellular (wireless) network/networking, see network, cellular (wireless) centralized algorithm, see algorithm, centralized centralized implementation, see implementation, centralized channel additive white Gaussian noise, 279 AWGN, see channel, additive white Gaussian noise control, 184, 277 feedback, 277, 324, see also channel, control memoryless (one-shot), 100 noiseless, 229 radio (propagation), 81, 112 radio (propagation), 87, see also channel, wireless wireless, 81, 90, 95, 97 channel state, 95 channel coefficient, see coefficient, channel channel impulse response, 104, 280 channel signature, 112, 114 channel state, 125 channel state information, see information, channel state channelization, 88 characterization Collatz–Wielandt-type, 22, 57 saddle point, 25 closure (of a set), 16, 152, 377 co-channel interference, see interference, co-channel code Alamouti space-time, 93 Alamouti space-time, 116 error-control, 93 orthogonal space-time, 116 space-time, 116 code division multiple access, see multiple access, code division coefficient channel, 112, 116, 281 relative risk aversion, 141 coherence time, see time, (channel) coherence
Index Collatz–Wielandt formula, 56, 57, 362 Collatz–Wielandt-type characterization, see characterization, Collatz–Wielandt-type collision avoidance, 96 complement (of a set), 38, 65, 378 complementary slackness (condition), see condition(s), complementary slackness composition (of maps), 62, 394 concave-convexity, see function, concave-convex concavity, see function, concave relative (of utility), 141, 218 condition number (of a matrix), 275 condition(s) complementary slackness, 294 complementary slackness, 133, 395 constraint qualification, 320, 321, 395 Kuhn–Tucker, 122, 124, 133, 146–148, 217, 245, 255, 294, 296, 304, 307, 312, 329, 395 ff. Kuhn–Tucker constraint qualification, 149 Kuhn–Tucker constraint qualification, 147, 217, 396 Second-Order Sufficiency, 307 Second-Order Sufficiency, 316, 396 Slater’s, 132, 291, 395 strict complementarity, 314, 320, 321, 396 congestion control, see control, congestion connected graph, see graph, connected connectivity (of an undirected graph), 50, see also graph, connected constraint qualification (condition), see condition(s), constraint qualification constraint(s) average (power), 105 general (power), 199 individual (power) per link, 303 individual (power) per link, 106 per link, 152, 156, 164, 295 per node, 105, 262 link capacity, 121
413
peak (power), 129 sum (power), 193 sum (power), 106, 152, 195, 248 continuity (of a function), see function, continuous control best-effort power, 83, see also control, utility-based power congestion, 135 end-to-end, 132, 133, 135 hop-by-hop, 124, 134, 135 data link, 87, see also layer, DLC end-to-end rate, 120, 132 joint power and receiver, 222, 224 max-min SIR power, 191–208, 300 medium access, 134, 135, see also layer, MAC power, 82, 86, 90, 160, 168, see also control, QoS-based power; control, utility-based power primal-dual power, 304, 308, 320 QoS-based power, 188, 304 QoS-based power, 82, 150, 168, 169, 222, 305 receiver, 222 utility-based power, 83, 136, 139, 154, 158, 168, 206, 210, 213, 221, 232, 236, 237, 289, 302, 304, 305 control channel, see channel, control convergence almost sure, 287, 325, 341 global, 185, 190, 270, 336 linear (quotient/root), 274, 391 local, 310, 325, 332 mean-square-sense, 287 quadratic (quotient), 190, 333, 391 quotient, 390 root, 390 superlinear, 189, 190, 391 weak, 287, 288, 325, 341 convergence in distribution, see convergence, weak convergence in the mean square (sense), see convergence, mean-squaresense convergence rate, see rate, convergence convergent subsequence, 363 convergent subsequence, 55
414
Index
convex (optimization) problem, see problem, convex (optimization) convex combination, 26, 131, 383 convex hull, 131, 165 convex-concavity, see function, convex-concave convexification, 315, 317 convexity (of a function), see function, convex convexity (of a set), see set, convex crest factor, see factor, crest cross-layer (protocol), see protocol, cross-layer cyclic projection (algorithm), see projection (on a set), cyclic data application, see application, data data link control, see control, data link data rate, see rate, data data stream, 92, 111 data symbol, see symbol, data de-correlating receiver, see receiver, zero-forcing decentralized algorithm, see algorithm, distributed decentralized implementation, see implementation, distributed decode-and-forward relaying, 86 decoder/decoding, 87, 91 single-user, 95 delay, 95 path, 81 queuing, 133 round-trip, 125 total potential (criterion of), 139 delay-insensitive application, see application, delay-insensitive delay-sensitive application, see application, delay-sensitive demodulator/demodulation, 89 derivative, 382 directional, 381 partial, 189, 381 second, 382 second partial, 275 destination node, see node, destination diagonal scaling, 275 differentiability (of a function), see function, Frechet differentiable
directed graph, see graph, directed discrete-time model, 114 distance, 14, 349, 377 Euclidean, 274, 349 distributed algorithm, see algorithm, distributed distributed handshake (protocol), see protocol, distributed handshake distributed implementation, see implementation, distributed distribution Gaussian (channel) input, 108 joint (probability), 14 probability, 28, see also measure, probability stationary (probability), 17 diversity (gain), see gain, diversity DLC, see control, data link DLC layer, see layer, DLC domain (function), 59, 379 QoS, 150, 154, 242 downlink (channel), 111, 113, 114, 195 downward comprehensivity, see set, downward comprehensive dual algorithm/method, see method(s), dual dual variable, see variable, dual duality Lagrangian, 316, 397 strong (Lagrangian), 310, 316, 398 weak (Lagrangian), 317, 398 duality gap, see gap, duality duality theory, see theory, duality (for multiple-antenna channels) duplex frequency division, 111 time division, 111 edge (of a graph), 85 directed, 48, 359, see also link(s) effective bandwidth, see bandwidth, effective effective noise variance, 97, see also vector, noise (effective) effective transmit vector, see vector, effective transmit efficiency–fairness trade-off, 141, 158, 230, 236
Index eigenmanifold, 53 eigenvalue, 352 simple, 4, 352 eigenvector, 352 left, 352 nonnegative right, 4, 232 nonnegative left, 4, 53, 233 nonnegative right, 232 Perron, 4, 361 positive left, 4 positive left, 4, 235, 236, 360 positive right, 4, 56, 192, 202, 360 right, 198, 352 elastic (data) application, see application, elastic (data) elastic traffic, see traffic, elastic empirical mean, see mean, empirical encoder/encoding, 87 end-to-end fairness, see fairness, end-to-end energy (per information bit), 164 entirely coupled network, see network, entirely coupled entirely coupled subnetwork, see subnetwork, entirely coupled equiprobability (assumption), 92 ergodicity, see process, ergodic (stochastic) error rate, see rate, bit error error-control code/coding, see code, error-control estimation blind, 227 least square, 187 pilot-based, 227 strongly consistent, 285 extended power allocation/vector, see vector, extended power factor damping, 190, 334 interference, 279 spreading, 108, 114 fading, 91, 116 block, 91, 108 frequency-flat, 91, 108 frequency-selective, 91, 280 multipath, 91 Rayleigh, 343
415
time-selective, 91 fair queuing (policy), see queuing (policy), fair fairness, 120, 144 end-to-end, 132, 134 MAC layer, 134, 135 max-min, 122, 142, 159, 229, see also policy, max-min fair proportional, 122, 123 weighted proportional, 122 fairness gap, see gap, fairness fairness index, see measure, fairness fairness measure, see measure, fairness fairness–efficiency trade-off, see efficiency–fairness trade-off FDD, see duplex, frequency division FDMA, see multiple access, frequency division feasible power region, see region, feasible power feasible power vector/allocation, see vector, feasible power feasible QoS region, see region, feasible QoS feasible QoS vector, see vector, feasible QoS feasible rate region, see region, feasible rate feasible SIR region, see region, feasible SIR feedback channel, see channel, feedback Fenchel–Legendre transform, 16 FIFO queuing, 125, 135 file transfer, 82 fixed point, see point, fixed fixed-point (power control) iteration/algorithm, see algorithm, fixed-point flooding protocol, see protocol, flooding flow, 85 MAC layer, 86, 135 flow control, see control, congestion flow rate allocation/vector, see vector, rate frame (interval), 87, 90, 92, 127 framing, 88, see also frame (interval) frequency, 90, 121 relative, 128 frequency band, 90
416
Index
frequency division duplex, see duplex, frequency division frequency division multiple access, see multiple access, frequency division function good rate, 16 aggregate utility, 83, 138, 141 barrier, 298, 308 bi-Lipschitz continuous, 267 bijective, 28, 110, 130, 136, 153, 194, 379, see also function, strictly monotonic concave, 265, 384 concave-convex, 336, 399 continuous, 173, 194, 380 continuously differentiable, 20, 382 continuously differentiable, 121 convex, 37, 167, 384 convex-concave, 334, 336, 399 error, 190, 273 exponential, 45 extended-valued, 327, 387 Frechet differentiable, 382 Gateaux differentiable, 382 generalized Lagrangian, 311, 314, 316 injective, 20, see also one-to-one (map) interference, 97, 148, 174–180 affine, 97, 147, 174, 177, 210 axiomatic, 174–180 continuously differentiable, 148 minimum, 176, 177, 184 noiseless, 228 nonlinear, 148, 327 standard, 169, 176–179, 210 vector-valued, 149, 183 inverse, 130, 153, 157, 388 Lagrangian, 133, 147, 395 linear, 38, 163 linear rate, 111 Lipschitz continuous, 270, 393 log-concave, 264, 298, 387 log-convex, 34, 131, 158, 386, 388 log-convex matrix-valued, 27 logarithmic, 12, 123 logarithmic moment generating, 16 logarithmic rate, 108, 109 logarithmic utility, 133 lower semicontinuous, 16
matrix-valued, 26 max-min, 400 min-max, 334, 335, 400 modified Lagrangian, 326, 329 modified utility, 140 moment generating, 16 monotonic, 93, 383, 388 monotonically decreasing, 383 monotonically increasing, 110, 383 nonlinear Lagrangian, 311 nonnegative, 17 order-convex, 191 positive, 169 probability mass, 14, see also measure, probability quasiconcave, 43 rate, 16, 107, 108 rate-SIR, 107, 139 Schur-concave, 145, 386 Schur-convex, 145, 386 strictly increasing, 20 strictly convex, 269 strictly decreasing, 150 strictly increasing, 108 strictly concave, 23, 138 strictly concave utility, 83, 119 strictly convex, 263, 384 strictly decreasing, 145, 150, 383 strictly increasing, 150, 383 strictly log-convex, 29, 73, 269, 387, 388 strictly min-max, 400 strictly monotonic, 70, 150, 157 strongly concave, 268 strongly convex, 267, 269, 384 surjective, 110 traditional utility, 138, 139 twice differentiable, 305 twice differentiable, 382 twice continuously differentiable, 28, 97, 137, 382 twice differentiable, 138 utility, 83 gain array, 116 diversity, 116 effective spreading, 150 power, 97, 116
Index interference, 97 signal, 97 spatial multiplexing, 116 gain matrix, see matrix, gain gap duality, 317, 397 fairness, 238, 239 Gaussian (channel) input distribution, see distribution, Gaussian (channel) input Gaussian noise, see noise, additive white Gaussian global convergence, see convergence, global globally solvable problem, see problem, globally solvable gradient, 138, 381, 382 gradient algorithm/method, see algorithm, gradient gradient projection algorithm/method, see algorithm, gradient projection graph, 50, 85 bipartite, 49 connected, 49, 50 directed, 48, 359 network topology, 85, see also topology, network strongly connected directed, 359 strongly connected directed, 231 H¨ older’s inequality, see inequality, H¨ older’s Hadamrd product, see product, Hadamard hard QoS support, see support, QoS, hard Hessian (matrix), 272, 315, 382 heuristic (algorithm), see algorithm, heuristic high SIR regime, see regime, high SIR hyperplane, 154, 159 i.i.d., see independent identically distributed idempotent matrix, see matrix, idempotent implementation centralized, 110, 128
417
decentralized, see implementation, distributed distributed, 128, 211, 227, 299, 301, 311, 341 independent identically distributed, 16, 93 independent identically distributed, 95 inequality Cauchy–Schwarz, 47, 94, 349 geometric-arithmetic-mean, 387 H¨ older’s, 18, 71, 349 triangle, 14 infeasible SIR region, see region, infeasible SIR infimum, 178, 380 information channel state, 91 mutual, 14, 108 information theory, see theory, information injection, see function, injective interference co-channel, 81 intersymbol, 89, 95 multiple access, 96, 167, 228 self-, 96, 167 strong, 96, 135, 136 interference cancellation, 100 interference factor, see factor, interference interference function, see function, interference interference management, see management, interference interference power, see power, interference interference uncertainty, 177 interference-isolated subnetwork, see subnetwork, interference-isolated interference-limited network/scenario, see network, interference-limited interior (of a set), 16, 154, 377 interior point methods, see method(s), interior point intermodulation, 105 Internet, 134 intersection (of sets), 37, 205, 292 intersymbolinterference, see interference, intersymbol
418
Index
interval closed, 378 open, 378 symbol, 100, 128, see also slot (of time) irreducibility, see matrix, irreducible irreducibility (of a matrix), see matrix, irreducible isolated saddle point, see point, isolated saddle iteration conditional Newton, 330, 341 fixed-point, see algorithm, fixed-point Lagrange–Newton (reduced), 330 Newton, 190, 341, see also method(s), Newton-based power control, see algorithm, power control primal-dual, see method(s), primaldual unconstrained, 320, 322, 329 Jacobian (matrix), 149, 189, 190 Jensen’s inequality, 8 joint power and receiver control, see control, joint power and receiver joint power control and beamforming, 114 joint power control and link scheduling, see policy, joint power control and link scheduling JPCLS, see policy, joint power control and link scheduling Karush–Kuhn–Tucker conditions, see condition(s), Kuhn–Tucker KKT conditions, see condition(s), Kuhn–Tucker KLD, see Kullback-Leibler divergence Kronecker delta, 39, 44 Kuhn–Tucker conditions, see condition(s), Kuhn–Tucker Kuhn–Tucker point, see point, Kuhn–Tucker Kuhn–Tucker theorem, see theorem, Kuhn–Tucker Kullback–Leibler divergence, 14 Lagrange multiplier, 125, 147, 395
Lagrange–Newton (reduced) iteration/method, see iteration, Lagrange–Newton (reduced) Lagrangian, see function, Lagrangian Lagrangian duality, see duality, Lagrangian Lagrangian optimization theory, see theory, Lagrangian optimization Landau symbol, see symbol, Landau large deviations, 15 large deviations principle, 16 law of diminishing returns, 120 layer DLC, 87, 96 MAC, 88, 160 network, 135 physical, 90, 91 transport, 132 LDP, see large deviations principle least square estimation/estimate, see estimation, least square level interference, 89, 90 QoS, 81 SIR, 95, 210, see also target, SIR;requirement, SIR threshold (of a rate function), 108, 109 linear (equation) system, see system of linear equations linear (quotient/root) convergence, see convergence, linear (quotient/root) linear receiver, see receiver, linear link scheduling, 86 link capacity, 126 link scheduling, 90, 136, 160, see also policy, link scheduling link scheduling strategy, see policy, link scheduling link(s), 48, 85 active, 92, 107 best-effort, 170, 212 bidirectional, 85 bottleneck, 122 idle, 92, 107 logical, 86, 87 orthogonal, 96, 192 physical, 86
Index QoS, 170, 212, 218 wireless, 85, 167 Lipschitz constant, 271 Lipschitz constant, 272 Lipschitz continuity, see function, Lipschitz continuous local convergence, see convergence, local locally solvable problem, see problem, locally solvable log-concavity, see function, log-concave log-convexity, see function, log-convex log-SIR fair power allocation/vector, see vector, log-SIR fair power logical queue, see queue, logical logical receiver, see receiver, logical logical transmitter, see transmitter, logical longest queue first (policy), 125 low SIR regime, see regime, low SIR M/M/1 queuing system, 157 MAC, see control, medium access MAC layer, see layer, MAC MAC layer fairness, see fairness, MAC layer MAC layer flow, see flow, MAC layer majorization, 145, 386 management interference, 82, 90 time slot, 128 map, see mapping and function mapping, see also function exponential, 45 linear, 37 SIR-QoS, 150 Markov chain, 16, 17 finite-state, 15 martingale, 286 martingale difference, 286 martingale difference, 286 matched filter, see receiver, matchedfilter matched-filter receiver, see receiver, matched-filter matrix traceless, 13 adjacency, 49 block-irreducible, 57, 235, 365
419
circulant, 10 diagonal, 6, 315, 350, 356 extended gain, 180 gain, 97, 115, 117, 143 Hermitian, 357 idempotent, 354, 373 inverse, 350 invertible, 350 irreducible, 3, 143, 359 irreducible stochastic, 5 lower triangular, 350, 364 M-, 373 nonnegative, 51, 358 orthogonal, 355 permutation, 48, 350, 359 positive, 357, 363 positive definite, 190, 356 positive semidefinite, 36, 356 probability transition, 15 random, 28 receiver, 224, 225 reducible, 51, 248, 358 simple, 353 skew-symmetric, 168 spatial covariance, 112 stochastic, 53, 358 symmetric, 35, 355, 356 transpose, 351 unitary, 103, 357 upper triangular, 350 upper triangular, 101 max-min fair policy, see policy, max-min fair max-min fair rate allocation/vector, see vector, max-min fair max-min fairness, see fairness, max-min max-min SIR (balancing) power control, see control, max-min SIR power max-min SIR (balancing) power control algorithm, see algorithm, power control max-min SIR (balancing) problem, see problem, max-min SIR max-min SIR(-balanced) power allocation/vector, see vector, max-min SIR power maximal subnetwork, see subnetwork, maximal
420
Index
maximization problem, see problem, maximization maximizer global, 217, 390 local, 227, 390, see also optimizer, local;minimizer, local unique, 156, 232, 379 mean arithmetic, 18, 387 empirical, 15, 17 geometric, 18, 387 measure fairness, 144 performance, 93, 105, 168 probability, 16 medium access control, see control, medium access method(s) barrier, 218, 308 dual, 302 gradient, see algorithm, gradient gradient projection, see algorithm, gradient projection interior point, 303, 308 Lagrange–Newton (reduced), see iteration, Lagrange–Newton (reduced) Newton-based, 183, 189 Newton-like, 189, 273, 391 numerical, 137 primal, 262, 308 primal-dual, 183, 262, 295, 308, 310, 319, 321 projected gradient, see algorithm, gradient projection min-max SIR power allocation/vector, see vector, min-max SIR power minimization problem, see problem, minimization minimizer global, 22, 390 local, 320, 321, 390, see also optimizer, local;maximizer, local unique, 185, 379 minimum valid power allocation/vector, see vector, minimum valid power mixer, 90 mobility, 87, 125 modulator/modulation, 90, 93
monotonicity, see function, monotonic multi-carrier (transmission), 105, see also orthogonal frequency division multiplexing multi-hop routing, see routing, multi-hop multi-user receiver, see receiver, multi-user multipath fading, see fading, multipath multiple access code division, 114 code division, 88, 98 frequency division, 88, 99 space division, 88 time division, 99 time division, 88 multiple access interference, see interference, multiple access multiple antennas, see antenna(s), multiple mutual information, see information, mutual nat, 110 nats per channel use, 127 nats per second, 110 nats per symbol, 110 network ad hoc (wireless), 81, 90 adjoint, 278, 280, 282 CDMA-based, 90, 107, 114 cellular (wireless), 106, 111, 114 communications, 63, see also network, wireless entirely coupled, 234 entirely coupled, 200, 231, 248 infrastructure of, 81 interference-limited, 107, 168, 179 mesh, 306 OFDM-based, 225 primal, 278, 282 reversed, 278, 280 saturated (wired), 120 wired, 119, 126, 130 wireless, 81, 104, 114, 115, 119 network controller, 89, 111 network topology, see topology, network network-centric approach, 168 Neumann series, 67
Index Neumann series, 253, 355 Newton iteration/algorithm, see iteration, Newton Newton-based method, see method(s), Newton-based node (of a graph/network) origin, 87 node (of a graph/network) destination, 86 node (of a graph/network), 48, 85 column, 49 origin, 106, 199 row, 49 source, 85, 134 noise additive white Gaussian, 99 additive white Gaussian, 94, 286, see also noise, background background, 94, 96, 107, 112 exogenous (estimation), 286 Gaussian, see noise, additive white Gaussian quantization, 187, 285 noiseless channel, see channel, noiseless noisy measurement, 340 noisy measurement, 287, 325 non-idling property, see property, non-idling nonnegativity (of a matrix), see matrix, nonnegative norm 2-, 271 l1 -, 62, 75 induced matrix, 351 matrix, 350 vector, 348 normal form (of a matrix), 52, 234, 364 objective, 168, 181 share, 90, 119 OFDM, see orthogonal frequency division multiplexing OFDM-based network, see network, OFDM-based one-to-one (map), 20, 95, 379 one-to-one correspondence, see function, bijective optimal link scheduling, see policy, optimal scheduling
421
optimizer global, 274, 390, see also minimizer, global local, 390, see also minimizer, local;maximizer, local orthogonal frequency division multiplexing, 93 orthogonal (signature) sequence, see sequence, orthogonal (signature) orthogonal frequency division multiplexing, 225 orthogonal space-time code/coding, see code, orthogonal space-time orthogonality (of links), see link(s), orthogonal packet (of bits), 85, 96 PAPR, see ratio, peak-to-average power parallelization, 276 Pareto optimal point, see point, Pareto optimal path attenuation/amplitude, 95 path attenuation/amplitude, 81 peak-to-average power ratio, see ratio, peak-to-average power pentagon, 165 per-flow queuing (policy), see queuing (policy), per-flow performance measure, see measure, performance permutation, 19, 102 Perron root, 3, 4, 25, 361 Perron–Frobenius theory, see theory, Perron–Frobenius physical layer, see layer, physical pilot-based estimation, see estimation, pilot-based point fixed, 190, 250 isolated saddle, 316, 399 Kuhn–Tucker, 285, 294, 296, 312, 313, 329, 395 f. maximal, 156, 248, 378 maximum, 156, 378 minimal, 378 minimum, 172, 181, 378 Pareto optimal, 156 saddle, 25, 209, 236, 316, 399
422
Index
SOSC, 307, 310, 314, see also condition(s), Second-Order Sufficiency start, 189, 272, 298 stationary, 133, 270, 312, 392 policy joint power control and link scheduling, 127 joint power control and link scheduling, 161 joint power control and link scheduling, 131, 135 joint power control and link scheduling, 126, 160 link scheduling, 90 link scheduling, 126–129, 134, 163 max-min fair, 122 optimal link scheduling, 38 optimal link scheduling, 76, 135, 160 power control, 82, 126, 134 power control, 211, 216, 221 throughput-optimal (MAC), 160 throughput-optimal (MAC), 131, 214, 242 polygon, 164 polyhedron, 105, 146, see also polytope polytope, 105, 130, 166 positive definiteness, see matrix, positive definite positive semidefiniteness, see matrix, positive semidefinite power interference, 95, 96, 148 received, 95, 282–285, 297 total transmit, 105, 106 transmit, 85, 95 power control, see control, power power control algorithm, see algorithm, power control power control iteration, see algorithm, power control power control problem, see problem, power control power control scheme, see algorithm, power control power control strategy, see policy, power control power control theory, see theory, power control
power gain, see gain, power power reciprocity, 279 power spectral density, 305 power vector/allocation, see vector, power pricing, 284 primal (optimization) problem, see problem, primal (optimization) primal algorithm/method, see method(s), primal primal variable, see variable, primal primal-dual algorithm/iteration, see method(s), primal-dual primal-dual power control, see control, primal-dual power probability distribution, see distribution, probability probability measure, see measure, probability problem convex (optimization), 233 convex (optimization), 155, 266, 291, 298, 320, 390 generalized power control, 304 globally solvable, 148 globally solvable, 396 globally solvable, 300, 307, 390 locally solvable, 396 locally solvable, 222, 390 max-min SIR, 193, 197, 198, 202, see also control, max-min SIR power maximization, 390 min-max (formulation of) power control, 327 min-max SIR power control, 238 min-max SIR power control, 237 minimization, 389 power control, 82, 206, 213 power control, 136, 146, 150, 154, 169, 182, 216–218, 224, 230, 237, 262, 266, 304 primal (optimization), 398 primal (optimization), 292, 309, 390 resource allocation, 119 resource allocation, 81 utility maximization, 139 utility maximization, 120, 230, 235 water-filling, 162 process
Index ergodic (stochastic), 94, 131 stochastic, 28 wide-sense stationary (stochastic), 94 product Cartesian, 275 Hadamard, 53, 350 inner, 168, 349 outer, 356 projected gradient algorithm/method, see algorithm, gradient projection projection (on a set), 392 cyclic, 292 property correlation (of signature sequences), 89 non-idling, 306 ray, 145, 379 SIR ray, 103, 107 proportional fairness, see fairness, proportional protocol cross-layer, 132 distributed handshake, 282, 325, 338 flooding, 261, 284 handshake, 299 MAC, 88, 135 routing, 86, 111 TCP Vegas, 133, 134 time sharing, 160, 165 pseudo-noise sequence, see sequence, pseudo-noise QoS, 81, 95, 150 link, 83 QoS domain, see domain, QoS QoS level, see level, QoS QoS requirement, see requirement, QoS QoS support, see support, QoS QoS vector, see vector, QoS QoS-based power control, see control, QoS-based power quadratic (quotient) convergence, see convergence, quadratic (quotient) quadrilateral, 165 quality of service, see QoS queue, 214, see also backlog logical, 87 queuing (policy) fair, 124
423
queuing (policy) per-flow, 87, 132 weighted fair, 134 quotient convergence, see convergence, quotient radio (propagation) channel, see channel, radio (propagation) range of a function/map, 379 range of a matrix transformation, 351 rank of a matrix, 351 rate bit error, 108, 211 code, 93 convergence, 185, 273, 391 data, 86, 107, 120, 127 flow, see rate, source link, 133, 135, 150, see also rate, data MAC layer fair, 136 minimum data, 85 source, 86, 120, 124 rate allocation/vector, see vector, rate rate control, see control, rate rate vector/allocation, 120 ratio peak-to-average power, 104 signal-to-interference, see SIR signal-to-interference-plus-noise, 94, see also SIR signal-to-noise, see SNR ray property, see property, ray ray property of SIR, see property, SIR ray Rayleigh fading, see fading, Rayleigh Rayleigh quotient, 103 real-time application, see application, real-time receive strategy, see strategy, receive received power, see power, received received signal, see signal, received receiver de-correlating, see receiver, zeroforcing linear, 91, 98, 102 linear SIC, 100–102 linear successive interference cancellation, see receiver, linear SIC logical, 87
424
Index
matched-filter, 99, 114 mismatched-filter, 99 multi-user, 98 nonlinear SIC, 101 optimal linear, 102, 188 single-user, 99 zero-forcing, 99 receiver matrix, see matrix, receiver receiver output, 94 reducibility (of a matrix), see matrix, reducible regime high SIR, 157, 159, 163, 242 low SIR, 157 low SIR, 163 noise-limited, 199 region (information-theoretic) capacity, 129 admissible power, 106, 129, 136, 146, 264, 303 feasible log-SIR, 242 feasible power, 212, 308 feasible rate, 130 feasible log-SIR, 244 feasible power, 170, 172, 181, 290 feasible QoS, 151, 152 feasible rate, 129 feasible SIR, 153, 164, 165 infeasible SIR, 164 valid power, 131, 170, 181, 212, 290 relative concavity (of utility), see concavity, relative (of utility) relative risk aversion (coefficient), see coefficient, relative risk aversion relay, 86, 111 requirement QoS, 95, 169, 170, 293 SIR, 172, see also SIR, target;level, SIR resource allocation problem, see problem, resource allocation root convergence, see convergence, root round-robin (policy), 124 route, 86 routing, 82 multi-hop, 82 single-path, 120, 125 saddle point, see point, saddle
saddle point characterization, see characterization, saddle point SDMA, see multiple access, space division Second-Order Sufficiency Conditions, see condition(s), Second-Order Sufficiency segment, see interval, open self-interference, see interference, selfsequence non-decreasing, 383 non-increasing, 383 nonorthogonal (signature), 89, 107 orthogonal (signature), 89 pseudo-noise, 227 semi-orthogonal (signature), 89 signature, 89, 114 spreading, see sequence, signature step size, 188, 287, 291 training, 277, 283 set bounded, 378 bounded above, 378 bounded below, 378 closed, 140, 377, 378 compact, 129, 378 connected, 142, 378 convex, 26, 131, 134, 383 downward comprehensive, 129 downward comprehensive, 106 empty, 30, 171 feasibility, 30, 45, 63 nonconvex, 37, 66 open, 26, 377 s-convex, see set, strictly convex strictly convex, 31, 75, 163 sublevel, 16, 37, 180 superlevel, 180 SIC receiver (linear), see receiver, linear SIC SIC receiver (nonlinear), see receiver, nonlinear SIC signal band-limited, 94 received, 94, 98, 112 transmit/transmission, 91, 93, 113 signal-to-interference ratio, see SIR
Index signal-to-interference-plus-noise ratio, see ratio, signal-to-interferenceplus-noise simplex, 4 standard, 4 singleton, 172, 173 SIR, 82, 94, 95, 97, 102, 103, 107, 112, 113, 115, 117, 127, 174, 223, 277 maximum, 103, 223 SIR requirement, see requirement, SIR size constant step, 270 step, 273, 274 window, 132 Slater’s condition, see condition(s), Slater’s slot (of time), 93, 110 SNR, 85, 99, 116 soft QoS support, see support, QoS, soft soft-decision variable, see variable, soft-decision solution positive (to a system of linear equations), 371 positive (to a system of linear equations), 63 positive (to a system of linear equations), 67 unique positive (to a system of linear equations), 69 SOSC, see condition(s), Second-Order Sufficiency source, 85 source node, see node, source source rate, see rate, source source rate allocation/vector, see vector, rate space lp , 348 (affine) half-, 105, 171, 348 column, 351, 354 Euclidean, 349 Hilbert, 349 normed vector, 348 row, 351 sample, 15 vector, 347
425
space division multiple access, see multiple access, space division space-time code/coding, see code, space-time spatial multiplexing (gain), see gain, spatial multiplexing spatial reuse, 89, 90 spectral radius, 4, 151, 354 spectrum (of a matrix), 353 sphere, 175, 223 spreading factor, see factor, spreading spreading sequence, see sequence, signature standard power control iteration/algorithm, see algorithm, power control start point, see point, start stationary point, see point, stationary step size, see size, step stochastic (recursive) algorithm, see algorithm, stochastic stochastic power control iteration/algorithm, see algorithm, power control strategy adaptive receive, 175 receive, 175 strict complementarity (condition), see condition(s), strict complementarity strong (Lagrangian) duality, see duality, strong (Lagrangian) strong interference, see interference, strong strongly connected directed graph, see graph, strongly connected directed strongly consistent estimation/estimator, see estimation, strongly consistent subcarrier, 225 subframe, 127, 128, see also frame (interval) subnetwork, 200 entirely coupled, 234 entirely coupled, 234, 248 interference isolated, 234 interference-isolated, 220, 234 maximal, 235 subscriber unit, 81
426
Index
superlinear convergence, see convergence, superlinear supermodular game, 285 support QoS, 210, 262 hard, 210, 212, 213, 289, 302, 308 soft, 213–215, 221, 289, 300 supremum, 242, 380 surjection, see function, surjective symbol data, 92–94 information-bearing, see symbol, data Landau, 379–380 transmit, 93 symbol epoch, see interval, symbol symbol interval, see interval, symbol synchronization coarse, 185, 284 perfect, 91, 222, 225 system load (effective), 185 system of linear equations, 61, 100 homogenous, 63 target feasible SIR, 211 feasible SIR, 171, 211 SIR, 169, 178, 212, 213, 216, see also level, SIR;requirement, SIR TCP Vegas (protocol), see protocol, TCP Vegas TDD, see duplex, time division TDMA, see multiple access, time division theorem Brouwer’s, 181, 182 Cramer’s, 16 Debreu’s, 314 G¨ artner-Ellis, 16 Kingman’s, 60 Kuhn–Tucker, 149, 297, 396 Perron, 196, 362 Perron–Frobenius, 4, 359 Sanov’s, 17 weak form of the Perron–Frobenius, 51, 363 theorem of Debreu, see theorem, Debreu’s theorem of Sanov, see theorem, Sanov’s theory
(Lagrangian) duality, 294, see also duality, Lagrangian duality (for multiple-antenna channels), 114 game, 180 information, 14 Lagrangian optimization, 292 Lagrangian optimization, 146, 395 large deviations, 15 majorization, 144, 386 Perron–Frobenius, 3, 357 power control, 114 power control, 83 throughput, 119 approximate, 343 maximum, 125, 216 total, 123, 141 throughput-optimal (MAC) policy, see policy, throughput-optimal (MAC) time (channel) coherence, 91, 227 customer, 157 time division duplex, see duplex, time division time division multiple access, see multiple access, time division time sharing (protocol), see protocol, time sharing topology network, 85, 230 star, 111 string, 144 total power constraint, see constraint(s), sum (power) total transmit power, see power, total transmit trace (of a matrix), 351 traffic elastic, 121 traffic class, 323, 340 transmit power, see power, transmit transmit signal, see signal, transmit/transmission transmit symbol, see symbol, transmit transmitter, 87, 98 effective, see vector, effective transmit logical, 86, 87 transpose, see matrix, transpose
Index union (of sets), 77, 129 uplink (channel), 111, 113, 114 user-centric approach, 122 user-centric approach, 169, 180, 305 utility, see function, utility utility maximization problem, see problem, utility maximization utility-based power control, see control, utility-based power utility-based power control with QoS support, see support, QoS valid power region, see region, valid power valid power vector/allocation, see vector, valid power variable discrete random, 14 dual, 133, 304, 312, 395 primal, 289, 292, 395 random, 14, 92, 286 soft-decision, 94, 98, 101, 112, 115, 117 variational principle for pressure, 59 vector (w, α)-proportionally fair rate, 123, 124 admissible power, 303 admissible power, 337 admissible power, 105, 170, 328 beamforming, 111, 113, 114, 184, 225 effective transmit, 98–100, 115, 117 extended power, 179, 196 feasible power, 181 feasible power, 106, 170, 171, 178, 289 feasible QoS, 150, 174, 178, 198 feasible rate, 123 log-SIR fair power, 244 log-SIR fair power, 241 logarithmic power, 264, 290 max-min fair power, 248 f.
427
max-min fair rate, 122, 141–143, 214 max-min SIR power, 174, 191–208, 214, 230–240 min-max SIR power, 238 min-max SIR power, 238, 240 minimum valid power, 181 minimum valid power, 182, 185, 190 noise, 107 noise (effective), 115 noise (effective), 97 power, 95, 128, 136 proportional fair rate, 122 QoS, 150, 154, 170, 181, 191 random, 16, 287 valid power, 170, 178, 289 weight, 135, 217 vertex (of a graph), 85, see also node (of a graph/network) voice application, see application, voice water-filling (problem), see problem, water-filling weak (Lagrangian) duality, see duality, weak (Lagrangian) weak convergence, see convergence, weak weight vector, see vector, weight wide-sense stationarity, see process, wide-sense stationary (stochastic) window size, see size, window window-based congestion control, see control, congestion wired network/networking, see network, wired wireless communications standard, 81 wireless channel, see channel, wireless wireless network/networking, see network, wireless zero-forcing (receiver), see receiver, zero-forcing