Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, TX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
4362
Jan van Leeuwen Giuseppe F. Italiano Wiebe van der Hoek Christoph Meinel Harald Sack František Plášil (Eds.)
SOFSEM 2007: Theory and Practice of Computer Science 33rd Conference on Current Trends in Theory and Practice of Computer Science Harrachov, Czech Republic, January 20-26, 2007 Proceedings
13
Volume Editors Jan van Leeuwen Utrecht University 3584 Utrecht, The Netherlands E-mail:
[email protected] Giuseppe F. Italiano Università di Roma “Tor Vergata” 00133 Roma, Italy E-mail:
[email protected] Wiebe van der Hoek University of Liverpool Liverpool, L69 3BX, UK E-mail:
[email protected] Christoph Meinel University of Potsdam D-14440 Potsdam, Germany E-mail:
[email protected] Harald Sack Friedrich-Schiller-Universität Jena Jena, Germany E-mail:
[email protected] František Plášil Charles University 11800 Prague, Czech Republic E-mail:
[email protected] Library of Congress Control Number: 2006939388 CR Subject Classification (1998): F.2, F.1, D.2, H.3, H.2.8, H.4, F.3-4 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues ISSN 0302-9743 ISBN-10 3-540-69506-0 Springer Berlin Heidelberg New York ISBN-13 978-3-540-69506-6 Springer Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2007 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 11970354 06/3142 543210
Preface
This volume contains the invited and the contributed papers selected for presentation at SOFSEM 2007, the 33rd Conference on Current Trends in Theory and Practice of Computer Science, held January 20–26, 2007 in Hotel Skl´aˇr, Harrachov, in the Czech Republic. SOFSEM (originally SOFtware SEMinar) aims to foster cooperation among professionals from academia and industry working in all modern areas of computer science. Developing over the years from a local event to a fully international and well-established conference, contemporary SOFSEM continues to maintain the best of its original Winter School aspects, such as a high number of invited talks and an in-depth coverage of novel research results in selected areas within computer science. SOFSEM 2007 was organized around the following four tracks: – – – –
Foundations of Computer Science (Track Chair: Giuseppe F. Italiano) Multi-Agent Systems (Track Chair: Wiebe van der Hoek) Emerging Web Technologies (Track Chairs: Christoph Meinel, Harald Sack) Dependable Software and Systems (Track Chair: Frantiˇsek Pl´aˇsil)
The SOFSEM 2007 Program Committee consisted of 69 international experts from 21 different countries, representing the respective areas of the SOFSEM 2007 tracks with outstanding expertise and an eye for current developments. An integral part of SOFSEM 2007 was the traditional Student Research Forum (Chair: M´ aria Bielikov´ a), organized with the aim to present student projects in the theory and practice of computer science and to give students feedback on both the originality of their scientific results and on their work in progress. The papers presented at the Student Research Forum were published in a separate local proceedings. In response to the call for papers, SOFSEM 2007 received a record number of 283 submissions. After a careful reviewing process (with three reviewers per paper), followed by a detailed electronic preselection procedure within each track, a thorough evaluation and final selection was made during the PC meeting held on September 22, 2006, at the Institute of Computer Science of the Academy of Sciences of the Czech Republic in Prague. A total of 71 papers (less than 25%) with authors coming from 47 countries covering Europe, Asia and America(s) were selected for presentation at SOFSEM 2007, following the strictest criteria of quality and originality. Two papers were withdrawn, leading to 69 papers for SOFSEM 2007. In addition, these proceedings contains full texts or extended abstracts of all invited papers. Furthermore, 15 student papers were selected for the SOFSEM 2007 Student Research Forum, based on the PC members’ recommendation and the approval by the PC and Track Chairs.
VI
Preface
As editors of these proceedings, we are indebted to all the contributors to the scientific program of the conference, especially to the invited speakers and all authors of contributed papers. We also thank all authors who responded promptly to our requests for minor modifications and corrections in their manuscripts. SOFSEM 2007 is the result of a considerable effort by a number of people. It is a great pleasure to express our special thanks to: – The SOFSEM 2007 Program Committees for the four tracks and all additional referees who did an extraordinary job in reviewing a large number of assigned submissions (in average about 12 submissions per PC member) – The Executive Editor of Springer’s LNCS series, Alfred Hofmann, for his continuing confidence in the SOFSEM conference ˇ – The SOFSEM 2007 Organizing Committee chaired by Martin Rimn´ aˇc of the Institute of Computer Science (ICS), Prague, for the smooth preparation of the conference ˇ – The SOFSEM Steering Committee chaired by J´ ulius Stuller, also of the ICS in Prague, for its excellent guidance and support of all operations of the PC and the reviewing process Special thanks also go to: – Hana B´ılkov´ a from the ICS, who did an excellent job in the editing and completion of these proceedings ˇ anek from the ICS for running the SOFSEM 2007 Submission and – Roman Sp´ Review System, which was invaluable for the work of the PC and which was instrumental in preparing a smooth final PC meeting in Prague Finally, we highly appreciate the financial support of our sponsors (ERCIM, SOFTEC and others) who assisted with the invited speakers and helped the organizers to offer lower student fees. We thank the ICS for its hospitality during the PC meeting and for providing many additional forms of help and support. We hope that these proceedings offer the reader a representative and instructive view of the state of the art in research in the beautiful scientific areas selected for SOFSEM 2007. November 2006
Jan van Leeuwen Giuseppe F. Italiano Wiebe van der Hoek Christoph Meinel Harald Sack Frantiˇsek Pl´aˇsil
SOFSEM 2007 Committees
Steering Committee ˇ J´ ulius Stuller, Chair M´ aria Bielikov´a Bernadette Charron-Bost Keith G. Jeffery Anton´ın Kuˇcera Branislav Rovan Petr T˚ uma
Institute of Computer Science, Prague, Czech Republic Slovak University of Technology in Bratislava, Slovak Republic Ecole Polytechnique, France CLRC RAL, Chilton, Didcot, Oxon, UK Masaryk University, Brno, Czech Republic Comenius University, Bratislava, Slovak Republic Charles University in Prague, Czech Republic
Program Committee Jan van Leeuwen, Chair Giuseppe F. Italiano, Co-chair Wiebe van der Hoek, Co-chair Christoph Meinel, Co-chair Harald Sack, Co-chair Frantiˇsek Pl´aˇsil, Co-chair Thomas Agotnes Natascha Alechina Leila Amgoud Nicolas Beldiceanu Hans Bodlaender Veronique Bruyere Christiano Castelfranchi Vincent Conitzer Stephen Cranefield ˇ Ivana Cern´ a John Debenham Keith Decker Camil Demetrescu Yefim Dinitz Jonathan Feinberg Marcelo Fiore Chiara Ghidini Joseph Gorman Roberto Grossi
University of Utrecht, The Netherlands University of Rome “Tor Vergata”, Italy University of Liverpool, UK Hasso Platner Institute, Germany Friedrich Schiller University Jena, Germany Charles University, Prague, Czech Republic Bergen University College, Bergen, Norway University of Nottingham, UK CNRS, Toulouse, France Ecole de Mines de Nantes, France University of Utrecht, The Netherlands University of Mons-Hainaut, Mons, Belgium ISTC, Rome, Italy Carnegie Mellon University, Pittsburgh, USA Otago, New Zealand Masaryk University Brno, Czech Republic University of Technology, Sydney, Australia University of Delaware, Newark, USA University of Rome “La Sapienza”, Italy Ben-Gurion University, Beer-Sheva, Israel Universitat Politecnica de Catalunya, Barcelona, Spain University of Cambridge, UK ITC-irst, Trento, Italy Sintef, Norway University of Pisa, Italy
VIII
SOFSEM 2007 Committees
Paul Harrenstein Andreas Herzig Mika Hirvensalo Petr Hnˇetynka Valerie Issarny Gal Kaminka Ludˇek Kuˇcera Kim Larsen Diego R. Lopez Ulrike Lucke Yishay Mansour Tiziana Margaria Vladimir Mencl Uli Meyer John-Jules Meyer Petra Mutzel Sotiris Nikoletseas Eric Pacuit Cesare Pautasso Michal Pˇechouˇcek Gabriella Pigozzi Iman Poernomo Ralf Reussner Nicolas Rivierre Juan Antonio Rodriguez Jose Rolim Partha S. Roop Francesca Rossi Branislav Rovan Vladimiro Sassone Stefan Schlobach Nahid Shahmehri Albrecht Schmidt Michael Stal Rudi Studer Freek Stulp Gerd Stumme Francois Taiani Andrzej Tarlecki Robert Tolksdorf
Ludwig-Maximilians-Universit¨at M¨ unchen, Germany CNRS, Toulouse, France University of Turku, Finland University College Dublin, Ireland Inria Rocquencourt, France Bar Ilan University, Israel Charles University, Czech Republic Aalborg University, Denmark RedIRIS, Spain University of Rostock, Germany Tel Aviv University, Israel Universit¨ at Potsdam, Germany United Nations University, Macao Max-Planck-Institut Informatik, Germany Utrecht University, The Netherlands University of Dortmund, Germany Patras University, Greece Universiteit van Amsterdam, The Netherlands ETH Zurich, Switzerland Czech Technical University, Czech Republic King’s College London, UK King’s College London, UK Universit¨ at Karlsruhe, Germany France Telecom, France Artificial Intelligence Research Institute, Barcelona, Spain University of Geneva, Switzerland University of Auckland, New Zealand University of Padova, Italy Comenius University, Bratislava, Slovak Republic University of Southampton, UK Vrije Universiteit Amsterdam, The Netherlands Link¨ opings Universitet, Sweden University of Munich, Germany Siemens, Germany University of Karlsruhe, Germany TU M¨ unchen, Germany University of Kassel, Germany Lancaster University, UK Warsaw University, Poland Freie University Berlin,Germany
SOFSEM 2007 Committees
Dorothea Wagner Jiˇr´ı Wiedermann Steven Willmott
University of Karlsruhe, Germany Institute of Computer Science, Czech Republic Universitat Politecnica de Catalunya, Barcelona, Spain
Additional Referees Jiˇr´ı Ad´ amek Sudhir Agarwal Fabio Aiolli Ana Karla Alves de Medeiros Richard Atterer Mira Balaban Fabien Baligand Holger Bast Michael Baur Marek A. Bednarczyk Marc Benkert Piergiorgio Bertoli Marc Bezem Meghyn Bienvenu Stefano Bistarelli Stephan Bloehdorn Sebastian Blohm Paolo Bouquet Gregor Broll Ken Brown Lubom´ır Bulej Paolo Busetta Gianguglielmo Calvi Mauro Caporuscio Jesus Cerquides Kenneth Chan Li-Te Cheng Alessandra Cherubini Jacek Chrz¸aszcz Hung Dang Van Anuj Dawar Tiago De Lima Alexander de Luca Ronald de Wolf Christian Decker Olivier Delgrange Daniel Delling Sophie Demassey
IX
Konstantinos Efstathiou Khaled Elbassioni Ulle Endriss Marc Esteva Elsa Estevez Dave Eyers Alfredo Ferro Pablo Fillotrani Emmanuel Fleury Luk´aˇs Folt´ yn Noria Foukia Manel Fredj Dorian Gaertner Ombretta Gaggi Leszek Gasieniec Benoit Gaudou Viliam Geffert Andrea Giovannucci Christos Gkantsidis Miranda Grahl Hristo Gregoriev Lindsay Groves Carsten Gutwenger Robert G¨ orke Peter Haase Ofer Hadar Andreas Haeberlen Nabil Hameurlain Tero Harju Jens Hartmann Dag Haugland Nils Hebbinghaus Paul Holleis Martin Holzer Fran¸cois Horn Zhiyi Huang Joris Hulstijn Kenneth Iman
X
SOFSEM 2007 Committees
Da Deng Pavol Duris Nurzhan Duzbayev Jana Dvoˇra´kov´ a Galina Jir´ askov´a Tom´aˇs Kalibera Lucia Kapov´ a Jarkko Kari Bastian Katz Steffen Kern Friederike Klan Karsten Klein Jan Kofroˇ n Stelios Kotsios Ivan Kotuliak Maciej Koutny Marcin Kowalczyk L ukasz Kowalik Miroslaw Kowaluk Rastislav Kr´aloviˇc Michal Kr´ atk´ y Uwe Kr¨ uger Markus Kr¨ otzsch Maciej Kurowski Morten K¨ uhnrich Ulrich K¨ uster Birgitta K¨ onig-Ries Willem Labuschagne Steffen Lamparter Till Christopher Lech Ghislain Lemaur Jin-Jang Leou Alexei Lisitsa Zhiming Liu Brian Logan Dominique Longin Thibaut Lust Grzegorz Marczy´ nski Steffen Mecke Fr´ed´eric Messine Marius Mikucionis David Millen Sonia Ben Mokhtar Aurelien Moreau Ben Moszkowski Martin Mundhenk
Vaida Jakoniene Emmanuel Jeandel Jan Jacob Jessen Pavel Jeˇzek Yoosoo Oh Alexander Okhotin Alain Ozanne Ignazio Palmisano Dana Pardubsk´ a Wolfgang Paul Radek Pel´ anek Loris Penserini Heiko Peter Ion Petre Ulrich Pferschy Hans Philippi Cees Pierik Giovanni Pighizzini Tom´ aˇs Plachetka Iman Poernomo Chan Poernomo Eleftherios Polychronopoulos Saulius Pusinskas Bert Randerath Pierre-Guillaume Raverdy Jakob Rehof Partha Roop Wilhelm Rossak Olivier Roy Wojciech Rutkowski Ignaz Rutter ˇ ak Martin Reh´ Amir Sapir Christian Scheideler Ralf Schenkel Elad Schiller Christoph Schmitz Thomas Schneider Ralph Schoenfelder ˇ y Ondˇrej Ser´ Jeffrey Shallit ˇ sl´ak David Siˇ Tom´ aˇs Skopal Martin Skutella Roberto Speicys Cardoso Samuel Staton
SOFSEM 2007 Committees
Chet Murthy Mariusz Nowostawski Ulrik Nyman Kazuihro Ogata Patrick Stuedi Alexander Sverdlov ˇ sa Jiˇr´ı Simˇ He Tan Nicolas Tang Sergio Tessaris Sophie Tison Nicolas Troquard Petr T˚ uma Andy Twigg Timur Umarov Jurriaan van Diggelen Kristof Van Laerhoven M. Birna van Riemsdijk Frank van Ham Igor Vaynerman Martin Vechev Marinus Veldhorst
Bernhard Steffen Volker Stolz Jan Strejˇcek Lena Str¨ omb¨ ack Sicco Verwer Nicolas Villar Kyriakos Vlachos Johanna Voelker Jiˇr´ı Vokˇr´ınek Denny Vrandecic Imrich Vrt’o J¨ org Waitelonis Bruce Watson Martin Wattenberg Marco Wiering Alexander Wolff Feng Wu Artur Zawlocki Weihua Zhuang Barbora Zimmerov´a Floriano Zini Michele Zito
XI
Organization
33nd SOFSEM 2007 was organized by: Institute of Computer Science, Academy of Sciences of the Czech Republic, Prague Action M Agency, Prague
Organizing Committee ˇ Martin Rimn´ aˇc, Chair Hana B´ılkov´ a ˇ anek Roman Sp´ Zdenka Linkov´ a Milena Zeithamlov´ a ˇ J´ ulius Stuller
Institute of Computer Science, Prague, Czech Republic Institute of Computer Science, Prague, Czech Republic Institute of Computer Science, Prague, Czech Republic Institute of Computer Science, Prague, Czech Republic Action M Agency, Prague, Czech Republic Institute of Computer Science, Prague, Czech Republic
Sponsoring Institutions ERCIM – European Research Consortium for Informatics and Mathematics SOFTEC Bratislava
Supporting Projects 33nd SOFSEM 2007 was partly supported by the following projects: Project 1ET100300419 of the Program Information Society (of the Thematic Program II of the National Research Program of the Czech Republic) “Intelligent Models, Algorithms, Methods and Tools for the Semantic Web Realisation”. Institutional Research Plan AV0Z10300504 “Computer Science for the Information Society: Models, Algorithms, Appplications”.
Table of Contents
Invited Talks Graphs from Search Engine Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ricardo Baeza-Yates
1
Model-Checking Large Finite-State Systems and Beyond . . . . . . . . . . . . . . Luboˇs Brim and Mojm´ır Kˇret´ınsk´y
9
Interaction and Realizability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manfred Broy
29
A Short Introduction to Computational Social Choice . . . . . . . . . . . . . . . . Yann Chevaleyre, Ulle Endriss, J´erˆ ome Lang, and Nicolas Maudet
51
Distributed Models and Algorithms for Mobile Robot Systems . . . . . . . . . Asaf Efrima and David Peleg
70
Point-to-Point Shortest Path Algorithms with Preprocessing . . . . . . . . . . . Andrew V. Goldberg
88
Games, Time, and Probability: Graph Models for System Design and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas A. Henzinger
103
Agreement Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nicholas R. Jennings
111
Automatic Testing of Object-Oriented Software . . . . . . . . . . . . . . . . . . . . . . Bertrand Meyer, Ilinca Ciupa, Andreas Leitner, and Lisa Ling Liu
114
Architecture-Based Reasoning About Performability in Component-Based Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Heinz W. Schmidt Multimedia Retrieval Algorithmics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remco C. Veltkamp
130 138
Foundations of Computer Science Size of Quantum Finite State Transducers . . . . . . . . . . . . . . . . . . . . . . . . . . . Ruben Agadzanyan and R¯ usi¸ nˇs Freivalds Weighted Nearest Neighbor Algorithms for the Graph Exploration Problem on Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuichi Asahiro, Eiji Miyano, Shuichi Miyazaki, and Takuro Yoshimuta
155
164
XVI
Table of Contents
Straightening Drawings of Clustered Hierarchical Graphs . . . . . . . . . . . . . . Sergey Bereg, Markus V¨ olker, Alexander Wolff, and Yuanyi Zhang Improved Upper Bounds for λ-Backbone Colorings Along Matchings and Stars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hajo Broersma, Bert Marchal, Daniel Paulusma, and A.N.M. Salman About the Termination Detection in the Asynchronous Message Passing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J´er´emie Chalopin, Emmanuel Godard, Yves M´etivier, and Gerard Tel Fast Approximate Point Set Matching for Information Retrieval . . . . . . . Rapha¨el Clifford and Benjamin Sach A Software Architecture for Shared Resource Management in Mobile Ad Hoc Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Orhan Dagdeviren and Kayhan Erciyes
176
188
200
212
224
Compressed Prefix Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O’Neil Delpratt, Naila Rahman, and Rajeev Raman
235
On Optimal Solutions for the Bottleneck Tower of Hanoi Problem . . . . . . Yefim Dinitz and Shay Solomon
248
Competitive Maintenance of Minimum Spanning Trees in Dynamic Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miroslaw Dynia, Miroslaw Korzeniowski, and Jaroslaw Kutylowski Exact Max 2-Sat: Easier and Faster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin F¨ urer and Shiva Prasad Kasiviswanathan Maximum Finding in the Symmetric Radio Networks with Collision Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frantiˇsek Galˇc´ık and Gabriel Semaniˇsin An Approach to Modelling and Verification of Component Based Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gregor G¨ ossler, Sussane Graf, Mila Majster-Cederbaum, M. Martens, and Joseph Sifakis Improved Undecidability Results on the Emptiness Problem of Probabilistic and Quantum Cut-Point Languages . . . . . . . . . . . . . . . . . . . . Mika Hirvensalo On the (High) Undecidability of Distributed Synthesis Problems . . . . . . . David Janin
260 272
284
295
309 320
Table of Contents
XVII
Maximum Rigid Components as Means for Direction-Based Localization in Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bastian Katz, Marco Gaertler, and Dorothea Wagner
330
Online Service Management Algorithm for Cellular/WALN Multimedia Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sungwook Kim and Sungchun Kim
342
A Simple Algorithm for Stable Minimum Storage Merging . . . . . . . . . . . . Pok-Son Kim and Arne Kutzner
347
Generating High Dimensional Data and Query Sets . . . . . . . . . . . . . . . . . . Sang-Wook Kim, Seok-Ho Yoon, Sang-Cheol Lee, Junghoon Lee, and Miyoung Shin
357
Partial vs. Complete Domination: t-Dominating Set . . . . . . . . . . . . . . . . . . Joachim Kneis, Daniel M¨ olle, and Peter Rossmanith
367
Estimates of Data Complexity in Neural-Network Learning . . . . . . . . . . . . Vˇera K˚ urkov´ a
377
Concurrent and Located Synchronizations in π-Calculus . . . . . . . . . . . . . . Ivan Lanese
388
Efficient Group Key Agreement for Dynamic TETRA Networks . . . . . . . . Su Mi Lee, Su Youn Lee, and Dong Hoon Lee
400
Algorithmic Aspects of Minimum Energy Edge-Disjoint Paths in Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Markus Maier, Steffen Mecke, and Dorothea Wagner The Pk Partition Problem and Related Problems in Bipartite Graphs . . . J´erˆ ome Monnot and Sophie Toulouse
410 422
Spatial Selection of Sparse Pivots for Similarity Search in Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oscar Pedreira and Nieves R. Brisaboa
434
A Model of an Amorphous Computer and Its Communication Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luk´ aˇs Petr˚ u and Jiˇr´ı Wiedermann
446
A Branch-and-Bound Algorithm to Solve Large Scale Integer Quadratic Multi-Knapsack Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dominique Quadri, Eric Soutif, and Pierre Tolla
456
Indexing Factors with Gaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Sohel Rahman and Costas S. Iliopoulos
465
XVIII
Table of Contents
Information Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joel Ratsaby
475
Deterministic Simulation of a NFA with k–Symbol Lookahead . . . . . . . . . Bala Ravikumar and Nicolae Santean
488
Mobility Management Using Virtual Domain in IPv6-Based Cellular Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jae-Kwon Seo and Kyung-Geun Lee Restarting Tree Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Heiko Stamer and Friedrich Otto A Polynomial Time Constructible Hitting Set for Restricted 1-Branching Programs of Width 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ˇıma and Stanislav Z´ ˇak Jiˇr´ı S´ Formal Translation Directed by Parallel LLP Parsing . . . . . . . . . . . . . . . . Ladislav Vagner and Boˇrivoj Melichar Self-adaptive Lagrange Relaxation Algorithm for Aggregated Multicast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hua Wang, Zuquan Ge, and Jun Ma
498 510
522 532
544
A Language for Reliable Service Composition . . . . . . . . . . . . . . . . . . . . . . . . Qingjun Xiao, Ruonan Rao, and Jinyuan You
554
Operational Semantics of Framed Temporal Logic Programs . . . . . . . . . . . Xiaoxiao Yang and Zhenhua Duan
566
Constraints for Argument Filterings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Harald Zankl, Nao Hirokawa, and Aart Middeldorp
579
Multi-agent Systems Performance Analysis of a Multiagent Architecture for Passenger Transportation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Claudio Cubillos, Franco Guidi-Polanco, and Ricardo Soto Teacher-Directed Learning with Mixture of Experts for View-Independent Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reza Ebrahimpour, Ehsanollah Kabir, and Mohammad Reza Yousefi FTTH-Enhanced Mini-System mTBCP-Based Overlay Construction and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mi-Young Kang, Omar F. Hamad, Choung-Ung Pom, and Ji-Seung Nam
591
601
612
Table of Contents
XIX
On Efficient Resource Allocation in Communication Networks . . . . . . . . . Michal Karpowicz and Krzysztof Malinowski
624
Protecting Agent from Attack in Grid ComputingIII . . . . . . . . . . . . . . . . . . Byungryong Kim
636
Incremental Learning of Planning Operators in Stochastic Domains . . . . Javad Safaei and Gholamreza Ghassem-Sani
644
Competitive Contract Net Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jiˇr´ı Vokˇr´ınek, Jiˇr´ı B´ıba, Jiˇr´ı Hod´ık, Jarom´ır Vyb´ıhal, and Michal Pˇechouˇcek
656
Agent Oriented Methodology Construction and Customization with HDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xue Xiao, Zeng Zhifeng, and Cui Ying
669
Emerging Web Technologies Building an Ontological Base for Experimental Evaluation of Semantic Web Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peter Bartalos, Michal Barla, Gy¨ orgy Frivolt, Michal Tvaroˇzek, Anton Andrejko, M´ aria Bielikov´ a, and Pavol N´ avrat
682
Semantic Web Approach in Designing a Collaborative E-Item Bank System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Heung-Nam Kim, Ae-Ttie Ji, Soon-Geun Lee, and Geun-Sik Jo
693
A Hybrid Region Weighting Approach for Relevance Feedback in Region-Based Image Search on the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deok-Hwan Kim, Jae-Won Song, and Ju-Hong Lee
705
Rapid Development of Web Interfaces to Heterogeneous Systems . . . . . . . Jos´e Paulo Leal and Marcos Aur´elio Domingues
716
Enhancing Security by Embedding Biometric Data in IP Header . . . . . . . Dae Sung Lee, Ki Chang Kim, and Year Back Yoo
726
Runtime-Efficient Approach for Multiple Continuous Filtering in XML Message Brokers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyunho Lee and Wonsuk Lee A Semantic Peer-to-Peer Overlay for Web Services Discovery . . . . . . . . . . Yong Li, Fangchun Yang, Kai Shuang, and Sen Su Multi-document Summarization Based on Cluster Using Non-negative Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sun Park, Ju-Hong Lee, Deok-Hwan Kim, and Chan-Min Ahn
738 750
761
XX
Table of Contents
A Program Slicing Based Method to Filter XML/DTD Documents . . . . . Josep Silva
771
A Hybrid Approach for XML Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joe Tekli, Richard Chbeir, and Kokou Yetongnon
783
Personalized Presentation in Web-Based Information Systems . . . . . . . . . Michal Tvaroˇzek, Michal Barla, and M´ aria Bielikov´ a
796
Immune-Inspired Online Method for Service Interactions Detection . . . . . Jianyin Zhang, Fangchun Yang, Kai Shuang, and Sen Su
808
Dependable Software and Systems Separation of Concerns and Consistent Integration in Requirements Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xin Chen, Zhiming Liu, and Vladimir Mencl
819
Checking Interaction Consistency in MARMOT Component Refinements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yunja Choi
832
Towards a Versatile Contract Model to Organize Behavioral Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Philippe Collet, Alain Ozanne, and Nicolas Rivierre
844
Improved Processing of Textual Use Cases: Deriving Behavior Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaroslav Drazan and Vladimir Mencl
856
A Dialogue-Based NLIDB System in a Schedule Management Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Harksoo Kim
869
Experimental Assessment of the Practicality of a Fault-Tolerant System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jai Wug Kim, Jongpil Lee, and Heon Y. Yeom
878
A Polynomial-Time Checkable Sufficient Condition for Deadlock-Freedom of Component-Based Systems . . . . . . . . . . . . . . . . . . . . . Mila Majster-Cederbaum, Moritz Martens, and Christoph Minnameier Extracting Zing Models from C Source Code . . . . . . . . . . . . . . . . . . . . . . . . Tomas Matousek and Filip Zavoral Parameterised Extra-Functional Prediction of Component-Based Control Systems – Industrial Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ian D. Peake and Heinz W. Schmidt
888
900
911
Table of Contents
XXI
Explicit Connectors in Component Based Software Engineering for Distributed Embedded Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dietmar Schreiner and Karl M. G¨ oschka
923
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
935
Graphs from Search Engine Queries Ricardo Baeza-Yates Yahoo! Research Barcelona, Spain & Santiago, Chile
Abstract. Server logs of search engines store traces of queries submitted by users, which include queries themselves along with Web pages selected in their answers. Here we describe several graph-based relations among queries and many applications where these graphs could be used.
1
Introduction
Queries submitted to search engines convey implicit knowledge if we assume that most of the time user actions are meaningful. Hence, the challenge is to extract interesting relations from very large query logs. One natural starting point is to infer a graph from the queries. Another possibility, most frequent in previous research, is to define a similarity (or distance) function between queries. This also implies a graph based on this function. One drawback of defining a function is that is more difficult to understand why two queries are similar and in some degree we add artificial artifacts that can add noise to data that is already noisy. In this paper we explore relations between queries that are based on different sources of information like words in the query, clicked URLs in their answers, as well as their links or terms, as shown in Figure 1. For each source we define different sets and conditions on those sets that naturally generate a graph. Depending on each case, these graphs can be directed and/or weighted in the nodes queries q1
common words
q2
common clicked URL
q3
q4
link
common words Fig. 1. Different relations among queries
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 1–8, 2007. c Springer-Verlag Berlin Heidelberg 2007
2
R. Baeza-Yates
and/or edges. These weights are also natural in the sense that are related to the number of cases that fulfill a given condition. We start by covering and discussing previous work on query similarity, followed by the conceptual framework that we use. Next we present different induced graphs that capture different sources of information. We conclude by analyzing the proposed graphs and mentioning possible applications of these ideas.
2
Previous Work
Most of the work on query similarity is related to query expansion or query clustering. One early technique proposed by Raghavan and Sever [7] attempts to measure query similarity by determining differences in the ordering of documents retrieved in the answers. As this technique requires a total ordering in the document collection, the comparison of two rankings would require superlinear time. Considering the current size of the Web, this algorithm is not scalable. Later, Fitzpatrick and Dent [5], measured query similarity using the normalized set intersection of the top 200 documents in the answers for the queries. Again, this is not meaningful in the Web as the intersection for semantically similar queries that use different synonyms can and will be very small. Wen et al [9] proposed to cluster similar queries to recommend URLs to frequently asked queries of a search engine. They used four notions of query distance: (1) based on keywords or phrases of the query; (2) based on string matching of keywords; (3) based on common clicked URL’s; and (4) based on the distance of the clicked documents in some pre-defined hierarchy. Befferman and Berger [4] also proposed a query clustering technique based on distance notion (3). As the average number of words in queries is small (about two) and the number of clicks in the answer pages is also small [1], notions (1)-(3) are difficult to deal with in practice, because the distance matrices between queries generated by them are very sparse. This sparsity can be diminished by using larger query logs, which are not available to most researchers. Notion (4) needs a concept taxonomy and requires the clicked documents to be classified into the taxonomy as well, something that cannot be done in a large scale. Fonseca et al [6] present a method to discover related queries based on association rules. Here queries represent items in traditional association rules. The query log is viewed as a set of transactions, where each transaction represent a session in which a single user submits a sequence of related queries in a time interval. The method shows good results, however two problems arise. First, it is difficult to determine sessions of successive queries that belong to the same search process; and on the other hand, the most interesting related queries, those submitted by different users, cannot be discovered. This is because the support of a rule increases only if its queries appear in the same query session, and thus they must be submitted by the same user. Zaiane and Strilets [10] present seven different notions of query similarity. Three of them are mild variations of notion (1) and (3). The remainder notions consider the content and title of the URL’s in the result of a query. Their
Graphs from Search Engine Queries
3
approach is intended for a meta-search engine and thus none of their similarity measures consider user preferences in form of clicks stored in query logs. Baeza-Yates et al. [2,3] used the content of clicked Web pages to define a termweight vector model for a query. They consider terms in the URLs clicked after a query. Each term is weighted according to the number of occurrences of the query and the number of clicks of the documents in which the term appears. That is Pop(u , q ) × Tf(ti , u) q[i ] = maxt Tf(t, u) U R L u
where q is a query, u is a clicked URL, Pop(u, q) is the number of clicks for u in the sessions related to query q and Tf(t, u) is, as usual, the number of occurrences of term t in the text of the URL u. Notice that Pop plays the role of Idf in the well-known tf-idf weighting scheme for the vector model. Then the similarity of two queries is equivalent to the similarity of their vector representations, like the cosine function. This notion of query similarity has several advantages. First, it is simple and easy to compute. On the other hand, it allows to relate queries that happen to be worded differently but stem from the same topic. Therefore, semantic relationships among queries are captured. Recently, Sahami [8] used a query similarity based on the snippets of the answers to the queries (e.g. for example the first page of results). For that they treat each snippet as a query to the search engine in order to find a certain number of documents that contain the terms in the original snippets. Then, they use these returned documents to create a context vector for the original snippet. However, the main drawback is that this approach does not consider the feedback of the users (i.e. clicked pages). In this paper we are interested in exploiting the sources of information that come directly from user actions in the search engine or user generated data such as text content, hyperlinks and their anchor text, metadata, etc.
3
Basic Concepts
Figure 2 shows the relationships between the different entities that participate in the process induced by the use of a search engine. In this model we assume that clicks on answers are a positive feedback, that is, that the answer was relevant for the query. We now define the main concepts used in the sequel of the paper: – Query instance: query (set of words or sentence) plus zero or more clicks related to that query1 . Formally: QI = (q, p, t, c∗ )
where q = {words} and c = (u, t)
being q the query, p a user profile, u a clicked URL, and t a time stamp. We will use QIi to denote the elements of an instance (i ∈ {q, p, t, c(u), c(t)}). 1
We only consider clicks on the results, but the same ideas can be extended to advertising or navigational clicks.
4
R. Baeza-Yates
Fig. 2. Search engine interaction cycle
– Query session: one or more query instances with the same user profile. That is Q S = Q I + using a regular expression notation. Notice that this implies an ordered sequence on the time stamps. – URL Cover: set of all URLs clicked by a query instance. That is: U C p = Q I c (u) Q I
q =p
We are interested in the aggregation of equal queries (e.g. same set of words) independently of the user profile (a cookie, an IP and user agent, etc.) and the query time stamp. So in the rest of the paper we may drop these parameters, aggregating a query instance to Q I = (q, u∗ ). Similarly for query sessions and URL covers.
4
Induced Graphs
We will define now several weighted graphs based on the previous definitions. The idea is to use more and more related information, starting from the words of the query and finishing with the content of the clicked URLs. 4.1
Word Graph
Each vertex is a QI and the weight of it is the number of occurrences of QI. There is an undirected edge between v1 and v2 if v1q ∩ v2q = ∅. The weight of the edge is the number of cases where this condition is true. Other weight schemes could be defined, for example based on the distribution of clicks. 4.2
Session Graph
Each vertex is a QI and the weight of it is the number of sessions that have this query (most of the time is the same as the number of occurrences). There
Graphs from Search Engine Queries
5
is a directed edge from v 1 to v 2 if both QIs are in the same session and v1 happened before v2 and the weight of the edge is the number of such cases. 4.3
URL Cover Graph
Each vertex is a QI and the weight of it is the number of occurrences of the query. Now we define three different types of edges. There is an edge between v1 and v2 if: – Complete cover: U Cv 1q ⊂ U Cv 2q . This is a directed edge from v1 to v2. – Identical cover: U Cv 1q = U Cv 2q . This edge is undirected. – Partial cover: U Cv 1q ∩ U Cv 2q = ∅. This edge is undirected but could be directed from v1 to v2 if |U Cv 1q | > |U Cv 2q |. In all cases the weight of the edge is the size of the smallest set covered. In figure 3 we show a small part of a cover graph that considers the three cases above. Notice that edges with larger weights are darker. 4.4
URL Link Graph
Each vertex is a QI and the weight of it is the number of occurrences of the query. There is a directed edge from v1 to v2 if there is at least a link from a URL in U Cv 1q to a URL in U Cv 2q . The weight of the edge is the number of links of this kind. 4.5
URL Terms Graph
We can extract a set of terms from every clicked URL to represent it. There are many choices, for example: – Full text content of the page (after deleting HTML tagging and stopwords); – Text snippet generated by the search engine for that page and for the corresponding query or all the sentences or passages that contain the query; – A subset of the text content of the URL (e.g. title, headings, etc.); – Anchor text in the links that point to the URL; and – A combination of the above. In all of these cases a URL is represented by a set of terms and the frequency of each term. The induced graph in this case is similar to the previous ones. Each vertex is a QI and the weight of it is the number of occurrences of the query. There is a directed edge from v1 to v2 if there is at least ℓ common terms in the intersection of the representations of at least one URL in U Cv 1q to a URL in U Cv 2q . The weight of the edge is the sum of the frequencies of the common terms in the URLs that satisfy this condition.
6
R. Baeza-Yates
Fig. 3. A graph example considering all possible covers
Graphs from Search Engine Queries
5
7
Concluding Remarks
These graphs could be sparse but provide different information. Some imply stronger relations and also have different levels of noise. Table 1 gives a qualitative summary of all the graphs proposed. The word and URL cover graphs are based in previous work, but all the others are new. These graphs are natural graphs as they do not need to define a distance function between nodes. Table 1. Qualitative analysis of the graphs Graph Word Session URL Cover URL Link URL Terms
Strength Sparsity Noise medium high polysemy medium high low high medium click spam weak medium link spam medium low term spam
All these graphs are in general not connected. One possibility would be to have just one graph that aggregate all graphs with different labels for each edge type. The weights that we have proposed should be normalized in many cases, either using the overall value of the measure used or the sum of the weights of the outgoing edges in a node. The graphs can also be weighted using other measures, for example the number of cases that satisfy the condition associated with the existence of the edge times some similarity measure on the condition (for example, number or length of common words in the Word graph). The next level of graphs would use a distance function over the representations of the queries. For example, by using a vector model over terms. However, as mentioned before, these would not be natural graphs. In all cases, a query similarity measure implies the possibility of ranking queries. There are some interesting related open problems. For example, sessions are usually physical sessions and not logical sessions (for example, four queries could mean two different tasks, with two queries each). Another problem is that not all clicks are equal, as they are biased by the ranking function of the search engine as well as by the user interface. So clicks should be unbiased considering these two effects (see [2]). These graphs can be used for many different applications. The following is a partial list of potential uses: – Recognition of polysemic words: the edge exist in the word graph and the intersection of words has size 1, but the edge does not exist in the URL cover graph. – Similar queries: queries that are close using some distance function in the URL cover graph. – Related queries: queries that are close using some distance function in the URL link graph.
8
R. Baeza-Yates
– Clustering queries: connected components in the session and/or URL cover graph. Clusters can be used for many purposes, such as reranking, query suggestions, logical session finding, etc. – Pseudo-taxonomy of queries: more specific queries are children in the complete URL cover graph and related queries are linked by a partial URL cover. Acknowledgement. We thank the comments of Vanessa Murdock and the example graph generated by Alessandro Tiberi.
References 1. Baeza-Yates, R.: Applications of Web Query Mining. European Conference on Information Retrieval (ECIR’05), D. Losada, J. Fern´ andez-Luna (eds), Springer LNCS 3408 (2005) 7–22 2. Baeza-Yates, R. Hurtado, C., and Mendoza M.: Query Clustering for Boosting Web Page Ranking. Advances in Web Intelligence, AWIC 2004, Springer LNCS, 3034, (2004) 164–175 3. Baeza-Yates, R. Hurtado, C., and Mendoza M.: Query Recommendation Using Query Logs in a Search Engine. In EDBT Workshops, Springer LNCS 3268 (2004) 588–596 4. Beeferman, D. and Berger, A.: Agglomerative Clustering of a Search Engine Query Log. KDD (1999) Boston, MA USA 5. Fitzpatrick, L. and Dent, M.: Automatic Feedback Using Past Queries: Social Searching? 20th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (1997) 306–313 6. Fonseca, B.M., Golgher, P.B., De Moura, E.S., and Ziviani N.: Using Association Rules to Discovery Search Engines Related Queries. In First Latin American Web Congress (LA-WEB’03), November 2003 Santiago, Chile. 7. Raghavan, V.V. and Sever H.: On the Reuse of Past Optimal Queries. 18th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (1995) 344–350 8. Sahami, M. and Heilman, T.D.: A Web-Based Kernel Function for Measuring the Similarity of Short Text Snippets. World Wide Web Conference (2006) 377–386 9. Wen, J., Mie, J., and Zhang, H.: Clustering User Queries of a Search Engine. In Proc. at 10th International World Wide Web Conference, W3C (2001) 10. Zaiane O.R. and Strilets A.: Finding Similar Queries to Satisfy Searches Based on Query Traces. In Proceedings of the International Workshop on Efficient WebBased Information Systems (EWIS), Montpellier, France, September 2002
Model-Checking Large Finite-State Systems and Beyond⋆ Luboˇs Brim and Mojm´ır Kˇret´ınsk´ y Faculty of Informatics, Masaryk University, Brno, Czech Republic
1
Introduction
With the increase in the complexity of computer systems, it becomes even more important to develop formal methods for ensuring their quality. Early detection of errors requires application of advanced analysis, verification and validation techniques for modelling resources, temporal properties, datatype invariants, and security properties. Various techniques for automated and semi-automated analysis and verification of computer systems have been proposed. In particular, model-checking has become a very practical technique due to its push-button character. The basic principle behind model-checking is to build a model of the system under consideration together with a formal description of the verified property in a suitable temporal logic. The model-checking algorithm is a decision procedure which in addition to the yes/no answer returns a trace of a faulty behaviour in case the checked property is not satisfied by the model. One of the additional advantages of this approach is that verification can be performed against partial specifications, by considering only a subset of all specification requirements. This allows for increased efficiency by checking correctness with respect to only the most relevant requirements that should be fulfilled. The limiting factor is that the size of the model explodes, i.e. it generally grows exponentially with respect to the size of the system description. To handle state space explosion additional techniques are required. In recent years, research has been conducted in techniques which utilise the combined resources of parallel or distributed computers to further push the borders of still tractable systems. In the first part we give an introductory survey of achievements related to cluster-based LTL model checking finite-state systems. In the second part we employ the classes of infinite-state systems defined by term rewrite systems and called Process Rewrite Systems (PRS) as introduced by Mayr. PRS subsume a variety of the formalisms studied in the context of formal verification; Petri nets, pushdown automata, and process algebras like BPA, BPP, or PA all serve to exemplify this. We present some extensions of PRS and discuss their basic properties. Also, we explore the model checking problem over these classes with respect to various linear- and branching-time logics. ⋆
This work has been partially supported by the Grant Agency of Czech Republic grant No. 201/06/1338 and the Academy of Sciences grant No. 1ET408050503.
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 9–28, 2007. c Springer-Verlag Berlin Heidelberg 2007
10
2
L. Brim and M. Kˇret´ınsk´ y
Model-Checking Large Finite-State Systems
In this part we focus on finite-state models where one assumes only a finite number of distinct configurations during any arbitrary long execution of a computer system. Although surely limited in a mathematical sense, finite-state models necessary encompass every software system implemented on a digital computer. Model-checking finite-state systems has been applied fairly successfully for verification of quite a few real-life systems. However, its applicability to a wider class of practical systems has been hampered by the so called state explosion problem (i.e. the enormous increase in the size of the state space). For large industrial models, the state space does not completely fit into the main memory of a single state-of-art computer and hence the model-checking algorithm becomes very slow as soon as the memory is exhausted and system starts swapping. Much attention has been focused on the development of approaches to battle the state space explosion problem. Many techniques, such as abstraction, state compression, state space reduction, symbolic state representation, etc., are used to reduce the size of the problem to be handled allowing thus a single computer to process larger systems. There are also techniques that purely focus on increasing the amount of available computational power. These are, for example, techniques to fight memory limits with efficient utilisation of an external I/O device [1], [30], [43], [64], or techniques that introduce cluster-based algorithms to employ aggregate power of network-interconnected computers. Cluster-based algorithms perform their computation simultaneously on a number of workstations that are allowed to communicate and synchronise themselves by means of message passing. Cluster-based algorithms can thus be characterised as parallel algorithms performing in a distributed memory environment. Efficient parallel solution often cannot be achieved by a simple adaptation of a sequential one, in many cases it requires invention of original, novel approaches radically different from those used to solve the same problems sequentially. Parallel algorithms have been successfully applied to symbolic model checking [36], [37], analysis of stochastic [39] and timed [6] systems, equivalence checking [9] and other related problems [7], [10], [35]. Experimental performance results on clusters of workstations show significant improvements with respect to sequential techniques, both in extension of the size of the problem and in computational times, along with adequate scalability with the number of processors. As a demonstration of cluster-based verification we consider parallel LTL model-checking. The LTL model-checking problem can be reformulated as a cycle detection problem in an oriented graph and the basic principles behind presented algorithms rely on efficient solutions to detecting cycles in a distributed environment. The best known enumerative sequential algorithms for detection of accepting cycles are the Nested DFS algorithm [27], [41] (implemented, e.g., in the model checker SPIN [40]) and SCC-based algorithms originating in Tarjan’s algorithm for the decomposition of the graph into strongly connected components (SCCs) [67]. While Nested DFS is more space efficient, SCC-based algorithms produce shorter counterexamples in general. The linear time complexity of both algorithms relies on the postorder as produced by the depth-first search
Model-Checking Large Finite-State Systems and Beyond
11
traversal over the graph. It is a well known fact that computing depth-first search postorder is P-complete [61], hence probably inherently sequential. This means that none of the two algorithms can be easily adapted to work on a parallel machine. A few fundamentally different cluster-based techniques for accepting cycle detection appeared though. They typically perform repeated reachability over the graph. Unlike the postorder problem, reachability is a graph problem which can be parallelised, hence the algorithms might be transformed to cluster-based algorithms that work with reasonable increase in time and space. The algorithms employ specific structural properties of the underlying graphs (often computable in advance from the given system specification), use additional data structures to divide the problem into independent sub-problems, or translate the model-checking problem to another one, which admits efficient parallel solution. Several of the algorithms are based on sequentially less efficient but well parallelizable breadth-first exploration of the graph or on placing bounds limiting the size of the graph to be explored. 2.1
Distributed Algorithms for Accepting Cycle Detection
The algorithms are meant for cluster-based computing. The cluster is formed from a network of workstations, there is no shared memory. We describe the main ideas primarily as sequential, leaving thus many technical details related to distributed computation out. The problem we consider comes out from the automata-based procedure to decide LTL model checking problem as introduced in [68]. The approach exploits the fact that every set of executions expressible by an LTL formula is an ω-regular set and can be described by a B¨ uchi automaton. In particular, the approach suggests to express all system executions by a system automaton and all executions not satisfying the formula by a property or negative claim automaton. These two automata are combined into their synchronous product in order to check for the presence of system executions that violate the property expressed by the formula. The language recognised by the product automaton is empty if and only if no system execution is invalid. The language emptiness problem for B¨ uchi automata can be expressed as an accepting cycle detection problem in a directed graph. Each B¨ uchi automaton can be naturally identified with an automaton graph which is a directed graph G = (V, E, s, A) where V is the set of vertexes (n = |V |), E is a set of edges (m = |E|), s is an initial vertex, and A ⊆ V is a set of accepting vertexes (a = |A|). We say that a reachable cycle in G is accepting if it contains an accepting vertex. Let A be a B¨ uchi automaton and GA the corresponding automaton graph. Then A recognises a nonempty language iff GA contains an accepting cycle. The LTL model-checking problem is thus reduced to the reachable accepting cycle detection problem in automaton graphs. We suppose the graph is given implicitly and is generated on demand. This contrasts to the possibility of having an explicit representation - like adjacency matrix - and this gives a better chance to get the solution without actually constructing the entire graph. For this reason our graphs are given by two functions:
12
L. Brim and M. Kˇret´ınsk´ y
the one gives the initial vertex and the other function gives for each vertex the set of its immediate successors. The graph is distributed on the workstations using a partition function placing each vertex on some workstation. Algorithm 1. Maximal Accepting Predecessors [19], [20] A vertex u is a predecessor of a vertex v if there is a non-trivial path from u to v. The main idea behind the algorithm is based on the fact that each accepting vertex lying on an accepting cycle is its own predecessor. Instead of expensive computing and storing of all accepting predecessors for each (accepting) vertex, the algorithm computes a single representative accepting predecessor for each vertex. We presuppose a linear ordering ≺ of vertexes (given e.g. by their memory representation) and choose the maximal accepting predecessor. For a vertex u we denote its maximal accepting predecessor in the graph G by map G (u). Clearly, if an accepting vertex is its own maximal accepting predecessor (map G (u) = u), it is its own predecessor and it lies on an accepting cycle. Unfortunately, the opposite does not hold in general. It can happen that the maximal accepting predecessor for an accepting vertex on a cycle does not lie on the cycle. Such vertexes can be safely deleted from the set of accepting vertexes (by applying the deleting transformation) and the accepting cycle still remains in the resulting graph. Whenever the deleting transformation is applied to automaton graph G with map G (v) = v for all v ∈ V , it shrinks the set of accepting vertexes by those vertexes that do not lie on any cycle. As the set of accepting vertexes can change after the deleting transformation has been applied, maximal accepting predecessors must be recomputed. It can happen that even in the modified graph the maximal accepting predecessor function is still not sufficient for cycle detection. However, after a finite number of applications of the deleting transformation an accepting cycle is certified. For an automaton graph without accepting cycles the repetitive application of the deleting transformation results in an automaton graph with an empty set of accepting vertexes. Time complexity of the algorithm is O(a2 · m), where a is the number of accepting vertexes. Here the factor a · m comes from the computation of the map function and the factor a relates to the number of iterations. Algorithm 2. Eliminating Bad States [24] The accepting cycle detection problem can be directly reformulated as a question whether the automaton graph contains a nontrivial accepting strongly connected component. The inspiration for the algorithm is taken from symbolic algorithms for cycle detection, namely from SCC hull algorithms. SCC hull algorithms compute the set of vertexes containing all accepting components. Algorithms maintain the approximation of the set and successively remove non-accepting components until they reach a fixpoint. Different strategies to remove non-accepting components lead to different algorithms. An overview, taxonomy, and comparison of symbolic algorithms can be found in independent reports [34] and [60].
Model-Checking Large Finite-State Systems and Beyond
13
The enumerative algorithm works on individual vertexes rather than on sets of vertexes as is the case in symbolic approach. A component is removed by removing its vertexes. The algorithm employs two rules to remove vertexes of non-accepting components: – if a vertex is not reachable from any accepting vertex then the vertex does not belong to any accepting component and – if a vertex has in-degree zero then the vertex does not belong to any accepting component. Note that an alternative set of rules can be formulated as – if no accepting vertex is reachable from a vertex then the vertex does not belong to any accepting component and – if a vertex has out-degree zero then the vertex does not belong to any accepting component. This second set of rules results in an algorithm which works in a backward manner and we will not describe it explicitly here. The algorithm in its forward version requires the entire automaton graph to be generated first. The same is true for the backward version. Moreover, the backward version actually needs to store the edges to be able to perform backward reachability. This is however payed out by relaxing the necessity to compute successors, which is in fact a very expensive operation in practise. Time complexity of the algorithm is O(h · m ) where h is the height of the SCC tree. A positive aspect of the algorithms is their effectiveness for weak automaton graphs. A graph is weak if each SCC component of G is either fully contained in A or is disjoint with A . For weak graphs one iteration of the SCCbased algorithm is sufficient to decide accepting cycles. The studies of temporal properties [29], [25] reveal that verification of up to 90% of LTL properties leads to weak automaton graphs. Algorithm 3. Maximal Number of Accepting Predecessors [18] Consider the maximal number of accepting vertexes on a path from the source to a vertex, where the maximum is taken over all paths. For vertexes on an accepting cycle the maximum does not exist because extending a path along the cycle adds at least one accepting vertex. For computing the maximal number of accepting predecessors the algorithm maintains for every vertex v its “distance” label d(v) giving the maximal number of accepting predecessors, parent vertex p(v), and status S(v) ∈ {unreached , labelled , scanned }. Initially, d(v) = ∞, p(v) = nil , and S(v) = unreached for every vertex v. The method starts by setting d(s) = 0, p(s) = nil and S(s) = labelled , where s is the initial vertex. At every step a labelled vertex is selected and scanned. When scanning a vertex u, all its outgoing edges are relaxed (immediate successors are checked). Relaxation of an edge (u, v) means that if d(v) is an accepting vertex then d(v) is set to d(u) + 1 and p(v) is set to u. The status of u is changed to scanned while the status of v is changed to labelled.
14
L. Brim and M. Kˇret´ınsk´ y
If all vertexes are either scanned or unreached then d gives the maximal number of accepting predecessors. Moreover, the parent graph Gp is the graph of these “maximal” paths. More precisely, the parent graph is a subgraph Gp of G induced by edges (p(v), v) for all v such that p(v) = nil . Different strategies for selecting a labelled vertex to be scanned lead to different algorithms. When using FIFO strategy to select vertexes, the algorithm runs in O(m · n) time in the worst case. For graphs with reachable accepting cycles there is no “maximal” path to the vertexes on an accepting cycle and the scanning method must be modified to recognise such cycles. The algorithm employs the walk to root strategy which traverses a parent graph. The walk to root strategy is based on the fact (see e.g. [26]) that any cycle in parent graph Gp corresponds to an accepting cycle in the automaton graph. The walk to root method tests whether Gp is acyclic. Suppose the parent graph Gp is acyclic and an edge (u, v) is relaxed, i.e. d(v) is decreased. This operation creates a cycle in Gp if and only if v is an ancestor of u in the current Gp . Before applying the operation, we follow the parent pointers from u until we reach either v or s. If we stop at v a cycle is detected. Otherwise, the relaxation does not create a cycle. However, since the path to the initial vertex can be long, the cost of edge relaxation becomes O(n) instead of O(1). In order to optimise the overall computational complexity, amortisation is used to pay the cost of checking Gp for cycles. More precisely, the parent graph Gp is tested only after the underlying algorithm performs Ω(n) relaxations. The running time is thus increased only by a constant factor. The worst case time complexity of the algorithm is thus O(n · m). Algorithm 4. Back-Level Edges [2] The algorithm builds on breadth-first search (BFS) exploration of the graph. BFS is typically used in graph algorithms that work with distances and distances can also be used to characterise cycles in a graph. Distance of a vertex u ∈ V , d(u), is the length of a shortest path from the initial vertex to the vertex u. The set of vertexes with the same distance is called level. An edge (u, v) ∈ E is called a back-level edge if d(u) ≥ d(v). The key observation connecting the cycle detection problem with the backlevel edge concept is that every cycle contains at least one back-level edge. Backlevel edges are therefore used as triggers which start a cycle detection. However, it is too expensive to test every back-level edge for being a part of a cycle. The algorithm therefore integrates several optimisations and heuristics to decrease the number of tested edges and speed-up the cycle test. The BFS procedure which detects back-level edges runs in time O(m + n). Each back-level edge has to be checked to be on a cycle, which requires linear time O(m + n) as well. In the worst case there can be O(m) back-level edges, hence the overall time complexity of the algorithm is O(m.(m + n)). The algorithm performs well on graphs with small number of back-level edges. In such cases the performance of the algorithm approaches the performance of reachability analysis, although, the algorithm performs full LTL model checking. On the other hand, a drawback shows up when a graph contains many back-level
Model-Checking Large Finite-State Systems and Beyond
15
edges. In such a case, frequent revisiting of vertexes in the second phase of the algorithm causes the time of the computation to be high. The level-synchronised BFS approach also allows to involve BFS-based Partial Order Reduction (POR) technique in the computation. POR technique prevents some vertexes of the graph from being generated while preserving result of the verification. Therefore, it allows analysis of even larger systems. The standard DFS-based POR technique strongly relies on DFS stack and as such it is inapplicable to cluster-based environment. Algorithm 5. Dependency Graph [3], [5] Local cycles in a distributed graph can be detected using standard sequential techniques, therefore, the real problem in cluster-based detection of accepting cycles is the detection of cycles that are split among workstations. The idea of the last algorithm is to construct a smaller graph by omitting those parts of the original graph that are irrelevant for the detection of split cycles. By a split cycle we mean a cycle that contains at least one cross-edge. An edge (u, v) is a cross-edge if vertexes u and v are owned by two different workstations. Vertex v is called a transfer vertex if there is a cross-edge (u, v). Let G = (V, E, s, A) be a graph, we call graph Gd e p = (Vdep , Edep ) a dependency graph if Vdep contains the initial vertex, all accepting vertexes, and all transfer vertexes of the product automaton graph, and the reachability relation induced by reflexive and transitive closure of Edep is a subset of the reachability relation induced by reflexive and transitive closure of E. Directly from the definition we have that there is an accepting cycle in Gdep if and only if there is a split accepting cycle in G. The cluster-based algorithm stores the dependency graph explicitly in a distributed manner. In particular, vertexes of the dependency graph are distributed among the workstations by the same partition function as used for the original graph. To maintain consistency of the dependency graph in a distributed environment, the graph is implemented using a particular data structure called dependency structure. The algorithm employing dependency structure performs its task in two global steps. In the first step it explores the given graph in order to construct the dependency structure and detect local accepting cycles. If no local accepting cycle is detected, the algorithm continues with the second step. Vertexes that have no successors in the dependency structure are recursively removed from it as they cannot lie on a split cycle. If all vertexes are removed from the structure, there is no split cycle in the original graph. In the other case, the presence of a split cycle is detected. The algorithm was historically the first cluster-based algorithm for detection of accepting cycles, hence for the full LTL model checking. The original idea of the algorithm builds on backward elimination of vertexes with no successors from the dependency structure. However, any cluster-based algorithm presented in this survey can be combined with the dependency structure in order to detect split accepting cycles.
16
2.2
L. Brim and M. Kˇret´ınsk´ y
A Tool for Cluster-Based Verification
A few sequential tools have been developed to support engineers in their verification needs. However, when verification engineers find themselves in the situation of needing resources beyond the capabilities of a single computer, the situation is rather poor. Most of the parallel model-checking algorithms have been implemented as research prototypes which often are not publicly available, usually undocumented, without user interface, unstable (in the sense of “prone to change”), and not optimised. These tools are mainly research vehicles, and as such not ready for widespread use by third parties. Additionally, deployment of tools running on parallel computers is more demanding than for sequential tools. We cite high entrance costs for hardware acquisition, complex software installation procedures, but also consequential costs for maintenance. As a consequence, hardly any benchmark results of parallel and/or distributed model checking algorithms can be compared fairly, since the hardware employed for benchmarks varies from a few workstations also being used for regular tasks, to medium-sized dedicated clusters. DiVinE (Distributed Verification Environment) is a framework for enumerative model checking of LTL properties on a cluster of workstations that aims to create a distributed state space exploration and analysis tool directed at a significant part of the user base of verification tools, as well as providing hardware to run on. DiVinE consists of a library of common functions (DiVinE Library) on top of which various distributed verification algorithms can be implemented, of a collection of state-of-the-art distributed verification algorithms incorporated into a single software product (DiVinE Tool) which is as easy to install as most sequential tools, and a ready-to-use cluster for users of sequential tools in case they need to run experiments using DiVinE Tool without having access to their own cluster. DiVinE Tool is thus a parallel, distributed-memory enumerative modelchecking tool for verification of concurrent systems. The tool employs aggregate power of network-interconnected workstations to verify systems whose verification is beyond capabilities of sequential tools. DiVinE modelling language is rich enough to describe systems made of synchronous and asynchronous processes communicating via shared memory and buffered or unbuffered channels. System properties can be specified either directly in Linear Temporal Logic (LTL) or alternatively as processes describing undesired behaviour of systems under consideration (negative claim automata). Thanks to the DivSPIN project [4], DiVinE Tool is also capable of verifying models written in ProMeLa. From the algorithmic point of view, the tool is quite unique as it incorporates several LTL model-checking algorithms, in fact all the above mentioned algorithms are available. Besides these, DiVinE Tool includes also an algorithm for distributed state space generation and an algorithm that performs sequential NestedDFS in a distributed-memory setting. DiVinE Tool can be deployed either as a complete software package to be installed on a separate Linux cluster or as a small Java application to access
Model-Checking Large Finite-State Systems and Beyond
17
a pre-installed clusters. In the first case, basic Linux administrator skills are required to install the tool, but the user is in the full control of environment settings under which distributed algorithms are to be executed and can control the tool from a command line. In the second case, the tool can be used employing DiVinE pre-installed clusters and accessed remotely via a graphical user interface. The graphical user interface (GUI) requires properly installed Java Runtime Environment. An important part of the DiVinE project is the maintenance of a public server together with a limited number of DiVinE dedicated clusters. For security reasons registered users are allowed to connect to DiVinE public server only. New users can be registered by following instructions given on DiVinE project web pages.
3
Infinite-State Systems
Current software systems often exhibit an evolving structure and/or operate on unbounded data types. Hence automatic verification of such systems usually requires to model them as infinite-state ones. Various modelling formalisms suited to different kinds of applications have been developed with their respective advantages and limitations. Petri nets, pushdown automata, and process algebras like BPA, BPP, or PA all serve to exemplify this. Here we employ the classes of infinite-state systems defined by term rewrite systems and called Process Rewrite Systems (PRS, [55]). PRS subsume a variety of the formalisms studied in the context of formal verification (e.g. all the models mentioned above). a A PRS is a finite set of rules t −→ t′ where a is an action under which a subterm t can be reduced to a subterm t′ . Terms are built up from an empty process ε and a set of process constants using (associative) sequential “.” and (associative and commutative) parallel “ ” operators. The semantics of PRS can be defined by labelled transition systems (LTS) – labelled directed graphs whose nodes (states of the system) correspond to terms modulo properties of “.” and “ ” and edges correspond to individual actions (computational steps) which can be performed in a given state. Mayr [55] has also shown that the reachability problem (i.e. given terms t, t′ : is t reducible to t′ ?) for PRS is decidable. The relevance of various subclasses of PRS for modelling and analysing programs is shown e.g. in [32], for automatic verification see e.g. surveys [22], [63]. 3.1
PRS and Its Extensions
Most research (with some recent exceptions, e.g. [15], [32], [14]) has been devoted to the PRS classes from the lower part of the PRS hierarchy, especially to pushdown automata (PDA), Petri nets (PN) and their respective subclasses. We mention the successes of PDA in modelling recursive programs (without process creation) and PN in modelling dynamic creation and synchronisation of concurrent processes (without recursive calls). These two formalisms subsume a notion of a finite state unit (FSU) keeping some kind of global information which is accessible to the redices (the ready to be reduced components) of a PRS term – hence an FSU can regulate rewriting. On the other hand, using an FSU to
18
L. Brim and M. Kˇret´ınsk´ y
extend the PRS rewriting mechanism is very powerful since the state-extended version of PA processes (sePA) has a full Turing-power [11] – the decidability of reachability is lost for sePA, including all its superclasses (see Figure 1). Here, we present a unified view on PRS classes and their respective extensions of three types: fcPRS classes ([65], inspired by concurrent constraint programming [62]), wPRS classes ([48], PRS systems equipped with a weak FSU inspired by weak automata [57]), and state-extended PRS classes [46]. Let Const = {X , . . .} be a set of process constants. The set of process terms (ranged over by t, . . .) is defined by the abstract syntax t ::= ε | X | t.t | t t, where ε is the empty term, X ∈ Const is a process constant; and ’.’ and ’ ’ mean sequential and parallel compositions respectively. We always work with equivalence classes of terms modulo commutativity and associativity of ’ ’, associativity of ’.’, and neutrality of ε, i.e. ε.t = t.ε = t ε = t. We distinguish four classes of process terms as: 1 – terms consisting of a single process constant only, in particular ε ∈ 1, S – sequential terms - terms without parallel composition, e.g. X.Y.Z, P – parallel terms - terms without sequential composition, e.g. X Y Z, G – general terms - terms without any restrictions, e.g. (X.(Y Z)) W . Let M be a set of control states and Act be a set of actions. Let α, β ∈ {1, S, P, G}, α ⊆ β be the classes of process terms. An (α, β)-sePRS (state extended process rewrite system) Δ is a tuple (R, p0 , t0 ), where a
– R is a finite set of rewrite rules of the form (p, t1 ) ֒→ (q, t2 ), where t1 ∈ α, t1 = ε, t2 ∈ β, p, q ∈ M , and a ∈ Act , – a pair (p0 , t0 ) ∈ M × β forms the distinguished initial state of the system. Sets of control states and process constants occurring in rewrite rules or in the initial state of Δ are denoted by M (Δ) and Const(Δ) respectively. An (α, β)-sePRS Δ = (R, p0 , t0 ) represents a labelled transition system the states of which are pairs (p, t) such that p ∈ M (Δ) is a control state and t ∈ β is a process term over Const (Δ). The transition relation −→ is the least relation satisfying the following inference rules: a
((p, t1 ) ֒→ (q, t2 )) ∈ Δ
(p, t1 ) −→ (q, t2 )
a
(p, t1 ) −→ (q, t2 )
a
a
a
(p, t1 ) −→ (q, t2 )
(p, t1 t′1 ) −→ (q, t2 t′1 )
a
(p, t1 .t′1 ) −→ (q, t2 .t′1 )
To shorten our notation we write pt in lieu of (p, t). The transition relation can be extended to finite words over Act in a standard way. A state qt2 is reachable σ ∗ from a state pt1 , written pt1 −→ qt2 , if there is σ ∈ Act ∗ such that pt1 −→ qt2 . We say that a state is reachable if it is reachable from the initial state. An (α, β)-sePRS where M (Δ) is a singleton is called (α, β)-PRS (process rewrite system). In such systems we omit the single control state from rules and states. An (α, β)-sePRS Δ is called a process rewrite system with weak finite-state control unit or a weakly extended process rewrite system, written (α, β)-wPRS, a if there exists a partial order ≤ on M (Δ) such that each rule pt1 ֒→ qt2 of Δ satisfies p ≤ q.
Model-Checking Large Finite-State Systems and Beyond
19
sePRS wPRSK KK rr r K r r fcPRSK KKKK r r r r K K rrr rrrr PRS KKKK KKKK r r r K KK KKKK KK rrr rrrr r(G r ,G)-PRS r KK K r r r K r KK KKKK sePAN r KK rr r sePAD rrr r KK r K r r K r r r KK KK K r r r KK r r KK r r K KK wPAN KK wPADL rrr rr KK Ks rLrLL rrrr K ss K r L s K r L s K fcPADL rr LL s KKsfcPAN LL ss rLrLL ss sss PAN LLL LLLL s PAD s ss ss (S ,G )-PRS LLL LLLL LLLL s ,G)-PRS ss ssss s(P s LLL LLL LLL s s s s s s LL sePA Psss LL s LLL ss s PPPPssss ss LLL LLLmLmmmmL s s P s LL mmm LLL wPA ss PPPsss JsJs P mmmLLLL rrLr m JJ ssss PPPPP m m L r J m s L PPP r mm r L fcPA J sss JJJ P rr LrLr J mmm J s r J J JJ JJ {se,w,fc}PDA=PDA=seBPArrrr rrrr {se,w,fc}PN=PN PA J J JJ r r (1,G (S ,S )-PRS (P ,P)-PRS JJ JJJJ KK rr rr r )-PRS JJ r K JJ rr rr r K r r r J K J r r r J K J seBPP=MSA JJ KK rr rr rr KK JJ JJ rr rrrr rrr KK J r J r r KK wBPA rr r KK JJJwBPP rr rrr KK r r KK fcBPP fcBPA rrr KK rr BPA
BPP
(1,S )-PRSTTT
(1,P)-PRS kkk kkk k k kk kkk kkk
TTTT TTTT TTTT TT
{se,w,fc}FS=FS (1,1)-PRS
Fig. 1. The hierarchy of classes defined by (extended) rewrite formalisms
Some classes of (α, β)-PRS correspond to widely known models as finitestate systems (FS, (1, 1)-PRS), basic process algebras (BPA, (1, S)-PRS), basic parallel processes (BPP, (1, P )-PRS), process algebras (PA, (1, G)-PRS), pushdown processes (PDA, (S, S)-PRS, see [23] for justification), and Petri nets (PN, (P, P )-PRS). The classes (S, G)-PRS, (P, G)-PRS and (G, G)-PRS were introduced and named as PAD, PAN, and PRS by Mayr [55]. Instead of (α, β)-sePRS or (α, β)-wPRS we juxtapose prefixes ‘se-’ or‘w-’ respectively with the acronym corresponding to the (α, β)-PRS class. For example, we use wBPP rather than (1, P )-wPRS. 3.2
Expressiveness and Reachability
Figure 1 describes the hierarchy of PRS classes and their extended counterparts with respect to bisimulation equivalence. If any process in class X can be also
20
L. Brim and M. Kˇret´ınsk´ y
defined (up to bisimilarity) in class Y we write X ⊆ Y . If additionally Y ⊆ X holds, we write X Y and say X is less expressive than Y . This is depicted by the line(s) connecting X and Y with Y placed higher than X in Figure 1. The dotted lines represent the facts X ⊆ Y , where we conjecture that X Y hold. The strictness (’’) of the PRS-hierarchy has been proved by Mayr [55], that of the corresponding classes of PRS and fcPRS has been proved in [65], and the relations among MSA and the classes of fcPRS and wPRS have been studied in [48]. Note that the strictness relations wX seX hold for all X = PA, PAD, PAN, PRS due to our reachability result for wPRS and due to the full Turingpower of sePA [11]. These proofs together with Moller’s result establishing MSA PN [56] complete the justification of Figure 1 – with one exception, namely the relation between the PN and sePA classes. Looking at two lines leaving sePA down to the left and down to the right, we note the “left-part collapse” of (S, S)-PRS and PDA proved by Caucal [23] (up to isomorphism). The right-part counterpart is slightly different due to the just mentioned result that MSA PN and our results that PN sePA ([47]). Let us recall that the reachability problem for PRS is decidable [55]. We note that this problem remains decidable for weakly extended PRS as well: Theorem 1 ([47]). The reachability problem for wPRS is decidable. This result deserves some additional remarks. First, it determines the decidability borderline of the reachability problem in the mentioned hierarchy; the problem is decidable for all classes except those with Turing power. In other words, it can be seen as a contribution to studies of algorithmic boundaries of reachability for infinite-state systems. Second, in the context of verification, one often formulates a property expressing that nothing bad occurs. These properties are called safety properties. The collection of the most often verified properties [29] contains 41% of such properties. Model checking of safety properties can be reduced to the reachability problem. Moreover, many successful verification tools concentrate on reachability only. Therefore, our decidability result can be seen as a contribution to an automatic verification of infinite-state systems as well. Further, given a labelled transition system (S, Act, −→, α0 ) with a distinguished action τ ∈ Act, we define a weak trace set of a state s ∈ S as w
wtr(s) = {w ∈ (Act {τ })∗ | s =⇒ t for some t ∈ S}, w
w
′
where s =⇒ t means that there is some w′ ∈ Act∗ such that s −→ t and w is equal to w′ without τ actions. Two systems are weak trace equivalent if the weak trace sets of their initial states are the same. So far it has been known that weak trace non-equivalence is semi-decidable for Petri nets (see e.g. [44]), pushdown processes (due to [21]), and PA processes (due to [52]). Using the decidability result, it is easy to show that the weak trace set is recursive for every state of any wPRS. Hence, the weak trace non-equivalence is semi-decidable for (all subclasses of) wPRS.
Model-Checking Large Finite-State Systems and Beyond
21
Finally, our decidability result has been recently applied in the area of cryptographic protocols. H¨ uttel and Srba [42] define a replicative variant of a calculus for Dolev and Yao’s ping-pong protocols [28]. They show that the reachability problem for these protocols is decidable as it can be reduced to the reachability problem for wPRS. 3.3
Branching-Time Logics and Studied Problems
A reachability property problem, for a given system Δ and a given formula ϕ , is to decide whether EFϕ holds in the initial state of Δ . Hence, these problems are parametrised by the class to which the system Δ belongs, and by the type of the formula ϕ . In most of practical situations, ϕ specifies error states and the reachability property problem is a formalisation of a natural verification problem whether some error state is reachable in a given system. In this section we work with fragments of unified system of branching-time logic (UB) [8]. Formulae of UB have the following syntax: ϕ
::= tt | ¬ϕ
| ϕ
1
∧ϕ
2
| a ϕ
| EFϕ
| EGϕ,
where a ∈ Act is an action. Here, formulae are interpreted over states of sePRS systems. Validity of a formula ϕ in a state pt of a given sePRS system Δ, written (Δ, pt) |= ϕ, is defined by induction on the structure of ϕ: tt is valid for all states; boolean operators have standard meaning; (Δ, pt) |= aϕ iff there is a state qt′ a such that pt −→ qt′ and (Δ, qt′ ) |= ϕ; (Δ, pt) |= EFϕ iff there is a state qt′ reachable from pt such that (Δ, qt′ ) |= ϕ; (Δ, pt) |= EGϕ iff there is a maximal a3 a2 a1 . . . such that p3 t3 −→ p2 t2 −→ (finite or infinite) transition sequence p1 t1 −→ pt = p1 t1 and all states in the sequence satisfy pi ti |= ϕ. We write Δ |= ϕ if ϕ is valid in the initial state p0 t0 of Δ. In the following, we deal with six problems parametrised by a subclass of sePRS systems. Let Δ be a given system of the subclass considered. The problem to decide whether – Δ |= ϕ, where ϕ is a given EF formula, is called decidability of EF logic; – Δ |= EFϕ, where ϕ is a given HM formula, is called reachability HM property; – Δ |= EFϕ, where ϕ is a given simple formula, is called reachability simple property; – Δ |= ϕ, where ϕ is a given EG formula, is called decidability of EG logic; – Δ |= EGϕ, where ϕ is a given HM formula, is called evitability HM property; – Δ |= EGϕ, where ϕ is a given simple formula, is called evitability simple property. We recall that the (full) EF logic is decidable for PAD [54]. It is undecidable for PN [31]. If we consider the reachability HM property problem, then this problem has been shown to be decidable for the classes of PN [45] and PAD [46]. In [49] we have lifted the decidability border for this problem to the wPRS class: Theorem 2 ([49]). The reachability HM property problem is decidable for wPRS. A combination of Theorem 2 and [46], Theorem 22 yields the following corollary.
22
L. Brim and M. Kˇret´ınsk´ y
Theorem 3 ([49]). Strong bisimilarity is decidable between wPRS systems and finite-state ones. As PRS and its subclasses are proper subclasses of wPRS, it follows that we positively answer the question of the reachability HM property problem for the PRS class and hence the questions of bisimilarity checking the PAN and PRS processes with finite-state ones, which have been open problems, see for example [63]. Their relevance to program specification and verification is advocated, for example, in [46], [50]. Further, we mention two extensions of known undecidability results. First, we recall that (full) EF logic is undecidable for PN. An inspection of the proof given in [31] shows that this undecidability result is valid even for seBPP class (also known as multiset automata, MSA). Second, Esparza and Kiehn have proved that EG logic is undecidable for (deterministic) BPP [33]. In [49] we have described a modification of their proof showing that for (deterministic) BPP even the evitability simple property problem is undecidable. The following table describes the current state of (un)decidability results regarding the six problems defined at the beginning of this section for the classes of PRS hierarchy and their extended counterparts. The results established in this section are typeset in bold. problem decidability of EF logic reachability HM property reachability simple property decidability of EG logic evitability HM property evitability simple property 3.4
decidable for PAD [54] wPRS wPRS PDA [58], [23] PDA [58], [23] PDA [58], [23]
undecidable for seBPP sePA sePA BPP [33] BPP [33] BPP
Linear-Time Logics and Studied Problems
Here we focus exclusively on (future) Linear Temporal Logic (LTL). The syntax of Linear Temporal Logic (LTL) [59] is defined as follows ϕ ::= tt | a | ¬ϕ | ϕ ∧ ϕ | Xϕ | ϕ U ϕ, where a ranges over Act, X is called next, and U is called until. The logic is interpreted over infinite as well as nonempty finite words of actions. Given a word u = u(0)u(1)u(2) . . . ∈ Act ∗ ∪ Act ω , |u| denotes the length of the word (we set |u| = ∞ if u is infinite). For all 0 ≤ i < |u|, by ui we denote the ith suffix of u, i.e. ui = u(i)u(i + 1) . . .. The semantics of LTL formulae is defined inductively as follows: u |= tt u |= a u |= ¬ϕ
iff iff
u(0) = a u |= ϕ
Model-Checking Large Finite-State Systems and Beyond
23
LTL(U, X) SSS SSS mmm m m SSS m m m SSS m mm _ Q m LTL(F, X) LTL(U, Fs ) D PPP z l l l PPP l 8 ll PPP lll PP lll 59 l ∞ LTL(Fs ) ≡ LTL(Fs, Gs ) LTL(U) LTL( F, X) E TTT SSS T P T S T S undecidable for PA TTT X _ _ _ _ _ SSS TTT SSS decidable for PDA and PN T SSS _ _ _ _ _ _ _ _ _ _ _ _ _ u SSS LTL(F) ≡ LTL(F, G) SSS SSS decidable for wPRS SSS SS ∞ LTL(X) LTL( F ) PPP l l l PPP l PPP lll lll PP l l l
↑ ↓
LTL()
Fig. 2. The hierarchy of basic fragments with model checking decidability boundary
u |= ϕ1 ∧ ϕ2 iff u |= Xϕ iff u |= ϕ1 U ϕ2 iff
u |= ϕ1 and u |= ϕ2 |u| > 1 and u1 |= ϕ ∃0 ≤ i < |u| . ( ui |= ϕ2 and ∀ 0 ≤ j < i . uj |= ϕ1 )
Moreover, we define the following modalities: Fϕ (eventually) standing for tt U ϕ, Gϕ (always) standing for ¬F¬ϕ, Fs ϕ (strict eventually) standing for XFϕ, ∞
Gs ϕ (strict always) standing for ¬Fs ¬ϕ, F ϕ (infinitely often) standing for GFϕ, ∞ ∞ G ϕ (almost always) standing for ¬ F ¬ϕ. Note that Fϕ is equivalent to ϕ ∨ Fs ϕ but Fs ϕ cannot be expressed with F as the only modality. Thus Fs is “stronger” than F. The relation between Gs and G is similar. For a set {O1 , . . . , On } of modalities, LTL(O1 , . . . , On ) denotes the LTL fragment containing all formulae with modalities O1 , . . . , On only. Such a fragment is called basic. Figure 2 shows an expressiveness hierarchy of all studied basic LTL fragments. Indeed, every basic LTL fragment using standard1 future modalities is equivalent to one of the fragments in the hierarchy, where equivalence between fragments means that every formula of one fragment can be effectively translated into a semantically equivalent formula of the other fragment and vice versa. For example, LTL(Fs , Gs ) ≡ LTL(Fs ). Further, the hierarchy is strict. For detailed information about expressiveness of future LTL modalities and LTL fragments we refer to [66]. It is known that LTL model checking of PDA is EXPTIME-complete [12]. LTL model checking of PN is also decidable, but at least as hard as the reachability 1
By standard modalities we mean the ones defined in this paper and also other commonly used modalities like strict until, release, weak until, etc. However, it is well possible that one can define a new modality such that there is a basic fragment not equivalent to any of the fragments in the hierarchy.
24
L. Brim and M. Kˇret´ınsk´ y
problem for PN [31] (the reachability problem is EXPSPACE-hard [53], [51] and no primitive recursive upper bound is known). If we consider only infinite runs, then the problem for PN is EXPSPACE-complete [38], [54]. Conversely, LTL model checking is undecidable for all classes subsuming PA [13], [54]. So far, there are only two positive results for these classes. Bouajjani and Habermehl [13] have identified a fragment called simple PLTL2 for which model checking of infinite runs is decidable for PA (strictly speaking, simple PLTL2 is not a fragment of LTL as it can express also some non-regular properties, while LTL cannot). Recently, the model checking problem (of infinite runs) has been shown decidable for PRS and the fragment of LTL capturing exactly fairness properties [16]. Note that this fragment and simple PLTL2 fragment are incomparable and both are strictly less expressive than LTL(F, G) (also known as Lamport logic), which is again strictly less expressive than LTL(Fs , Gs ). Theorem 4 ([17]). The model checking problem for wPRS and LTL(Fs , Gs ) is decidable. This problem is EXPSPACE-hard due to EXPSPACE-hardness of the model checking problem for LTL(F, G) for PN [38]. Our decidability proof does not provide any primitive recursive upper bound as it employs LTL model checking for PN, for which no primitive recursive upper bound is known. We also emphasise that this positive result for LTL(Fs , Gs ) deals with both finite and infinite runs, and with wPRS rather than with PRS or PA only. In [17] we have completely located a decidability boundary of the model checking problem for all subclasses of PRS (and wPRS) and all basic LTL fragments. The boundary is depicted in Figure 2. Obviously, the model checking for wPRS and LTL(X) is decidable. Hence, to prove that the decidability boundary of Figure 2 is drawn correctly, it remains to show the following. Theorem 5 ([17]). Model checking of PA against LTL(U) is undecidable. Model ∞
checking of PA against LTL( F , X) is undecidable as well. In the proof of the previous theorem, the PA systems constructed there have only infinite runs. This means that model checking of infinite runs remains un∞ decidable for PA and both LTL( F , X) and LTL(U).
4
Conclusions
Early detection of programming errors requires application of advanced program analysis and verification techniques. These techniques range from light-weight simulations over medium-weight static analysis or model checking to heavyweight theorem proving and axiomatic semantics. In the paper we examined some techniques to handle extremely large finitestate and infinite-state systems. For huge finite systems the cluster-based parallel verification is a natural option. Parallel verification is not the ultimate solution to
Model-Checking Large Finite-State Systems and Beyond
25
the state explosion problem by itself. However, in combination with other techniques, we can verify models that are orders of magnitude larger than systems we would be able to handle with purely sequential techniques. However, many systems have unbounded (i.e. potentially infinite) state spaces. As an example can serve systems with unbounded data types (e.g. queues, channels, or stacks of activation records), parametric systems (i.e. n concurrently running copies), or systems with a dynamically evolving structure (e.g. dynamic creation of processes). Hence, by modelling some nontrivial reality, we can not often avoid (at least potentially) infinite-state systems. Here, we have employed Process Rewrite Systems. Although it is clear that only a small class of real problems can have automated verification procedures, algorithmic boundaries of this class have been intensively studied. We have presented some of the recent (un)decidability results on model-checking of infinite-state systems specified by Process Rewrite System mechanism (possibly extended with a weak finite-state control unit).
References 1. Bao, T. and Jones, M.: Time-Efficient Model Checking with Magnetic Disks. In Proc. Tools and Algorithms for the Construction and Analysis of Systems, Springer-Verlag, LNCS 3440 (2005) 526–540 2. Barnat, J., Brim, L., and Chaloupka, J.: Parallel Breadth-First Search LTL ModelChecking. In Proc. 18th IEEE International Conference on Automated Software Engineering, IEEE Computer Society (2003) 106–115 3. Barnat, J., Brim, L., and Stˇr´ıbrn´ a, J.: Distributed LTL Model-Checking in SPIN. In Proc. SPIN Workshop on Model Checking of Software, Springer-Verlag, LNCS 2057 (2001) 200–216 4. Barnat, J., Forejt, V., Leucker, M., and Weber, M.: DivSPIN – A SPIN Compatible Distributed Model Checker. In Proc. 4th International Workshop on Parallel and Distributed Methods in Verification (2005) 95–100 5. Barnat, J.: Distributed Memory LTL Model Checking. PhD Thesis, Faculty of Informatics, Masaryk University Brno (2004) 6. Behrmann, G., Hune, T.S., and Vaandrager, F.W.: Distributed Timed Model Checking – How the Search Order Matters. In Proc. Computer Aided Verification, Springer, LNCS 1855 (2000) 216–231 7. Bell, A. and Haverkort, B.R.: Sequential and Distributed Model Checking of Petri Net Specifications. Int. J. Softw. Tools Technol. Transfer 7 1 (2005) 43–60 8. Ben-Ari, M., Pnueli, A., and Manna, Z.: The Temporal Logic of Branching Time. Acta Informatica 20 3 (1983) 207–226 9. Blom, S. and Orzan, S.: A Distributed Algorithm for Strong Bisimulation Reduction Of State Spaces. Int. J. Softw. Tools Technol. Transfer 7 1 (2005) 74–86 10. Bollig, B., Leucker, M., and Weber, M.: Parallel Model Checking for the Alternation Free µ-Calculus. In Proc. of TACAS, Springer, LNCS 2031 (2001) 543–558 11. Bouajjani, A., Echahed, R., and Habermehl, P.: On the Verification Problem of Nonregular Properties for Nonregular Processes. In Proc. of LICS’95, IEEE (1995) 123–133 12. Bouajjani, A., Esparza, J., and Maler, O.: Reachability Analysis of Pushdown Automata: Application to Model-Checking. In Proc. of CONCUR’97, LNCS 1243 (1997) 135–150
26
L. Brim and M. Kˇret´ınsk´ y
13. Bouajjani, A. and Habermehl, P.: Constrained Properties, Semilinear Systems, and Petri Nets. In Proc. of CONCUR’96, Springer–Verlag, LNCS 1119 (1996) 481–497 14. Bouajjani, A., Strejˇcek, J., and Touili, T.: On Symbolic Verification of Weakly Extended PAD. In EXPRESS 2006, Electronic Notes in Theoretical Computer Science, Elsevier Science (2006) to appear 15. Bouajjani A. and Touili, T.: Reachability Analysis of Process Rewrite Systems. In Proc. of FSTTCS 2003, Springer–Verlag, LNCS 2914 (2003) 74–87 16. Bozzelli, L.: Model Checking for Process Rewrite Systems and a Class of ActionBased Regular Properties. In Proc. of VMCAI’05, Springer, LNCS 3385 (2005) 282–297 ˇ ak, V., and Strejˇcek, J.: On Decidability of LTL 17. Bozzelli, L., Kˇret´ınsk´ y, M., Reh´ Model Checking for Process Rewrite Systems. In Proceedings of FSTTCS 2006, Springer, LNCS (2006) to appear, Dec. 2006 ˇ 18. Brim, L., Cern´ a, I., Krˇca ´l, P., and Pel´ anek, R.: Distributed LTL Model Checking Based on Negative Cycle Detection. In Proc. of FSTTCS 2001, Springer-Verlag, LNCS 2245 (2001) 96–107 ˇ ˇ sa, J.: Accepting Predecessors are Better 19. Brim, L., Cern´ a, I., Moravec, P., and Simˇ than Back Edges in Distributed LTL Model-Checking. In Formal Methods in Computer-Aided Design (FMCAD 2004), Springer-Verlag, LNCS 3312 (2004) 352– 366 ˇ ˇ sa, J.: How to Order Vertices for Dis20. Brim, L., Cern´ a, I., Moravec, P., and Simˇ tributed LTL Model-Checking Based on Accepting Predecessors. In 4th International Workshop on Parallel and Distributed Methods in verifiCation (PDMC’05), July 2005 21. B¨ uchi, J.R.: Regular Canonical Systems. Arch. Math. Logik u. Grundlagenforschung 6 (1964) 91–111 22. Burkart, O., Caucal, D., Moller, F., and Steffen, B.: Verification on Infinite Structures. In J. Bergstra, A. Ponse, and S. Smolka (eds), Handbook of Process Algebra, Elsevier (2001) 545–623 23. Caucal, D.: On the Regular Structure of Prefix Rewriting. Theor. Comput. Sci. 106 (1992) 61–86 ˇ 24. Cern´ a, I., and Pel´ anek, R.: Distributed Explicit Fair Cycle Detection (Set Based Approach). In Model Checking Software. 10th International SPIN Workshop, Springer-Verlag, LNCS 2648 (2003) 49–73 ˇ 25. Cern´ a, I. and Pel´ anek, R.: Relating Hierarchy of Temporal Properties to Model Checking. In Proc. Mathematical Foundations of Computer Science, SpringerVerlag, LNCS 2747 (2003) 318–327 26. Cherkassky, B.V. and Goldberg, A.V.: Negative-Cycle Detection Algorithms. Mathematical Programming 85 (1999) 277–311 27. Courcoubetis, C., Vardi, M.Y., Wolper, P., and Yannakakis, M.: Memory-Efficient Algorithms for the Verification of Temporal Properties. Formal Methods in System Design 1 (1992) 275–288 28. Dolev, D. and Yao, A.: On the Security of Public Key Protocols. IEEE Transactions on Information Theory 29 2 (1983) 198–208 29. Dwyer, M.B., Avrunin, G.S., and Corbett, J.C.: Property Specification Patterns for Finite-State Verification. In Proc. Workshop on Formal Methods in Software Practice, ACM Press (1998) 7–15 30. Edelkamp, S. and Jabbar, S.: Large-Scale Directed Model Checking LTL. In Model Checking Software: 13th International SPIN Workshop, Springer-Verlag, LNCS 3925 (2006) 1–18
Model-Checking Large Finite-State Systems and Beyond
27
31. Esparza, J.: Decidability of Model Checking for Infinite-State Concurrent Systems. Acta Informatica 34 2 (1997) 85–107 32. Esparza, J.: Grammars as Processes. In Formal and Natural Computing, Springer, LNCS 2300 (2002) 33. Esparza, J. and Kiehn, A.: On the Model Checking Problem for Branching Time Logics and Basic Parallel Processes. In CAV, Springer, LNCS 939 (1995) 353–366 34. Fisler, K., Fraer, R., Kamhi, G., Vardi, M.Y., and Yang, Z.: Is There a Best Symbolic Cycle-Detection Algorithm? In Proc. Tools and Algorithms for the Construction and Analysis of Systems, Springer-Verlag, LNCS 2031 (2001) 420–434 35. Garavel, H., Mateescu, R., and Smarandache, I.: Parallel State Space Construction for Model-Checking. In Proc. SPIN Workshop on Model Checking of Software, Springer-Verlag, LNCS 2057 (2001) 216–234 36. Grumberg, O., Heyman, T., Ifergan, N., and Schuster A.: Achieving Speedups in Distributed Symbolic Reachability Analysis through Asynchronous Computation. In CHARME 2005, Springer, Lecture Notes in Computer Science, (2005) 129–145 37. Grumberg, O., Heyman, T. and Schuster, A.: Distributed Model Checking for µ-Calculus. In Proc. Computer Aided Verification, Springer-Verlag, LNCS 2102 (2001) 350–362 38. Habermehl, P.: On the Complexity of the Linear-Time µ-Calculus for Petri Nets. In Proceedings of ICATPN’97, Springer–Verlag, LNCS 1248 (1997) 102–116 39. Haverkort, B.R., Bell, A., and Bohnenkamp, H.C.: On the Efficient Sequential and Distributed Generation of Very Large Markov Chains From Stochastic Petri Nets. In Proc. 8th Int. Workshop on Petri Net and Performance Models, IEEE Computer Society Press (1999) 12–21 40. Holzmann G.J.: The Spin Model Checker: Primer and Reference Manual. AddisonWesley (2003) 41. Holzmann, G.J., Peled, D., and Yannakakis, M.: On Nested Depth First Search. In Proc. SPIN Workshop on Model Checking of Software, American Mathematical Society, (1996) 23–32 42. H¨ uttel, H. and Srba, J.: Recursion vs. Replication in Simple Cryptographic Protocols. In Proceedings of SOFSEM 2005: Theory and Practice of Computer Science, Springer, LNCS 3381 (2005) 178–187 43. Jabbar, S. and Edelkamp, S.: Parallel External Directed Model Checking with Linear I/O. In Verification, Model Checking, and Abstract Interpretation: 7th International Conference, VMCAI 2006, Springer-Verlag, LNCS 3855 (2006) 237–251 44. Janˇcar P.: High Undecidability of Weak Bisimilarity for Petri Nets. In Proc. of TAPSOFT, Springer, LNCS 915 (1995) 349–363 45. Janˇcar, P. and Moller, F.: Checking Regular Properties of Petri Nets. In CONCUR, Springer, LNCS 962 (1995) 348–362 46. Janˇcar, P., Kuˇcera, A., and Mayr, R.: Deciding Bisimulation-Like Equivalences with Finite-State Processes. Theor. Comput. Sci. 258 (2001) 409–433 ˇ ak, V., and Strejˇcek, J.: Extended Process Rewrite Systems: 47. Kˇret´ınsk´ y, M., Reh´ Expressiveness and Reachability. In Proceedings of CONCUR’04, Springer, LNCS 3170 (2004) 355–370 ˇ ak, V., and Strejˇcek, J.: On Extensions of Process Rewrite 48. Kˇret´ınsk´ y, M., Reh´ Systems: Rewrite Systems with Weak Finite-State Unit. In Proceedings of INFINITY’03, Elsevier, ENTCS 98 (2004) 75–88 ˇ ak, V., and Strejˇcek, J.: Reachability of Hennessy-Milner Prop49. Kˇret´ınsk´ y, M., Reh´ erties for Weakly Extended PRS. In Proceedings of FSTTCS 2005, Springer, LNCS 3821 (2005) 213–224
28
L. Brim and M. Kˇret´ınsk´ y
50. Kuˇcera, A. and Schnoebelen, Ph.: A General Approach to Comparing Infinite-State Systems with Their Finite-State Specifications. In CONCUR, Springer, LNCS 3170 (2004) 371–386 51. Lipton, R. The Reachability Problem is Exponential-Space Hard. Technical Report 62, Department of Computer Science, Yale University (1976) 52. Lugiez D. and Schnoebelen, Ph.: The Regular Viewpoint on PA-Processes. In Proc. of CONCUR’98, Springer, LNCS 1466 (1998) 50–66 53. Mayr, E.W.: An Algorithm for the General Petri Net Reachability Problem. SIAM Journal on Computing 13 3 (1984) 441–460 54. Mayr, R.: Decidability and Complexity of Model Checking Problems for InfiniteState Systems. PhD thesis, Technische Universit¨ at M¨ unchen, (1998) 55. Mayr, R.: Process Rewrite Systems. Information and Computation 156 1 (2000) 264–286 56. Moller, F.: Pushdown Automata, Multiset Automata and Petri Nets. In MFCS Workshop on concurrency, ENTCS 18 (1998) 57. Muller, D., Saoudi, A., and Schupp, P.: Alternating Automata, the Weak Monadic Theory of Trees and Its Complexity. Theor. Comput. Sci. 97 1–2 (1992) 233–244 58. Muller, D. and Schupp, P.: The Theory of Ends, Pushdown Automata, and SecondOrder Logic. Theor. Comput. Sci. 37 (1985) 51–75 59. Pnueli, A.: The Temporal Logic of Programs. In Proc. 18th IEEE Symposium on the Foundations of Computer Science (1977) 46–57 60. Ravi, K., Bloem, R., and Somenzi, F.: A Comparative Study of Symbolic Algorithms for the Computation of Fair Cycles. In Proc. Formal Methods in ComputerAided Design, Springer-Verlag, LNCS 1954 (2000) 143–160 61. Reif, J.: Depth-First Search is Inherently Sequential. Information Proccesing Letters 20 5 (1985) 229–234 62. Saraswat, V.A. and Rinard, M.: Concurrent Constraint Programming. In Proc. of 17th POPL, ACM Press (1990) 232–245 63. Srba, J.: Roadmap of Infinite Results. EATCS Bulletin 78 (2002) 163–175 http://www.brics.dk/~srba/roadmap/. 64. Stern, U. and Dill, D.L.: Using Magnetic Disc Instead of Main Memory in the murϕ Verifier. In Proc. of Computer Aided Verification, Springer-Verlag, LNCS 1427 (1998) 172–183 65. Strejˇcek, J.: Rewrite Systems with Constraints. In: Proc. of EXPRESS’01, ENTCS 52 (2002) 66. Strejˇcek, J.: Linear Temporal Logic: Expressiveness and Model Checking. PhD thesis, Faculty of Informatics, Masaryk University in Brno (2004) 67. Tarjan, R.: Depth First Search and Linear Graph Algorithms. SIAM Journal on Computing Januar 1972, 146–160 68. Vardi, M.Y. and Wolper, P.: An Automata-Theoretic Approach to Automatic Program Verification. In Proc. IEEE Symposium on Logic in Computer Science, Computer Society Press (1986) 322–331
Interaction and Realizability Manfred Broy Institut für Informatik, Technische Universität München D-80290 München Germany
[email protected] http://wwwbroy.informatik.tu-muenchen.de
Abstract. We deal with the issue of realizability and computability of interactive interface behaviors as described in [1]. We treat the following aspects of interactive behaviors that are represented by relations between communication streams: − − − − −
Causality between input and output streams Realizability of single output histories for given input histories The role of non-realizable output in specific system contexts and for composition Relating non-realizable behaviors to state machines The concept of interactive computation and computability
Finally, we relate our results to classical notions of computability. The main goal of this paper is the characterization of a general concept of interactive interface behavior as basis for the extension and generalization of the notion of computability to interactive behaviors.
1 Motivation Interactive distributed computations are typical for software systems today. Classical computability theory deals with so-called sequential algorithms and sequential computations. Church’s thesis that µ-recursive functions (and equivalently Turing machines) capture the intuitive notion of computability is broadly accepted by now. This acceptance is supported by the fact that µ-recursive computability is equivalent to other notions of computability such as Turing-computability or register-machine computability. For interactive, concurrent, distributed, real-time computations the situation is less well understood. Concepts like infinite non-terminating computations and fairness bring in new questions. It has been claimed several times that interactive, concurrent, distributed, real-time systems lead to computations that go beyond Church’s thesis (see, for instance, [10]). In computability theory, a key notion is the concept of a computable function. The set of all partial functions is seen as good representation of the set of all problems. A problem is algorithmically solvable if the partial function is computable. For interactive computation a more sophisticated concept than partial functions is needed. The FOCUS theory (see [1]) has been created to provide an intuitive abstract functional theory for the modular, component-oriented modeling and specification of Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 29 – 50, 2007. © Springer-Verlag Berlin Heidelberg 2007
30
M. Broy
distributed interactive systems. Its key goal is interface abstraction that introduces a functional approach to interactive systems. The theory is based on the notion of causality, a quite intuitive idea capturing the causal relationship between input and output in an interactive computation within a time frame, as well as realizability, at a first glance a less intuitive concept. Both causality and realizability are concepts to characterize the proper flow of messages in a time frame. The concept of a causal, realizable I/O-behavior was initially introduced as representation of the interface behavior of interactive systems. Notions such as causality and realizability have been defined to characterize properties that are useful for modeling and engineering systems. In this paper we apply the notions in a more theoretical area, targeting the issue of computability in terms of interactive computation. We give a more concise analysis of the idea of realizability. Furthermore, we relate this notion to classical concepts of computability. In the following we deal in detail with a theory of realizability and interactive computability. We start by characterizing the idea of interactive computation. We treat its fundamental concepts such as causality and realizability. We relate the concept to state machines with input and output. Finally, we study the structure of non-realizable system specifications and relate our approach to realizability to classical theories of computability such as Turing computability.
2 Sequential, Interactive, and Real-Time Computations The essential difference between a non-interactive and an interactive computation lies in the way the input is provided to the computing device during the computation and how the output is provided by the computing device to its environment. In contrast, in a non-interactive computation, all input is provided before the computation is actually started. Similarly, only after the computation has finished completely, the output becomes available to the environment. 2.1 Sequential Atomic Computability and Input/Output Behaviour A sequential computation (say of a Turing Machine or a recursive function) starts with a given input, which is a word (a string – a finite sequence of characters) stored on the tape of the Turing Machine or a tuple of natural numbers in the case of recursive function. The difference between these different forms of input is not deep, since we can encode strings by tuples of natural numbers and vice versa. The computation is carried out by steps of the Turing Machine or in the case of recursive functions as steps of term rewriting. The steps are done in a sequential order (this is why we speak of a sequential computation) until a final state is reached or the computation may go on forever. Since we do not consider the intermediate steps of the computation, we also speak of an atomic computation. To keep the discussion simple we do not consider “run time” errors such as “exceptional results” which we consider simply as special instances of output and distinguish only terminating from nonterminating computations. Let M be the set of input and output values (for simplicity we consider only identical sets for the input and the output, in the case of Turing Machines strings, in the case of recursive
Interaction and Realizability
31
functions tuples of natural numbers). Thus what a Turing Machine or a recursive function computes is a partial function f: M → M By Dom(f) ⊆ M we denote the set of arguments x for which the value of f applied to x is defined. The partial function f is used to represent the behavior of algorithms or computational machines. We say that a Turing Machine computes f if, whenever we start the Turing Machine for an argument x ∈ M the Turing Machine does not terminate if x ∉ Dom(f) and otherwise it terminates with the result f(x). Definition. Computable Function A partial function f: M → M is called computable, if there is a Turing Machine (recursive function, register machine) that computes f. ❑ In denotational semantics a special symbol ⊥ is introduced to symbolize the results of nonterminating computation. The symbol is added to the set M M⊥ = M ∪ {⊥} To deal with the composition of computations the symbol ⊥ is also allowed as argument. Thus the following function f: M⊥ → M⊥ is used to model the results of algorithms or Turing machines. Technically, to deal with recursion monotonic functions are considered. Monotonicity reflects the fact that nontermination is not decidable. As a result, every computable function is monotonic, but not vice versa. Another concept used in denotational semantics is continuity. The concept basically expresses that the language constructs allow for recursive definitions that can be inductively approximated. Partial functions are a key to the notion of computability since they are the artifacts that are called computable. 2.2 Interactive Computations In interactive computations, in contrast to sequential atomic computation, input is provided step after step and also output is provided step after step while the computation is still going on, perhaps going on forever. More precisely, the computation is carried out by a sequence of pairs of successive complementary steps. It is no longer atomic. In the first step a piece of input is provided and in the second step a piece of output is returned. Since the number of steps is not bounded, in the course of the computation an infinite stream of input may be consumed and an infinite stream of output may be generated. It is a key issue that in each step of the computation the input can be freely chosen by the environment and therefore the future input is not known, in advance, to the system. In such a setting infinite interactive computations make perfect sense. This is in contrast to non-interactive computation, for which infinite computations make no sense, since, in this case, the computation never provides any output.
32
M. Broy
Interactive computations can be nicely related to two-person games. One player (the “user” or the system’s environment) provides the input (which represents its “move”) to the system, which is the other player, which provides some output (representing its “counter move”). There are two essential assumptions that characterize interactive computations. None of the players actually knows in advance the sequence of moves the other player will do in the future and both moves are done in each step simultaneously. In the following we study two models of interactive computations, namely stream processing and state transitions, and show how they are related. We discuss their inherent properties in terms of two-person games.
3 Foundations In this section we introduce a number of foundational notions. We define, in particular, an interactive interface behavior as a notion of what is called computable or not for concurrent interactive computations. 3.1 Types, Streams, Channels and Histories A type is a name for a set of data elements. Let TYPE be the set of all types. With each type Τ ∈ TYPE we associate a set of data elements, the carrier set for Τ. We use the following notation: M∗ denotes the set of finite sequences over M including the empty sequence 〈〉,
M∞ denotes the set of infinite sequences over M (that are represented by the total mappings IN+ → M were IN+ = IN \ {0}). By Mω = M∗ ∪ M∞ we denote the set of streams of elements taken from the set M. Streams of elements from M are finite or infinite sequences of elements of the set M. By 〈〉 we denote the empty stream m. The set of streams has a rich algebraic and topological structure. We introduce concatenation ˆ as an operator _ˆ_ : Mω × Mω → Mω On finite streams s concatenation is defined as usual on finite sequences. For infinite streams r, s: IN+ → M we define sˆx, xˆs, sˆr to be infinite streams as follows: sˆx = s sˆr = s 〈x1 … xn〉ˆ〈s1 … 〉 = 〈x1 … xn s1 … 〉 We may see finite streams as partial functions IN+ → M and infinite streams as total functions IN+ → M. A stream represents the sequence of messages sent over a channel during the lifetime of a system. Of course, in concrete systems this communication takes place in
Interaction and Realizability
33
a time frame. Hence, it is often convenient to be able to refer to this time. Moreover, the theory of feedback loops in networks of communicating components gets much simpler. Therefore we work with timed streams. Streams are used to represent histories of communications of data messages transmitted within a time frame. Given a message set M of type Τ we define a timed stream by a function s: IN+ → M∗ For each time t the sequence s(t) denotes the sequence of messages communicated at time t in the stream s. Throughout this paper we work with a couple of simple basic operators and notations for streams and timed streams respectively that are summarized below: 〈〉
empty sequence or empty stream,
〈m〉
one-element sequence containing m as its only element,
x.t
t-th element of the stream x,
x↓t
prefix of length t of the stream x,
#x
number of elements in x
x
finite or infinite stream that is the result of concatenating all sequences in the timed stream x. Note that x is finite if x carries only a finite number of nonempty sequences.
In a timed stream x ∈ (M∗)∞ we express in which time intervals which messages are transmitted. Throughout this paper, we use streams exclusively to model the communication histories of sequential communication media called channels. In general, in a system several communication streams occur. Therefore we work with channels to refer to individual communication streams. Accordingly, in FOCUS, a channel is simply an identifier in a system that evaluates to a stream in every execution of the system. Definition. Channel history Let C be a set of channels; a channel history is a mapping (let IU be the universe of all data elements)
x : C → (IN → IU∗) such that x.c is a stream of type Type(c) for each c ∈ C. By C the set of channel histories for the channel set C is denoted. ❑ All operations and notation introduced for streams generalize in a straightforward way to histories applying them elementwise. 3.2 Interface Behaviors In this section we introduce a model for I/O behavior. We consider interface behaviors of the form:
34
M. Broy
F : I → ℘( O ) that model interactive computations. We assume for these functions the law of strong causality. It reads as follows (let x, z ∈ I , y ∈ O , t ∈ IN ):
x↓t = z↓t ⇒ {y↓t+1: y ∈ F(x)} = {y↓t+1: y ∈ F(z)} Causality characterizes a proper time flow. It captures the principle that a reaction to input can happen only after the input has been received. Since the output in step t is produced while the input in step t is provided, the output in step t must depend at most on input provided before step t. A behavior F is called deterministic (and total) if F(x) is a one element set for each input history x. Such a behavior is equivalent to a function
f: I → O
where F(x) = {f(x)}
f represents a deterministic interface behavior, if the strong causality properly holds. Then we have x↓t = z↓t ⇒ f(x)↓t+1 = f(z)↓t+1 Again this property guarantees proper time flow. Definition. Realizability An I/O-behavior F is called realizable, if there exists a strongly causal total function f: I → O such that we have: ∀ x ∈ I : f(x) ∈ F(x).
f is called a realization of F. By F we denote the set of all realizations of F. An output history y ∈ F(x) is called realizable for an I/O-behavior F with input x, if there ❑ exists a realization f ∈ F with y = f(x). A strongly causal function f: I → O provides a deterministic strategy to calculate for every input history x a particular output history y = f(x). The strategy is called correct for input x and output y with respect to F if y = f(x) ∈ F(x). According to strong causality the output y can be computed in an interactive computation. Only input x↓t received till time t determines the output till time t+1 and, in particular, the output at time t+1. As we will demonstrate, f essentially defines a deterministic “abstract” automaton with input and output which is due to strong causality actually a Moore machine. Obviously, partial I/O-behaviors F (these are behaviors with F(x) = ∅ for some input history x) are not realizable. If F(x) = ∅ for some input x, then by strong causality we get F(x) = ∅ for all input histories x. However, there are also more sophisticated examples of behaviors that are not realizable. Consider, for instance, the following example of a behavior F: I → ℘( I ) that is not realizable (the proof of this fact is left to the reader, a proof is given in [1]):
F(x) = {x' ∈ I : x
x'}
Interaction and Realizability
35
Note that F(x) is strongly causal but nevertheless F = ∅ and thus no output is realizable. Definition. Full Realizability An I/O-behavior F is called fully realizable, if it is realizable and if for all input histories x ∈ I
F(x) = {f(x): f ∈ F } holds. Then also every output is realizable.
❑
Full realizability of F guarantees that for all output histories y ∈ F(x) for some input x there is a strategy (a deterministic implementation) that computes this output history. In other words, for each input history x each output history y ∈ F(x) is realizable.
4 State Machines The concept of an interface behavior does not express the idea of an interactive computation very explicitly. State machines with input and output more explicitly express the concept of interaction. 4.1 State Machines with Input and Output In this section we introduce the concept of a state machine with input and output. A state machine (Δ, Λ) with input and output according to the set I of input channels and the set O of output channels is given by a state space Σ, which represents a set of states, a set Λ ⊆ Σ of initial states as well as a state transition function Δ: (Σ × (I → M∗)) → ℘(Σ × (O → M∗)) For each state σ ∈ Σ and each valuation α: I → M∗ of the input channels in I by sequences of messages we obtain by every pair (σ', β) ∈ Δ(σ, α) a successor state σ' and a valuation β: O → M∗ of the output channels consisting of the sequences produced by the state transition. Such state machines are also called Mealy machines (move precisely Mealy machines generalized to infinite state spaces and infinite input/output alphabets). A state machine (Δ, Λ) is called − − −
deterministic, if, for all states σ ∈ Σ and input α, Δ(σ, α) and Λ are sets with at most one element. total, if for all states σ ∈ Σ and all inputs α the sets Δ(σ, α) and Λ are not empty; otherwise the machine (Δ, Λ) is called partial, a (generalized) Moore machine, if its output depends only on the state and not on the actual input of the machine. Then the following equation holds for all input sequences α, α′ and output sequences β, and all states σ: ∃ σ′ ∈ Σ: (σ′, β) ∈ Δ(σ, α) ⇔ ∃ σ′ ∈ Σ: (σ′, β) ∈ Δ(σ, α′)
36
M. Broy
A more explicit way to characterize a Moore machine is to require functions out: Σ → ℘(O → M∗) next: Σ × (I → M∗) × (O → M∗) → ℘(Σ) such that Δ(σ, α) = {(σ′, β): β ∈ out(σ) ∧ σ′ ∈ next(σ, α, β)} Note a subtle point here: the choice of the output β does not depend on the input α, but the choice of the successor state σ′ may depend both on the input α and on the choice of the output β. We therefore require that for each β ∈ out(σ) there actually exists a successor state: ∀ β ∈ out(σ): ∃ σ′∈ Σ: σ′ ∈ next(σ, α, β) This characterization of the Moore property of a Mealy machine is equivalent to the other one given above. By SM[I ¢ O] we denote the set of all total Moore machines with input channels I and output channels O. By DSM[I ¢ O] we denote the set of deterministic total Moore machines. 4.2 Computations of State Machines In this section we define the idea of computations for state machines with input and output.
x1/y2 σ0
x2 /y2 σ1
σ2 …
Fig. 1. Computation of an I/O - machine
Fig 1 shows typically a computation of a state machine with input and output. Actually we get three streams that way − − −
A stream x of input : x1 , x2, … A stream y of output : y1 , y2, … A stream s of states : σ0 , σ1, …
Note, that the computation can be generated given the input stream x and the initial state σ0 by choosing step by step (σi+1, yi+1) ∈ Δ(σi, xi+1). A computation for a state machine (Δ, Λ) and an input history x ∈ I is given by a sequence of states {σt: t ∈ IN } and an output history y ∈ O such that for all times t ∈ IN we have: (σt+1, y.t+1) ∈ Δ(σt, x.t+1) and σ0 ∈ Λ
Interaction and Realizability
37
The history y is called an output of the computation of the state machine (Δ, Λ) for input x and initial state σ0. We also say that the machine computes the output history y for the input history x and initial state σ0. 4.3 Refinement and Equivalence of State Machines with Input and Output Two state machines are called (observably) equivalent if for each input history their sets of output histories coincide. A state machine is called equivalent to a behavior F, if for each input history x the state machine computes exactly the output histories in the set F(x). A state machine (Δ2, Λ2) with transition function Δ2: (Σ2 × (I → M∗)) → ℘(Σ2 × (O → M∗)) is called a transition refinement or a simulation of a state machine (Δ1, Λ1) with the transition function Δ1: (Σ1 × (I → M∗)) → ℘(Σ1 × (O → M∗)) if there is a mapping ρ: Σ2 → Σ1 such that for all states σ ∈ Σ2, and all input α ∈ I → M∗ we have: {(ρ(σ′), β): (σ′, β) ∈ Δ2(σ, α)} ⊆ Δ1(ρ(σ), α),
{ρ(σ): σ ∈ Λ2} ⊆ Λ1
A special case is given if ρ is the identity; then the equation simplifies to: Δ2(σ, α) ⊆ Δ1(σ, α) ∧ Λ2 ⊆ Λ1 Based on these definitions we show that all computations of nondeterministic machines can be carried out also by deterministic machines that are in their set of refinements. Theorem Every computation of a total non-deterministic Moore machine is also a computation of a total deterministic Moore machine. Proof Given a state machine (Δ1, Λ1) with transition function: Δ1: (Σ1 × (I → M∗)) → ℘(Σ1 × (O → M∗)) we construct a deterministic state machine (Δ2, Λ2) that is a refinement of (Δ1, Λ1) with the transition function Δ2: (Σ2 × (I → M∗)) → ℘(Σ2 × (O → M∗)) with Σ2 = (Σ1 × IN) and Δ2((σ, k), β) ∈ Δ1(σ, α). Given a computation {σt: t ∈ IN} with input x and output y for the state machine ❑ (Δ1, Λ1) we define Δ2((σt, t), x.t+1) = {((σt+1, t+1), y.t+1)}. As seen in the construction of the deterministic state machine we need a more involved construction of the state space of the state machine than just deterministic
38
M. Broy
refinements of a non-deterministic state machine over the same state space to capture, by a deterministic state machine s, all output histories produced by a nondeterministic one. To show the reason for this construction very explicitly, we consider an extreme case. We look at a state machine with a state space Σ = {σ} with only one state that may generate arbitrary output (σ, α) = {(σ, ): true} This means every output history can be generated for each given input history. Since there is only one state in the state space, a deterministic Moore machine with the same state space (since the output depends only on the state) will produce the same output in every state transition for any input. Obviously, this way the required output cannot be generated, in general. Therefore we need the more general state space with Σ2 = (Σ1 × IN) as shown in the proof above where for the Moore machine (Δ2, Λ) we only require: Δ2((σ, t), α) = {((σ′, t+1), )}
where
(σ′, ) ∈ Δ1 (σ, α)
Each such state machine (Δ2, {(σ0 , 0)}) with σ0 ∈ Λ1 is called a deterministic enhanced refinement of state machine (Δ1, Λ1). Here there is a subtle point, since the refinement of a Moore machine need not necessarily be a Moore machine again. A simple example demonstrates the problem. Consider again two state machines: Δ1, Δ2: Σ × (I → M∗) → ℘(Σ × (O → M∗)) where Δ1 produces arbitrary output and arbitrary successor states. Δ1 is trivially a Moore machine. Clearly every machine Δ2 in DSM[I ¢ O] is a refinement of Δ1. In fact, every Mealy machine Δ2 is a refinement of Δ1, too. This shows that there are refinements of Moore machines that are not Moore machines again since there are Mealy machines that are not Moore machines. To make sure that we obtain Moore machines in the construction above we have to strengthen the formula slightly as follows: ∀α: Δ2((σ, t), α) = {((σ′, t+1), )}
where (σ′, ) ∈ Δ1 (σ, α)
Since does not depend on α in the original machine, this formula can be fulfilled for each output . Let us have a slightly more careful look at the idea of a Moore machine. A machine: Δ: Σ × (I → M∗) → ℘(Σ × (O → M∗)) is a Moore machine, if for all states σ ∈ Σ, α, α′ ∈ (I → M∗) we have the equation output(Δ(σ, α)) = output(Δ(σ, α′)) where the function output: ℘(Σ × (O → M∗)) → ℘((O → M∗)) is defined by
Interaction and Realizability
39
output(P) = {β: ∃ σ: (σ, β) ∈ P} With this definition in mind we define a deterministic enhanced refinement as follows: Δ′: (Σ × IN ) × (I → M∗) → ℘((Σ × IN ) × (O → M∗)) where we define Δ′((σ, k), α) ∈ {((σ′, k+1), β): (σ′, β) ∈ Δ(σ, α)} and require for all input α, α′ ∈ (I → M∗) output(Δ′((σ, k), α)) = output(Δ((σ, k), α′)) The second condition can actually be achieved, since for all inputs α and α′ we have output(Δ(σ, α)) = output(Δ(σ, α′)) (since Δ is a Moore machine) and therefore we can choose the output β of Δ((σ, k), α) the same for all inputs α. Note that we can do the choices for any computation of Δ for a given input x and output y so that the resulting state machine Δ′ does carry out exactly the chosen computation. 4.4 Combination of State Machines We can also combine sets of state machines into one state machine. Let a set of state machines be given (where K is an arbitrary set of names for state machines) {(Δk, Λk): k ∈ K} with Δk: (Σk × (I → M∗)) → ℘(Σk × (O → M∗)) We define the composed state machine (Δ, Λ) =
(Δk, Λk) k∈K
as follows (let w.l.o.g. all state spaces Σk for the machines (Δk, Λk), with k ∈ K be pairwise disjoint): Λ=
∪
Λk
k∈K
Δ(σ, α) = Δk(σ, α)
for σ ∈ Σk
Obviously, the computations of (Δ, Λ) are exactly the union of the computations of the individual machines (Δk, Λk). Note that the resulting machine (Δ, Λ) is a Moore machine again, if all the state machines (Δk, Λk) combined that way are Moore machines. We immediately get the following theorem about the equivalence of nondeterministic Moore machines with sets of deterministic Moore machines:
40
M. Broy
Theorem Every total Moore machine is equivalent to (a state machine composed of) a set of deterministic Moore machines. Proof Consider the union of the set of deterministic refined enhanced state machines that are constructed as shown in the proof above. ❑ This shows that in the sense defined above non-deterministic state machines are not more powerful than deterministic ones. 4.5 Interface Abstractions for State Machines with Input and Output In this section we study the transition from state machines to behaviors. Theorem Every total deterministic Moore machine (Δ, Λ) with the transition function Δ: (Σ × (I → M∗)) → ℘(Σ × (O → M∗)) defines a deterministic behavior
FσΔ : I → ℘( O ) for every state σ ∈ Σ where for each input x the output of the state machine (Δ, Λ) is the history y where FσΔ (x) = {y}. In particular, the function FσΔ is strongly causal. Proof Given a total deterministic Moore machine (Δ, {σ0}) with state transition function Δ: (Σ × (I → M∗)) → ℘(Σ × (O → M∗)) we construct for every state σ ∈ Σ a deterministic behavior FσΔ : I → ℘( O ) as follows:
FσΔ (x) = 〈β〉ˆ FσΔ' (x↑2) where
x = 〈x.1〉ˆ(x↑2) and Δ(σ, x.1) = {(σ′, β)}
Here for a history x ∈ C we denote by x↑t with t ∈ IN the history where for every channel the first t sequences are dropped in its streams. This way the behavior FσΔ is uniquely defined. FσΔ denotes for the initial state σ0 the “functional” behavior of (Δ, Λ). By induction we easily prove (proof is left to the reader) that for each input history x the output of the machine (Δ, {σ0}) is the history y where FσΔ0 (x) = {y} and that FσΔ is strongly causal. ❑ We define an operator along the lines of the proof of the theorem above Ψ: DSM[I ¢ O] → ( I →℘( O )) that maps every total deterministic Moore machine onto its interface abstraction
Interaction and Realizability
41
Ψ((Δ, {σ0})) = FσΔ0 where FσΔ0 is constructed as described in the proof above. Corollary Every total Moore machine can be represented by a fully realizable behavior. Proof Given the total Moore machine take its set of total deterministic enhanced refinements, construct a behavior for each of them and take their union to get the behavior that represents the Moore machine. Note that the union of strongly causal ❑ behavior functions yields a strongly causal behavior function. This shows that there is a function that maps total Moore machines on equivalent realizable behaviors. 4.6 State Machines with Input and Output for Interactive Behaviors In this section we study the transition from behaviors to state machines. In particular, we show that deterministic behaviors define Moore machines. Theorem Every deterministic behavior F: I → ℘( O ) defines a total deterministic Moore machine (Δ, Λ) with a transition function Δ: (Σ × (I → M∗)) → ℘(Σ × (O → M∗)). Proof Given a deterministic behavior F: I → ℘( O ) we define a total deterministic Moore machine (ΔF, ΛF) with ΔF: (ΣF × (I → M∗)) → ℘(ΣF × (O → M∗)) as follows. We choose for ΣF the subset of I → ℘( O ) of all deterministic strongly causal behaviors. Then we define: Λ F = {F } and Δ F(F′, α) = {(F″, β)}
where
∀ α, x: F′(〈α〉ˆx) = 〈β〉ˆF″(x)
Note that by the equation F″ is uniquely defined and the equation defining F″ is consistent due to the strong causality of F′. We obtain the equation Ψ((Δ F, {F})) = F This shows that the interface abstraction is the inverse to the construction of the state machine for an interface behavior. ❑
42
M. Broy
Corollary Every fully realizable interactive behavior can be represented by a total Moore machine. Proof Given a fully realizable interactive behavior take the set of total deterministic refinements of the behavior, construct for each behavior the Moore machine and take the union to get the Moore machine that represents the behavior. ❑ This shows that there is a function that maps realizable behaviors onto “equivalent” Moore machines.
5 Interactive Computations Revisited In this section we introduce the idea of an interactive computation independent of the idea of a state machine. 5.1 Interactive Computations We assume that in an interactive computation the input is provided step by step by the environment and the output is produced step by step by the system. In particular, each step is carried out without knowing anything about the future input1. More precisely, the computation is carried out for each time interval t ∈ IN always in two steps: (1) The input x.t is provided to the system, (2) The output y.t+1 is selected. It must and can depend only on the initial state, the input till time interval t and the output, which is produced so far, till time interval t. More precisely, to model interactive computations we assume for each initial state of the considered system a function:
g: {z: I ∪ O → (M∗)t: t ∈ IN } × (I → M∗) → ℘(O → M∗) such that for given input history x ∈ I we define the output history y ∈ O and the state of the computation z inductively, where z : IN → (I ∪ O → (M∗)t)
y.t+1 ∈ g(z.t)
where
(z.t)⏐I = x↓t
and (z.t)⏐O = y↓t
The function g is called an interactive computation strategy. By Out(g)(x) we denote the set of output histories y that can be constructed by the computation strategy g in this way. The function g is called an interactive computation strategy for behaviors from I → ℘( O ). The computation strategy g is called correct for the interactive behavior F if Out(g)(x) ⊆ F(x); it is called deterministic if Out(g)(x) is always a one element set. 1
Note that if we drop this assumption and consider a situation where we know all the input in advance then neither the concept of causality nor the concept of realizability nor the concept of interaction is needed.
Interaction and Realizability
43
For this idea of an interactive computation the following observations about strategies g hold: •
As long as g(z) is never empty, Out(g) is never empty.
•
Each strategy can be refined into a set of deterministic strategies G where − for each g′ ∈ G we require that g′(z) ∈ g(z) holds and g′(x) contains exactly one element, − a deterministic strategy is equivalent to a deterministic behavior, − we get g(x) = {g′(x): g′ ∈ G }.
The construction shows that each strategy can be replaced by a set of deterministic strategies. Note that the idea of a strategy exactly captures the concept of an interactive computation, where in each step the output has to be determined from the initial state and the (history formed by) previous input and output. 5.2 Real Time Computation and Divergence A significant issue in computability theory is nontermination (also called divergence). Nontermination addresses the situation where a computation does not terminate but goes on forever. In the models of computations such as deterministic Turing machines a computation either stops and delivers a result or it does not stop and instead computes forever. Therefore we can model the effects of deterministic algorithms or Turing machines by partial functions. These are functions that deliver results only for a subset of their set of arguments while for arguments outside this set the results are not defined since for these arguments the computations do not terminate. Termination is a fundamental aspect of computability. As well-known, it is not general computable (more precisely decidable), whether a computation terminates for an argument. For real machines that compute in real time the situation is quite different. A digital machine is pulse driven that means in a certain time interval it does a finite number of steps. If we exclude the possibility that the machine may break, then after each time interval it reaches an intermediate state. This is the same for Turing machines. Therefore, when considering computations within a time frame nontermination is no longer modeled by a partial function with an argument without a defined result but by an infinite sequence of states. Note that there is also a physical aspect of real time computation. Given a certain function and an argument we can ask what is the smallest time duration in which its result can be computed. This question is very difficult to answer, since it asks what is the fastest concept of computation.
6 The Essence of Realizability In this section we analyze what it means in terms of interactive computations that for a behavior some output is not realizable. To do that we refer to the idea of a computation strategy as introduced in the previous section. We show that an output y ∈ F(x) is not realizable if and only if there does not exist a correct deterministic
44
M. Broy
strategy g′ with y ∈ g′(x). According to what we have said above this implies that there does also not exist a nondeterministic strategy g with y ∈ Out(g)(x) either. A partial computation z: I ∪ O → (M∗)t for some t ∈ IN is called a dead end for an interactive behavior F, input history x and output history y such that y ∈ F(x), z⏐I = x↓t and z⏐O = y↓t, if there does not exist a correct computation strategy g for every input history x’ ∈ I with z⏐I = x’↓t. In other words, either for some input history x’ ∈ I with z⏐I = x’↓t there does not exist an output history y ∈ F(x) such that z⏐O = y↓t (which contradicts the property of strong causality) or for each deterministic strategy g which leads to z in a partial computation there exists some x’ ∈ I such that Out(g)(x’) ∉ F(x’). In fact, in the case of a dead end z: I ∪ O → (M∗)t we may find some output history y ∈ F(x) for each given input x ∈ I with z⏐I = x↓t such that z⏐O = y↓t, but nevertheless there does not exist a strategy that calculates for each such input x the output y ∈ F(x) in this stepwise fashion of interaction. This situation can be reflected in our game theoretic view. Assume we say that there is a winning strategy for the partial computation z: I ∪ O → (M∗)t for some time t ∈ IN (which represents a partially played game) if there is a strategy g with g(x) ∈ F(x) for all x ∈ I with z⏐I = x↓t that finds some y ∈ F(x) such that z⏐O = y↓t, where g(x) = {y}. If for a partial computation z every y ∈ F(x) with z⏐O = y is not realizable then there does not exist a winning strategy. Now we study the situation where a behavior
F : I → ℘( O ) is not fully realizable. This means that there is some input x ∈ I and some output y ∈ F(x) such that there does not exist a strongly causal total function f: I → O such that ∀ x ∈ I : f(x) ∈ F(x) and y = f(x). We show in the next section that then there is a time t ∈ IN such that the partial computation z with z⏐I = x↓t and z⏐O = y↓t is a dead end. We discuss the issue in more detail in the following section.
7 On the Structure of Not Realizable Output Histories In this section we analyze situations of behaviors F where there exists an output history y ∈ F(x) for an input history x where y is not realizable for F. Due to strong causality we know that for each input history x′ ∈ I and each time t ∈ IN with
x′↓t = x↓t there exists some output history y′ ∈ F(x′) for where F
y′↓t+1 = y↓t+1 The question we are interested in is whether we can find some output history y′ with this property that is realizable for F although y is not realizable or whether all output histories with this property are also not realizable.
Interaction and Realizability
45
Let us collect some key proposition about a behavior F where for a given input history x ∈ I the output history y ∈ F(x) is not realizable. This means that there is no correct strategy that produces on input history x the output history y. According to strong causality if y ∈ F(x) and for every input history x′ and time t with x′↓t = x↓t there is a realizable output history y′ ∈ F(x) with y′↓t+1 = y↓t+1 such that there is a winning strategy for y′ all times t. As a consequence there is a winning strategy producing the history y for input history x, in other words: y is realizable. To analyze the situation and the structure of non-realizable output in more detail we employ the concept of game theory, again. We characterize partial computations for output y ∈ F(x) by pairs (for t ∈ IN) (a, b) ∈ (I → (M∗)t , O → (M∗)t) where
x↓t = a ∧ y↓t = b We say that a partial computation (a, b) is in a winning state (and not a dead end) if there is a strategy g such that for all input histories x′ ∈ I :
bˆg(x′) ∈ F(aˆx′) In other words, there exists a strategy that after the partial computation (a, b) for further every input x′ the strategy delivers a correct output bˆg(x′). If a computation is not in a winning state there is no such strategy. In other words, for each strategy there exists an input history that leads for the selected strategy to some output that is no correct w.r.t. the behavior F. We call winning states (states for which a winning strategy exists) “white” states and loosing states (states for which a winning strategy does not exist) “black” states. Each state is characterized by a pair of evaluations for the channels (a, b) ∈ (I → (M∗)t , O → (M∗)t). An interactive computation step is the transition of a state (a, b) to a new state (a′, b′) where there exists input : I → M∗, output β: O → M∗ and a′ = aˆ〈 〉, b′ = bˆ〈β〉.
(α1/β1) …
(α2/β2) …
Fig. 2. Infinite Tree of Partial Computations with Black Nodes Denoting Loosing States
46
M. Broy
A step is called correct, if (a′ , b′ ) is again a partial computation, i.e. if there exist histories x ∈ I and y ∈ F(x) with
x↓ (t+1) = a′ ∧ y↓ (t+1) = b′ For each behavior F we obtain a tree of partial computations as shown in Fig 2. A node in the tree is white if and only if for every input α: I → M∗ there exists some output β: O → M∗ such that there is an arc that is labeled by α/β and leads to a white node. A node is black if and only if there exists some input α: I → M∗ such that for each feasible output β: O → M∗ the arcs labeled by α/β lead to black nodes. A behavior is realizable, if the root of its computation tree is white. It is fully realizable if its computation tree contains only white nodes. Each path in the tree conforms to a computation. For each input history x ∈ I and each output history y ∈ F(x) we obtain a path in the computation tree. We get: (1) The history y ∈ F(x) is realizable for the input x if and only if its corresponding computation path is colored by white nodes only. (2) The history y ∈ F(x) is not realizable if and only if the is at least one node on its path in the computation tree that is black. (3) For a not realizable history y ∈ F(x) there is a least partial computation (a, b) with a↓t = x and b↓t = y such that its node is black and all nodes (a↓t′, b↓t′) with t′ < t are white. This means that all output histories y′ ∈ F(x′) with a.
y′↓t = y↓t
∧ x′↓t = x↓t
b.
are not realizable since there is a black node on their computation paths.
(4) Due to strong causality, if y ∈ F(x) and y is not realizable there exists a time t such that all input histories x′ with x↓t = x′↓t contain not realizable output histories in F(x). The statements (1)~(4) are, in fact, theorems. We sketch their proofs in the following: A winning strategy corresponds to a tree in which all nodes are white and there is a path for every input history x ∈ I . (1) If the path for y ∈ F(x) is all white it is part of a winning strategy. Note, that a node can only be colored white if for every input history there exists a computation path that is all white. This proves the “if”. The “only if” is proved in (2). (2) Assume that on the computation path for y ∈ F(x) there is a black node. Then there exists a first black node in the path that appears after t steps. Since for the black node the exists no winning strategy there cannot be a winning strategy for y. Assume g is the winning strategy for y ∈ F(x) then g′ is a winning strategy for the t-th node on the path of (x, y) defined by a. g′(a, b) = g((x↓t)ˆa, (y↓t)ˆb)↑t
Interaction and Realizability
47
(3) All output histories y′ ∈ F(x) with y′↓t = y↓t share in their computation paths for (x, y′) the black node at position t. (4) For all input histories x′ with x′↓t = x↓t where t is larger than the position of the first black node on the computation path for (x, y) due to causality there exists an output history y′ ∈ F(x′) with y↓t = y′↓t and thus on a computation path for (x′, y′) there is a black node and therefore y′ is not realizable for x in F. This gives clear ideas about the structure of behaviors with non-realizable output histories. The key result is that for every not realizable history y ∈ F(x) there is a finite time t ∈ IN such that all histories y′↓t = y↓t are not realizable for input x.
8 Interactive Computability Note the difference between computability in the sense of Church’s thesis and computability theory on one side and the notions of causality and realizability on the other side. We talked about computations in the previous chapters, but only in the sense of interactive vs. non-interactive computation, not actually discussing computability in Church’s sense. Now we show how to extend the concept of Turing computability to interactive computations. 8.1 Computability and Nondeterminism Nondeterminism plays an important role in the theory of computation. Nondeterminism deals with computations and the results of computations that are not uniquely specified. This means that in the computations nondeterministic choices are possible and that there is a set of potential results. There are a number of different ideas how to understand nondeterminism in terms of computations. One idea is to calculate the set of all results – which is only possible effectively for finite sets – or to enumerate the set of results. Another idea is that there occur choices in the computations. Then a further issue is what to do with nonterminating branches. Angelic nondeterminism avoids nonterminating branches (which leads to complications according to the undecidability of the halting problem), erratic nondeterminism does arbitrary choices without any care whether the choices may lead to nontermination, and demonic nondeterminism leads into nonterminating computations, if there are some. We take a different view to nondeterminism. We consider a nondeterministic behavior or a nondeterministic state machine as the description (a “specification”) of a set of deterministic behaviors or a set of nondeterministic state machines. One way to think about this is to think about hidden parameters or states (“oracles”) that determine the choices. We thus understand nondeterminism as underspecification. This idea guides our idea of computability of nondeterministic behaviors or state machines. We call a nondeterministic behavior or a state machine computable, if there is a deterministic refinement that is computable.
48
M. Broy
In the following, we consider only deterministic behaviors and state machines. We accept, for simplicity, as state space only the natural numbers and use for simplicity also only natural numbers as input and output. 8.2 Computability of State Machines We call the state machine computable if the state transition function is Turing computable. We call a deterministic behavior computable if its state machine representation is computable. Theorem If a behavior is computable it is realizable.
❑
The theorem immediately follows from our basic definitions since every state machines defines only realizable behaviors. Computability for non-deterministic behaviors is a more subtle notion. A nondeterministic behavior is called computable, if it is realizable and the corresponding state machine (which always exists as we have proved) is computable. A non-deterministic state machine (∆ , Λ ) is called computable if its set of initial states is finite and there is a non-deterministic Turing machine that calculates for each state σ and each input α the (finite) set Δ(σ, α). 8.3 Computability of Interactive Behaviors We consider timed and untimed behaviors. We consider only messages that are natural numbers. The idea of computability is well-understood for partial functions over natural numbers and also for finite sequences of natural numbers. Computability of Untimed Interactive Behaviors. For simplicity we consider only functions over untimed streams. The generalization to tuples of streams is quite straightforward. We consider functions on streams ω
ω
f: IN → IN
We call the stream function f computable iff there exists a computable function:
f∗: IN∗ × IN → IN such that for all sequences x ∈ IN∗, t ∈ IN :
f∗ (x, t) = f(x).t iff #f(x)
t and f∗ (x, t) undefined otherwise
and all x ∈ IN ∞, t ∈ IN there exists x’ ∈ IN∗ such that
f(x).t = f∗(x, t) Note that the second condition is actually with what is called continuity in fixpoint theory.
Interaction and Realizability
49
Computability of Timed Interactive Behaviors. For simplicity we consider only functions over timed streams. The generalization to tuples of streams is quite straightforward. We consider functions on streams
f: (IN+ → IN ∗) → (IN+ → IN ∗) We call f computable iff there exists a computable function:
f∗: {(x, y) ∈ (IN∗)t × (IN∗)t: t ∈ IN } → IN∗ such that for all t ∈ IN , (x, y) ∈ (IN∗)t × (IN∗)t:
f∗((x, y), t) = f(x).t+1 Note that the condition guarantees strong causality. Note also that here we do not deal with partial functions any more.
9 Concluding Remarks Realizability is a notion that only arises in the context of specifications of interactive computations. It is a fundamental issue when asking whether a behavior corresponds to a computation. We choose to work out the theory of realizability in terms of Moore machines because they are a more intuitive model of interactive computation. As we show in the appendix the problem of realizability is not a problem of Moore machines only but applies as well to Mealy machines. The bottom line of our investigation is that state machines with input and output, in particular, generalized Moore machines, are an appropriate concept to model interactive computations. Moore machines, in particular, take care of a delay between input and output. Realizable functions are the abstractions of state machines, such as partial functions are the abstractions of Turing machines. They extend the idea of computability as developed for non-interactive computations to interactive computations. Acknowledgment. It is a pleasure to thank Leonid Kof and Birgit Penzenstadler for help and useful remarks.
References 1. Broy, M., Stølen, K.: Specification and Development of Interactive Systems: FOCUS on Streams, Interfaces, and Refinement, Springer (2001) 2. Broy, M.: A Theory of System Interaction: Components, Interfaces, and Services. In: D. Goldin, S. Smolka and P. Wegner (eds): The New Paradigm. Springer Verlag, Berlin (2006) 41-96
50
M. Broy
3. Darondeau, P.: Concurrency and Computability. Proceedings of the LITP Spring School on Theoretical Computer Science on Semantics of Systems of Concurrent Processes, La Roche Posay, France, November 1990, 223-238 4. Eberbach, E., Goldin, D., Wegner, P.: Turing's Ideas and Models of Computation. Book Chapter, in: Alan Turing: Life and Legacy of a Great Thinker, Springer (2004) 5. Goldin, D., Smolka, S., Attie, P., Sonderegger, E.: Turing Machines, Transition Systems, and Interaction Information & Computation Journal, (2004) 6. Goldin, D., Wegner, P.: The Church-Turing Thesis: Breaking the Myth. Presented at CiE 2005, Amsterdam, June 2005, to be published in LNCS 7. Japaridze, G.: Computability Logic: a Formal Theory of Interaction. In: D. Goldin, S. Smolka and P. Wegner (eds) The New Paradigm. Springer Verlag, Berlin (2006) 183226 8. Milner, R.: Elements of Interaction: Turing Award Lecture. Communications of the ACM, 36 1 (1993) 78-89 9. Prasse, M., Rittgen, P.: Why Church's Thesis still Holds: Some Notes on Peter Wegner's Tracts on Interaction and Computability, Comput. J. 41 6 (1998) 357-362 10. Wegner, P.: Why Interaction is More Powerful than Algorithms. Communications of the ACM, May 1997 11. J. van Leeuwen, J. Wiedermann: A Theory of Interactive Computation. In: D. Goldin, S. Smolka, P. Wegner (eds.) The New Paradigm. Springer Verlag, Berlin (2006) 119-142
Appendix: Extension to Mealy Machines One may think that the problem of realizability disappears when considering Mealy machines and weak causality instead of Moore machines and strong causality, but this is not correct. Consider the following weakly causal function
F(x) = {y: x↑ 1 ≠ y} F is in fact weakly causal. Assume there is a weakly causal deterministic function f (a deterministic Mealy machine, which is like a strategy) with f(x) ∈ F(x) for all input histories x, then f′(x) = 〈 〉ˆf(x) with arbitrary output is strongly causal and has a fixpoint z (all strongly causal deterministic functions have fix points) z = f′(z) We conclude z = 〈 〉ˆf(z) and z↑1 ≠ f(z). We get z↑1 = (〈 〉ˆf(x))↑1 = f(z) which is a contradiction. This proves that our results apply and generalize to Mealy machines as well.
A Short Introduction to Computational Social Choice⋆ Yann Chevaleyre1 , Ulle Endriss2 , J´erˆome Lang3 , and Nicolas Maudet1 1
2
LAMSADE, Univ. Paris-Dauphine, France {chevaley,maudet}@etud.dauphine.fr ILLC, University of Amsterdam, The Netherlands
[email protected] 3 IRIT, Univ. Paul Sabatier and CNRS, France
[email protected]
Abstract. Computational social choice is an interdisciplinary field of study at the interface of social choice theory and computer science, promoting an exchange of ideas in both directions. On the one hand, it is concerned with the application of techniques developed in computer science, such as complexity analysis or algorithm design, to the study of social choice mechanisms, such as voting procedures or fair division algorithms. On the other hand, computational social choice is concerned with importing concepts from social choice theory into computing. For instance, the study of preference aggregation mechanisms is also very relevant to multiagent systems. In this short paper we give a general introduction to computational social choice, by proposing a taxonomy of the issues addressed by this discipline, together with some illustrative examples and an (incomplete) bibliography.
1
Introduction: What Is Computational Social Choice?
Social choice theory is concerned with the design and analysis of methods for collective decision making. For a few years now, computer science and artificial intelligence (AI) have been taking more and more of an interest in social choice. There are two main reasons for that, leading to two different lines of research. The first of these is concerned with importing notions and methods from AI for solving questions originally stemming from social choice. The point of departure for this line of research is the fact that most of the work in social choice theory has concentrated on establishing abstract results regarding the existence (or otherwise) of procedures meeting certain requirements, but computational issues have rarely been considered. For instance, while it may not be possible to design a voting protocol that makes it impossible for a voter to cheat in one way or another, it may well be the case that cheating successfully turns out to be a computationally intractable problem, which may therefore be deemed an acceptable risk. This is where AI (and operations research, and more generally computer science) comes into play. Besides the complexity-theoretic analysis of ⋆
Some parts of this paper appeared in the proceedings of ECSQARU-2005 [62].
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 51–69, 2007. c Springer-Verlag Berlin Heidelberg 2007
52
Y. Chevaleyre et al.
voting protocols, other typical examples for work in computational social choice include the formal specification and verification of social procedures (such as fair division algorithms) using mathematical logic, and the application of techniques developed in AI and logic to the compact representation of preferences in combinatorial domains (such as negotiation over indivisible resources or voting for committees). The second line of research within computational social choice goes the other way round. It is concerned with importing concepts and procedures from social choice theory for solving questions that arise in computer science and AI application domains. This is, for instance, the case for managing societies of autonomous software agents, which calls for negotiation and voting procedures. Another example is the application of techniques from social choice to developing page ranking systems for Internet search engines. All of these are examples for a wider trend towards interdisciplinary research involving all of decision theory, game theory, social choice, and welfare economics on the one hand, and computer science, artificial intelligence, multiagent systems, operations research, and computational logic on the other. In particular, the mutually beneficial impact of research in game theory and computer science is already widely recognised and has lead to significant advances in areas such as combinatorial auctions, mechanism design, negotiation in multiagent systems, and applications in electronic commerce. The purpose of this paper is to highlight some further areas of successful interdisciplinary research, focussing on the interplay of social choice theory with computer science, and to propose a taxonomy of the issues tackled by this new discipline of computational social choice. There are two distinct lines along which we could classify the topics addressed by computational social choice: (a) the nature of the social choice problem dealt with; and (b) the type of formal or computational technique studied. These two dimensions are independent to some extent. We first give a (nonexhaustive) list of topics falling under (a): Preference Aggregation — Aggregating preferences means mapping a collection P = P1 , . . . , Pn of preference relations (or profiles) of individual agents into a collective preference relation P ∗ (which implies circumventing Arrow’s impossibility theorem [6] by relaxing one of its applicability conditions). Sometimes we are only concerned with determining a socially preferred alternative, or a subset of socially preferred alternatives rather than a full collective preference relation: a social choice function maps a collective profile P into a single alternative, while a social choice correspondence maps a collective profile P into an nonempty subset of alternatives. This first topic is less specific than the following ones, which mostly also deal with some sort of preference aggregation, but each in a much more specific context.
A Short Introduction to Computational Social Choice
53
Voting Theory — Voting is one of the most popular ways of reaching common decisions. Researchers in social choice theory have studied extensively the properties of various families of voting rules, but have typically neglected computational issues. A whole panorama of voting rules has been proposed in the literature [15]. We shall only mention a few examples here. A positional scoring rule computes a score (a number) for each candidate from each individual preference profile and selects the candidate with the maximum sum of scores. The plurality rule, for instance, gives score 1 to the most preferred candidate of each voter and 0 to all others. The Borda rule assigns scores from m (the number of candidates) down to 1 to the candidates according to the preference profile of each voter. Another important concept is that of a Condorcet winner, i.e. a candidate preferred to any other candidate by a strict majority of voters. It is well-known that there are profiles for which no Condorcet winner exists. Obviously, when there exists a Condorcet winner then it is unique. A Condorcet-consistent rule is a voting rule electing the Condorcet winner whenever there is one. Resource Allocation and Fair Division — Resource allocation of indivisible goods aims at assigning items from a finite set R to the members of a set of agents N , given their preferences over all possible bundles of goods. In centralised allocation problems the assignment is determined by a central authority to which the agents have given their preferences beforehand. In distributed allocation problems agents negotiate, communicate their interests, and exchange or trade goods in several rounds, possibly in a multilateral manner. An overview of issues in resource allocation may be found in [20]. We can distinguish two types of criteria when assessing the quality of a resource allocation, namely efficiency and fairness. The most fundamental efficiency criterion is Pareto efficiency: an allocation should be such that there is not alternative allocation that would be better for some agents without being worse for any of the others. An example for a fairness condition is envy-freeness: an allocation is envy-free iff no agent would rather obtain the bundle held by one of the others. Coalition Formation — In many occasions, agents do not compete but instead cooperate, for instance to fullfill more efficiently a given task. Suppose for instance that agent x is rewarded 10 when he performs a given task alone, while agent y gets 20. Now if they form a team, the gain is up to 50 (think for instance of two musicians, playing either solo or in a duet). Coalition formation studies typically two questions: what and how coalitions will form for a given problem, and how should then the surplus be divided among the members of the coalition (after they have solved their optimisation problem). Central here is the notion of stability: an agent should have no incentive to leave the coalition. These questions are studied in the field of cooperative game theory [72], and different solution concepts have been introduced. For instance, the strongest of these, known as the core, requires that no other coalition could make its members better off.
54
Y. Chevaleyre et al.
Judgement Aggregation and Belief Merging — The field of judgement aggregation aims at studying how a group of individuals should aggregate their members’ individual judgements on some interconnected propositions into corresponding collective judgements on these propositions. Such aggregation problems occur in many different collective decision-making bodies (especially committees and expert panels).1 Belief merging is a closely related problem that is concerned with investigating ways to aggregate a number of individual belief bases into a collective one (connections between both problems are discussed by Eckert and Pigozzi [42,78]). Ranking Systems — The so-called “ranking systems” setting is a variation of classical social choice theory where the set of agents and the set of alternatives coincide. The most well-known family of such systems are page ranking systems in the context of search engines (and more generally, reputation systems) [5,92]. As concerns the second dimension of our proposed taxonomy of topics in computational social choice, namely the classification according to the technical issues addressed rather than the nature of the social choice problem itself, here is now an (equally incomplete) list of issues: – – – – – –
computationally hard aggregation rules; social choice in combinatorial domains; computational aspects of strategy-proofness and manipulation; distributed resource allocation and negotiation; communication requirements in social choice; logic-based analysis of social procedures.
The rest of the paper is organised according to this second dimension. For each of the items above we give some description of typical problems considered in the literature, together with some pointers to the bibliography.
2
Computationally Hard Aggregation Rules
Many aggregation and voting rules among those that are practically used are computable in linear or quadratic time in the number of candidates (and almost always linear in the number of voters). Therefore, when the number of candidates is small (which is typically the case for political elections where a single person has to be elected), computing the outcome of a voting rule does not require any sophisticated algorithms. However, there are also a few voting rules that are computationally complex. The following ones have been considered from the computational point of view. Kemeny — Kemeny’s aggregation rule consists of aggregating n individual profiles into a collective profile (called Kemeny consensus) being closest to 1
An introduction to judgement aggregation, together with a bibliography, may be found on the website http://personal.lse.ac.uk/LIST/doctrinalparadox.htm.
A Short Introduction to Computational Social Choice
55
the n profiles, with respect to a distance which, roughly speaking, is the sum, for all agents, of the numbers of pairs of alternatives on which the aggregated profile disagrees with the agent’s profile. This aggregation rule can be turned into a voting rule: a Kemeny winner is a candidate ranked first in one of the Kemeny consensus. Computing a Kemeny consensus is NP-hard [10], and deciding whether a given candidate is a Kemeny winner is ∆P2 (O(log n))-complete [52]. Its practical computation has also been addressed [36,24], while other work has focussed on approximating Kemeny’s rule in polynomial time [3] . Slater — Slater’s rule aggregates n individual profiles P1 , . . . , Pn into a collective profile (called Slater ranking) minimising the distance to the majority graph MP induced by P (MP is the graph whose vertices are the candidates and that contains the edge x → y if and only if a strict majority of voters prefers x to y). Slater’s rule is NP-hard, even under the restriction that pairwise ties cannot occur [3,4,23]. The computation of Slater rankings has been addressed by Charon and Hudry [19,56] as well as Conitzer [23], who gives an efficient preprocessing technique for computing Slater rankings by partitioning the set of candidates into sets of “similar” candidates. Dodgson — In this voting rule, proposed in 1876 by Dodgson (better known as Lewis Carroll), the election is won by the candidate(s) who is (are) “closest” to being a Condorcet winner: each candidate is given a score that is the smallest number of exchanges of adjacent preferences in the voters’ preference orders needed to make the candidate a Condorcet winner with respect to the resulting preference orders. Whatever candidate (or candidates, in the case of a tie) has the lowest score is the winner. This problem was shown to be NP-hard by Bartholdi et al. [10], and ∆P2 (O(log n))-complete by Hemaspaandra et al. [50]. Young — The principle of Young’s voting rule is similar to Dodgson’s, but here the score of a candidate x is the smallest number of voters whose removal makes x a Condorcet winner. Deciding whether x is a winner according to this rule is ∆P2 (O(log n))-complete as well [84]. Banks — A Banks winner for a collection of profiles P is the top vertex of any maximal (with respect to inclusion) transitive subtournament of the majority graph MP . The problem of deciding whether some fixed vertex v is a Banks winner for P is NP-complete [93,55]. See also [54] for a partial overview of complexity results for preference aggregation problems.
3
Social Choice in Combinatorial Domains
As long as the set of alternatives is small in size, preferences can be represented explicitly. That is, we can simply list all alternatives together with their utility
56
Y. Chevaleyre et al.
or their rank in the preference order. Unfortunately, in many problem domains the set of alternatives has a combinatorial structure. A combinatorial domain is a Cartesian product of finite value domains for each one of a set of variables: an alternative in such a domain is a tuple of values. Clearly, the size of such domains grows exponentially with the set of variables and becomes quickly very large, which makes explicit representations and straightforward elicitation and optimisation no longer reasonable. Logical or graphical compact representation languages aim at representing preference structures, the size of which would be prohibitive if represented expicitly, in as little space as possible. The literature on preference elicitation and representation for combinatorial domains has been growing fast, and due to the lack of space we omit giving references here. See for instance [34] for an (incomplete) overview of logic-based preference representation languages, together with results about expressivity and spatial efficiency. When the set of alternatives has a combinatorial structure, aggregation is a computationally hard problem. Moreover, since in that case preferences are often described in a compact representation language, aggregation should ideally operate directly on this language, without generating the individual nor the aggregated preferences explicitly. In what follows, we give some examples for the issues at stake for different types of problem in social choice. Voting — When the set of candidates has a combinatorial structure, even simple voting rules such as plurality and Borda become hard. The computational complexity of some voting procedures when applied to compactly represented preferences has been investigated in [61]; although that paper does not address the question of how the outcome can be computed within a reasonable amount of time. One approach would be to decompose the vote into local votes on individual variables (or small sets of variables), and then to gather the results. However, “multiple election paradoxes” [16] show that this can lead to suboptimal choices. Suppose, for instance, 100 voters have to decide whether or not to build a swimming pool (S), and whether or not to build a tennis court (T ). 49 voters prefer a swimming pool and no tennis ¯ ) court (S T¯), 49 voters prefer a tennis court and no swimming pool (ST and 2 voters prefer to have both (ST ). Voting separately on each of the issues gives the outcome ST , although it received only 2 votes out of 100. The problem is that there is a preferential dependence between S and T . A simple idea then would be to exploit preferential independencies between variables. The question is to what extent we may use these independencies to decompose the computation of the outcome into smaller problems. Unfortunately, several well-known voting rules (such as plurality or Borda) cannot be decomposed, even when the preferential structure is common to all voters. Most of them fail to be decomposable even when all variables are mutually independent for all voters [63]. Fair Division — In fair division problems for indivisible resources, the set of alternatives is the set of allocations, the number of which grows exponentially with the number of resources. The need for compact representation
A Short Introduction to Computational Social Choice
57
arises from the following dilemma, formulated by several social choice theorists: either (a) allow agents to express any possible preference relation on the set of all subsets of items, and end up with an exponentially large representation (such as in [53]); or (b) severely restrict the set of expressible preferences, typically by assuming additive seperability between items, and then design procedures where agents express preferences between single items, thus giving up the possibility of expressing, say, complementarities and substitutabilities. This latter approach is the path followed by Brams et al. [14] and Demko and Hill [37], for instance. Compact representation and complexity issues for fair division have received little attention until now, apart for recent work by Lipton et al. [65], who study approximation schemes for envy-freeness, and Bouveret et al. [12,13], who study the complexity of fair division problems with compactly represented preferences. Judgement Aggregation and Belief Merging — Here the set of alternatives is the set of all possible truth assignments to a given set of propositional variables (in belief merging) or to a given set of propositional formulae (in judgement aggregation). The common point of logic-based merging approaches is that the set of alternatives corresponds to a set of propositional worlds; the logic-based representation of an agent’s preferences (or beliefs) then induces a cardinal function (using ranks or distances) on worlds and aggregates these cardinal preferences. Relevant references that explicitly mention some social choice-theoretic issues include [59,67,22,66]. Konieczny et al. [58] specifically address complexity issues for distance-based belief merging operators. As for judgement aggregation, computational issues seem to have been neglected do far. However, some authors [70,38] give necessary and sufficient conditions for collective rationality, expressed in terms of minimal inconsistent subsets, which can be seen a first step towards addressing computational issues of judgement aggregation.
4
Computational Aspects of Strategy-Proofness
Manipulating a voting rule consists, for a given voter or coalition of voters, in expressing an insincere preference profile so as to give more chance to a preferred candidate to be elected. Gibbard and Satterthwaite’s theorem [48,88] states that if the number of candidates is at least 3, then any nondictatorial voting procedure is manipulable for some profiles. However, by applying very specific restrictions on the class of allowed preferences, this theorem does not hold any more [68]. More formally, manipulation by a voter is defined as follows: given a collection of profiles of n voters P = P1 , . . . , Pn , let c be the elected candidate w.r.t. a given voting rule applied on P . We say that a voter j can manipulate the voting rule if there exists a profile P ′ j such that the voting rule applied on P1 , . . . , Pj−1 , P ′ j , Pj+1 , . . . , Pn elects a candidate c′ = c and that j ranks c′ higher than c. Note that other manipulation schemes have also been studied,
58
Y. Chevaleyre et al.
in particular manipulation made by the chairman [11], and manipulation by coalition of voters [75]. Let us show an example of manipulation by a voter. Consider three candidates c 1 , c2 , c3 and 5 voters, among which 2 voters have the preference profile c1 ≻ c2 ≻ c3 , 2 other voters have the profile c2 ≻ c1 ≻ c3 , and that the last voter has the profile c3 ≻ c1 ≻ c2 . If the plurality rule is used here, the last voter will have an interest to report an insincere preference profile with c1 on the top, as his truly preferred candidate c3 has no chance of winning. In the general case, since it is theoretically impossible to make manipulation impossible, one can try to make it less efficient or more difficult. Making it less efficient can consist of making as little as possible of the others’ votes known to the would-be manipulating voter – which may be difficult in some contexts—this situtation arises in real world elections, as opinion polls often fail to accurately reflect voters real intentions. Making manipulation more difficult to compute is a way followed recently by several authors [9,8,26,25,28], who address the computational complexity of manipulation for several voting rules. For instance, Single Transferable Vote is NP-hard to manipulate by single agents [8]. The line of argument is that if finding a successful manipulation is extremely hard computationally, then the voters will give up trying to manipulate and express sincere preferences. Note that, for once, the higher the complexity, the better. Moreover, Conitzer and Sandholm [28] have shown that adding a pre-round to the voting process, consisting in eliminating half of the candidates by applying a binary cup rule, considerably increases the hardness of manipulation Unfortunately, applying a binary cup as a pre-round may eliminate highly ranked candidates, thus dropping interesting properties of the voting rule used afterwards. As an attempt to overcome this drawback, Elkind and Lipmaa [43] introduced a principle called hybridization generalizing the method of [28]. A hybridized voting rule Hyb(Xk , Y ) consists of k steps of rule X, followed by rule Y . They study the impact of hybridization on the complexity of manipulation in various cases (including hybridizing a voting rule with itself). As recently noted by Conitzer and Sandholm [32], computational hardness concepts such as NP- hardness or PSPACE-hardness are worst case settings. Thus, they only ensure that there exist cases in which manipulation gets hard to compute. In fact, these authors showed that under some mild assumptions, there are no voting rules that are hard to manipulate on average. To obtain this result, the authors first exhibit an algorithm which can be used by individual voters to compute an insincere profile, and then show that this algorithm succeeds in manipulating the vote on a large fraction of the instances. We end up this Section by briefly mentioning the existence of complexity results for manipulation by the chairman [11,51] and bribery in elections [47].
5
Distributed Resource Allocation and Negotiation
In recent years, concepts from social choice theory have become more and more salient in computer science research, in particular on topics such as distributed
A Short Introduction to Computational Social Choice
59
systems, multiagent systems, grid computing, and electronic commerce. Many of the issues addressed in these areas can be modelled in terms of negotiation between autonomous agents. In the case of grid computing, for instance, access to scarce computing resources may be allocated dynamically and in response to specific needs. Naturally, game theory provides the foundations for investigating the strategic aspects of such scenarios, while preference aggregation mechanisms originating in social choice theory may be used to identify socially desirable outcomes of negotiation. As discussed already in the introduction, we can distinguish two types of criteria when assessing an allocation of resources: criteria pertaining to the efficiency of an allocation and those relating to fairness considerations. Both of these can often be described in terms of a social welfare ordering or a collective utility function [69]. In what follows, we give a few examples of efficiency and fairness criteria: Pareto Efficiency — An allocation Pareto dominates another allocation, if no agents are worse and some are better off in the former. A Pareto efficient allocation is an allocation that is not Pareto dominated by any other allocation. This is the weakest possible efficiency requirement. Utilitarianism — The utilitarian social welfare of an allocation is the sum of the individual utilities experienced by the members of society. Asking for maximal utilitarian social welfare is a very strong efficiency requirement; it licenses reallocations that benefit average utility. Egalitarianism — The egalitarian social welfare of an allocation is given by the individual utility of the poorest agent in the system. Aiming at maximising this value is an example for a basic fairness requirement. A refinement of this idea is the leximin ordering which, informally, works by comparing first the utilities of the least satisfied agents, and when these coincide, compares the utilities of the next least satisfied agents, and so on. Envy-Freeness — An agent is said to be envious when it would rather get the bundle of resources allocated to one of the other agents. An allocation is envyfree when no agent is envious. If an envy-free allocation is not attainable, it may also be of interest to reduce envy as much as possible (which may, for instance, be measured in terms of the number of envious agents). Efficiency and fairness criteria are often not compatible. For instance, for a given profile of agent preferences, there may be no allocation that is both Pareto efficient and envy-free. Some work in computational social choice has addressed the computational complexity of checking whether allocations meeting a certain combination of the above criteria exist for a given resource allocation scenario [13]. Complexity results pertaining to efficiency criteria alone have been known for somewhat longer already. Checking whether there exists an allocation such that utilitarian social welfare will exceed a given limit is known to be NP-complete, for instance [85]. Another line of work has been concerned with procedures for finding good allocations. At one end of the spectrum, combinatorial auctions are mechanisms
60
Y. Chevaleyre et al.
for finding an allocation that maximises the revenue of the seller, where this revenue is the sum of the prices the other agents are willing to pay for the bundles allocated to them. Combinatorial auctions have received a lot of attention in recent years [35]; they are a very specific, purely utilitarian class of allocation procedures, in which considerations such as equity and fairness are not relevant. In this context, preference structures are valuation functions (positive and monotonic utility functions). Combinatorial auctions are also centralised allocation mechanisms. In distributed approaches to resource allocation, on the other hand, allocations emerge as a consequence of individual agents locally agreeing on a sequence of deals to exchange some of the items they currently have in their possession [87,45]. In the context of distributed resource allocation, an interesting question is under what circumstances convergence to a socially optimal allocation can be guaranteed given certain known facts regarding the criteria used by individual agents to decide whether or not to implement a particular deal. Notions of social optimality considered in this field range from utilitarianism [87], over Pareto optimality and egalitarianism [45], to envy-freeness [21]. As another example for issues in distributed resource allocation and negotiation, we mention some work on establishing the complexity inherent to various allocation procedures. Dunne et al. [41] have analysed the computational complexity of decision problems arising in the context of distributed negotiation. For instance, checking whether a given allocation with superior utilitarian social welfare can be reached by means of a sequence of deals over single resources that are rational (in the sense of it being possible to arrange side payments such that both trading partners benefit) is NP-hard (in fact, this result has later been strengthened to a PSPACE-completeness result [40]). A related line of work has been concerned with the communication complexity of distributed negotiation mechanisms, analysing upper and lower bounds on the number of deals implemented until an optimal allocation is reached [39,44]. For a much more thorough survey of research in multiagent resource allocation the reader is referred to [20].
6
Communication Requirements in Social Choice
One area where the interplay between social choice and (theoretical) computer science has been striking in recent years is that of the analysis of social choice problems in terms of their communication complexity. In most (if not all) social choice problems, there are some (potentially hard) communication requirements. Even if the procedure is centralised, the center needs at some point to elicit the preferences of the agents involved in the process in order to compute the outcome. Although it is sometimes possible to carefully design protocols that will make this task easier, general results (lower bounds) suggest that it is very often not realistic to rely on that. This in turn is a main motivation to study the problem of social choice under incomplete knowledge. We now briefly present a non-exhaustive overview of recent research on these aspects.
A Short Introduction to Computational Social Choice
61
The design of protocols that elicit the agents’ preferences is a key problem. Take the case of a combinatorial auction involving |R | items: fully revealing an agent’s preferences would require 2|R| − 1 bundles to be valued, and that for each of the bidder agents. Now put yourself into the shoes of that auctioneer: you would of course wonder whether you are really obliged to ask that many “value queries”. Maybe a sequential approach would ease the process by avoiding unnecessary queries? The key point consists in finding the relevant preferences to elicit from the agents: whose preferences are to be elicited about which outcomes? As an example from voting theory, assume that we have 4 candidates A, B, C, D and 9 voters, 4 of which vote C ≻ D ≻ A ≻ B, 2 of which vote A ≻ B ≻ D ≻ C and 2 of which vote B ≻ A ≻ C ≻ D, the last vote being still unknown. If the plurality rule is chosen then the outcome is already known (the winner is C) and there is no need to elicit the last voter’s profile. If the Borda rule is used then the partial scores are A : 14, B : 10, C : 14, D : 10; therefore the outcome is not determined. However, we do not need to know the totality of the last vote, but we only need to know whether the last voter prefers A to C or C to A. Can you always design such a clever protocol? Communication complexity may be helpful in answering that question. Communication complexity [60] is concerned with the problem of determining the amount of information that needs to be exchanged between agents in order to compute a given function f , when the input of that function is distributed among those agents. The computational resources needed to do so are irrelevant here. More technically, the communication complexity is defined as the worst-case of the best protocol that you may find to compute that function. For unstructured problems, it is unlikely that you can do better than the naive upper bound which will consist, for each agent, of revealing his entire input. In some cases however, the combinatorial structure of the problem can be exploited so that the communication burden can be alleviated. Communication complexity offers a bag of techniques that can used to derive lower bounds on communication requirements. Perhaps the most popular of these techniques is the fooling set. A fooling set consists of a set of input vectors that would each give the same result to the function, but such that you could somehow mix any pair of vectors and get a different value. A central result says that exhibiting a fooling set of size m guarantees a lower bound of log m on the communication complexity. Voting — As a first example, we present rather informally the argument advanced by Conitzer and Sandholm [30] that allows to conclude that the communication complexity of the Condorcet voting rule is Ω(nm), where n is the number of voters and m the number of candidates. In this case, the function f that players have to compute is interpreted as the voting rule that will return the winning candidate, given the vote vector of all the voters. Assume C is the set of candidates. The idea is to construct a set of vote vectors such that the first voter would prefer any candidate of some set Si ⊆ C to a, and a to any other candidate (Si ≻ a ≻ Si ), while the following would prefer (Si ≻ a ≻ Si ), and so on. Finally, the last voter would prefer a against any
62
Y. Chevaleyre et al.
other candidate. As one can easily see, a is indeed preferred to any other candidate in that set (by a single vote). There is an exponential number (in nm) of possible such vectors to be constructed. Now this set would indeed be “fooling” iff, for any pair of such vectors, it would be possible to mix the votes of the vectors and obtain a different Condorcet winner. Consider any pair of vote vectors. By construction, there must be a candidate, say b, that is ranked below a by a given voter in one vector of the pair, while being ranked above in the other vector. By replacing that latter vote in the first vote, you would make b preferred by a single vote. This set is indeed a fooling set, whose size allows to derive the lower bound on communication complexity stated above. Conitzer and Sandholm [30] have analysed the communication complexity of several other voting rules, and Segal [90] studies a particular subclass of social choice rules. Coalition Formation — As a further example of the use of the fooling set technique, we mention the work of Procaccia and Rosenschein [82] who analyse the communication complexity of coalition formation. More precisely, they analyse the communication complexity of computing the expected payoff of an arbitrary player (not for all the players) before joining a coalition: here again, maybe only limited communication could be sufficient for that player to compute its payoff. This is done in the context of the coalition model proposed by Shehory and Kraus [91], where each agent only knows the resources it initially holds and its own utility function. Procaccia and Rosenschein prove communication results regarding various solution concepts (core, equal excess, Shapley value, etc.). Most of these results show that when the number of agents (n) is not too large, this problem does not involve prohibitive communication costs (Ω(n)). Resource Allocation — Let us return to the canonical example of combinatorial auctions discussed before. Here the distributed inputs are the agents’ valuations over possible bundles, and the function would return the optimal allocation. Can we do better than those 2|R| − 1 queries then? In general, the answer is no, in the sense that at least one agent has to reveal its full valuation. Nisan and Segal [71] have shown this, and the communication requirement remains exponential when all valuations are submodular. Only when the valuations of the agents exhibit very specific structures does it become possible to improve on that bound. We refer the reader to the review chapter by Segal [89] for further details on that topic. In many situations then, the communication complexity will be too heavy a burden to be supported by the agents. For combinatorial auctions, Segal even claims that “the communication bottleneck appears to be more severe than the computational one” [89]. One consequence is that the central authority who has to compute the function will often have to deal with incomplete preferences (note however that this is not the only reason: it may simply be the case that the agents’ preferences are intrinsically incomplete, for instance). Technically, incomplete knowledge about an agent’s preferences comes down to partial preferences (i.e.
A Short Introduction to Computational Social Choice
63
partial preorders on the set of alternatives).2 This in turn raises further interesting questions as to how difficult it is to compute an outcome given incomplete preferences. For instance, the computational complexity of vote elicitation has been investigated by Conitzer and Sandholm [27]. A second way of coping incomplete preferences consists of “living” with incompleteness and to consider all complete extensions of the initial incomplete preference profile. More formally, if R = R1 , . . . , Rn is an n-tuple of incomplete preference relations, then define Ext(R) = Ext(R1 )×. . .×Ext(Rn ), where Ext(Ri ) is the set of all complete extensions Ri . For a given social choice function f , one can then define f (R) = {f (R1′ , . . . , Rn′ ) | (R1′ , . . . , Rn′ ) ∈ Ext(R)}. In particular, if f is a voting rule, an element of f (R) is a “possible winner”, whereas an element of f (R) is a “necessary winner”. For instance, in the voting example presented at the beginning of this section, for the incomplete profile R consisting of the first 8 votes (with no information on the 9th vote), if f is the plurality rule then C is a necessary winner (and there is no other possible winner); if f is the Borda rule then A and C are the two possible winners (and there is no necessary winner). Because the cardinality of Ext(R) grows exponentially with the number of alternatives, computing possible and necessary winners is generally hard. Some recent work has addressed the computation of possible and necessary winners for several families of voting rules [57,64,80]. The problem of strategy-proofness (see also Sect. 4) has been investigated in [79]. Diminishing the amount of information to be transmitted is also of the utmost importance when one considers privacy issues in social choice. The work of Brandt and colleagues (see e.g. [17,18]), in particular, is very representative of this line of research. One example for a significant result is the fact that social choice functions that are non-dictatorial, Pareto-optimal, and monotonic cannot be implemented by distributed protocols guaranteeing unconditional full privacy (that is, privacy which does not rely either on trusted third parties or computational intractability to protect the agents’ preferences).
7
Logic-Based Analysis of Social Procedures
A final area of applications of tools familiar from computer science to problems in social choice theory is the use of mathematical logic for the specification and verification, or more generally analysis, of social procedures. In the same way as computer scientists have long been using logic to formally specify the behaviour of computer systems, so as to allow for the automatic verification of certain desirable properties of such systems, suitable logics may be used to specify social procedures such as voting protocols or fair division algorithms. Rohit Parikh [74] has coined the term social software for this line of research and argued that (extensions of) dynamic logic [49] may be particularly suited for formalising such social procedures. 2
Note that this interpretation of incomplete preferences is epistemic: this has nothing to do with intrinsic or ethical incompleteness where it does not make sense to compare some alternatives to some others, or it is unethical to do so.
64
Y. Chevaleyre et al.
In what follows, we briefly discuss three lines of work that are being pursued under the broad heading of social software. This is not an exhaustive list, but it does give a good taste of what kinds of questions are being investigated. Logics for Social Software — Modal logic is typically the overall framework in which this kind of research is carried out. The most important kind of modal logic for social software is dynamic logic (the logic of programs). Parikh [73] and Pauly [76], amongst others, have proposed various extensions of dynamic logic to account for concepts such as strategies (as in game theory). Another important familiy of modal logics are epistemic logics, which are relevant to social software as they allow us to model the knowledge of the different agents participating ina social mechanism. Dynamic epistemic logic [7] is being applied to study updates of the states of knowledge of these agents. Pauly and Wooldridge [77] also explore the use of logic in context if economic mechanism design. Finally, Agotnes et al. [2] have recently proposed a logic for modelling social welfare functions. Specification and Verification of Social Procedures — Once suitable logics have been developed, the central aim of social software is to put these logics to use for the analysis of social procedures. Probably the first such example is Parikh’s specification of a cake-cutting algorithm using his game logic based on dynamic logic [73]. Recently, a variant of propositional dynmaic logic has also been used to model some of the results on convergence to a socially optimal allocation by means of distributed negotiation mentioned in Section 5 [46]. Coalition Formation — Pauly [76] introduces a modal logic (coalition logic) to specifically allow reasoning about actions that are undertaken by coalitions of agents (typically more than two agents here, as opposed to the game logic of Parikh [73], which justifies this new modal logic). The logic includes a new modality (effectivity), which represents the fact that a group of agents can bring about a given action. The satisfiability problem of the logic lies in PSPACE, which confirms that considering that actions can be brought about by groups of agents increases the complexity of related reasoning problems.
8
Conclusion
In this paper we have given a short (and hence incomplete) survey of some research issues where social choice and computer science can interact. Due to space considerations, many interesting lines of research have only been mentioned in passing or even been omitted altogether. Two such cases are the large body of work on computational aspects of coalition formation [29,31,86,1], and the method of automated mechanism design [33]. In conclusion, computational social choice has by now become a very active area of research, with many important new results being published every year. So while this short survey can only offer a glimpse at current research and is bound to become out of date rather soon, we
A Short Introduction to Computational Social Choice
65
nevertheless hope to have been able to convey a sense of the types of questions that are being investigated in this exciting new field.
References ˚gotnes, T., van der Hoek, W., and Wooldridge, M.: On the Logic of Coalitional 1. A Games. In Proceedings of AAMS-2006, (2006) Agotnes, T., van der Hoek, W., and Wooldridge, M.: Towards a Logic of Social 2. ˚ Welfare. In Proceedings of LOFT-2006 (2006) 3. Ailon, N., Charikar, M., and Newman, A.: Aggregating Inconsistent Information: Ranking and Clustering. In Proceedings of STOC-2005 (2005) 4. Alon, N.: Ranking Tournaments. SIAM Journal of Discrete Mathematics 20 1–2 (2006) 137–142 5. Altman, A. and Tennenholtz, M.: Ranking Systems: The PageRank Axioms. In Proceedings of EC-2005 (2005) 6. Arrow, K.: Social Choice and Individual Values. John Wiley and Sons (1951) 2nd edition (1963) 7. Baltag, A., Moss, L., and Solecki, S.: The Logic of Public Annoucements, Common Knowledge, and Private Suspicion. In Proceedings of TARK-1998 (1998) 8. Bartholdi, J. and Orlin, J.: Single Transferable Vote Resists Strategic Voting. Social Choice and Welfare 8 4 (1991) 341–354 9. Bartholdi, J., Tovey, C., and Trick, M.: The Computational Difficulty of Manipulating an Election. Social Choice and Welfare 6 3 (1989) 227–241 10. Bartholdi, J., Tovey, C., and Trick, M.: Voting Schemes for Which It Can Be Difficult to Tell Who Won the Election. Social Choice and Welfare 6 3 (1989) 157–165 11. Bartholdi, J., Tovey, C., and Trick, M.: How Hard Is It to Control an Election? Mathematical and Computer Modeling 16 8/9 (1992) 27–40 12. Bouveret, S., Fargier, H., Lang, J., and Lemaˆıtre, M.: Allocation of Indivisible Goods: A General Model and Some Complexity Results. In Proceedings of AAMAS-2005 (2005) 13. Bouveret, S. and Lang, J.: Efficiency and Envy-Freeness in Fair Division of Indivisible Goods: Logical Representation and Complexity. In Proceedings of IJCAI-2005 (2005) 14. Brams, S., Edelman, P., and Fishburn, P.: Fair Division of Indivisible Items. Technical Report RR 2000-15, C.V. Starr Center for Applied Economics, New York University (2000) 15. Brams, S., and Fishburn, P.: Voting Procedures. In K. Arrow, A. Sen, and K. Suzumura (eds), Handbook of Social Choice and Welfare, chapter 4, Elsevier (2004) 16. Brams, S., Kilgour, D.M., and Zwicker, W.: The Paradox of Multiple Elections. Social Choice and Welfare 15 (1998) 211–236 17. Brandt, F.: Social Choice and Preference Protection – Towards Fully Private Mechanism Design. In Proceedings of EC-2003) (2003) 18. Brandt, F. and Sandholm, T.: Unconditional Privacy in Social Choice. In Proceedings of TARK-2005 (2005) 19. Charon, I. and Hudry, O.: Slater Orders and Hamiltonian Paths of Tournaments. Electronic Notes in Discrete Mathematics 5 (2000) 60–63 20. Chevaleyre, Y., Dunne, P.E., Endriss, U., Lang, J., Lemaˆıtre, M., Maudet, N., Padget, J., Phelps, S., Rodr´ıguez-Aguilar, J.A., and Sousa, P.: Issues in Multiagent Resource Allocation. Informatica 30 (2006) 3–31
66
Y. Chevaleyre et al.
21. Chevaleyre, Y., Endriss, U., Estivie, S., and Maudet, N.: Reaching Envy-Free States in Distributed Negotiation Settings. In Proceedings of IJCAI-2007 (2007) 22. Chopra, S., Ghose, A., and Meyer, T.: Social Choice Theory, Belief Merging, and Strategy-Proofness. International Journal on Information Fusion 7 1 (2006) 61–79 23. Conitzer, V.: Computing Slater Rankings Using Similarities among Candidates. In Proceedings of AAAI-2006 (2006) 24. Conitzer, V., Davenport, A., and Kalagnanam, J.: Improved Bounds for Computing Kemeny Rankings. In Proceedings of AAAI-2006 (2006) 25. Conitzer, V., Lang, J., and Sandholm, T.: How Many Candidates Are Required to Make an Election Hard to Manipulate? In Proceedings of TARK-2003 (2003) 26. Conitzer, V. and Sandholm, T.: Complexity of Manipulating Elections with Few Candidates. In Proceedings of AAAI-2002 (2002) 27. Conitzer, V. and Sandholm, T.: Vote Elicitation: Complexity and StrategyProofness. In Proceedings of AAAI-2002 (2002) 28. Conitzer, V. and Sandholm, T.: Universal Voting Protocols to Make Manipulation Hard. In Proceedings of IJCAI-2003 (2003) 29. Conitzer, V. and Sandholm, T.: Computing Shapley Values, Manipulating Value Division Schemes, and Checking Core Membership in Multi-Issue Domains. In AAAI (2004) 219–225 30. Conitzer, V. and Sandholm, T.: Communication Complexity of Common Votiong Rules. In Proceedings of EC-2005 (2005) 31. Conitzer, V. and Sandholm, T.: Complexity of Constructing Solutions in the Core Based on Synergies among Coalitions. Artif. Intell. 170 6-7 (2006) 607–619 32. Conitzer, V. and Sandholm, T.: Nonexistence of Voting Rules that Are Usually Hard to Manipulate. In Proceedings of AAAI-2006 (2006) 33. Conitzer, V. and Sandholm, T.W.: Complexity of Mechanism Design. In Proceedings of UAI-2002 (2002) 34. Coste-Marquis, S., Lang, J., Liberatore, P., and Marquis P.: Expressive Power and Succinctness of Propositional Languages for Preference Representation. In Proceedings of KR-2004 (2004) 35. Cramton, P., Shoham, Y., and Steinberg, R. (eds): Combinatorial Auctions. MIT Press (2006) 36. Davenport, A. and Kalagnanam, J.: A Computational Study of the Kemeny Rule for Preference Aggregation. In Proceedings of AAAI-2004 (2004) 37. Demko, S. and Hill, T.P.: Equitable Distribution of Indivisible Items. Mathematical Social Sciences 16 (1998) 145–158 38. Dietrich, F. and List, C.: Judgment Aggregation by Quota Rules. Journal of Theoretical Politics (2006) Forthcoming 39. Dunne, P.E.: Extremal Behaviour in Multiagent Contract Negotiation. Journal of Artificial Intelligence Research 23 (2005) 41–78 40. Dunne, P.E. and Chevaleyre, Y.: Negotiation Can Be as Hard as Planning: Deciding Reachability Properties of Distributed Negotiation Schemes. Technical Report ULCS-05-009, Department of Computer Science, University of Liverpool (2005) 41. Dunne, P.E., Wooldridge, M., and Laurence, M.: The Complexity of Contract Negotiation. Artificial Intelligence 164 1–2 (2005) 23–46 42. Eckert, D. and Pigozzi, G.: Belief Merging, Judgment Aggregation, and Some Links with Social Choice Theory. In Belief Change in Rational Agents: Perspectives from Artificial Intelligence, Philosophy, and Economics, Dagstuhl Seminar Proceedings 05321 (2005)
A Short Introduction to Computational Social Choice
67
43. Elkind, E. and Lipmaa, H.: Hybrid Voting Protocols and Hardness of Manipulation. In Proceedings of ISAAC-2005 (2005) 44. Endriss, U. and Maudet, N.: On the Communication Complexity of Multilateral Trading: Extended Report. Journal of Autonomous Agents and Multiagent Systems 11 1 (2005) 91–107 45. Endriss, U., Maudet, N., Sadri, F., and Toni, F.: Negotiating Socially Optimal Allocations of Resources. Journal of Artificial Intelligence Research 25 (2006) 315–348 46. Endriss, U. and Pacuit, E.: Modal Logics of Negotiation and Preference. In Proceedings of JELIA-2006 (2006) 47. Faliszewski, P., Hemaspaandra, E., and Hemaspaandra, L.A.: The Complexity of Bribery in Elections. In Proceedings of AAAI-2006 (2006) 48. Gibbard, A.: Manipulation of Voting Schemes. Econometrica 41 (1973) 587–602 49. Harel, D., Kozen, D., and Tiuryn, J.: Dynamic Logic. MIT Press (2000) 50. Hemaspaandra, E., Hemaspaandra, L.A., and Rothe, J.: Exact Analysis of Dodgson Elections: Lewis Carroll’s 1876 System Is Complete for Parallel Access to NP. JACM 44 6 (1997) 806–825 51. Hemaspaandra, E., Hemaspaandra, L.A., and Rothe, J.: Anyone but Him: The Complexity of Precluding an Alternative. In AAAI (2005) 52. Hemaspaandra, E., Spakowski, H., and Vogel, J.: The Complexity of Kemeny Elections. Jenaer Schriften zur Mathematik und Informatik (2003) 53. Herreiner, D. and Puppe, C.: A Simple Procedure for Finding Equitable Allocations of Indivisible Goods. Social Choice and Welfare 19 (2002) 415–430 54. Hudry, O.: Computation of Median Orders: Complexity Results. In Proceedings of the DIMACS-LAMSADE Workshop on Computer Science and Decision Theory, Annales du LAMSADE 3 (2004) 55. Hudry, O.: A Note on “Banks Winners in Tournaments Are Difficult to Recognize” by G.J. Woeginger. Social Choice and Welfare 23 1 (2004) 113–114 56. Hudry, O.: Improvements of a Branch and Bound Method to Compute the Slater Orders of Tournaments. Technical report, ENST (2006) 57. Konczak, K. and Lang, J.: Voting Procedures with Incomplete Preferences. In Proceedings of the Multidisplinary Workshop on Advances in Preference Handling (2005) 58. Konieczny, S., Lang, J., and Marquis, P.: DA2 Merging Operators. Artificial Intelligence 157 1-2 (2004) 49–79 59. Konieczny, S. and P´erez, R.P.: Propositional Belief Base Merging or How to Merge Beliefs/Goals Coming from Several Sources and Some Links with Social Choice Theory. European Journal of Operational Research 160 3 (2005) 785–802 60. Kushilevitz, E. and Nisan, N.: Communication Complexity. Cambridge University Press (1997) 61. Lang, J.: Logical Preference Representation and Combinatorial Vote. Annals of Mathematics and Artificial Intelligence 42 1 (2004) 37–71 62. Lang, J.: Some Representation and Computational Issues in Social Choice. In Proceedings of ECSQARU-2005 (2005) 63. Lang, J.: Vote and Aggregation in Combinatorial Domains with Structured Preferences. In Proceedings of IJCAI-2007 (2007) 64. Lang, J., Pini, M., Rossi, F., Venable, K. and Walsh, T.: Winner Determination in Sequential Majority Voting with Incomplete Preferences. In Proceedings of Multidisciplinary ECAI06 Workshop about Advances on Preference Handling (2006) 65. Lipton, R., Markakis, E., Mossel, E., and Saberi, A.: On Approximately Fair Allocations of Indivisible Goods. In Proceedings of EC-2004 (2004)
68
Y. Chevaleyre et al.
66. Maynard-Zhang, P. and Lehmann, D.: Representing and Aggregating Conflicting Beliefs. Journal of Artificial Intelligence Research 19 (2003) 155–203 67. Meyer, T., Ghose, A., and Chopra, S.: Social Choice, Merging, and Elections. In Proceedings of ECSQARU-2001 (2001) 68. Moulin, H.: On Strategy-Proofness and Single Peakedness. Public Choice 35 (1980) 437–455 69. Moulin, H.: Axioms of Cooperative Decision Making. Cambridge University Press (1988) 70. Nehring, K. and Puppe, C.: Consistent Judgement Aggregation: A Characterization. Technical Report, Univ. Karlsruhe (2005) 71. Nisan, N. and Segal, I.: The Communication Requirements of Efficient Allocations and Supporting Prices. Journal of Economic Theory (2006) to appear 72. Osborne, M.J. and Rubinstein, A.: A Course in Game Theory. MIT Press (1994) 73. Parikh, R.: The Logic of Games and Its Applications. Annals of Discrete Mathematics 24 (1985) 111–140 74. Parikh, R.: Social Software. Synthese 132 3 (2002) 187–211 75. Pattanaik, P.K.: On the Stability of Sincere Voting Situations. Journal of Economic Theory 6 (1973) 76. Pauly, M.: Logic for Social Software. PhD Thesis, ILLC, University of Amsterdam (2001) 77. Pauly, M. and Wooldridge, M.: Logic for Mechanism Design: A Manifesto. In Proc. 5th Workshop on Game-Theoretic and Decision-Theoretic Agents (2003) 78. Pigozzi, G.: Belief Merging and the Discursive Dilemma: an Argument-Based Account to Paradoxes of Judgment Aggregation. Synthese (2007) to appear 79. Pini, M., Rossi, F., Venable, K., and Walsh, T.: Strategic Voting when Aggregating Partially Ordered Preferences. In Proceedings of AAMAS-2006 (2006) 80. Pini, M., Rossi, F., Venable, K., and Walsh, T.: Winner Determination in Sequential Majority Voting with Incomplete Preferences. In Proceedings of Multidisciplinary ECAI06 Workshop about Advances on Preference Handling (2006) 81. Pini, M.S., Rossi, F., Venable, K., and Walsh, T.: Aggregating Partially Ordered Preferences: Possibility and Impossibility Results. In Proceedings of TARK-2005 (2005) 82. Procaccia, A. and Rosenschein, J.S.: The Communication Complexity of Coalition Formation among Autonomous Agents. In Proceedings of AAMAS-2006 (2006) 83. Rossi, F., Venable, K., and Walsh, T.: mCP Nets: Representing and Reasoning with Preferences of Multiple Agents. In Proceedings of AAAI-2004 (2004) 729–734 84. Rothe, J., Spakowski, H., and Vogel, J.: Exact Complexity of the Winner for Young Elections. Theory of Computing Systems 36 4 (2003) 375–386 85. Rothkopf, M., Peke˘c, A., and Harstad, R.: Computationally Manageable Combinational Auctions. Management Science 44 8 (1998) 1131–1147 86. Rusinowska, A., de Swart, H., and van der Rijt, J.-W.: A New Model of Coalition Formation. Social Choice and Welfare 24 1 (2005) 129–154 87. Sandholm, T.: Contract Types for Satisficing Task Allocation: I Theoretical Results. In Proc. AAAI Spring Symposium: Satisficing Models (1998) 88. Satterthwaite, M.: Strategyproofness and Arrow’s Conditions. Journal of Economic Theory 10 (1975) 187–217 89. Segal, I.: The Communication Requirements of Combinatorial Allocation Problems. In Cramton et al. [35] (2006)
A Short Introduction to Computational Social Choice
69
90. Segal, I.: The Communication Requirements of Social Choice Rules and Supporting Budget Sets. Journal of Economic Theory (2006) to appear 91. Shehory, O. and Kraus, S.: Coalition Formation among Autonomous Agents. Springer-Verlag (1995) 92. Tennenholtz, M.: Transitive Voting. In Proceedings of EC-2004 (2004) 93. Woeginger, G.J.: Banks Winners in Tournaments Are Difficult to Recognize. Social Choice and Welfare 20 3 (2003) 523–528
Distributed Models and Algorithms for Mobile Robot Systems Asaf Efrima and David Peleg⋆ Department of Computer Science and Applied Mathematics, The Weizmann Institute of Science, Rehovot 76100, Israel
[email protected]
Abstract. Systems consisting of a collection of independently operating mobile robots (a.k.a. robot swarms) have recently been studied from a distributed computing point of view. The paper reviews the basic model developed for such systems and some recent algorithmic results on a number of coordination and control tasks for robot swarms. The paper then discusses various possibilities for modifications in the basic model, and examines their effects via the example of the partitioning problem.
1
Introduction
Systems of multiple autonomous mobile robots engaged in collective behavior (also known as robot swarms) have been extensively studied throughout the past two decades. This subject is of interest for a variety of reasons. The main advantage of using multiple robot systems is the ability to accomplish tasks that are infeasible for a single robot, powerful as it may be. Moreover, the use of simpler, expendable individual robots results in decreased costs. These systems have immediate applicability in a wide variety of tasks, such as military operations, search and rescue, fire fighting, and space missions. Most studies of multiple robot systems (cf. [13], [12], [2], [9]) concentrate on engineering, experimental and empirical aspects, and result in the design of algorithms based on heuristics. Moreover, the control of such systems in usually managed by a central controller. During the last few years, multiple robot systems have been studied from a distributed computing point of view [20], [11], focusing on modeling robot swarms as a distributed systems and studying a variety of common cooperative coordination tasks. A number of computation models were proposed in the literature, and some studies attempted to characterize the influence of the model on the ability of a robot swarm to perform its task under different constraints. In particular, a primary motivation of the studies presented in [20], [23], [15], [16], [22] is to identify the minimal capabilities a collection of distributed robots must have in order to accomplish certain basic tasks. Consequently, the models adopted in these studies assume the robots to be relatively weak and simple. Specifically, these robots are generally assumed to be dimensionless, oblivious, anonymous and with ⋆
Supported in part by a grant from the Israel Science Foundation.
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 70–87, 2007. c Springer-Verlag Berlin Heidelberg 2007
Distributed Models and Algorithms for Mobile Robot Systems
71
no common coordinate system, orientation or scale, and no explicit communication. Each robot operates in simple “look-compute-move” cycles, basing its movement decisions on viewing its surroundings and analyzing the configuration of robot locations. A robot is capable of locating all robots within its visibility range and laying them in its private coordinate system, thereby calculating their position with respect to itself. Hence from the “distributed computing” angle, such problems give rise to a different type of communication model, based on “positional” or “geometric” information exchange. The tasks that were studied so far include formation of geometric patterns, i.e., organizing the robots in a geometric form (such as a circle, a simple polygon or a line), gathering and convergence, flocking (i.e., following a pre-designated leader), even distribution of robots within simple geometric patterns, searching a target within a bounded area, and the wake-up task, where one initially awake robot must wake up all others. In this paper we consider another important task, which has been studied so far to a lesser extent, namely, partitioning. In this task, the robots must divide themselves into (size-balanced) groups. This problem was recently studied by us in [10], and its difficulty was related to the level of common orientation among the robots in the system. In particular, if one assumes a full compass, then the problem admits a simple partitioning algorithm, named Algorithm Part, which works for all timing models. (This algorithm is reviewed in Section 5.) A more elaborate algorithm is given in [10] for the half-compass model in the fully and semi synchronous timing models. In the no-compass model, it is shown that the partitioning problem is deterministically unsolvable; instead, the paper presents a randomized algorithm that works in the semi synchronous timing model. More generally, it is shown in [10] that in the fully and semi synchronous timing models, having common axis directions is a necessary and sufficient condition for the feasibility of partitioning, whereas in the asynchronous timing model, this is a necessary condition, and having also one common axis orientation is a sufficient condition. The main purpose of the current paper is to examine the potential effects of simple modifications to the common distributed model of robot swarms. We argue that the common model is too extreme, and makes it difficult to perform and analyze even the most basic tasks. In contrast, some simple and natural modifications, consistent with readily available technology, render many useful tasks easier, and thus free algorithm designers to focus their efforts on more involved and potentially more practical tasks. Since the partitioning problem is deterministically unsolvable in the nocompass model, it becomes a natural candidate for this type of study. In what follows we examine a number of such modifications. First, we prove that if the initial configuration is not symmetric, then partitioning is achievable even in the no-compass asynchronous model. We then observe that if the robots are identifiable, then the problem has an easy solution. In fact, the problem has a deterministic solution even in a setting where only one robot is identifiable, and the rest are identical. Finally, we show that adding a signalling device, in
72
A. Efrima and D. Peleg
effect amounting to one bit of memory and communication, makes the problem solvable by a randomized algorithm against a non-adaptive adversary. The probability of success depends on the number of random bits the robots are allowed to use in their computations; adding two such memory and communication bits increases this probability to 1.
2
The Model
The basic model studied in previous papers, e.g., [22], [23], [15], [7], can be summarized as follows. Each of the robots executes the same algorithm in cycles, with each cycle consisting of three steps: 1. “Look”: Determine the current configuration by identifying the location of all visible robots and marking them on the robot’s private coordinate system. 2. “Compute”: Execute the algorithm, resulting in a goal point p˜. 3. “Move”: Travel towards the point p˜. The robot might stop before reaching p˜, but is guaranteed to traverse at least a minimal distance unit s (unless reaching the goal first). The value of s is not known to the robots and they cannot use it in their computations. In most papers in the area (cf. [20], [22], [21], [11], [5]), the robots are assumed to be dimensionless, namely, treated as points that do not obstruct each other’s visibility or movement, and oblivious or memoryless, namely, do not remember their previous actions or the previous positions of the other robots, and therefore cannot rely on information from previous cycles, or have alternating states. Also, the robots are indistinguishable and cannot identify any of their peers. Moreover, the robots have no means of explicit communication. On the other hand, the robots are assumed to possess unlimited visibility, and sensors, computations and movements are assumed to be accurate. With respect to timing, the following models were considered in the literature. In the fully-synchronous model, the robots are driven by an identical clock and hence operate according to the same cycles, and are active in every cycle. In the semi-synchronous model, the robots operate according to the same cycles, but need not be active in every cycle. A fairness constraint guarantees that each robot will eventually be active (infinitely many times) in any infinite execution. In the asynchronous model, the robots operate on independent cycles of variable length. Formally, this can be modeled by each cycle starting with an additional “Wait” step. In this paper we focus on the attribute of orientation, referring to the local views of the robots in terms of their x-y coordinates. Elaborating on [15], the following sub-models of common orientation levels were considered in [10]. – Full-compass: Directions and orientations of both axes are common to all robots. – Half-compass: Directions of both axes are common to all robots, but the positive orientation of only one axis is common. (i.e., in the other axis, different robots may have different views of the positive orientation).
Distributed Models and Algorithms for Mobile Robot Systems
73
– Direction-only: Directions of both axes are common to all robots, but the positive orientations of the axes are not common. – Axes-only: Directions of both axes are common to all robots, but the positive orientations of the axes are not common. In addition the robots do not agree on which of the two axes is the x axis and which is the y axis. – No-compass: There are no common axes. In the no-compass and half-compass sub-models, the robots do not share the notion of “clockwise” or “right hand side”. Note that the robots do not share a common unit distance or a common origin point even in the full-compass model. For randomized algorithms, one may consider two possible adversary types. An adaptive adversary is allowed to make its decisions after learning the (possibly randomized) choices made by the algorithm. This means that in each cycle, first the robot computes its goal position, and then the adversary chooses the maximal distance the robot will reach in the direction of its goal point. In contrast, a non-adaptive adversary must make its decisions independently of the random choices of the algorithm. Namely, in each cycle, the adversary chooses the maximal distance the robot will reach before the robot computes its goal point (i.e., before knowing the direction in which the robot will move, which may be chosen randomly by the algorithm). Note that despite its name, there is some adaptiveness even in the non-adaptive adversary, since it still has control over the timing of the robots.
3
Previous Work
Much of the literature on distributed control algorithms for autonomous mobile robots has concentrated on the two basic tasks of gathering and convergence. Gathering requires the robots to occupy a single point within finite time, regardless of their initial configuration. Convergence is the closely related task in which the robots are required to converge to a single point, rather than reach it. More precisely, for every ε > 0 there must be a time tεby which all robots are within a distance of at most ε of each other. The problem of gathering autonomous mobile robots was studied extensively in two computational models. The first was the semi-synchronous model, introduced by Suzuki et al. [20], [23], and the second is the closely related CORDA model described by Prencipe et al. [15], [16], which is equivalent to our asynchronous model. The gathering problem was first discussed in [22], [23] in the semi-synchronous model. It was proven there that it is impossible to gather two oblivious autonomous mobile robots that have no common sense of orientation under the semi-synchronous model. The algorithms presented therein for n ≥ 3 robots rely on the assumption that a robot can identify a point p∗ occupied by two or more robots (a.k.a. multiplicity point). This assumption was later proven to be essential for achieving gathering in all asynchronous and semi-synchronous models [17], [18]. Under this assumption, an algorithm is developed in [23] for
74
A. Efrima and D. Peleg
gathering n ≥ 3 robots in the semi-synchronous model. In the asynchronous model, an algorithm for gathering n = 3, 4 robots is presented in [17], and an algorithm for gathering n ≥ 5 robots has been described in [4]. We use a similar assumption, stating that a robot can tell the number of robots in a multiplicity point. In [3] a gathering algorithm was given in a model in which the above assumption has been replaced by equipping the robots with an unlimited amount of memory. Some studies try to characterize the class of geometric patterns that the robots can form in various models. The effect of common orientation on the class of achievable geometric patterns (in the asynchronous model) is summarized in [15]. In the full-compass model, the robots can form an arbitrary given pattern. In the half-compass model the robots can form an arbitrary pattern only when n is odd (this is shown in [17] to hold also in a model in which the robots share axis directions only). In the no-compass model, with no common orientation, the robots cannot form an arbitrary given pattern. The class of patterns achievable by an even number of robots in the half-compass model is characterized in [14]. Non-oblivious robots in the semi-synchronous model are examined in [21], [23]. The problem of agreement on a common x-y coordinate system is shown to be reducible to that of forming certain geometric patterns. The robots are always capable of agreeing on both the origin and a unit distance in this model, thus the difficulty lies in agreement on direction. Considering the different timing models of the system, it is known that problems solvable in the asynchronous timing model are solvable in the semisynchronous model, and that problems solvable in the semi-synchronous model are solvable in the fully-synchronous model (cf. Theorem 3.1 in [15]). Moreover, an algorithm for the asynchronous model works also in the semi-synchronous model, and an algorithm for the semi-synchronous model works also in the fullysynchronous model. The convergence properties of Algorithm Go to COG are explored in [7], [6]. In this simple algorithm a robot sets its goal point to be Cog(P ), i.e., the center of gravity of all observed robot positions. Algorithm Go to COG is used extensively in the current paper, and our proofs use some of the following results. In [7] it is proven that the algorithm converges in the fully- and semi-synchronous models. In [6] it is proven to converge in the asynchronous model as well. In addition, the convergence rate is established in the fully-synchronous model. The number of cycles it takes to achieve gathering in the fully-synchronous model (in two dimensions) is O(h/s), where h is the maximal width of the convex hull at the beginning of the execution, and s is the minimal movement distance unit. Convergence and gathering with inaccurate sensors and movements are examined in [8]. Gathering is shown to be impossible for robots with inexact measurements, while a variant of Algorithm Go to COG is shown to converge for sufficiently small errors in measurements. An algorithm for partitioning is given in [20]. That algorithm uses a previous algorithm presented for flocking. It does not comply with the models presented above, mainly because it requires outside intervention (i.e., it requires an
Distributed Models and Algorithms for Mobile Robot Systems
75
outside supervisor to move a few robots which the others will follow). Moreover, the robots are not indistinguishable, and the algorithm operates in two stages, thus requiring some memory. The partitioning problem was recently revisited in [10], which focused on understanding the effects of common orientation on the solvability of the problem. In particular, that paper presents partitioning algorithms for various levels of common orientation and different timing models, as discussed in Section 1.
4
The Partitioning Problem
In this paper we consider the problem Partition(n, k), in which n robots must divide themselves into k size-balanced subsets. The robots in each subset must converge, so that some minimal distance is kept between robots of different subsets. We use the following basic definitions. Let dist(a, b) denote the Euclidean distance between points a and b. For sets of points X and Y , denote dist(X, Y ) = min{dist(x, y) | x ∈ X, y ∈ Y }. Denote the position of robot ri at time t as pi [t] = (xi [t], yi [t]). (We sometimes omit the parameter t when no confusion arises.) Denote the set of all robot positions at time t as P [t]. For a set of n points P ={(xi , yi ) | 1 ≤ i ≤ n}, define the center of gravity of P as Cog(P ) = ( i xi /n, i yi /n). Formally, the partitioning problem Partition(n, k) is defined as follows.
Input: A set of n robots R = {r1 , . . . , rn }, positioned in a 2-dimensional space, with initial positions PI = P [t0 ] = {p1 [t0 ], . . . , pn [t0 ]}, and an integer k. We assume that n is divisible by k and define m = n/k. Goal: For some fixed η > 0, for every η ≥ ε > 0, there is a time tε , such that for every time t > tε , R can be partitioned into k disjoint subsets S1 , . . . , Sk satisfying the following: k – Partition: R = i=1 Si and Si ∩ Sj = ∅ for every i = j. – Size-balance: The subsets are balanced, i.e., |Si | = m for every i. – Proximity: Robots in the same subset are within ε of each other, i.e., dist(rw , rl ) < ε for every i and for every rw , rl ∈ Si . – Separation: Robots in different subsets are farther than 2η apart, i.e., dist(Si , Sj ) > 2η for every i = j.
Robots are treated as dimensionless points, yet we make the following assumption. Non-overlap: No two robots have the same initial position, i.e., pi [t0 ] = pj [t0 ] for every i = j. In the general case, in which n is not divisible by k, define m = ⌊n/k⌋ and require that the subsets are nearly-balanced, i.e., m ≤ |Si | ≤ m + 1 for every i. The choice of the separation distance as 2η is arbitrary, and any clear separation between the subsets will do. In practice, we may set η = 12 dmin , where dmin is the minimal distance between any two robots at time t0 . Note that requiring the conditions to hold on every time t > tε implies that the subsets Si
76
A. Efrima and D. Peleg
do not change after time tη . Note also that the convergence problem discussed earlier may be considered as the special case Partition(n, 1) of the partitioning problem. The partitioning problem also partly relates to the problem of forming geometric patterns.
5
Basic Algorithm for the Full Compass Model
In this section we review the basic deterministic algorithm Part of [10] for solving the Partition(n, k) problem in the full-compass model. In this model the robots share common x and y axes (directions and orientations). Algorithm Part works in the asynchronous model, hence it also applies in the semi- and fullysynchronous models. The availability of a full compass permits a solution based on an ordering of the robots. Define the order relation 0’. 9 (Deformation robustness) For each attainable ǫ > 0, there is an open set F of homeomorphisms sufficiently close to the identity, such that d(f (A), A) < ǫ for all f ∈ F . 10 (Noise robustness) For shapes in R2 , noise is an extra region anywhere in the plane, and robustness can be defined as: for each x ∈ (R2 − A), and each attainable ǫ > 0, an open neighborhood U of x exists such that for all B, B −U = A−U implies d(A, B) < ǫ. When we consider contours, we interpret noise as an extra region attached to any location on the contour, and define robustness similarly. 3.2
Multiple Polyline to Polygon Matching
There is evidence that, for the task of object recognition, the human visual system uses a part-based representation. Biederman [4], for example, suggested that objects are segmented at regions of deep concavity into an arrangement of simple geometric components. For the retrieval of polygonal shapes, we have therefore developed an algorithm to search for the best matching polygon, given one or more query parts. This dissimilarity measure models partial matching, is translation and rotation invariant, and deformation robust.
144
R.C. Veltkamp
Let P1 be a polyline, and let P1 (s) be the point on P1 at distance s along the polyline from its beginning. The turning-angle function Θ1 of a polyline P1 measures the angle of the counterclockwise tangent at P1 (s) with respect to a reference orientation as a function of s. It is a piecewise constant function, with jumps corresponding to the vertices of P1 . The domain of the function is [0, ℓ1 ], where ℓ1 is the length of P1 . Rotating P1 by an angle θ corresponds to shifting Θ1 over a distance θ in the vertical direction. The turning-angle function ΘP of a polygon P is defined in the same way, except that the distance s is measured by going counterclockwise around the polygon from an arbitarily chosen reference point. Since P is a closed polyline, we can keep going around the polygon, and the domain of ΘP can thus be extended to the entire real line, where ΘP (s + ℓP ) = ΘP (s) + 2π. Moving the location of the reference point over a distance s along the boundary of P corresponds to shifting ΘP horizontally over a distance s. To measure the mismatch between P1 and the part of P starting at P (t), we align P1 (0) with P (t) by shifting the turning-angle function of P over a distance t and computing the L2 -distance between the two turning-angle functions, minimized over all possible rotations θ (that is: vertical shiftings of the turning functions). The squared mismatch between P1 and P , as a function of t, is thus given by: ℓ1 (ΘP (s + t) − Θ1 (s) + θ)2 ds. (1) d1 (t) := min θ∈R
0
An ordered set of k polylines {P1 , P2 , . . . , Pk } can be represented by concatenating the turning-angle functions of the individual polylines. Thus we get a function ΘPL : [0, ℓk ] → R, where ℓj is the cumulative length of polylines P1 through Pj . For 1 ≤ j ≤ k and ℓj−1 ≤ s ≤ ℓj we have ΘPL (s) := Θj (s − ℓj−1 ), so that each polyline Pj is represented by the section of ΘPL on the domain [ℓj−1 , ℓj ]. The squared mismatch between Pj and P (shifted by t) is now given by: ℓj dj (t) := min (ΘP (s + t) − ΘPL (s) + θ)2 ds. (2) θ∈R
ℓj−1
We now express the mismatch between the set of polylines {P1 , P2 , . . . , Pk } and P as the square root of the sum of squared mismatches between each polyline and P , minimized over all valid shiftings:
d(P1 , . . . , Pk ; P ) :=
min
valid shiftings t1 . . . tk
⎛ ⎝
k j=1
⎞1/2
dj (tj )⎠
.
(3)
It remains to define what the valid shiftings are. To keep the polylines disjoint (except possibly at their endpoints) and in counterclockwise order around the polygon, each polyline has to be shifted at least as far as the previous one, that is: tj−1 ≤ tj for all 1 < j ≤ k. Furthermore, to make sure that Pk does not wrap around the polygon beyond the starting point of P1 , we have to require that ℓk + tk ≤ t1 + ℓP (see figure 4).
Multimedia Retrieval Algorithmics
145
Fig. 4. To match polylines P1 , . . . , P3 to polygon P , we shift the turning functions of the polylines over the turning function of the polygon. To maintain the order of the polylines around the polygon, we need to guarantee t1 ≤ t2 ≤ t3 and ℓ3 + t3 ≤ t1 + ℓP .
In [5] we show that the optimal placement and the distance value can be computed in O(km2 n2 ) time with a straightforward dynamic programming algorithm, and in O(kmn log(mn)) time and space with a novel fast algorithm.
4
Experimental Evaluation
In order to compare different dissimilarity measures, we can look at the formal properties they have, such as listed in section 3.1. Another way is to evaluate how well they perform in practice on a specific task. One way to make such comparisons is on the basis of a chosen ground truth. The Motion Picture Expert Group (MPEG), a working group of ISO/IEC (see http://www.chiariglione. org/mpeg/) has defined the MPEG-7 standard for description and search of audio and visual content. The data set created by the MPEG-7 committee for evaluation of shape similarity measures [6,7] offers an excellent possibility for objective experimental comparison of the existing approaches evaluated based on the retrieval rate. The shapes were restricted to simple pre-segmented shapes defined by their outer closed contours. The goal of the MPEG-7 Core Experiment CE-Shape-1 was to evaluate the performance of 2D shape descriptors under change of a view point with respect to objects, non-rigid object motion, and noise. In addition, the descriptors should be scale and rotation invariant. The test set consists of 70 different classes of shapes, each class containing 20 similar objects, usually (heavily) distorted versions of a single base shape. The whole data set therefore consists of 1400 shapes. For example, each row in figure 5 shows four shapes from the same class.
146
R.C. Veltkamp
Fig. 5. Example images from the MPEG-7 Core Experiment CE-Shape1 part B
Fig. 6. Images with the same name prefix belong to the same class
We focus our attention on the performance evaluation of shape descriptors in experiments established in Part B of the MPEG-7 CE-Shape-1 data set [6]. Each image was used as a query, and the retrieval rate is expressed by the so called Bull’s Eye score: the fraction of images that belong to the same class in the top 40 matches. Strong shape variations within the same classes make that no shape similarity measure achieves a 100% retrieval rate. E.g., see the third row in figure 5 and the first and the second rows in figure 6. The third row shows spoons that are more similar to shapes in different classes than to themselves. A region-based and a contour-based shape similarity method are part of the MPEG-7 standard. The contour-based method is the Curvature Scale Space (CSS) method [8]. This technique matches two shapes based on their CSSimage, which is constructed by iteratively convolving the contour with a Gaussian smoothing kernel, until the shape is completely convex. When at a certain iteration a curvature zero-crossing disappears due to the convolution process, a peak is created in the CSS-image. Two shapes are now matched by comparing the peaks in their CSS-images. The multiple polyline to polygon matching algorithm of section 3.2 has been implemented in C++ and is evaluated in a part-based shape retrieval application (see http://give-lab.cs.uu.nl/Matching/Mtam/) with the Core Experiment CE-Shape-1 part B test set. We compared our matching to the CSS method, as well as to matching the global contours with turning angle functions (GTA) with respect to the Bulls Eye score. These experimental results indicate that for those classes with a low performance of the CSS matching, our approach consistently performs better. See figure 7 for two examples. The interactive selection of part to query with, makes a comparison on all images from the test set infeasible, but a rigorous experimental evaluation is given in [9]. The running time for a single query on the MPEG-7 test set of 1400 images is typically about one second on a 2 GHz PC.
Multimedia Retrieval Algorithmics
147
Fig. 7. A comparison of the Curvature Scale Space (CSS), the Global Turning Angle function (GTA), and our Multiple Polyline to Polygon (MPP) matching
In order to compare the performance of various similarity measures, we built the framework SIDESTEP – Shape-based Image Delivery Statistics Evaluation Project, http://give-lab.cs.uu.nl/sidestep/. Performance measures such as the number of true/false positives, true/false negative, specificity, precision, recall, negative predicted value, relative error, k-th tier, total performance, and Bull’s Eye score can be evaluated for a single query, over a whole class, or over a whole collection, see figure 8. In [10] we have compared many dissimilarity measures on the basis of their formal properties, as well as on their performance in terms of the Bull’s Eye score on the MPEG-7 test collection. The difference between the Bull’s Eye scores of these dissimilarity measures as reported in the literature, and the performances of the reimplement methods in SIDESTEP is significant. Our conjecture is that this is caused by the following. Firstly, several methods are not trivial to implement, and are inherently complex. Secondly, the description in the literature is often not sufficiently detailed to allow a straightforward implementation. Thirdly, fine tuning and engineering has a large impact on the performance for a specific data set. It would be good for the scientific community if the reported test results are made reproducible and verifiable by publishing data sets and software along with the articles. The MPEG-7 test set provides a strict classification, which is not always available. The ground truth developed in [11] was used at the “1st Annual Music Information Retrieval Evaluation eXchange” (MIREX) 2005 for comparing various methods for measuring melodic similarity for notated music. This ground truth does not give one single correct order of matches for every query. One reason is that limited numbers of experts do not allow statistically significant differences in ranks for every single item. Also, for some alternative ways of altering a melody, human experts simply do not agree on which one changes the melody more. See figure 9 for an example. In cases like this, even increasing the number of experts might not always avoid situations where the ground truth
148
R.C. Veltkamp
Fig. 8. SIDESTEP interface
contains only groups of matches whose correct order is reliably known, while the correct order of matches within the groups is not known. Here, the 31 experts we asked do not agree on whether the second or the third piece is more similar to the query. The third piece is shorter, but otherwise identical to the query, while the second one contains more musical material from the query, but two ties are missing. In [11] we proposed a measure (called “average dynamic recall”) that measures, at any point in the result list, the recall among the documents that the user should have seen so far. Unlike Kek¨ al¨ ainen’s and J¨ arvelin’s measures [12], this measure only requires a partially ordered result list as ground truth, but no similarity scores, and it works without a binary relevance scale. It does not have any parameters that can be chosen arbitrarily, and it is easy to interpret. Consider a result list R1 , R2 , . . . and a ground truth of g groups of items (G11 , G12 , . . . , G1m1 ), (G21 , . . . , G2m2 ), . . . , (Gg1 , . . . , Ggmg ) (with mi denoting the number of members of group i) where we know that rank(Gij ) < rank(Gkl ) if and only if i < k, but we do not know whether rank(Gij ) < rank(Gip ) for any i (unless j = p). We propose to calculate the g result quality as follows. Let n = i=1 mi be the number of matches in the
Multimedia Retrieval Algorithmics
149
Query: Peter von Winter (1754-1825): Domus Israel speravit, RISM A/II signature: 600.054.278
1. Peter von Winter: Domus Israel speravit, 600.054.278
2. Peter von Winter : Domus Israel speravit, 600.055.822 3. Anonymus: Offertories, 450.040.980 Fig. 9. Ground truth for Winter: “Domus Israel speravit”
ground truth and the group that contains the ith item in the c c the number of c−1 ground truth ( v=1 mv ≥ i ∧ v=1 mv < i). Then we can define ri , the recall after the item Ri , as: ri =
#{Rw |w ≤ i ∧ ∃j, k : j ≤ c ∧ Rw = Gjk } . i
The result quality is then defined as: n
ADR =
1 ri . n i=1
This measure was used at the MIREX 2005 and 2006 competitions for symbolic melodic similarity, and the 3D shape retrieval contest (SHREC) 2006.
5
Indexing
Proximity searching in multimedia databases has gained more and more interest over the years. In particular searching in dissimilarity spaces (rather than extracting a feature vector for each database object) is an increasing area of research. With growing multimedia databases indexing has become a necessity. Vantage indexing works as follows: given a multimedia database A and a distance measure d : A × A → R, select from the database a set of m objects A∗ = {A∗1 , ...A∗m }, the so called vantage objects. Compute the distance from each database object Ai to each vantage object, thus creating a point pi = (x1 , ...xm ), such that xj = d(Ai , A∗j ). Each database object corresponds to a point in the m-dimensional vantage space. A query on the database now translates to a range-search or a nearestneighbor search in this m-dimensional vantage space: compute the distance from
150
R.C. Veltkamp
the query object q to each vantage object (i.e. position q in the vantage space) and retrieve all objects within a certain range around q (in the case of a range query), or retrieve the k nearest neighbors to q (in case of a nearest neighbor query). The distance measure used on the points in vantage space is L∞ . Vleugels and Veltkamp show [13] that as long as the triangle inequality holds for the distance measure d defined on the database objects, recall (ratio of number of relevant retrieved objects to the total number of relevant objects in the whole data base) is 100%, meaning that there are no false negatives. However, false positives are not excluded from the querying results, so precision (ratio of number of relevant retrieved objects to the total number of retrieved objects) is not necessarily 100%. We claim that by choosing the right vantage objects, precision can increase significantly. The retrieval performance of a vantage index can improve significantly with a proper choice of vantage objects. This improvement is measured in terms of false positives, as defined below. Let δ be the distance measure in vantage space. Definition 1. Return set Given ǫ > 0 and query Aq , object Ai is included in the return set of Aq if and only if δ(Aq , Ai ) ≤ ǫ. Definition 2. False positive Ap is a false positive for query Aq if δ(Aq , Ap ) ≤ ǫ and d(Aq , Ap ) > ǫ. We present a new technique for selecting vantage objects that is based on two criteria which address the number of false positives in the retrieval results directly. The first criterion (spacing) concerns the relevance of a single vantage object, the second criterion (correlation) deals with the redundancy of a vantage object with respect to the other vantage objects. We call this method Spacing-based Selection. The main idea is to keep the number of objects that are returned for a query Aq and range ǫ low. Since false negatives are not possible under the condition that the triangle inequality holds for d, minimization of the number of false positives is achieved by spreading out the database along the vantage space as much as possible. False positives are, intuitively speaking, pushed out of the returned sets. 5.1
Spacing
In this section we will define a criterion for the relevance of a single vantage object Vj . A priori the query object Aq is unknown, so the distance d(Aq , Vj ) between a certain query Aq and vantage object Vj is unknown. The size of the range query (ǫ) is unknown beforehand as well. Optimal performance (achieved by small return sets given a query Aq and range ǫ) should therefore be scored over all possible queries and all possible ranges ǫ. This is achieved by avoiding clusters on the vantage axis belonging to Vj . Our first criterion therefore concerns the spacing between objects on a single vantage axis, which is defined as follows: Definition 3. The spacing between two consecutive objects Ai and Ai+1 on the vantage axis of Vj is d(Ai+1 , Vj ) − d(Ai , Vj ).
Multimedia Retrieval Algorithmics
151
Let μ be the average spacing. Then the variance of spacing is given by n−1 1 2 i=1 ((d(Ai+1 , Vj ) − d(Ai , Vj )) − μ) . To ensure that the database objects n−1 are evenly spread in vantage space, the variance of spacing has to be as small as possible. A vantage object with a small variance of spacing has a high discriminative power over the database, and is said to be a relevant vantage object. 5.2
Correlation
It is not sufficient to just select relevant vantage objects, they also should be nonredundant. A low variance of spacing does not guarantee that the database is well spread out in vantage space, since the vantage axes might be strongly correlated. Therefore, we compute all linear correlation coefficients for all pairs of vantage objects and make sure these coefficients do not exceed a certain threshold. Experiments show that on the MPEG-7 shape images set pairwise correlation is sufficient and that higher order correlations are not an issue. 5.3
Algorithm
Spacing-based Selection selects a set of vantage objects according to the criteria defined above with a randomized incremental algorithm. The key idea is to add the database objects one by one to the index while inspecting the variance of spacing and correlation properties of the vantage objects after each object has been added. As soon as either the variance of spacing of one object or the correlation of a pair of objects exceeds a certain threshold, a vantage object is replaced by a randomly chosen new vantage object. These repair steps are typically necessary only at early stages of execution of the algorithm, thus keeping the amount of work that has to be redone small. For details, see the algorithm in figure 10. The complexity of our algorithm is expressed in terms of distance calculations, since these are by far the nmost expensive part of the process. The running time complexity is then O( i=0 Pi × i + (1 − Pi ) × k) where k is the (in our case constant) number of vantage objects and Pi is the chance that, at iteration i, a vantage object has to be replaced by a new one. This chance depends on the choice for ǫspac and ǫcorr . There is a clear trade-off here: the stricter these threshold values are, the better the selected vantage objects will perform but also the higher the chance a vantage object has to be replaced, resulting in a longer running time. If we only look at spacing and set ǫspac such that, for instance, Pi is (log n)/i, the running time would be O(nlog n) since k is a small constant (8 in our experiments). 5.4
Experimental Evaluation
We implemented our algorithm and tested it on MPEG-7 test set CE-Shape-1 part B, and the distance measure used to calculate the distance between two of these shape images is the Curvature Scale Space (CSS), discussed in section 4. To justify our criteria, we manually selected four sets of eight vantage objects that
152
R.C. Veltkamp
Input: Database A with objects A 1 , ..., An , d(A, A) → R, thresholds ǫcorr and ǫspac Output: Vantage Index with Vantage objects V1 , V2 , ..., Vm 1: select initial V1 , V2 , ..., Vm randomly 2: for All objects Ai do in random order 3: for All objects Vj do 4: compute d(Ai , Vj ) 5: add Ai to index 6: if var(Spacing)(Vj ) > ǫspac then 7: remove Vj 8: select new vantage object randomly 9: if for any pair p(Vk , Vl ), Corr(Vk , Vl )> ǫcorr then 10: remove p’s worst spaced object 11: select new vantage object randomly Fig. 10. Spacing-based Selection
either satisfy both criteria (weakest correlation and lowest variance of spacing: weak-low ), none (strongest correlation and highest variance of spacing: stronghigh) or a strong-low or weak-high combination. The performance of these four sets of vantage objects was evaluated by querying with all 1400 objects. The number of nearest neighbors that was retrieved for each query object varied from 1 to 200. The distance of the furthest nearest neighbor functioned as ǫ, which was used to calculate the number of false positives among these nearest neighbors, see Definition 2. For each vantage index, and all k-NN queries, k = 1, ..., 200, an average ratio of false positives in result was calculated over all 1400 queries. The results are displayed in figure 11, together with some typical runs of our algorithm, the “MaxMin” approach [13] and the “loss-based” approach [14]. These results show that both criteria need to be satisfied in order to achieve good performance (only the set called weak-low scores less than 50% false positives for all sizes of nearest neighbor query). Furthermore, it shows that our algorithm can actually select a set of vantage objects in which these criteria are satisfied, since false positive ratios are low for these sets. For more details, see [15].
6
Concluding Remarks
Motivated by the need for perceptually relevant multimedia algorithmics, we looked at properties of shape dissimilarity measures, showed a framework for the experimental performance evaluation of dissimilarities (SIDESTEP), and introduced a new performance measure (Average Dynamic Recall). Because in human perception the parts of objects play an important role, we developed a dissimilarity measure for multiple polyline to polygon matching, and designed an efficient algorithms to compute it. We then introduced a way to decrease the number of false positive retrievals by selecting vantage objects for indexing on the basis of an objective function that has a direct relation with the number of false positives, rather than by a heuristic.
Multimedia Retrieval Algorithmics
153
Fig. 11. MPEG-7: false positive ratios
This paper primarily shows examples in the domain of image retrieval, but we have taken a similar approach to music retrieval. As a dissimilarity measure we have designed the Proportional Transportation Distance [16], a normalized version of the Earth Mover’s Distance [17]. It satisfies the triangle inequality, which makes it suitable for indexing with the vantage method. Indeed, we have used it in combination with the vantage indexing method in our music retrieval systems Muugle (http://give-lab.cs.uu.nl/muugle) [18] and Orpheus (http: //give-lab.cs.uu.nl/orpheus/). The vantage indexing made it possible to identify anonymous incipits (beginnings of pieces, for example twenty notes long) from the RISM A/II collection [19] consisting of about 480,000 incipits [20]. All 80,000 anonymous incipits were compared to the remaining 400,000 ones, giving a total of 32,000,000,000 comparisons. Should a single comparison take 1 ms, this would have taken about 370 days. The vantage indexing made it possible to do this within a day on a 1 GHz PC. A total of 17,895 incipits were identified. Acknowledgment. I want to thank all persons I have worked with on multimedia retrieval, and with whom the results reported here are obtained. In particular I thank Martijn Bosma, Panos Giannopoulos, Herman Haverkort, Reinier van Leuken, Mirela Tanase, Rainer Typke, and Frans Wiering. This research was supported by the FP6 IST projects 511572-2 PROFI and 506766 AIM@SHAPE.
References 1. Smeulders, A.W., Worring, M., Santini, S., Gupta, A., and Jain, R.: ContentBased Image Retrieval at the End of the Early Years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 12 (2000) 1349–1380 2. Veltkamp, R.C. and Tanase, M.: A Survey of Content-Based Image Retrieval Systems. In Marques, O., Furht, B., (eds): Content-Based Image and Video Retrieval, Kluwer (2002) 47–101
154
R.C. Veltkamp
3. Moret, B.: Towards a Discipline of Experimental Algorithmics. In Goldwasser, M., Johnson, D., McGeoch, C., (eds): Data Structures, Near Neighbor Searches, and Methodology: Fifth and Sixth DIMACS Implementation Challenges. DIMACS Monographs 59, American Mathematical Society (2002) 197–213 4. Biederman, I.: Recognition-by-Components: A Theory of Human Image Understanding. Psychological Review 94 2 (1987) 115–147 5. Tanase, M., Veltkamp, R.C., and Haverkort, H.: Multiple Polyline to Polygon Matching. In: Proceedings 16th Annual Symposium on Algorithms and Computation (ISAAC), LNCS 3827 (2005) 60–70 6. Bober, M., Kim, J.D., Kim, H.K., Kim, Y.S., Kim, W.Y., and Muller, K.: Summary of the Results in Shape Descriptor Core Experiment, iso/iec jtc1/sc29/wg11/mpeg99/m4869 (1999) 7. Latecki, L.J., Lakaemper, R., and Eckhardt, U.: Shape Descriptors for Non-Rigid Shapes with a Single Closed Contour. In: Proc. Conference on Computer Vision and Pattern Recognition (CVPR) (2000) 424–429 8. Mokhtarian, F., Abbasi, S., and Kittler, J.: Efficient and Robust Retrieval by Shape Content through Curvature Scale Space. In: Proceedings of IDB-MMS’96 (1996) 35–42 9. Tanase, M.: Shape Deomposition and Retrieval. PhD Thesis, Utrecht University, Department of Computer Science (2005) 10. Veltkamp, R.C. and Latecki, L.J.: Properties and Performances of Shape Similarity Measures. In: Batagelj et al. (eds.), Data Science and Classification, Proceedings of the IFCS06 Conference, Spinger (2006) 47–56 11. Typke, R., Veltkamp, R.C., and Wiering, F.: A Measure for Evaluating Retrieval Techniques Based on Partially Ordered Ground Truth Lists. In: Proceedings International Conference on Multimedia & Expo (ICME) (2006) 12. J¨ arvelin, K. and Kek¨ al¨ ainen, J.: Cumulated Gain-Based Evaluation of IR Techniques. ACM Transactions on Information Systems 20 4 (2002) 422–446 13. Vleugels, J. and Veltkamp, R.C.: Efficient Image Retrieval through Vantage Objects. Pattern Recognition (2002) 69–80 14. Henning, C. and Latecki, L.J.: The Choice of Vantage Objects for Image Retrieval. Pattern Recognition (2003) 2187–2196 15. van Leuken, R.H., Veltkamp, R.C., and Typke, R.: Selecting Vantage Objects for Similarity Indexing. In: Proceedings of the 18th International Conference on Pattern Recognition (ICPR) (2006) 16. Giannopoulos, P. and Veltkamp, R.C.: A Pseudo-Metric for Weighted Point Sets. In: Proceedings European Conference on Computer Vision (ECCV 2002), Springer, LNCS 2352 (2002) 715–730 17. Rubner, Y.: Perceptual Metrics for Image Database Navigation. PhD Thesis, Stanford University, Department of Computer Science (1999) 18. Bosma, M., Veltkamp, R.C., and Wiering, F.: Muugle: A Music Retrieval Experimentation Framework. In: Proceedings of the 9th International Conferncen on Music Perception and Cognition (2006) 1297–1303 19. R´epertoire International des Sources Musicales (RISM): Serie A/II, manuscrits musicaux apr`es 1600. K. G. Saur Verlag, M¨ unchen, Germany (2002) 20. Typke, R., Giannopoulos, P., Veltkamp, R.C., Wiering, F., and van Oostrum, R.: Using Transportation Distances for Measuring Melodic Similarity. In: Proceedings of the Fourth International Conference on Music Information Retrieval (ISMIR 2003) (2003) 107–114
Size of Quantum Finite State Transducers Ruben Agadzanyan and R¯ usi¸ nˇs Freivalds Institute of Mathematics and Computer Science, University of Latvia Rai¸ na bulv. 29, R¯ıga, Latvia
[email protected],
[email protected]
Abstract. Sizes of quantum and deterministic finite state transducers are compared in the case when both quantum and deterministic finite state transducers exist. The difference in size may be exponential.
1
Introduction
We start by reviewing the concept of probabilistic finite state transducer. For a finite set X we denote by X ∗ the set of all finite strings formed from X, the empty string is denoted ǫ. Definition 1. A probabilistic finite state transducer (pfst) is a tuple T = (Q, Σ1 , Σ2 , V, f, q0 , Qacc , Qrej), where Q is a finite set of states, Σ1 , Σ2 is the input/output alphabet, q0 ∈ Q is the initial state, and Qacc , Qrej ⊂ Q are (disjoint) sets of accepting and rejecting states, respectively. (The other states, forming set Qnon , are called non–halting). The transition function V : Σ1 ×Q → Q is such that for all a ∈ Σ1 the matrix (Va )qp is stochastic, and fa : Q → Σ2∗ is the output function. If all matrix entries are either 0 or 1 the machine is called a deterministic finite state transducer (dfst). The meaning of this definition is that, being in state q, and reading input symbol a, the transducer prints fa (q) on the output tape, and changes to state p with probability (Va )qp , moving input and output head to the right. After each such step, if the machine is found in a halting state, the computation stops, accepting or rejecting the input, respectively. To capture this formally, we introduce the total state of the machine, which is an element (PNON , PACC , prej) ∈ ℓ1 (Q × Σ2∗ ) ⊕ ℓ1 (Σ2∗ ) ⊕ ℓ1 ({REJ}), with the natural norm (PNON , PACC , prej ) = PNON 1 + PACC 1 + |prej |. and p′rej = prej + p∈Qrej (Va )qp . For a string x1 . . . xn the map Tx is just the concatenation of the Txi . Observe that all the Ta conserve the probability. Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 155–163, 2007. c Springer-Verlag Berlin Heidelberg 2007
156
R. Agadzanyan and R. Freivalds
Implicitely, we add initial and end marker symbols (‡, $) at the input, with additional stochastic matrices V‡ and V$ , executed only at the very beginning, and at the very end. We assume that V$ puts no probability outside Qacc ∪ Qrej . By virtue of the computation, to each input string v ∈ Σ1∗ there corresponds a probability distribution T (·|v) on the set Σ2∗ ∪ {REJ}: T (REJ|v) := T‡v$ ((q0 , ǫ), 0, 0)[REJ] is the probability to reject the input v, whereas T (w|v) := T‡v$ ((q0 , ǫ), 0, 0)[w] is the probability to accept, after having produced the output w. Definition 2. Let R ⊂ Σ1∗ × Σ2∗ . For α > 1/2 we say that T computes the relation R with probability α if for all v, whenever (v, w) ∈ R, then T (w|v) ≥ α, and whenever (v, w) ∈ R, then T (w|v) ≤ 1 − α For 0 < α < 1 we say that T computes the relation R with isolated cutpoint α if there exists ε > 0 such that for all v, whenever (v, w) ∈ R, then T (w|v) ≥ α+ε, but whenever (v, w) ∈ R, then T (w|v) ≤ α − ε. The definition of quantum finite-state transducers was introduced by R.Freivalds and A.Winter [5]. This definition is modelled after the one for pfst and after that for quantum finite state automata [8]: Definition 3. A quantum finite state transducer (qfst) is a tuple T = (Q, Σ1 , Σ2 , V, f, q0 , Qacc , Qrej), where Q is a finite set of states, Σ1 , Σ2 is the input/output alphabet, q0 ∈ Q is the initial state, and Qacc , Qrej ⊂ Q are (disjoint) sets of accepting and rejecting states, respectively. The transition function V : Σ1 ×Q → Q is such that for all a ∈ Σ1 the matrix (Va )qp is unitary, and fa : Q → Σ2∗ is the output function. Like before, implicitely matrices V‡ and V$ are assumed, V$ carrying no amplitude from Qnon to outside Qacc ∪ Qrej . The computation proceeds as follows: being in state q, and reading a, the machine prints fa (q) on the output tape, and moves to the superposition Va |q = p (Va )qp |p of internal states. Then a measurement of the orthogonal decomposition Enon ⊕ Eacc ⊕ Erej (with the subspaces Ei = span Qi ⊂ ℓ2 (Q), which we identify with their respective projections) is performed, stopping the computation with accepting the input on the second outcome (while observing the output), with rejecting it on the third. Here, too, we define total states: these are elements (|ψNON , PACC , prej ) ∈ ℓ2 (Q × Σ2∗ ) ⊕ ℓ1 (Σ2∗ ) ⊕ ℓ1 ({REJ}), with norm (|ψNON , PACC , prej ) = |ψNON 2 + PACC 1 + |prej |.
Size of Quantum Finite State Transducers
157
At the beginning the total state is (|q0 ⊗ |ǫ , 0, 0), the total state transformations, for |ψ = |q ⊗ |ωq , with |ωq = αqw |w , q∈Q
w∈Σ2∗
are (for a ∈ Σ1 ) Ta : (|ψ , PACC , prej ) → where |ωq fa (q) =
w
Enon
q
Va |q ⊗
′ |ωq fa (q) , PACC , p′rej
,
αqw |wfa (q), and
2 ′ PACC (x) = PACC (x) + Eacc αqw Va |q , q,w s.t. x=wfa (q) 2 p′rej
2 = prej + Erej Va |q ⊗ |ωq fa (q) . q 2
Observe that the Ta do not exactly preserve the norm, but that there is a constant γ such that Ta (X) ≤ γX for any total state X. Quite straightforwardly, the distributions T (·|v) are defined, and so are the concepts of computation with probability α or with isolated cutpoint α. Notice the physical benefits of having the output tape: whereas for finite automata a superposition of states means that the amplitudes of the various transitions are to be added, this is no longer true for transducers if we face a superposition of states with different output tape content. I.e. the entanglement of the internal state with the output may prohibit certain interferences. This will be a crucial feature in some of our later constructions.
2
Freivalds/Winter Results
Unlike the situation for finite automata, pfst are strictly more powerful than their deterministic counterparts: Theorem 1. [5] For arbitrary ε > 0 the relation R1 = {(0m 1m , 2m ) : m ≥ 0} can be computed by a pfst with probability 1 − ε. It cannot be computed by a dfst. Proof: For a natural number k choose initially an alternative j ∈ {0, . . . , k − 1}, uniformly. Then do the following: repeatedly read k 0’s, and output j 2’s, until the 1’s start (remember the remainder modulo k), then repeatedly read k 1’s, and output k − j 2’s. Compare the remainder modulo k with what you remembered: if the two are equal, output this number of 2’s and accept, otherwise reject.
158
R. Agadzanyan and R. Freivalds
It is immediate that on input 0m 1m this machine outputs 2m with certainty. ′ However, on input 0m 1m each 2n receives probability at most 1/k. That this cannot be done deterministically is straightforward: assume that a dfst has produced f (m) 2’s after having read m 0’s. Because of finiteness there are k, l such that after reading k 1’s (while n0 2’s were output) the internal state is the same as after reading l further 1’s (while n 2’s are output). So, the output for input 0m 1k+rl is 2f (m)+n0 +rn , and these pairs are either all accepted or all rejected. Hence they are all rejected, contradicting acceptance for m = k+rl. ⊓ ⊔ By observing that the random choice at the beginning can be mimicked quantumly, and that all intermediate computations are in fact reversible, we immediately get Theorem 2. [5] For arbitrary ε > 0 the relation R1 can be computed by a qfst with probability 1 − ε . ⊓ ⊔ Note that this puts qfst in contrast to quantum finite automata: in [2] it was shown that if a language is recognized with probability strictly exceeding 7/9 then it is possible to accept it with probability 1, i.e. reversibly deterministically. Theorem 3. [5] The relation R2 = {(w2w, w) : w ∈ {0, 1}∗} can be computed by a pfst and by a qfst with probability 2/3. Proof: We do this only for qfst (the pfst is obtained by replacing the unitaries involved by the stochastic matrices obtained by computing the squared moduli of √ the entries): let the input be x2y (other forms are rejected). With amplitude 1/ 3 each go to one of three ‘subprograms’: either copy x to the output, or y (and accept), or reject without output. ⊓ ⊔
3
When Deterministic Transducers Are Possible
We are interested in the following problem. Theorem 1 shows that for some functions a probabilistic fst may exist while deterministic fst does not exist. Now assume that a deterministic fst exists. How the sizes (the number of states) of the pfst and dfst are related. It seems natural to conjecture that the difference can be unlimitedly large. However we have failed to prove this. Moreover, now we conjecture that this is not the case. Theorem 4. 1) For arbitrary ε > 0 and for arbitrary k the relation Rk = {(0m 1m , 2m ) : 0 ≤ m ≤ k} can be computed by a pfst of size 2k + const with probability 1 − ε. 2)For arbitrary deterministic fst computing Rk the number of the states is not less than k.
Size of Quantum Finite State Transducers
159
Proof: The pfst performs the action of the pfst in the proof of Theorem 1 and, in parallel, checks whether or not the length of the the strings of zeros, ones and twos exceed k. Hence the number of the states is 2k + const. ⊓ ⊔ Unfortunately, we have not got any size advantage of pfst vs. dfst. Luckily, we have another example of a relation to be followed, namely, Theorem 3. Theorem 5. 1) The relation Rk ′ = {(w2w, w) : ∃m ≤ k, w ∈ {0, 1}m} can be computed by a pfst of size 2k + const with probability 2/3. 2) For arbitrary deterministic fst computing Rk ′ the number of the states is not less than ak where a is the cardinality of the alphabet for w. This theorem has another disadvantage. The probability of the correct result of the pfst cannot be improved over 2/3. Well, one more idea is possible. We introduce the following auxiliary coding of binary words: 1
2
3
m
code(w1 w2 · · · wm ) = w1 32 w2 32 w3 32 · · · wm 32
Theorem 6. 1) For arbitrary ε > 0 and for arbitrary k the relation Rk ′′ = {(code(w)2code(w), w) : ∃m ≤ k, w ∈ {0, 1}m} can be computed by a pfst of size 2k + const with probability 1 − ε. 2) For arbitrary deterministic fst computing Rk ′′ the number of the states is not less than ak where a is the cardinality of the alphabet for w.
4
Quantum vs. Probabilistic Transducers
After seeing a few examples one might wonder if everything that can be done by a qfst can be done by a pfst. That this is not so is shown as follows: Theorem 7. 1) The relation Rs ′′ = {(0m 1n 2k , 3m ) : n = k ∧ (m = k ∨ m = n) ∧ m ≤ s ∧ n ≤ s ∧ k ≤ s} can be computed by a qfst of size const · s with probability 4/7 − ε, for arbitrary ε > 0, 2) For arbitrary probabilistic fst computing Rs ′′ with probability bounded away from 1/2 the number of the states is not less than ak where a is the cardinality of the alphabet for w. Proof (of 1)): For a natural number l construct the following transducer: from q0 go to one of the states q1 , qj,b (j∈ {0, . . . , l − 1}, b ∈ {1, 2}), with amplitude 3/7 for q1 and with amplitude 2/(7l) each, for the others. Then proceed as follows (we assume the form of the input to be 0m 1n 2k , others are rejected): for q1 output one 3 for each 0, and finally accept. For qj,b repeatedly read l 0’s
160
R. Agadzanyan and R. Freivalds
and output j 3’s (remember the remainder m (mod l)). Then repeatedly read l b’s and output l−j 3’s (output nothing on the (3−b)’s). Compare the remainder with the one remembered, and reject if they are unequal, otherwise output this number of 3’s. Reading $ perform the following unitary on the subspace spanned by the qj,b and duplicate states qj ′ ,b : 1 (j ↔ j ′ ) ⊗ √ 2
1 1 1 −1
.
Accepting are all qj ′ ,2 , rejecting are all qj ′ ,1 . Now assume that the input does not occur as the left member in the relation: this means either m = k and m = n, or m = n = k. In the first case all the outputs in each of the b–branches of the program are of different length, so get amplitude 2/(7l). The final step combines at most two of them, so any output is accepted with probability at most 4/(7l). The second case is more interesting: in all branches the amplitude is concentrated on the output 3m . The rotation V$ however is made such that the amplitude on qj ′ ,2 cancels out, so we end up in a rejecting state qj ′ ,1 . In total, any output is accepted with probability at most 3/7 + ε. On the other hand, if the input occurs as the left member in the relation, exactly one of the two b–branches of the program concentrates all amplitude on output 3m , whereas the other spreads it to l different lengths. This means that the output 3m is accepted with probability at least (l − 1) · 1/(7l), and others are accepted with probability at most 1/(7l) each. In total, the output 3m is accepted with probability at least 4/7 − ε, all others are accepted with probability at most 3/7 + ε. ⊓ ⊔ Proof (of 2)): By contradiction. Suppose Rs ′′ is computed by a pfst T with isolated cutpoint α. The following construction computes it with probability bounded away from 1/2: assuming α ≤ 1/2 (the other case is similar), let p = 1/2−α 1−α . Run one of the following subprograms probabilistically: with probability p output one 3 for each 0, and ignore the other symbols (we may assume that the input has the form 0m 1n 2k ), with probability 1 − p run T on the input. It is easily seen that this new pfst computes the same relation with probability bounded away from 1/2. Hence, we may assume that T computes Rs ′′ with probability ϕ > 1/2, from this we shall derive a contradiction. The state set Q together with any of the stochastic matrices V0 , V1 , V2 is a Markov chain. We shall use the classification of states for finite Markov chains (see [7]): for Vi Q is partitioned into the set Ri of transient states (i.e. the probability to find the process in Ri tends to 0) and a number of sets Sij of ergodic states (i.e. once in Sij the process does not leave this set, and all states inside can be reached from each other, though maybe only by a number of steps). Each Sij is divided further into its cyclic classes Cijν (ν ∈ dij ), Vi mapping Cijν into Cijν+1 . By considering sufficiently high powers Vid (e.g. product of all the periods dij ) as transition matrices, all these cyclic sets become ergodic, in fact, Vid restricted to each is regular.
Size of Quantum Finite State Transducers
161
Using only these powers amounts to concentrating on input of the form 0m 1n 2k , with i = id , which we will do from now on. Relabelling, the ergodic sets of Vi = Vid will be denoted Sij . Each has its unique equilibrium distribution, to which every initial one converges: denote it by πij . Furthermore, there are limit probabilities a(j0 ) to find the process V0 in S0j0 after long time, starting from q0 . Likewise, there are limit probabilities b(j1 |j0 ) to find the process V1 in S1j1 after long time, starting from π0j0 , and similarly c(j2 |j1 ). So, by the law of large numbers, √ √ for large enough m, n, k the probability that V0 has passed into S0j0 after m steps, after which V1√has passed into S1j1 after n steps, after which V2 has passed into S2j2 after k steps, is arbitrarily close to P (j0 , j1 , j2 ) = a(j0 )b(j1 |j0 )c(j2 |j1 ). (Note that these probabilities sum to one). As a consequence of the ergodic theorem (or law of large numbers), see [7], ch. 4.2, in each of these events J = (j0 , j1 , j2 ) the probable number of 3’s written after the final $, is linear in m, n, k: T (3[(1−δ)λJ (m,n,k),(1+δ)λJ (m,n,k)] |0m 1n 2k , J) → 1, as m, n, k → ∞, with λJ (m, n, k) = αJ m + βJ n + γJ k, and non–negative constants αJ , βJ , γJ . Since we require that for k = m T (3dm |0m 1m 2k ) ≥ ϕ, it is necessary that for a set A of events J = (j0 , j1 , j2 ) αJ + βJ = d, γJ = 0, with P (A) ≥ ϕ. In fact, as for J ∈ A
T (3dm|0m 1m 2k , J) → 0
for certain sequences m, k → ∞, we even have P (J)T (3dm |0m 1m 2k , J) ≥ ϕ − o(1). J∈A
For J ∈ A it is obvious that the transducer outputs no more 3’s, once in S2j2 . But this implies that for m, k large enough, T (3dm|0m 1m 2k , J) is arbitrarily close to T (3dm |0m 1m 2m , J), hence T (3dm |0m 1m 2m ) ≥ ϕ − o(1), which implies that T (3dm |0m 1m 2m ) ≥ ϕ,
contradicting (0dm 1dm 2dm , 3dm ) ∈ Rs ′′. ⊓ ⊔ In general however, computing with isolated cutpoint is strictly weaker than with probability bounded away from 1/2 (observe that for finite automata, probabilistic and quantum) recognizability with an isolated cutpoint is equivalent to recognizability with probability bounded away from 1/2. ⊓ ⊔
162
R. Agadzanyan and R. Freivalds
Theorem 8. The relation Rs ′′′ = {(0m 1n a, 4l ) : m ≤ s ∧ n ≤ s ∧ (a = 2 → l = m) ∧ (a = 3 → l = n)} can be computed by a pfst and by a qfst of size s + const with an isolated cutpoint (in fact, one arbitrarily close to 1/2), but not with a probability bounded away from 1/2. Proof: First the construction (again, only√for qfst): initially branch into two possibilities c0 , c1 , each with amplitude 1/ 2. Assume that the input is of the correct form (otherwise reject), and in state ci output one 4 for each i, ignoring the (1 − i)’s. Then, if a = 2 + i, accept, if a = 3 − i, reject. It is easily seen that 4l is accepted with probability 1/2 if (0m 1n a, 4l ) ∈ Rs ′′′, and with probability 0 otherwise. That this cannot be done with probability above 1/2 is clear intuitively: the machine has to produce some output (because of memory limitations), but whether to output 4m or 4n it cannot decide until seeing the last symbol. Formally, assume that |m − n| > 4t, with t = maxa,q |fa (q)|. If T‡0m 1n 2$ ((q0 , ǫ), 0, 0)[4m ] = T (4m |0m 1n 2) ≥ 1/2 + δ, necessarily T‡0m 1n ((q0 , ǫ), 0, 0)[4m ] + T‡0m 1n ((q0 , ǫ), 0, 0)[Qnon × 4[m−2t,m+2t] ] ≥ 1/2 + δ. But this implies T‡0m 1n ((q0 , ǫ), 0, 0)[4n ] + T‡0m 1n ((q0 , ǫ), 0, 0)[Qnon × 4[n−2t,n+2t] ] ≤ 1/2 − δ, hence T‡0m 1n 3$ ((q0 , ǫ), 0, 0)[4n ] = T (4n |0m 1n 3) ≤ 1/2 − δ, contradicting (0m 1n 3, 4n ) ∈ Rs ′′′.
⊓ ⊔
References 1. Amano, M. and Iwama, K.: Undecidability on Quantum Finite Automata. In Proc. 31st STOC (1999) 368–375 2. Ambainis, A. and Freivalds, R.: 1–Way Quantum Finite Automata: Strengths, Weaknesses, and Generalizations. In Proc. 39th FOCS (1998) 332–341 3. Bonner, R., Freivalds, R., and Gailis, R.: Undecidability of 2–Tape Quantum Finite Automata. In Proceedings of Quantum Computation and Learning. Sundbyholms Slott, Sweden, 27–29 May, 2000, R. Bonner and R. Freivalds (eds.), Malardalen University (2000) 93–100 4. Freivalds, R.: Language Recognition Using Finite Probabilistic Multitape and Multihead Automata. Problems Inform. Transmission, 15 3 (1979) 235–241 5. Freivalds, R. and Winter, A.: Quantum Finite State Transducers. Springer, Lecture Notes in Computer Science 2234 (2001) 233–242
Size of Quantum Finite State Transducers
163
6. Gurari, E.: Introduction to the Theory of Computation. Computer Science Press (1989) 7. Kemeny, J.G. and Snell, J.L.: Finite Markov Chains. Van Nostrand, Princeton (1960) 8. Kondacs, A. and Watrous, J.: On the Power of Quantum Finite State Automata. In Proc. 38th FOCS (1997) 66–75 9. Rabin, M.O.: Probabilistic Automata. Information and Control, 6 (1963) 230–245
Weighted Nearest Neighbor Algorithms for the Graph Exploration Problem on Cycles⋆ Yuichi Asahiro1 , Eiji Miyano2 , Shuichi Miyazaki3, and Takuro Yoshimuta2 1
3
Department of Social Information Systems, Kyushu Sangyo University Fukuoka 813-8503, Japan
[email protected] 2 Department of Systems Innovation and Informatics Kyushu Institute of Technology, Fukuoka 820-8502, Japan
[email protected],
[email protected] Academic Center for Computing and Media Studies, Kyoto University Kyoto 606-8501, Japan
[email protected]
Abstract. In the graph exploration problem, a searcher explores the whole set of nodes of an unknown graph. The searcher is not aware of the existence of an edge until he/she visits one of its endpoints. The searcher’s task is to visit all the nodes and go back to the starting node by traveling as a short tour as possible. One of the simplest strategies is the nearest neighbor algorithm (NN), which always chooses the unvisited node nearest to the searcher’s current position. The weighted NN (WNN) is an extension of NN, which chooses the next node to visit by using the weighted distance. It is known that WNN with weight 3 is 16-competitive for planar graphs. In this paper we prove that NN achieves the competitive ratio of 1.5 for cycles. In addition, we show that the analysis for the competitive ratio of NN is tight by providing an instance for which the bound of 1.5 is attained, and NN is the best for cycles among WNN with all possible weights. Furthermore, we prove that no online algorithm is better than 1.25-competitive.
1
Introduction
The traveling salesperson problem (TSP) is one of the most popular problems in the fields of operations research and combinatorial optimization. In TSP, the complete environment of an instance such as the number of nodes, the length of edges, and the topology is available for the salesperson to determine his/her tour. The goal is to minimize the total distance traveled. However, sometimes ⋆
Supported in part by the Grant-in-Aid for Scientific Research on Priority Areas 16092215 and 16092223, and for Scientific Research for Young Scientists 17700015, 17700022 and 18700015 from the Japanese Ministry of Education, Science, Sports and Culture.
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 164–175, 2007. c Springer-Verlag Berlin Heidelberg 2007
Weighted Nearest Neighbor Algorithms
165
this offline model does not reflect the real-world situation. In many routing and scheduling applications, the whole information about the environment in which routing takes place is not available in advance; a partial information is only given in an online fashion. Such online routing problems to be solved in the unknown environment are known as the exploration or the map construction problems [3]. In this paper we consider the following variant of the graph exploration problems, which was originally formulated by Kalyanasundaram and Pruhs in [11]: Suppose that a searcher (called salesperson, robot, or agent in some applications) has to construct a complete map of an unknown environment by traveling as a short tour as possible. In many situations, the unknown environment can be modeled by an edge-weighted undirected graph G. The searcher begins at a special node called the origin. When the searcher visits a node v, it learns its neighbor objects, that is, nodes and edges adjacent to v. To explore the whole environment, the searcher must visit all nodes eventually, because they may have unknown neighbor objects. We assume that the searcher has a sufficiently large memory and thus it can identify nodes already visited when it observes them again at a later point in time. After visiting all the nodes, the searcher finally has to go back to the origin. The quality of an online algorithm is measured by the competitive ratio [8] which is the worst-case ratio of the total distance of the tour traveled by the online algorithm to the distance of the shortest tour that can visit all the nodes of G and return to the origin. At the beginning of an online algorithm, all nodes except for the origin are unvisited or new. In the next step, one of the neighbor nodes of the origin is selected and visited, and as the algorithm progresses the explored area thus becomes larger. Roughly speaking, the main task of the algorithm is to determine which new node neighboring to the explored area should be visited in a particular step by using the obtained knowledge. One of the simple strategies is the nearest neighbor algorithm (NN), which always chooses the unvisited node nearest to the current position. However, its competitive ratio rises up to Ω(log n) even for planar graphs [15], where n is the number of nodes. The depth-first algorithm is also popular but not competitive even for cycles. Kalyanasundaram and Pruhs [11] were the first to provide a competitive algorithm. They proposed the weighted nearest neighbor algorithm (WNN), called ShortCut, and showed that it achieves the competitive ratio of 16 for planar graphs. WNN is parameterized by a fixed positive constant Δ. It chooses the next new node to visit by using the weighted distance: See Fig. 1. Suppose that we are currently at a node u and there is a new node v directly connected to u by an edge (u, v) of length ℓ. Also suppose that to visit another new node w we have to traverse the already explored path of length ℓ′ and the edge of length ℓ′′ connecting to w. In the weighted nearest neighbor scenario we visit v if Δℓ ≤ ℓ′ + ℓ′′ , otherwise, visit w. The above competitive ratio of 16 is obtained by setting Δ = 3, which implies that, somewhat surprisingly, the so-called backwardfirst algorithm provides us a good competitive ratio.
166
Y. Asahiro et al.
explored area w
u
l
v
l’’ l’ Fig. 1. Weighted nearest neighbor algorithm
1.1
Our Contributions
One can easily see that if the graph G is a tree, then we can obtain an optimal tour by the depth-first algorithm. However, if the graph includes cycles, the problem would become non-trivial. In this paper we apply WNN to cycles and investigate its ability more intensively. Our results are summarized as follows: – NN (i.e., WNN with Δ = 1) achieves the competitive ratio of 1.5. – Our analysis of the 1.5-competitive ratio is tight since we can provide an instance for which the bound of 1.5 is attained. – Setting Δ = 1 for WNN is the best, i.e., if Δ > 1, or even 0 < Δ < 1, then the competitive ratio of WNN is at least 1.5. – No deterministic online algorithm has a competitive ratio less than 1.25. 1.2
Related Work
Exploration problems in an unknown environment have been studied extensively in the past. The seminal work of [14] established theoretical studies of exploration problems in unknown geometric and graph environments. Papadimitriou and Yannakakis [14] considered the problem of finding a shortest path between specified two nodes in an unknown environment, and provided several variants of this problem. Deng and Papadimitriou [5], Albers and Henzinger [1], and Fleischer and Trippen [7] considered the problem of exploring an unknown graph. It should be noted that their goal is to minimize the number of edge traversals, not to minimize the distance of the searcher’s tour. They reported the relationship between the competitive ratio and the deficiency of the graph, which is the minimum number of edges that must be added to make the graph Eulerian. Deng, Kameda, and Papadimitriou [6], Kleinberg [12], and Hoffmann, Icking, Klein, and Kriegel [9] presented online strategies that enable a mobile robot with vision to explore an unknown polygon (with or without holes). Subsequent to the work in [9], Hoffmann et al. [10] proved that the online searcher’s tour is less than 26.5 times as long as the shortest watchman tour. Recently, several papers have also dealt with other online formulations of the traveling salesperson problem and online routing problems (e.g., [2,4,13]).
Weighted Nearest Neighbor Algorithms
2
167
Models and Algorithms
We assume that an (unknown) environment is an edge-weighted undirected graph G = (V, E, ℓ), where V , E and ℓ denote a set of nodes, a set of edges, and a positive edge-weight function ℓ : E → R+ , respectively. We denote an undirected edge between nodes u and v by (u, v), with ℓ(u, v) being the length of (u, v). Let ℓmax and L be the maximum length of edges and the total length Σ(u,v)∈E ℓ(u, v) of edges, respectively. Let a simple path of length k be denoted by a sequence v0 , v1 , v2 , · · · , vk of k + 1 nodes. Similarly, let a cycle be denoted by v0 , v1 , · · · , vk−1 , v0 . For a path P = v0 , v1 , . . . , vk , P denotes the path in reverse order of P , i.e., vk , . . . , v1 , v0 . If a searcher starts from a node v1 and moves to a node v3 through an intermediate node v2 , then the searcher’s tour is denoted by v1 , · · · , v2 , · · · , v3 . Now our problem is formulated as follows: Online Graph Exploration Initial information: Only the partial information of graph G is given to a searcher: The origin o, its neighbor nodes, and the length of the edge (o, u) for each neighbor node u. Online information: When the searcher visits a node u, its neighbor nodes v’s, and the length of edges (u, v)’s are obtained. Goal: Find a tour T of minimum total length, beginning at the origin o, visiting all the nodes of G, and finally returning to the origin o. We assume that the searcher has a sufficiently large memory and can memorize a map of the subgraph induced by nodes already visited. Since the searcher lacks the complete information of the environment, it is generally not possible to find an optimal tour. Thus, the searcher’s goal is to construct a tour that is as close to the optimal tour as possible. The performance of online algorithms is commonly analyzed by the competitive analysis: Let ALG(G) denote the total length of the exploring tour of G taken by algorithm ALG, and let OPT(G) denote the total length of the shortest exploring tour taken by an optimal offline algorithm OPT. We say that ALG is α-competitive for a class of graphs G if ALG(G)/OPT(G) ≤ α for all graphs G ∈ G. The ratio α is called the competitive ratio. We say that ALG is competitive if ALG is c-competitive for some constant c. To explain the algorithm, we follow the terms on statuses of nodes and edges introduced in [11]: Definition 1 (Node status). Each node is classified in one of the three statuses: (i) Visited: A visited node is one that has been visited by the searcher. (ii) Boundary: A boundary node is an unvisited node adjacent to a visited one. (iii) Unknown: An unknown node is one that the searcher has not yet seen. Definition 2 (Edge status). Each edge is classified in one of the three statuses: (i) Explored: An edge is explored if both endpoints are visited. (ii) Boundary: A boundary edge is one for which exactly one endpoint is visited.(iii) Unknown: An unknown edge is one for which neither endpoint is visited.
168
Y. Asahiro et al.
Whenever we refer to a boundary edge, say, (u, v), the first node u is a visited one. Let d(x, y) denote the length of the shortest path including only explored and boundary edges between nodes x and y. The following definition is formulated also in [11]. Definition 3 (Block condition). When the searcher is at node u, a boundary edge (x, y) blocks a boundary edge (u, v) if both of the following two inequalities hold for some positive constant Δ (which is determined in the algorithm): ℓ(x, y) < ℓ(u, v) d(u, x) + ℓ(x, y) < Δ · ℓ(u, v).
(1) (2)
Such a boundary edge (x, y) is called a block edge for (u, v). If there are no block edges for (u, v), then (u, v) is block-free. It is obvious that for Δ ≤ 1 the first inequality holds whenever the second one is satisfied. Now we present the weighted nearest neighbor algorithm (WNN) formally: At the beginning, the searcher knows only the origin o and boundary edges adjacent to o. Then, the searcher selects one of the boundary edges and traverses it. Similarly, at any time, the searcher has to determine which boundary edge should be traversed according to the current knowledge and some policy. In the following algorithm the searcher always selects either the minimum block-free edge or the minimum block edge: Algorithm WNNΔ WNNΔ consists of n iterations. For i = 1, · · · , n−1, the i-th step is as follows: Step i: Suppose that the searcher is currently staying at a node u. Then, either (i) or (ii) is executed: (i) If there are block-free edges, then the searcher selects the minimum block-free edge (u, v) and visits the boundary node v. (ii) If there are no block-free edges, then the searcher finds a boundary edge (x, y) such that {d(u, x) + ℓ(x, y)} is the minimum among all boundary edges, and visits the boundary node y. Step n: The searcher finally returns to the origin o along the shortest path, and the algorithm terminates. Note that WNN1 represents the nearest neighbor algorithm. Recall that WNNΔ (G) denotes the total length of the tour taken by WNNΔ . If the graph environment includes no cycles, then the exploration is easy; the depth-first algorithm does work optimally: Proposition 1. The depth-first algorithm is optimal, i.e., 1-competitive for trees. Unfortunately, one can easily see that the depth-first algorithm is not competitive even for cycles. The nearest neighbor algorithm WNN1 also has the following lower bound: Proposition 2 ([15]). There is a planar graph G for which WNN1 (G) = Ω(log n) · OPT(G). However, as mentioned in Sect. 1, WNN3 is competitive for planar graphs: Proposition 3 ([11]). WNN3 is 16-competitive for planar graphs.
Weighted Nearest Neighbor Algorithms
3
169
Competitive Ratio of WNN for Cycles
Let Cn = (V, E, ℓ) be a cycle with n nodes and n edges. Recall that L is the sum of the length of all edges in E, and ℓmax is the maximum edge length. An optimal tour has the following trivial properties: Proposition 4. For a cycle Cn and an optimal offline algorithm OPT, OPT(G) = L if ℓmax ≤ L2 ; otherwise, i.e., if ℓmax > L2 , OPT(G) = 2(L − ℓmax ). Proof. We can see that every tour forms either a simple cycle including all n edges, or a U-turn tour including n−1 different edges. For the case that ℓmax ≤ L2 , an optimal offline algorithm OPT traverses every edge clockwise (or counterclockwise) and hence the total length is L because L = 2(L − L2 ) < 2(L − ℓmax ). For the case that ℓmax > L2 , each edge in the minimum spanning path is included exactly twice in an optimal tour; the total length is 2(L − ℓmax ) < L. ⊓ ⊔ Theorem 1. WNN1 is 1.5-competitive for cycles. Proof. We consider two cases: (Case 1) ℓmax ≤ L2 , and (Case 2) ℓmax > L2 . (Case 1) Recall that WNN1 has n steps, denoted by Step 1, . . ., Step n. For each i (1 ≤ i ≤ n), let di denote the moving distance of the searcher at Step i. n That is, the total distance WNN1 (Cn ) is Σi=1 di . Observe that at each Step 1 through n−1, there are two boundary edges. For each i (2 ≤ i ≤ n−1), define the forward boundary edge at Step i, denoted by fi , to be the boundary edge (u, v), where u is the node the searcher currently occupies. f1 will be determined soon. Let us first consider Step 1. The searcher is at the origin o, and there are two boundary nodes, say, v1 and v2 connected to o by edges (o, v1 ) and (o, v2 ), respectively. Suppose that v1 is the nearest neighbor of o and thus the searcher selects (o, v1 ). Then, define f1 = (o, v1 ). Hence d1 = ℓ(f1 ). Next, let us consider Step i (2 ≤ i ≤ n − 1). Suppose again that u is the node that the searcher currently occupies. Let v and y be boundary nodes, such that (u, v) and (x, y) are boundary edges and (u, v) = fi . There are two possibilities for selecting the next node to visit, but it is easy to see by Block condition (2) that in either case, di ≤ ℓ(fi ). Finally, consider Step n. At this moment, all edges are known to the searcher, and hence the searcher returns to the origin o along the shortest of the clockwise and the counterclockwise tours. Thus, dn is clearly at most L2 . It is easy to see that each edge can be fi for at most one i, i.e., fi = fj for i = j. Hence d1 + d2 + · · · + dn−1 + dn ≤ ℓ(f1 ) + ℓ(f2 ) + · · · + ℓ(fn−1 ) + dn ≤ L + dn 3 ≤ L, 2 and the cost of the optimal tour is L by Proposition 4. This completes Case 1. (Case 2) The analysis is almost the same as Case 1, but we need to treat well the edge of the maximum length. Let (x, y) be this edge, namely, ℓ(x, y) = ℓmax ,
170
Y. Asahiro et al.
lmax x
u v
y
explored Fig. 2. Case 2 in the proof of Theorem 1
and suppose that (x, y) becomes the forward boundary edge at Step k, i.e., (x, y) = fk . For Steps 1 through k − 1, we use the same arguments as Case 1, so that di ≤ ℓ(fi ) for 1 ≤ i ≤ k − 1. At the beginning of Step k, the searcher is at x, there are two boundary edges (x, y) and (u, v), and fk = (x, y) (see Fig. 2). By Block condition (2), the searcher next visits the node v. Hence, dk = d(x, u) + ℓ(u, v). At Step i (k + 1 ≤ i ≤ n − 1), the searcher explores fi , again by Block condition (2). Hence, di = ℓ(fi ) for k + 1 ≤ i ≤ n − 1. It turns out that d1 + d2 + · · · + dn−1 ≤ (ℓ(f1 ) + ℓ(f2 ) + · · · + ℓ(fk−1 )) + (d(x, u) + ℓ(u, v)) +(ℓ(fk+1 ) + ℓ(fk+2 ) + · · · + ℓ(fn−1 )) ≤ 2(L − ℓmax ) since ℓ(f1 ) + ℓ(f2 ) + · · · + ℓ(fk−1 ) + ℓ(fk+1 ) + ℓ(fk+2 ) + · · · + ℓ(fn−1 ) ≤ L − ℓmax and d(x, u) + ℓ(u, v) ≤ L − ℓmax . Finally, at Step n, the searcher returns to the origin o by taking the distance of at most L − ℓmax. As a result, the total distance is at most 3(L − ℓmax), while ⊓ ⊔ the optimal one is 2(L − ℓmax ). Theorem 2. For any Δ, the competitive ratio of WNNΔ is at least 1.5. This theorem is shown by the following two lemmas 1 and 2. Lemma 1. For 0 < Δ < 1, the competitive ratio of WNNΔ exceeds 1.5. Proof. We construct a cycle G for which WNNΔ attains a competitive ratio more than 1.5 for each 0 < Δ < 1. As illustrated in Fig. 3, the cycle G consists of three edges (o, x), (x, y), (y, z), and a path P = o, q, p1 , p2 , . . . , pk = z, where k is some positive integer which satisfies k > 2Δ(3Δ − 1)/(1 − Δ)2 , and o is the origin. Let L1 = (3k + (6 − k)Δ)/Δ2 , L2 = (k + 2Δ − Δ2 )/Δ2 , and L3 = k/Δ. As for the length of each edge, we set ℓ(o, x) = 1, ℓ(x, y) = L1 , ℓ(y, z) = L2 , ℓ(o, q) = ε, where 0 < ε < 1, ℓ(q, p1 ) = 1/Δ − ε, and ℓ(pi , pi+1 ) = 1/Δ for 1 ≤ i ≤ k − 1. Therefore, the length of P is L3 , the maximum length ℓmax is L1 , and the total length of edges is L = L1 + L2 + L3 + 1 = 4(k + 2Δ)/Δ2 . Note that ℓmax = L1 > L/2. First we consider the tour obtained by WNNΔ . Initially, two edges (o, x) and (o, q) are boundary edges. Since (o, q) is block-free, the searcher explores (o, q)
Weighted Nearest Neighbor Algorithms
171
L1 x
o 1 WNNΔ
ε
q
p1
1/Δ - ε
p2
y
z L2
1/Δ
L3
Fig. 3. Lower bound for WNNΔ
< 1
to visit q. When the searcher arrives at the node q, the boundary edges are (q, p1 ) and (o, x). Since d(q, o) + ℓ(o, x) = ε + 1 > Δℓ(q, p1 ) = 1 − Δε, (q, p1 ) is block-free, and therefore the searcher explores (q, p1 ) in the next step. Repeating similarly, the searcher explores all the edges in P exactly once and then arrives at the node z. One can see that ℓ(o, x) = 1 < L2 = ℓ(z, y) and since Δ < 1, d(z, o) + ℓ(o, x) = L3 + 1 k +1 = Δ k +2−Δ < Δ = ΔL2 . Namely, since the conditions (1) and (2) in Definition 3 hold, the edge (o, x) blocks the edge (z, y). Therefore the searcher moves to the node x through the reverse path P of P . We show that the edge (x, y) is block-free when the searcher arrives at the node x, that is, Δℓ(x, y) < d(x, z)+ℓ(z, y). Recall that ℓ(x, y) = L1 and d(x, z) + ℓ(z, y) = 1 + L3 + L2 = L − L1 . d(x, z) + ℓ(z, y) − Δℓ(x, y) = L − (1 + Δ)L1 3k + (6 − k)Δ 4k + 8Δ − (1 + Δ) · = 2 Δ Δ2 2 (1 − Δ) k − 2Δ(3Δ − 1) = Δ2 > 0, where the last inequality holds since k > 2Δ(3Δ−1)/(1−Δ)2 . Hence the searcher moves to the node y by exploring the edge (x, y) in the next. At this moment, all the nodes are visited and the searcher has to go back to the origin o. Since L1 > L/2, the returning tour must be y, z, pk−1 , . . . , p1 , q, o. In summary, by WNNΔ , the searcher moves starting from o as follows: (i) First it moves to z through the path P , (ii) moves to x through the path P and the edge (o, x), (iii) moves to y through the edge (x, y), and finally (iv) goes back to o through the edge (y, z) and the path P . The length of the sub-tours of (i),
172
Y. Asahiro et al.
2 am
a1
o 1- ε
1
b1
a2
4 b2
bm
Fig. 4. Lower bound for WNNΔ≥1
(ii), (iii) and (iv) are L3 , L3 + 1, L1 , and L2 + L3 , respectively, and hence the total length WNNΔ (G) of the tour for G is 3L3 + L1 + L2 + 1 = 2+Δ 2 L − 4. Let us proceed to the analysis for the competitiveness. From Proposition 4, the length of the optimal tour is 2(L − L1 ) = 1+Δ 2 L − 4 since ℓmax = L1 > L/2. Therefore, (2 + Δ)L − 8 2+Δ WNNΔ (G) = > > 1.5, OPT(G) (1 + Δ)L − 8 1+Δ where the last inequality holds since Δ < 1. This completes the proof.
⊓ ⊔
Lemma 2. For 1 ≤ Δ, the competitive ratio of WNNΔ is at least 1.5. Proof. We construct a cycle G for which WNNΔ attains a competitive ratio at least 1.5 for each Δ ≥ 1. As illustrated in Fig. 4, the cycle G is denoted by o, a1 , a2 , · · · , am , bm , bm−1 , · · · , b1 , o with 2m + 1 nodes and 2m + 1 edges for some large m. The length of each edge is as follows: ℓ(o, a1 ) = 1 − ε, where ε is a small positive constant, ℓ(ai , ai+1 ) = 22i−1 for 1 ≤ i ≤ m − 1, ℓ(o, b1 ) = 1, ℓ(bi , bi+1 ) = 22i for 1 ≤ i ≤ m − 1, and ℓ(am , bm ) = (22m−1 − 2)/3 + ε. One can verify that the total length is L = (22m+1 − 2)/3 and the maximum length is ℓmax = ℓ(bm−1 , bm ) ≤ L/2. At the first step of WNNΔ , the edge (o, a1 ) is block-free, and so the searcher visits a1 . Currently, two edges (o, b1 ) and (a1 , a2 ) are the boundary edges. Since ℓ(o, b1 ) < ℓ(a1 , a2 ) and d(a1 , o) + ℓ(o, b1 ) < Δℓ(a1 , a2 ), the searcher visits b1 at the next step. Generally, if the searcher is currently on ai , then the searcher visits bi at the next step since ℓ(bi−1 , bi ) < ℓ(ai , ai+1 ) is true for those two boundary edges and the following inequality holds since Δ ≥ 1 by the assumption: d(ai , bi−1 ) + ℓ(bi−1 , bi ) = (1 − ε) + 1 + 21 + 22 + · · · + 22i−2 = 22i−1 − ε < Δ22i−1 = Δℓ(ai , ai+1 ). Similarly, if the searcher’s current node is bi , then the next one is ai+1 from two inequalities ℓ(ai , ai+1 ) < ℓ(bi , bi+1 ) and d(bi , ai ) + ℓ(ai , ai+1 ) < Δℓ(bi , bi+1 ) on
Weighted Nearest Neighbor Algorithms
173
the boundary edges (ai , ai+1 ) and (bi , bi+1 ). Therefore, the distance of the “snakelike” tour o, a1 , o, b1 , o, a1 , a2 , a1 , o, b1 , b2 , b1 , o, · · · , bm−1 , · · · , o, · · · , am from the origin o to the leftmost node am is (1 − ε) + (2 − ε) + (4 − ε) + · · · + (22m−2 − ε) =
2m−2
2i − (2m − 1)ε
i=0 2m−1
=2
− 1 − (2m − 1)ε.
The final node to be visited is bm and the boundary edges are (am , bm ) and (bm−1 , bm ) at this moment. Since ℓ(am , bm ) ≤ ℓ(bm−1 , bm ) that does not satisfy the condition (1) of Definition 3, the edge (am , bm ) is block-free. Hence the searcher visits bm by exploring the edge (am , bm ) at the next step. The final part of the tour is to return to the origin o from the node bm . Since d(bm , o) = ℓ(o, b1 ) + ℓ(b1 , b2 ) + · · · + ℓ(bm−1 , bm ) =
m−1
22i
i=0
22m − 1 = 3 L = , 2 the length of the returning tour is (22m − 1)/3. Therefore the total length of the tour obtained by WNNΔ is WNNΔ (G) = (22m−1 − 1 − (2m − 1)ε) + (
22m − 1 22m−1 − 2 + ε) + ( ) 3 3
= 22m − 2 − 2(m − 1)ε. On the other hand, OPT(G) = L = (22m+1 − 2)/3 from Proposition 4, since ℓmax = ℓ(bm−1 , bm ) ≤ L/2. Therefore, 3(22m − 2 − 2(m − 1)ε) WNNΔ (G) = = 1.5 − o(1). OPT(G) 2(22m − 1) This completes the proof.
4
⊓ ⊔
Lower Bound for Deterministic Algorithms
In this section we show a lower bound on the competitive ratio of 1.25 for deterministic graph exploration algorithms by constructing a pair of cycles as an adversary: Theorem 3. No online graph exploration algorithm has a competitive ratio less than 1.25.
174
Y. Asahiro et al.
o v1
1 3
o 1
v2 (a)
3
v3
v1
1 3
1
v2
v3
ε0
(b)
Fig. 5. Lower bound for deterministic algorithms: (a) C4a , and (b) Cb4
Proof. Consider two cycles with four nodes, (a) C4a and (b) C4b , illustrated in Fig. 5; ℓ(v2 , v3 ) of C4a is 3 but ℓ(v2 , v3 ) of C4b is ε0 , where ε0 is a small positive constant. Then an optimal algorithm OPT for C4a starts at the origin o and explores v1 , v2 , and v3 in this order, and finally returns to o. The total length OPT(C4a ) is 8. For C4b , an optimal tour is o, v1 , o, v3 , v2 , v3 , o, and hence OPT(C4b ) = 4 + 2ε0 . Let ALG be an arbitrary deterministic online algorithm. When C4a or C4b is given to ALG, a searcher of ALG is at the origin o and knows two boundary edges (o, v1 ) and (o, v3 ), both of which have length 1. Without loss of generality, we can assume that ALG always visits v1 for the first step. At this moment, ALG is at v1 , but cannot distinguish C4a and C4b . There are two boundary nodes v2 and v3 . We prepare two scenarios depending on ALG’s next action. If ALG chooses v2 at the next step, then the adversary reveals C4b . Then the possible shortest tour of ALG is o, v1 , v2 , v3 , o, and so ALG(C4b ) = 5 + ε0 . Hence the competitive ratio of ALG is at least (5 + ε0 )/(4 + 2ε0 ) = 1.25 − ε, where ε = 3ε0 /(8 + 4ε0 ). On the other hand, if ALG chooses v3 , the adversary reveals C4a . Then, the possible shortest tour of ALG is o, v1 , o, v3 , v2 , v3 , o, and so, ALG(C4a ) = 10. Therefore, the competitive ratio of ALG is at least 10/8 = 1.25. ⊓ ⊔
References 1. Albers, S. and Henzinger, M.R.: Exploring Unknown Environments. SIAM J. Computing 29 4 (2000) 1164–1188 2. Ausiello, G., Feuerstein, E., Leonardi, S., Stougie, L., and Talamo, M.; Algorithms for the On-Line Traveling Salesman. Algorithmica 29 4 (2001) 560–581 3. Berman, P.: On-Line Searching and Navigation. In Online Algorithms: The State of the Art, Fiat and Woeginger (eds), Springer (1998) 232–241 4. Bose, P., Brodnik, A., Carlsson, S., Demaine, E.D., Fleischer, R., pez-Ortiz, A.L., Morin, P., and Munro, J.I.: Online Routing in Convex Subdivisions. International J. Computational Geometry and Applications 12 4 (2002) 283–296 5. Deng, X. and Papadimitriou, C.H.: Exploring an Unknown Graph. In Proc. 31st Annual Symposium on Foundations of Computer Science (1990) 355–361 6. Deng, X., Kameda, T., and Papadimitriou, C.H.: How to Learn an Unknown Environment. In Proc. 32nd Annual Symposium on Foundations of Computer Science (1991) 298–303
Weighted Nearest Neighbor Algorithms
175
7. Fleischer, R. and Trippen, G.: Exploring an Unknown Graph Efficiently. In Proc. 13th Annual European Symposium on Algorithms (2005) 11–22 8. Fiat, A. and Woeginger, G.J.: Competitive Analysis of Algorithms. In Online Algorithms: The State of the Art, Fiat and Woeginger (eds), Springer, (1998) 1–12 9. Hoffmann, F., Icking, C., Klein, R., amd Kriegel, K.: A Competitive Strategy for Learning a Polygon. In Proc. 8th Annual ACM-SIAM Symposium on Discrete Algorithms (1997) 166–174 10. Hoffmann, F., Icking, C., Klein, R., and Kriegel, K.: The Polygon Exploration Problem. SIAM J. Computing 31 2 (2001) 577–600 11. Kalyanasundaram, B. and Pruhs, K.R.: Constructing Competitive Tours from Local Information. Theoretical Computer Science 130 (1994) 125–138 12. Kleinberg, J.M.: On-Line Search in a Simple Polygon. In Proc. 5th Annual ACMSIAM Symposium on Discrete Algorithms (1994) 8–15 13. Kranakis, E., Singh, H., and Urrutia, J.: Compass Routing on Geometric Networks. In Proc. 11th Canadian Conference on Computational Geometry (1999) 51–54 14. Papadimitriou, C.H. and Yannakakis, M.: Shortest Paths without a Map. Theoretical Computer Science 84 (1991) 127–150 15. Rosenkrantz, D.J., Stearns, R.E., and Lewis, P.M.: An Analysis of Several Heuristics for the Traveling Salesman Problem. SIAM J. Computing 6 3 (1997) 563–581
Straightening Drawings of Clustered Hierarchical Graphs⋆ Sergey Bereg1 , Markus V¨ olker2,⋆⋆ , Alexander Wolff2,⋆ , and Yuanyi Zhang1 1
Dept. of Computer Science, University of Texas at Dallas, U.S.A. {besp,yzhang}@utdallas.edu 2 Fakult¨ at f¨ ur Informatik, Universit¨ at Karlsruhe, Germany
[email protected] http://i11www.ira.uka.de/people/awolff
Abstract. In this paper we deal with making drawings of clustered hierarchical graphs nicer. Given a planar graph G = (V, E) with an assignment of the vertices to horizontal layers, a plane drawing of G (with y-monotone edges) can be specified by stating for each layer the order of the vertices lying on and the edges intersecting that layer. Given these orders and a recursive partition of the vertices into clusters, we want to draw G such that (i) edges are straight-line segments, (ii) clusters lie in disjoint convex regions, (iii) no edge intersects a cluster boundary twice. First we investigate fast algorithms that produce drawings of the above type if the clustering fulfills certain conditions. We give two fast algorithms with different preconditions. Second we give a linear programming (LP) formulation that always yields a drawing that fulfills the above three requirements—if such a drawing exists. The size of our LP formulation is linear in the size of the graph.
1
Introduction
A graph is often associated with structural information that needs to be made explicit when drawing the graph. There are many ways in which structure can be given, but usually it comes in one of two ways: clusters or hierarchies. A clustering of a graph is a (possibly recursive) partition of the vertex set into so-called clusters. The vertices in the same cluster are interpreted as being similar or close, those in different clusters as different or far from each other in some sense. It is common to visualize disjoint clusters by placing their vertices in disjoint convex regions. For example in the Ptolemy II project (heterogeneous modeling, simulation, and design of concurrent systems), clustered graphs are used to represent (possibly nested) parts of embedded systems, see Fig. 1. Hierarchies also partition the vertex set, but not according to proximity, but according to rank. The rank of a vertex reflects its importance or status in relationship to vertices of lower or higher rank. Usually vertices of equal rank are placed on horizontal ⋆ ⋆⋆
A preliminary version of this work was presented as a poster at SOFSEM’06. Supported by grant WO 758/4-2 of the German Research Foundation (DFG).
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 176–187, 2007. c Springer-Verlag Berlin Heidelberg 2007
Straightening Drawings of Clustered Hierarchical Graphs
Fig. 1. A Ptolemy-II model: vertices represent actors, edges communication
177
Fig. 2. The organigram of Hogeschool Limburg: vertices represent administrative entities, edges interaction
lines, to which we refer as layers. Examples of hierarchical graphs are so-called organigrams that are used to represent the structure of organizations, see Fig. 2. For both clustered and hierarchical graphs there is an abundance of literature. Brockenauer and Cornelsen give an overview [1]. In this paper we deal with graphs that have both clusters and a hierarchy. This makes it possible to visualize two different graph structures at the same time. The challenging question is how to overlay these different and independent structures. We model the problem as follows. Given a planar graph G = (V, E) with an assignment of the vertices to horizontal layers, a plane drawing of G (with polygonal or y-monotone edges) can be specified by stating for each layer the order of the vertices lying on and the edges intersecting that layer. Given these orders and a recursive partition of the vertices into clusters, our aim is to draw G such that (i) edges are straight-line segments, (ii) clusters lie in disjoint convex regions, and (iii) no edge intersects a cluster boundary twice. Our first contribution consists of two fast algorithms that draw clustered hierarchical graphs if certain preconditions are met. Both algorithms require that the left-to-right ordering of the clusters is consistent, i.e., the precedence relationship of the clusters is the same over all layers. The first algorithm runs in O(n2 ) time and additionally relies on the cluster adjacency graph (to be defined later) being acyclic, see Section 3. The second algorithm runs in linear time and requires that clusters can be separated by y-monotone paths, see Section 4. The preconditions for both algorithms can be tested in linear time. Our second contribution is a linear programming (LP) formulation that always yields a drawing if a drawing with straight-line edges and non-intersecting convex cluster regions exists, see Section 5. The number of variables and constraints in our LP formulation is linear in the size of the graph. If either of the above-mentioned constraints is satisfied, the existence of the corresponding algorithm shows that the LP formulation also yields a drawing. The LP is obviously less efficient than the above algorithms, but it is more general, more flexible, and yields nicer results due to global optimization. The LP allows the user to incorporate esthetic criteria. For example, one can use additional constraints to enforce minimum vertex-vertex distances. We also suggest two different objective
178
S. Bereg et al.
functions; one minimizes the width of the drawing, the other tries to maximize the angular resolution. The LP can also draw non-planar graphs; it keeps exactly the crossings of the input graph. We extend the basic LP to be able to process rectangular vertices as in Fig. 2. We have implemented the LP and applied it to a number of planar and non-planar graphs, see Fig. 10. Our implementation can be tested via a Java applet under the URL http://i11www.ira.uka.de/ clusteredgraph/. Our work builds on the seminal work of Eades et al. [2]. They define a clustered graph to be compound planar (c-planar) if it admits a drawing with no edge crossings or edge-region crossings, where the regions are the convex hulls of the clusters. They present an algorithm that draws clustered c-plane graphs, i.e., c-planar graphs given a c-planar embedding. (An embedding is defined by the counter-clockwise order of the edges incident to a vertex and by specifying the outer face.) From the embedding they compute a special st-numbering, the socalled c-st numbering, which maps each vertex v of G to a unique layer λ(v), i.e., an integer y-coordinate from the set {1, . . . , n}. The layer assignment is such that the vertices that belong to the same cluster occupy consecutive layers. The assignment is then used to draw the graph as a hierarchical graph with straight-line edges. Since each cluster occupies a range of consecutive layers, the convex hulls of different clusters do not intersect. Moreover, since each cluster is assumed to be connected and the algorithm for drawing hierarchical graphs does not produce edge crossings, no edge intersects a cluster hull more than once. A draw-back of the algorithm of Eades et al. for drawing clustered graphs is that it produces a drawing of height n for any n-vertex graph. For example, it draws the graph of a k × k square grid on k 2 horizontal lines, although this graph can easily be drawn on k lines. Eades et al. list vertical compaction among the important problems for further research concerning the drawing of clustered graphs. Vertical compaction can be divided into two steps: (a) assign vertices to layers and (b) draw the hierarchical graph. This paper deals with step (b). Concerning the first step Bachmeier and Forster [3] have shown how to check in O(kn) time whether a graph has a planar k-level embedding. If an embedding exists, it is computed within the same time bound. However, they restrict the input to proper layer-connected single-source graphs. A hierarchical graph is proper if no edge crosses a layer, i.e., if |λ(u) − λ(v)| = 1 for every edge uv. A clustered hierarchical graph is layer-connected if in each cluster each pair of consecutive layers is spanned by an edge of the cluster. A source is a vertex that is only connected to vertices of higher levels. Rectangular cluster regions and rectilinear edges were considered by Sugiyama and Misue [5] and by Sander [4]. They give algorithms for drawing compound graphs, which generalize clustered graphs in that edges between clusters or between clusters and vertices are also allowed. Both algorithms extend the classical algorithm of Sugiyama et al. [6] for drawing hierarchical graphs. Like Eades et al. [2], Sugiyama and Misue [5] place each vertex on a separate horizontal level, while Sander [4] tries to produce more compact drawings.
Straightening Drawings of Clustered Hierarchical Graphs
2
179
Preliminaries
A clustered graph C = (G, T ) consists of an undirected graph G = (V, E) and a rooted tree T = (VT , ET ) such that the leaves of T are in one-to-one correspondence with the vertices of G. A subset C of V is called a cluster if C is the set of leaves of the subtree rooted at a vertex of VT . A drawing of a graph G = (V, E) assign positions π : V → R2 to the vertices of V and to each edge (u, v) ∈ E a simple Jordan curve joining π(u) and π(v). A drawing is planar if the curves of different edges do not cross. We say that a drawing is weakly monotone if all curves are weakly monotone in y-direction, i.e., for each curve it holds that its intersection with a horizontal line is empty or connected. For strictly monotone the intersection must be empty or a point. In other words: we allow horizontal edges between neighboring vertices on the same layer. A special case of monotone drawings are straight-line drawings, where all curves are straight-line segments. A layered or hierarchical graph L = (G, λ) is given by a graph G = (V, E) and an assignment λ : V → {1, . . . , k} of the vertices to horizontal layers y = 1, . . . , y = k. For a hierarchical graph we define Vi to be the set of vertices on level i, i.e., Vi = {v ∈ V | λ(v) = i} and Ei to be the set of edges crossing level i, i.e., Ei = {{u, v} ∈ E | (λ(u) − i)(λ(v) − i) < 0}. A monotone drawing D of G induces the x-order of Vi ∪ Ei , i.e., a bijection λi : Vi ∪ Ei → {1, 2, . . . , ni } where ni = |Vi ∪ Ei |. The layer assignment λ and the x-orders λ1 , . . . , λk induced by D yield another monotone drawing D′ of G, where each edge e = (u, v) is represented by a polygonal chain, namely the chain given by the point sequence (λi (e), i), (λi+1 (e), i + 1), . . . , (λj (e), j), where i = min{λ(u), λ(v)} and j = max{λ(u), λ(v)}. Note that D′ is plane if and only if D is plane. In this paper we assume that we are given a clustered hierarchical c-plane graph (G, T, λ) including the x-orders λ1 , . . . , λk of a monotone planar drawing of G. Our aim is to investigate conditions under which we can efficiently determine a straight-line drawing of G that respects the x-orders and has convex cluster regions. Eades at al. [2] have given a linear-time algorithm that draws clustered c-plane graphs such that edges are drawn straight and cluster regions are convex. The main disadvantage of that algorithm is that it places each vertex on a unique layer. Eades et al. require that the curves in the given drawing are strictly y-monotone and that the subgraph induced by each cluster is connected. We only require weak monotonicity. The layer assignment that Eades et al. compute has the property that the vertices of each cluster are assigned to a set of consecutive layers. We do not require this. However, we require that the x-orders λi are consistent, i.e., for any pair of clusters C and C ′ and any pair of layers i and j it holds that if λi (v) < λi (v ′ ) then λj (w) < λj (w′ ) for all v, w ∈ C and v ′ , w′ ∈ C ′ .
3
Recursive Algorithm
In this section we make a stronger assumption on the x-orders of the vertices on each layer. Let F be the directed graph whose vertices correspond to clusters
180
S. Bereg et al.
and, for two clusters C and C ′ , there is an edge (C, C ′ ) if there is a level i with λi (t) < λi (t′ ), where t is either a vertex of C or an edge incident to C and t′ is either a vertex of C ′ or an edge incident to C ′ . If F , the cluster adjacency graph, is acyclic, we say that the layer assignment λ is strongly consistent. Note that F is planar since G is c-planar. A c-plane clustered graph with strongly consistent layer assignment can be triangulated in linear time such that the same layer assignment is strongly consistent in the resulting graph. We show that every triangulated hierarchical plane graph with strongly consistent layer assignment admits a straight-line drawing with a prescribed external face that is the complement of a convex polygon P , i.e., R2 \ P . We borrow some terminology from Eades et al. [2]. As Eades at al. we allow slightly more general polygons than convex polygons. Strictly speaking, in a convex polygon each vertex has an interior angle of less than 180◦ . We call such a vertex an apex. We also allow flat vertices where the two incident edges form an angle of 180◦ . When we map vertices of the given c-plane graph G to those of the polygon P we must be careful with these flat vertices. We say that a polygon P is feasible for G if (i) P is a convex polygon, and (ii) if abc is a face of G and the vertices a, b, and c are vertices of P , then they are not collinear. It is easy to construct a feasible polygon for the graph G; for example, all the vertices of the outer face can be made apexes of the polygon. We present a recursive procedure to draw the graph G. Consider the cluster adjacency graph F defined above. Since F is acyclic, F has a topological ordering, i.e., an ordering C1 , C2 , . . . of the clusters such that i < j for any edge (Ci , Cj ) of F . Note that F has linear size and can thus be sorted topologically in time linear in the number of vertices, i.e., clusters. The first cluster C1 has in-degree 0 in F . We split G into the subgraph G1 induced by the vertex set V1 of C1 and a subgraph G2 induced by V2 = V \ V1 . Color the vertices in V1 black and those in V2 white. Let edges with a white and a black endpoint be gray. Due to our choice of C1 there are exactly two gray edges e and e′ on the outer face. These are connected by a path of inner faces each of which has exactly two gray edges. Now we split the polygon P into two polygons by a line ab in order to treat G1 and G2 recursively. We choose the points a and b anywhere on the edges e and e′ , respectively, see Fig. 3. Our recursion is as follows: 1. Construct feasible polygons P1 and P2 for G1 and G2 that are separated by the line ab and are consistent with P . 2. For i = 1, 2 draw Gi in Pi recursively. 3. Draw the gray edges as straight-line segments. Unfortunately, this may produce a drawing with crossing edges, see Fig. 5. We now show how this problem can be fixed by introducing dummy vertices.
C1
b
e′
e a G \ C1
Fig. 3. Split along ab
Straightening Drawings of Clustered Hierarchical Graphs
c
e
c′ e
181
c
d e
c
d d (a) type A
c
c′
e
d
(b) type B
(c) type C
Fig. 4. The three types of faces
We analyze the faces split by the line ab. Each of these faces contains at least one vertex in C1 and at least one vertex not in C1 . Let cde be a face that is crossed by ab such that c is the only vertex in C1 . The case where C1 contains two vertices of cde is symmetric. Without loss of generality we assume that cde is the clockwise order of the vertices. In general, there are three types of faces depending on the layer assignment of the vertices c, d, and e, see Fig. 4: (A) λ(d) ≤ λ(c) ≤ λ(e), (B) λ(d) ≤ λ(e) and (λ(c) ≤ λ(d) or λ(c) ≥ λ(e)), and (C) λ(e) ≤ λ(d) ≤ λ(c). Faces of type A can be handled by the above approach. We show that there are no faces of type C. Indeed, if C1 contains only one vertex c of a face as in Fig. 4 (c), then the edge ce crosses the layer of vertex d and the order λi (d) < λi (ce) is not consistent with the x-order of layer i = λ(d). Note that faces of type B cause a problem, see the face uvw in Fig. 5. For each type-B face abc we do two things. First, we introduce a dummy vertex c′ at the intersection of the separating line ab and the layer of c, see Fig. 4 (b). Second, we add the triangle c′ de to the graph G2 . Then we connect each pair of consecutive dummy vertices by an edge and triangulate new faces if they contain more than three vertices. The triangulation can be done arbitrarily. We construct a feasible polygon for G2 that contains the dummy points on ab and is consistent with P . Similarly we add vertices and faces to the graph G1 if there are faces of type B with two vertices in C1 . Let G′1 and G′2 be the graphs constructed as above and let P1′ and P2′ be the corresponding convex polygons. Then it is not hard to see the following. Lemma 1 (Recursion). The polygons P1′ and P2′ are feasible for the graphs G′1 and G′2 , respectively. The positions of the vertices of V in the polygons P1′ and P2′ can be used for the straight-line drawing of G. At the bottom level of the recursion the graph G contains only one cluster and the linear-time drawing algorithm for hierarchical graphs by Eades at al. [2] can be used. In the full version of this paper we show the following theorem.
182
S. Bereg et al.
C1
t u
b
C2
v
w a
G \ C1
x
Fig. 5. After splitting along ab simple recursion does not work: edge uv and vertex w are not independent
C1 Fig. 6. A clustered hierarchical graph without a monotone separating path
Theorem 1 (Algorithm). Let (G, T, λ) be a clustered hierarchical c-plane graph whose n vertices are assigned to k layers. If the layer assignment is strongly consistent, then a straight-line drawing with convex cluster regions can be computed in O(n2 ) time.
4
Separating-Path Algorithm
The algorithm of Section 3 is recursive and guarantees a c-plane drawing if the layer assignment is strongly consistent. However, the layer assignment of a clustered graph may not be strongly consistent even for two clusters. Therefore we now discuss an algorithm with a different requirement. We explore the possibility of splitting the graph along a path. A monotone separating path in a clustered hierarchical c-plane graph G = (V, E) is defined as a path Π between two vertices on the boundary of G such that (i) the path is y-monotone, and (ii) the graph G − Π has two connected components G1 and G2 whose vertices are in different clusters, i.e., for any cluster Ci , Ci ∩ G1 = ∅ or Ci ∩ G2 = ∅. For example, the graph shown in Fig. 5 admits the monotone separating path tuvwx. Although there are clustered c-plane graphs without separating paths, see Fig. 6 (a), the requirement is intuitive and does not seem too restrictive for practical applications. Finding a monotone separating path. Suppose that G has only two clusters. We show how to detect whether they can be separated by a monotone separating path. An edge (u, v), λ(u) ≤ λ(v) is separating if it separates the clusters in the slab λ(u) ≤ y ≤ λ(v). The boundary of G contains exactly two edges g1 and g2 called gates whose endpoints are in different clusters. We want to find a y-monotone path Π between two vertices u1 and u2 such that ui , i = 1, 2 is an endpoint of gi and every edge of Π is separating. We sweep the plane with a horizontal line l from top to bottom. We maintain a list L of edges intersecting l. An edge e ∈ L with is good if its part above l satisfies the definition of the separating edge; otherwise e is called bad. The good and bad edges satisfy the property that the list L consists of three sublists L1 , L2 , and L3 such that all good edges are in L2 , see Fig. 7. We just store
Straightening Drawings of Clustered Hierarchical Graphs
183
the first and last good edge of L2 . Suppose that l approaches layer i. In the list of vertices of layer i, we find two consecutive vertices a and b from different clusters, see Fig. 7. We proceed as follows. 1. Delete edges that end at layer i. If a good edge ending at a or b is deleted then it is a separating edge. 2. Reduce the list L2 using the positions of a and b in L. 3. Insert new edges into L. A new edge falls into L2 if it starts at a or b. The sweep can be done in lingood bad bad ear time since we maintain only the first and last edges of L2 and ... b the rest is just traversing a plai a nar graph. We create a directed graph G′ using separating edges bad bad good by orienting them from top to bottom. Any monotone separating Fig. 7. Traversing layer i. Vertices a and b path in G connects two vertices are consecutive vertices from different clusters. that belong to different gates g1 The separating edge is bold. and g2 . A path connecting gates in G′ can be found in linear time. Note that a separating path may not exist, see Fig. 6. Shortcuts. We compute shortcuts in the separating path. There are two types of shortcuts—left and right. The shortest path using left (right) shortcuts is called left path (resp. right path). We find two graphs G1 and G2 using these paths, see Fig. 8 (a). We draw the left and right paths using parallel line segments and compute drawings of G1 and G2 using the algorithm of Eades et al. [2]. Final drawing. Let ξ, δ > 0 be two parameters. We place the drawings of G1 and G2 at distance ξ from each other. The remaining vertices are placed on two arcs a1 , a2 using distance δ as shown in Fig. 8 (c). The values of ξ and δ are subject to the following restrictions. Consider a face abc. If its two vertices b and c are in G2 (or G1 ), then the restriction is ξ < ξ0 , see Fig. 9 (a). If exactly one vertex is in G2 (or G1 ), then the restriction is ξ < ξ1 , see Fig. 9 (b). If a, b and c lie on the arcs a1 , a2 , then the drawing of the face abc is correct if δ is chosen small, see Fig. 9 (c). This procedure yields the following theorem. a b G1 c
(a)
G1
a b
b G2
d e
ξ
a
a
a
G2
c
G1
G2
c δ
d e
e (b)
d e
(c)
e
Fig. 8. (a) Shortcuts in the separating path—the left path is ace, the right path abde, (b) recursive drawing of G1 and G2 , (c) the two parameters ξ and δ
184
S. Bereg et al.
b
b
ξ
G2
G2 ′
a (a)
b
b′
G2 x0 a
c
c
c
a
a ξ
(b)
(c)
Fig. 9. Restrictions for the correct drawing of the face abc that belongs to G1 and G2 : (a) ξ < ξ0 , where ξ0 is the distance of x0 from G2 , (b) ξ < ξ1 , where ξ1 is derived from the condition that the slope of b′ c is less than the slope of a′ b′ , (c) δ > 0 such that a is above bc
Theorem 2. Given a clustered hierarchical c-plane graph G with two clusters and a monotone separating path, a straight-line drawing of G with convex cluster regions can be computed in linear time.
5
Linear Programming Formulation
In this section we describe how a clustered hierarchical graph (G, T, λ) can be drawn “nicely”, i.e., with straight-line edges and disjoint convex cluster regions. We give an LP formulation that decides whether the input graph has a nice drawing. Note that this is always the case if the layer assignment is strongly consistent. If a nice drawing exists, then the objective function of our LP formulation yields an especially nice drawing. A great advantage of our LP in comparison with other algorithms is that it can handle unconnected and non-planar graphs. The edge crossings of the input are preserved and no new ones are produced. In the description of our LP formulation we only consider clusters on the top level of the cluster hierarchy, i.e., children of the root of T . Clusters at lower levels can be treated analogously. We have implemented the LP. For results, see Fig. 10. Our LP can easily be adapted to more complex drawings, e.g., for graphs with labeled vertices and edges or for graphs with icons as vertices (see full version). For three points p = (px , py ), q = (qx , qy ), and r = (rx , ry ) = q in the plane, let their relative position RelPos(p, q, r) be defined by the following determinant: px py 1 RelPos(·, q, r) > 0 r RelPos(p, q, r) = qx qy 1 . q rx ry 1 RelPos(·, q, r) < 0 Observe that RelPos(p, q, r) > 0 iff p lies to the left of the line from q to r. Note that these are linear constraints if the y-coordinates of the points are known. 5.1
Constraints
Our LP formulation has to fulfill three requirements. First, the x-orders λ1 , . . . , λk must be preserved. Second, the edges must be straight-line segments. Third, the convex hulls of the clusters must be disjoint.
Straightening Drawings of Clustered Hierarchical Graphs
185
For the first requirement we do the following. For each vertex v ∈ V we introduce a variable xv that will express the x-coordinate of v. Similarly, for each edge e ∈ E and each level y ∈ {1, . . . , k} spanned by e we introduce the variable xe,y if the immediate predecessor or successor of e on level y is a vertex (and not another edge). Since on each level y the x-order of the vertices and the edges spanning the level is part of the input, we preserve this order by constraints xa < xb ,
(1)
where a and b are either vertices or edge-level pairs and a is the immediate predecessor of b in the x-order λy . We can also use these constraints to ensure a certain minimum horizontal distance dmin between a and b: xa + dmin ≤ xb .
(2)
Since each vertex is the immediate neighbor of at most two edges, the first requirement needs O(n) variables and constraints. For the second requirement we proceed as follows. For each pair of an edge e = {u, w} ∈ E and a level y ∈ {1, . . . , k} for which we have introduced the variable xe,y above, we now introduce the constraint RelPos (xe,y , y), u, w = 0. (3)
This makes sure that the intersection point of edge e and level y lies on the straight line through u and w. Since there are O(n) variables of type xe,y , the second requirement needs O(n) new constraints. For the third requirement it is simple to come up with a solution that needs Θ(n3 ) constraints. We only need O(n) constraints, basically by observing that the cluster adjacency graph is planar. We introduce two new variables xij and Xij for each pair (Ci , Cj ) of adjacent clusters, i.e., clusters with vertices v ∈ Ci and w ∈ Cj where v is the immediate predecessor of w in the x-order on level λ(v) = λ(w). Let {yij , . . . , Yij } = λ(Ci ) ∩ λ(Cj ). The idea is to define two points pij = (xij , yij ) and Pij = (Xij , Yij ) such that the line segment from pij to Pij will separate the two clusters Ci and Cj . To ensure this separation we introduce the following constraint for each vertex u with yij ≤ λ(u) ≤ Yij that is rightmost in Ci , i.e., xu > xu′ for all u′ ∈ Ci with λ(u) = λ(u′ ): RelPos(pij , Pij , u) < 0
(4)
The constraint for the leftmost vertices is symmetric. Since each vertex v ∈ V is leftmost or rightmost relative to at most two clusters, the number of constraints of this type is also linear. By construction the system of Equations (1), (3), and (4) has a solution if and only if the clustered graph can be drawn nicely. 5.2
Objective Functions
If a nice drawing exists, then we would like to choose a particularly nice one. Therefore we try to produce balanced drawings, in which the angular space of
186
S. Bereg et al.
180◦ above and below each vertex is distributed uniformly among the respective vertices. We treat the vertices one by one. Let v be the current vertex. For each vertex u adjacent to v an optimal position relative to v can easily be computed. For this purpose the adjacent vertices above and below v are uniformly ∗ distributed. As the vertical distances are fixed, we are able to calculate δuv , the optimal x-offset of u relative to v, using trigonometric functions. The actual horizontal offset δuv between u and v is given by δuv = xu − xv . The absolute ∗ difference µuv of δuv and δuv can now be expressed as follows: ∗ − δuv µuv ≥ +δuv
and
∗ µuv ≥ −δuv + δuv
(5)
The variable µuv indicates how much the actual position of u relative to v differs from the ideal one. We normalize µuv : µuv (6) µ ¯uv = |yv − yu | Summing up µ ¯uv over all edges {u, v} ∈ E yields the following objective function: minimize (¯ µuv + µ ¯vu ) (7) {u,v}∈E
Note that in general µ ¯uv and µ ¯vu differ. Instead of optimizing angles, it is also possible to optimize the width of the drawing. This is achieved by µuv ≥ −δuv
and
µuv ≥ +δuv .
(8)
Recall that constraint (2) makes sure that the minimum distance between vertices is kept. Equation (6) and objective function (7) remain as before. For example drawings see Fig. 10. Note that graph G2 is not plane. Also note that H3 is not clustered; the drawing shows that our LP nicely keeps the symmetry.
References 1. Brockenauer, R. and Cornelsen, S.: Drawing Clusters and Hierarchies. In M. Kaufmann and D. Wagner (eds), Drawing Graphs: Methods and Models, Springer-Verlag, Lecture Notes in Computer Science 2025 (2001) 193–227 2. Eades, P., Feng, Q., Lin, X., and Nagamochi, H.: Straight-Line Drawing Algorithms for Hierarchical Graphs and Clustered Graphs. Algorithmica 44 1 (2005) 1–32 3. Forster, M. and Bachmaier, C.: Clustered Level Planarity. In P. van Emde Boas, J. Pokorny, M. Bielikova, and J. Stuller, (eds), Proc. 30th Int. Conf. Current Trends in Theory and Practice of Computer Science (SOFSEM’04), Springer-Verlag, Lecture Notes in Computer Science 2932 (2004) 218–228 4. Sander, G.: Graph Layout for Applications in Compiler Construction. Theoretical Computer Science 217 (1999) 175–214 5. Sugiyama, K. and Misue, K.: Visualization of Structural Information: Automatic Drawing of Compound Digraphs. IEEE Transactions on Systems, Man, and Cybernetics 21 4 (1991) 876–891 6. Sugiyama, K., Tagawa, S., and Toda, M.: Methods for Visual Understanding of Hierarchical System Structures. IEEE Transactions on Systems, Man, and Cybernetics 11 2 (1981) 109–125
Straightening Drawings of Clustered Hierarchical Graphs
(a) Graph H3 from [2], width optimized
(c) Graph G1 , width optimized
(e) Graph G2 , width optimized
187
(b) Graph G1 , input
(d) Graph G1 , angle optimized
(f) Graph G2 , angle optimized
Fig. 10. Graph drawings produced by our LP formulation. Note that G2 is not plane.
Improved Upper Bounds for λ-Backbone Colorings Along Matchings and Stars Hajo Broersma1, Bert Marchal2, Daniel Paulusma1 , and A.N.M. Salman3 1
3
Department of Computer Science, Durham University DH1 3LE Durham, United Kingdom {hajo.broersma,daniel.paulusma}@durham.ac.uk 2 Faculty of Economics and Business Administration Department of Quantitative Economics, University of Maastricht PO Box 616, 6200 MD Maastricht, The Netherlands
[email protected] Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung Jalan Ganesa 10, Bandung 40132, Indonesia
[email protected]
Abstract. We continue the study on backbone colorings, a variation on classical vertex colorings that was introduced at WG2003. Given a graph G = (V, E) and a spanning subgraph H of G (the backbone of G), a λ-backbone coloring for G and H is a proper vertex coloring V → {1, 2, . . .} of G in which the colors assigned to adjacent vertices in H differ by at least λ. The main outcome of earlier studies is that the minimum number ℓ of colors for which such colorings V → {1, 2, . . . , ℓ} exist in the worst case is a factor times the chromatic number (for all studied types of backbones). We show here that for split graphs and matching or star backbones, ℓ is at most a small additive constant (depending on λ) higher than the chromatic number. Despite the fact that split graphs have a nice structure, these results are difficult to prove. Our proofs combine algorithmic and combinatorial arguments. We also indicate other graph classes for which our results imply better upper bounds on ℓ than the previously known bounds.
1
Introduction and Related Research
Coloring has been a central area in Graph Theory for more than 150 years. Some reasons for this are its appealingly simple definition, its large variety of open problems, and its many application areas. Whenever conflicting situations between pairs of objects can be modeled by graphs, and one is looking for a partition of the set of objects in subsets of mutually non-conflicting objects, this can be viewed as a graph coloring problem. This holds for classical settings such as neighboring countries (map coloring) or interfering jobs on machines (job scheduling), as well as for more recent settings like colliding data streams in optical networks (wavelength assignment) or interfering transmitters and receivers for broadcasting, mobile phones and sensors (frequency assignment), to name just a few. Except perhaps for the notorious map coloring problem, all of the Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 188–199, 2007. c Springer-Verlag Berlin Heidelberg 2007
Improved Upper Bounds for λ-Backbone Colorings
189
above settings play an important role in Computer Science as well, e.g., in areas like parallel and distributed computing, embedded systems, optical networks, sensor networks and mobile networks. Apart from these applications areas, graph coloring has been a central theme within Theoretical Computer Science, especially within Complexity Theory and the currently very popular area of Exact Algorithms. In [7] backbone colorings are introduced, motivated and put into a general framework of coloring problems related to frequency assignment. Graphs are used to model the topology and interference between transmitters (receivers, base stations): the vertices represent the transmitters; two vertices are adjacent if the corresponding transmitters are so close (or so strong) that they are likely to interfere if they broadcast on the same or ‘similar’ frequency channels. The problem is to assign the frequency channels in an economical way to the transmitters in such a way that interference is kept at an ‘acceptable level’. This has led to various different types of coloring problems in graphs, depending on different ways to model the level of interference, the notion of similar frequency channels, and the definition of acceptable level of interference (See, e.g., [15],[20]). We refer to [6] and [7] for an overview of related research, but we repeat the general framework and some of the related research here for convenience and background. Given two graphs G1 and G2 with the property that G1 is a spanning subgraph of G2 , one considers the following type of coloring problems: Determine a coloring of (G1 and) G2 that satisfies certain restrictions of type 1 in G1 , and restrictions of type 2 in G2 . Many known coloring problems fit into this general framework. We mention some of them here explicitly, without giving details. The first variant is known as the distance-2 coloring problem. Much of the research has been concentrated on the case that G1 is a planar graph. We refer to [1], [4], [5], [18], and [21] for more details. A closely related variant is known as the radio coloring problem and has been studied (under various names) in [2], [8], [9], [10], [11], [12], and [19]. A third variant is known as the radio labeling problem. We refer to [14] and [17] for more particulars. In the WG2003 paper [7], a situation is modeled in which the transmitters form a network in which a certain substructure of adjacent transmitters (called the backbone) is more crucial for the communication than the rest of the network. This means more restrictions are put on the assignment of frequency channels along the backbone than on the assignment of frequency channels to other adjacent transmitters. Postponing the relevant definitions to the next subsections, we consider the problem of coloring the graph G2 (that models the whole network) with a proper vertex coloring such that the colors on adjacent vertices in G1 (that models the backbone) differ by at least λ ≥ 2. This is a continuation of the study in [7] and [22]. Throughout the paper we consider two types of backbones: matchings and disjoint unions of stars. We give many details on the matching case (for
190
H. Broersma et al.
which the proofs are the most involved), but due to page limits refrain from details for the other case (that is simpler). Matching backbones reflect the necessity to assign considerably different frequencies to pairwise very close (or most likely interfering) transmitters. This occurs in real world applications such as military scenarios, where soldiers or military vehicles carry two (or sometimes more) radios for reliable communication. For star backbones one could think of applications to sensor networks. If sensors have low battery capacities, the tasks of transmitting data are often assigned to specific sensors, called cluster heads, that represent pairwise disjoint clusters of sensors. Within the clusters there should be a considerable difference between the frequencies assigned to the cluster head and to the other sensors within the same cluster, whereas the differences between the frequencies assigned to the other sensors within the cluster, or between different clusters, is of a secondary importance. This situation is well reflected by a backbone consisting of disjoint stars. We concentrate on the case that G2 is a split graph, but will indicate how our results could be used in case G2 is a general graph, and for which type of graphs this could be useful. The motivation for looking at split graphs is twofold. First of all, split graphs have nice structural properties, which lead to substantially better upper bounds on the number of colors in this context of backbone colorings. Secondly, every graph can be turned into a split graph by considering any (e.g., a maximum or maximal) independent set and turning the remaining vertices into a clique. The number of colors needed to color the resulting split graph is an upper bound for the number of colors one needs to color the original graph. We will indicate classes of non-split graphs for which our results also imply better upper bounds. Although split graphs have a very special structure, they are not completely artificial in the context of, e.g., sensor networks. As an example, consider a sensor network within a restricted area (like a lab) with two distinct types of nodes: weak sensors with a very low battery capacity, like heat sensors, smoke sensors, body tags, etc., and PCs, laptops, etc., with much stronger power properties. The weak sensors are very unlikely to interfere with one another (especially if they are put with a certain purpose on fixed locations), while the other equipment is likely to interfere (within this restricted area). Weak sensors interfere with pieces of the other equipment within their vicinity. In such cases, the situation can be modeled as a split graph. 1.1
Terminology and Previous Results
For undefined terminology we refer to [3]. Let G = (V, E) be a graph, where V = VG is a finite set of vertices and E = EG is a set of unordered pairs of two different vertices, called edges. A function f : V → {1, 2, 3, . . .} is a vertex coloring of V if |f (u) − f (v)| ≥ 1 holds for all edges uv ∈ E. A vertex coloring f : V → {1, . . . , k} is called a k-coloring, and the chromatic number χ(G) is the smallest integer k for which there exists a k-coloring. A set V ′ ⊆ V is independent if its vertices are mutually nonadjacent; it is a clique
Improved Upper Bounds for λ-Backbone Colorings
191
if its vertices are mutually adjacent. By definition, a k-coloring partitions V into k independent sets V1 , . . . , Vk . Let H be a spanning subgraph of G, i.e., H = (VG , EH ) with EH ⊆ EG . Given an integer λ ≥ 2, a vertex coloring f of G is a λ-backbone coloring of (G, H), if |f (u) − f (v)| ≥ λ holds for all edges uv ∈ EH . The λ-backbone coloring number bbcλ (G, H) of (G, H) is the smallest integer ℓ for which there exists a λ-backbone coloring f : V → {1, . . . , ℓ}. A star Sq is a complete 2-partite graph with independent sets V1 = {r} and V2 with |V2 | = q; the vertex r is called the root and the vertices in V2 are called the leaves of the star Sq . In our context a matching M is a collection of pairwise disjoint stars that are all copies of S1 . We call a spanning subgraph H of a graph G a star backbone of G if H is a collection of pairwise disjoint stars, and a matching backbone if H is a (perfect) matching. Obviously, bbcλ (G, H) ≥ χ(G) holds for any backbone H of a graph G. We are interested in tight upper bounds for bbcλ (G, H) in terms of χ(G). In [22], it has been shown that the upper bounds in case of star and matching backbones 2 )χ(G), respectively. In all worst roughly grow like (2 − λ1 )χ(G) and (2 − λ+1 cases the backbone coloring numbers grow proportionally to a multiplicative factor times the chromatic number. Although these upper bounds in [22] are tight, they are probably only reached for very special graphs. To analyze this further, we turned to study the special case of split graphs. This was motivated by the observation in [7] that for split graphs and tree backbones the 2-backbone coloring number differs at most 2 from the chromatic number. We show a similar behavior for the general case with λ ≥ 2 and matching and star backbones in split graphs. This can have nice implications for upper bounds on the λ-backbone coloring numbers for matching and star backbones in other graphs, if they satisfy certain conditions. 1.2
New Results
A split graph is a graph whose vertex set can be partitioned into a clique and an independent set, with possibly edges in between. The size of a largest clique in G is denoted by ω (G). Split graphs were introduced by Hammer & F¨ oldes [16]; see also the book [13] by Golumbic. They form an interesting subclass of the class of perfect graphs. Hence, split graphs satisfy χ(G) = ω(G), and many NP-hard problems are polynomially solvable when restricted to split graphs. In Section 2 we present sharp upper bounds for the λ-backbone coloring numbers of split graphs with matching or star backbones. We apply them to certain other graphs, too. All upper bounds are only a small additive constant (depending on λ and for non-split graphs also on α (G)) higher than χ(G), in contrast to earlier results, which show a multiplicative factor times χ(G).
2
Matching and Star Backbones
In this section we present sharp upper bounds on the λ-backbone coloring numbers of split graphs along matching and star backbones. Our result on matching backbones is summarized in the next theorem which will be proved in Section 3.
192
H. Broersma et al.
Theorem 1. Let λ ≥ 2 and let G = (V, E) be a split graph with χ(G) = k ≥ 2. For every matching backbone M = (V, EM ) of G, ⎧ λ+1 if k = 2 (i) ⎪ ⎪ ⎪ k k+5 ⎪ k + 1 if k ≥ 4 and λ ≤ min{ , } (ii) ⎨ 2 3 if k = 9 or k ≥ 11 and k+6 ≤ λ ≤ ⌈ k2 ⌉ (iii) bbcλ (G, M ) ≤ k + 2 3 ⎪ k k ⎪ if k = 3, 5, 7 and λ ≥ ⌈ 2 ⌉ (iv) ⎪⌈2⌉ + λ ⎪ ⎩ k ⌈ 2 ⌉ + λ + 1 if k = 4, 6 or k ≥ 8 and λ ≥ ⌈ k2 ⌉ + 1. (v) All the bounds are tight. We will now show how these results can yield upper bounds for non-split graphs. For this purpose we first implicitly define a function f by the upper bounds bbcλ (G, M ) ≤ f (λ, χ(G)) from the above theorem. Note that f is a nondecreasing function in λ and χ(G). Let G = (V, E) be a graph and V1 ⊆ V be an independent set with |V1 | = α(G) and let V2 = V \ V1 . Let W be the subset of V1 consisting of vertices that are adjacent to all vertices in V2 . If W is non-empty, then we choose one v ∈ W and move it to V2 , i.e., V2 := V2 ∪ {v}. The meaning of this choice will become clear after the next sentence. Let S(G) be the split graph with clique V2 and independent set V1 . Since we moved one vertex from W to V2 in case W = ∅, we guarantee that no vertex of V1 is adjacent to all vertices of V2 . So χ(S(G)) = ω(S(G)) = |V (G)| − α(G) or χ(S(G)) = |V (G)| − α(G) + 1. Let the edges between V1 and V2 be defined according to E. Then we obtain: bbcλ (G, M ) ≤ bbcλ (S(G), M ) ≤ f (λ, χ(S(G))) ≤ f (λ, |V (G)| − α(G) + 1). These upper bounds are almost sharp in the sense that we have examples for sharpness for most values of λ and α(G), but we (still) have a discrepancy of 1 in some cases. We will present the tedious details in a full journal version of this paper. When can these bounds be useful for other (non-split) graphs? To answer this question, we should compare the new bound f (λ, |V (G)| − α(G) + 1) with 2 )χ(G) from [22]. To get some insight into situations for the bound (2 − λ+1 which this gives an improvement, we apply a very rough calculation in which we use that the first bound is roughly of order |V (G)| − α(G) (disregarding some additive constant depending on λ), and the second one is roughly of order 2χ(G) 2 ). Adopting these rough estimates, the first bound is (disregarding the factor λ+1 better than the second one whenever |V (G)| − α(G) ≤ 2χ(G). This is, of course, the case when G is a split graph, since then |V (G)| − α(G) ≤ ω(G) = χ(G). Now suppose we have a graph G with the following structure: An independent set I of G with cardinality α(G) shares at most one vertex with a clique C of G with cardinality ω(G), and r = |V (G) \ (I ∪ C)| ≤ ω(G). Then clearly |V (G)| − α(G) ≤ 2ω(G) ≤ 2χ(G). This gives large classes of non-split graphs for which the new bounds are better than the old bounds. Also if we apply 2 a more careful analysis: If r is small compared to (1 − λ+1 )ω(G) + λ, we get an improvement. We omit the details. For split graphs with star backbones we obtained the following result.
Improved Upper Bounds for λ-Backbone Colorings
193
Theorem 2. Let λ ≥ 2 and let G = (V, E) be a split graph with χ(G) = k ≥ 2. For every star backbone S = (V, ES ) of G, k+λ if either k = 3 and λ ≥ 2 or k ≥ 4 and λ = 2 bbcλ (G, S) ≤ k + λ − 1 in the other cases. The bounds are tight. The proof of Theorem 2 has been postponed to the journal version of our paper. We can apply the results to obtain upper bounds for certain non-split graphs that improve bounds in [22], in a similar way as we did in the case of matching backbones, using a function g(λ, χ(G)) which is implicitly defined by the upper bounds from Theorem 2. We omit the details.
3
Proof of Theorem 1
Given a graph G = (V, E) with a matching backbone M = (V, EM ), u ∈ V is called a matching neighbor of v ∈ V if (u, v) ∈ EM , denoted by u = mn(v). Throughout this section, G = (V, E) denotes a split graph, and V is assumed to be partitioned in a largest clique C and an independent set I. Moreover, |V | is assumed to be even to allow for a perfect matching in the graph G. The set of nonneighbors of a vertex u will be denoted by N N (u). Note that in G, every vertex of I has at least one nonneighbor in C (otherwise C would not be a largest clique). However, for a vertex u ∈ C, the set N N (u) may be empty. For some p ≤ α(G) a splitting set of cardinality p, named an s-set for short, is a subset {v1 , . . . , vp } ⊆ I such that ⎫ ⎫ ⎧ ⎧ ⎬ ⎬ ⎨ ⎨ {mn(vi )} = ∅. N N (vi ) ⎭ ⎭ ⎩ ⎩ i=1...p
i=1...p
Note that if (G, M ) has an s-set of cardinality p, then it also has an s-set of cardinality q, for all q ≤ p. We need the following technical lemmas on the existence of certain s-sets for our proof. The proof of the second lemma is postponed to the journal version of our paper. Lemma 1. Given (G, M ), let k ′ = |C ′ | for a clique C ′ in G and let i′ = |I ′ | for an independent set I ′ in G. If i′ = k ′ and every vertex in I ′ has at most one nonneighbor in C ′ and every vertex in I ′ has exactly one matching neighbor ′ in C ′ and ⌈ k3 ⌉ ≥ p, then (G, M ) has an s-set of cardinality p. Proof. Below we partition the disjoint sets C ′ and I ′ in the sets C1′ , C2′ , I1′ and I2′ with cardinalities c′1 , c′2 , i′1 and i′2 , respectively. Then we show that one can pick i′ i′ at least ⌈ 31 ⌉ vertices from I1′ and at least ⌈ 32 ⌉ vertices from I2′ to form an s-set i′
i′
′
with cardinality q ≥ ⌈ 31 ⌉ + ⌈ 32 ⌉ ≥ ⌈ k3 ⌉, which will prove the lemma.
194
H. Broersma et al.
C ′ and I ′ are split up in the following way: C1′ consists of all the vertices in C ′ that either have zero nonneighbors in I ′ or have at least two nonneighbors in I ′ or have exactly one nonneighbor in I ′ , whose matching neighbor in C ′ has no nonneighbors in I ′ ; C2′ consists of all other vertices in C ′ . Obviously, they all have exactly one nonneighbor in I ′ ; I1′ consists of the matching neighbors of the vertices in C1′ ; I2′ consists of the matching neighbors of the vertices in C2′ . Clearly, i′1 = c′1 and i′2 = c′2 . Now assume that there are ℓ1 vertices in C1′ that have no nonneighbors in I ′ and put them in L1 . Also assume that there are ℓ2 vertices in C1′ that have at least two nonneighbors in I ′ and put them in L2 . Finally, assume that there are ℓ3 vertices in C1′ that have exactly one nonneighbor in I ′ , whose matching neighbor has no nonneighbors in I ′ and put them in L3 . Then ℓ1 ≥ ℓ2 and ℓ1 ≥ ℓ3 and c′1 = ℓ1 + ℓ2 + ℓ3 , so c′1 ≤ 3ℓ1 . Let L′1 , L′2 and L′3 be the sets of matching neighbors of the vertices in L1 , L2 and L3 , respectively. Now we pick from I1′ the ℓ1 vertices in L′1 and put them in the s-set. Notice that these vertices do not violate the definition of an s-set, because the set of their nonneighbors and the set of their matching neighbors are two disjoint sets. The matching neighbors of the nonneighbors of the ℓ1 vertices in the s-set are either in L′2 or in L′3 , so we exclude the vertices in these two sets for use in the s-set. On the other hand, the matching neighbors of the ℓ1 vertices in the s-set do not have nonneighbors, so we do not have to worry about i′ c′ that. From the observations above it is clear that we can pick ℓ1 ≥ ⌈ 31 ⌉ = ⌈ 31 ⌉ vertices from I1′ that can be used in the s-set. Moreover, any vertices from I2′ that we will put in the s-set do not conflict with the vertices from L′1 that are i′ in the s-set already. So the only thing we have to do now is to pick at least ⌈ 32 ⌉ vertices from I2′ that can be used in the s-set. Simply pick an arbitrary vertex from I2′ and put it in the s-set. Now delete from I2′ the matching neighbor of its nonneighbor and the unique nonneighbor of its matching neighbor if they happen to be in I2′ . Continuing this way, we ’throw away’ at most two vertices of I2′ for every vertex of I2′ that we put in the s-set. It is easy to see that we can i′ pick at least ⌈ 32 ⌉ vertices from I2′ that we can put in the s-set. Therefore, the i′
i′
′
′
cardinality of the s-set will be at least ⌈ 31 ⌉ + ⌈ 32 ⌉ ≥ ⌈ i3 ⌉ = ⌈ k3 ⌉, which proves the lemma. ⊓ ⊔ Lemma 2. Given (G, M ), let k = ω(G) = |C| and let i = |I|. If i ≤ k and every vertex in I has exactly one nonneighbor in C and ⌈ k3 ⌉ ≥ p, then (G, M ) has an s-set S with |S| = p − k−i 2 such that there are no matching edges between elements of the set of nonneighbors of vertices of S. Proof of the bounds in Theorem 1. First of all, note that for technical reasons we split up the proof in more and different subcases than there appear in the formulation of the theorem. The exact relation between the subcases in the theorem and those in the following proof is given as follows: Subcase i of the theorem is proven in a. The proof of subcase ii can be found in b. For even k the proof of subcase iii is given in c, for odd k in d. The three cases with k = 3 and λ = 2, k = 5 and λ = 3 and k = 7 and λ = 4 from subcase iv are treated
Improved Upper Bounds for λ-Backbone Colorings
195
in b, the others in e. Finally, subcase v is proven in f for even k and in g for odd k. Subcase a. If k = 2 then G is bipartite, and we use colors 1 and λ + 1. For k ≥ 3, let G = (V, E) be a split graph with χ(G) = k and with a perfect matching backbone M = (V, EM ). Let C and I be a partition of V such that C with |C| = k is a clique of maximum size, and such that I with |I| = i is an independent set. Without loss of generality, we assume that every vertex in I has exactly one nonneighbor in C. Subcase b. Here we consider the cases with k ≥ 4 and λ ≤ min{ k2 , k+5 3 } together with the three separate cases with k = 3 and λ = 2, k = 5 and λ = 3 and k = 7 and λ = 4. The reason for this is that these are exactly the cases for which we obtain k ≥ 2λ − 1 and ⌈ k3 ⌉ ≥ λ − 1 and for which we need show the existence of a λ-backbone coloring using at most k + 1 colors. By Lemma 2, we find that (G, M ) has an s-set of cardinality y = λ − 1 − k−i 2 such that there are no matching edges between the nonneighbors of vertices in the s-set. We make a partition of C into six disjoint sets C1 , . . . , C6 , with cardinalities c1 , . . . , c6 , respectively, as follows: C1 consists of those vertices in C that have a matching neighbor in C and a nonneighbor in the s-set. Notice that by definition of the s-set, there are no matching edges between vertices in C1 ; C2 consists of those vertices in C that have a matching neighbor in I and a nonneighbor in the s-set; C3 contains one end vertex of each matching edge in C that has no end vertex in C1 ; C4 consists of those vertices in C whose matching neighbor is in I and that are neither matching neighbor nor nonneighbor of any vertex in the s-set; C5 consists of those vertices in C that have a matching neighbor in the s-set; C6 consists of those vertices in C that have a matching neighbor in C and that are not already in C1 or C3 . It is easily verified that c1 + c2 ≤ y, c5 = y,
c3 = c6 =
k−i 2 − c1 , k−i 2 ,
c4 = i − y − c 2 , 6 i=1 ci = k.
An algorithm that constructs a feasible λ-backbone coloring of (G, M ) with at most k + 1 colors is given below. In this algorithm I ′′ denotes the set of vertices of I that are not in the s-set. Coloring Algorithm 1 1 Color the vertices in C1 with colors from the set {1, . . . , c1 }. 2 Color the vertices in C2 with colors from the set {c1 + 1, . . . , c1 + c2 }. 3 Color the vertices in the s-set by assigning to them the same colors as their nonneighbors in C1 or C2 . Note that different vertices in the s-set can have the same nonneighbor in C1 or C2 , so a color may occur more than once in the s-set. 4 Color the vertices in C3 with colors from the set {c1 +c2 +1, . . . , c1 +c2 +c3 }. 5 Color the vertices in C4 with colors from the set {c1 + c2 + c3 + 1, . . . , c1 + c2 + c3 + c4 }.
196
H. Broersma et al.
6 Color the vertices in C5 with colors from the set {c1 +c2 +c3 +c4 +1, . . . , c1 + c2 + c3 + c4 + c5 }; start with assigning the lowest color from this set to the matching neighbor of the vertex in the s-set with the lowest color and continue this way. 7 Color the vertices in C6 with colors from the set {c1 + c2 + c3 + c4 + c5 + 1, . . . , c1 +c2 +c3 +c4 +c5 +c6 }; start with assigning the lowest color from this set to the matching neighbor with the lowest color in C1 ∪ C3 and continue this way. 8 Finally, color the vertices of I ′′ with color k + 1. We postpone the correctness proof of this algorithm to the journal version. k Subcase c. Here we consider the case k = 2m, m ≥ 6 and k+6 3 ≤ λ ≤ 2 . We obtain k ≥ 2λ. We color the k vertices in C with colors from the sets {2, . . . , k2 +1} and { k2 + 2, . . . , k + 1}. If there are matching edges in C, then we color them such that the first colors from both sets are assigned to the end vertices of one matching edge, the second colors from both sets are assigned to the end vertices of another matching edge, and so on. For later reference we call this a greedy coloring. We can color up the two end vertices of k2 matching edges in C this way, which suffices. Vertices in I get color k + 2 if their matching neighbor in C is colored by a color from the first set, and vertices in I get color 1 if their matching neighbor in C is colored by a color from the second set. This yields a λ-backbone coloring of (G, M ) with at most k + 2 colors. k+1 Subcase d. We now consider the case k = 2m + 1, m ≥ 4 and k+6 3 ≤ λ≤ 2 . We obtain k ≥ 2λ−1. For this case i is odd, otherwise there is no perfect matching in G. If i = 1, then there are k−1 2 matching edges in C. We can color their end k−1 vertices with colors from the two sets {1, . . . , k−1 2 } and { 2 + 3, . . . , k + 1} by a greedy coloring. The distance between the colors of the end vertices of 2λ−2 + 2 = λ + 1. For the other a matching edge in C is then k−1 2 + 2 ≥ 2 k−1 vertex in C we use color 2 + 1 and its matching neighbor in I gets color k+3 2λ+2 = λ + 1. If 3 ≤ i ≤ k, there k + 2. Note that k + 2 − k−1 2 −1 = 2 ≥ 2 k−i are 2 matching edges in C. We color their end vertices with colors from the k+i two sets {2, . . . , k−i 2 + 1} and { 2 + 2, . . . , k + 1} by a greedy coloring. The distance between the colors of the end vertices in a matching edge in C is then 2λ−1+i k+i ≥ 2λ+2 = λ + 1. The other i vertices in C are colored with colors 2 ≥ 2 2 k+3 k+i k−i from the sets { 2 + 2, . . . , k+3 2 } and { 2 + 1, . . . , 2 + 1}. The cardinality of i+1 i−1 the first set is 2 and of the second set 2 , adding up to exactly i. Vertices in I get color k + 2 if their matching neighbor in C is colored by a color from the first set, or get color 1 if their matching neighbor in C is colored by a color = 2k+4−k−3 = k+1 ≥ 2λ from the second set. Notice that k + 2 − k+3 2 2 2 2 = λ k+3 2λ+2 k+3 and 2 + 1 − 1 = 2 ≥ 2 = λ + 1, so this yields a λ-backbone coloring of (G, M ) with at most k + 2 colors.
Subcase e. Next, we consider the case k = 3, 5, 7 and λ ≥ k+6 3 . We obtain k k−1 λ > k+1 and ⌈ ⌉ = . By Lemma 2, we find that (G, M ) has an s-set of 2 3 2
Improved Upper Bounds for λ-Backbone Colorings
197
k−i i−1 cardinality z = k−1 2 − 2 = 2 such that there are no matching edges between the nonneighbors of vertices in the s-set. We have to construct a λ-backbone coloring of (G, M ) using at most k+1 2 + λ colors. Obviously, colors from the set + 1, . . . , λ} can not be used at all, so we must find a λ-backbone coloring { k+1 2 } and {λ + 1, . . . , k+1 with colors from the sets {1, . . . , k+1 2 2 + λ}. We partition C in six disjoint sets exactly like we did in (b). For the cardinalities of the sets, we now find the following relations:
c1 + c 2 ≤ c5 = z,
i−1 2 ,
c3 = c6 =
k−i 2 − c1 , k−i 2 ,
c4 = i − z − c 2 , 6 i=1 ci = k.
The following variation on Coloring Algorithm 1 constructs a feasible λ-backbone coloring of (G, M ). Coloring Algorithm 2 1 - 5 are the same as in Coloring Algorithm 1. 6 Color the vertices in C5 with colors from the set {λ + 1, . . . , λ + c5 }; start with assigning the lowest color from this set to the matching neighbor of the vertex in the s-set with the lowest color and continue this way. 7 Color the vertices in C6 with colors from the set {λ + c5 + 1, . . . , λ + c5 + c6 }; start with assigning the lowest color from this set to the matching neighbor with the lowest color in C1 ∪ C3 and continue this way. 8 Finally, color the vertices in I ′′ with color k+1 2 + λ. We postpone the correctness proof of this algorithm to the journal version. Subcase f. We consider the case k = 2m, m ≥ 2 and λ ≥ k2 + 1. For this case we find that i is even, otherwise there is no perfect matching of G. If i = 0, then there are k2 matching edges in C. We can use color pairs {1, λ + 1}, {2, λ + 2}, . . . , { k2 , k2 + λ} for their end vertices, because λ + 1 > k2 . If i ≥ 2, then there are k−i 2 matching edges in C. We can color their end vertices with colors from the i k two sets {2, . . . , k−i 2 + 1} and { 2 + λ + 1, . . . , 2 + λ}, using greedy coloring. The distance between the two colors on every matching edge in C is then 2i +λ−1 ≥ λ. k The other i vertices in C are colored with colors from the sets { k−i 2 +2, . . . , 2 +1} i and {λ + 1, . . . , 2 + λ}, which are exactly i colors. The colors in the first set have distance at least λ to color k2 + λ + 1, so we color the matching neighbors in I of the vertices in C that are colored with colors from this set with color k2 + λ + 1. The colors in the second set have distance at least λ to color 1, so we color the matching neighbors in I of the vertices in C that are colored with colors from this set with color 1. This yields a feasible λ-backbone coloring of (G, M ) with at most k2 + λ + 1 colors. Subcase g. Finally, we consider the case k = 2m + 1, m ≥ 4 and λ ≥ k+1 2 + 1. For this case we find that i is odd, otherwise there is no perfect matching of G. There are k−i 2 matching edges in C. We can color their end vertices with colors i+3 k+1 from the two sets {2, . . . , k−i 2 +1} and { 2 +λ, . . . , 2 +λ} by a greedy coloring.
198
H. Broersma et al.
k−i i+3+2λ−k+i−2 = 2i+1−k+2λ ≥ 2i+1−k+k+2 > 0, Notice that i+3 2 +λ− 2 −1 = 2 2 2 so that these sets are disjoint. The distance between the two colors on every matching edge in C is i−1 2 + λ ≥ λ. The other i vertices in C are colored with k+1 i+1 colors from the sets { k−i 2 +2, . . . , 2 } and {λ+1, . . . , 2 +λ}, which are exactly i colors that have not been used so far. Vertices in I get color k+1 2 + λ + 1 if their matching neighbor in C is colored by a color from the first set, and get color 1 otherwise. This yields a λ-backbone coloring of (G, M ) with at most k+1 2 +λ+1 colors. ⊓ ⊔
Proof of the tightness of the bounds in Theorem 1. We postpone the proof to the journal version.
References 1. Agnarsson, G. and Halld´ orsson, M.M.: Coloring Powers of Planar Graphs. SIAM J. Discrete Math. 16 (2003) 651–662 2. Bodlaender, H.L., Kloks, T., Tan, R.B., and van Leeuwen, J.: Approximations for λ-Colorings of Graphs. The Computing Journal 47 (2004) 193–204 3. Bondy, J.A. and Murty, U.S.R.: Graph Theory with Applications. Macmillan, London and Elsevier, New York (1976) 4. Borodin, O.V., Broersma, H.J., Glebov, A., and van den Heuvel, J.: Stars and Bunches in Planar Graphs. Part I : Triangulations (Russian). Diskretn. Anal. Issled. Oper. Ser. 1 8 2 (2001) 15–39 5. Borodin, O.V., Broersma, H.J., Glebov, A., and van den Heuvel, J.: Stars and Bunches in Planar Graphs. Part II: General Planar Graphs and Colourings (Russian). Diskretn. Anal. Issled. Oper. Ser. 1 8 4 (2001) 9–33 6. Broersma, H.J.: A General Framework for Coloring Problems: Old Results, New Results and Open Problems. In: Proceedings of IJCCGGT 2003, LNCS 3330 (2005) 65–79 7. Broersma, H.J., Fomin, F.V., Golovach, P.A., and Woeginger, G.J.: Backbone Colorings for Networks. In: Proceedings of the 29th International Workshop on GraphTheoretic Concepts in Computer Science WG 2003, LNCS 2880 (2003) 131–142 8. Chang, G.J. and Kuo, D.: The L(2, 1)-Labeling Problem on Graphs. SIAM J. Discrete Math. 9 (1996) 309–316 9. Fiala, J., Fishkin, A.V., and Fomin, F.V.: Off-Line and On-Line Distance Constrained Labeling of Graphs. Theoret. Comput. Sci. 326 (2004) 261–292 10. Fiala, J., Kloks, T., and Kratochv´ıl, J.: Fixed-Parameter Complexity of λ-Labelings. Discrete Appl. Math. 113 (2001) 59–72 11. Fiala, J., Kratochv´ıl, J., and Proskurowski, A.: Systems of Distant Representatives. Discrete Appl. Math. 145 (2005) 306–316 12. Fotakis, D.A., Nikoletseas, S.E., Papadopoulou, V.G., and Spirakis, P.G.: Radiocoloring in Planar Graphs: Complexity and Approximations. Theoret. Comput. Sci. 340 (2005) 514–538 13. Golumbic, M.C.: Algorithmic Graph Theory and Perfect Graphs. Academic Press, New York (1980) 14. Griggs, J.R. and Yeh, R.K.: Labelling Graphs with a Condition at Distance 2. SIAM J. Discrete Math. 5 (1992) 586–595
Improved Upper Bounds for λ-Backbone Colorings
199
15. Hale, W.K.: Frequency Assignment: Theory and Applications. Proceedings of the IEEE 68 (1980) 1497–1514 16. Hammer, P.L. and F¨ oldes, S.: Split Graphs. Congressus Numerantium 19 (1977) 311–315 17. van den Heuvel, J., Leese, R.A., and Shepherd, M.A.: Graph Labeling and Radio Channel Assignment. J. Graph Theory 29 (1998) 263–283 18. van den Heuvel, J. and McGuinness, S.: Colouring the Square of a Planar Graph. J. Graph Theory 42 (2003) 110–124 19. Jonas, T.K.: Graph Coloring Analogues with a Condition at Distance Two : L(2, 1)Labellings and List λ-Labellings. Ph.D. Thesis, University of South Carolina (1993) 20. Leese, R.A.: Radio Spectrum: a Raw Material for the Telecommunications Industry. In Progress in Industrial Mathematics at ECMI 98, Teubner, Stuttgart (1999) 382– 396 21. Molloy, M. and Salavatipour, M.R.: A Bound on the Chromatic Number of the Square of a Planar Graph. J. Combin. Theory Ser. B 94 (2005) 189–213 22. Salman, A.N.M., Broersma, H.J., Fujisawa, J., Marchal, L., Paulusma, D., and Yoshimoto, K.: λ-Backbone Colorings along Pairwise Disjoint Stars and Matchings. Preprint (2004). www.durham.ac.uk/daniel.paulusma/Publications/Papers/Submitted/ backbone.pdf
About the Termination Detection in the Asynchronous Message Passing Model J´er´emie Chalopin1 , Emmanuel Godard2 , Yves M´etivier1 , and Gerard Tel3 1
2
1
LaBRI UMR 5800 Universit´e Bordeaux 1, ENSEIRB 351 cours de la Lib´eration 33405 Talence, France {chalopin,metivier}@labri.fr LIF UMR 6166 Universit´e de Provence 39 rue Joliot-Curie 13453 Marseille France {
[email protected]} 3 Department of Computer Science University of Utrecht P.O. box 80.089, 3508 TB Utrecht The Netherlands {
[email protected]}
Introduction
Starting with the works by Angluin [1] and Itai and Rodeh [11], many papers have discussed the question what functions can be computed by distributed algorithms in networks where knowledge about the network topology is limited. Two important factors limiting the computational power of distributed systems are symmetry and explicit termination, and both have been found to be connected with the graph-theoretic concept of coverings. Impossibility proofs for distributed computations quite often use the replay technique. Starting from a (supposedly correct) execution of an algorithm, an execution is constructed in which the same steps are taken by nodes in a different network. The mechanics of distributed execution dictate that this can happen, if the nodes are locally in the same situation, and this is precisely what is expressed by the existence of coverings. Some functions can be computed by an algorithm that terminates implicitly but not by an explicitly terminating algorithm. In an implicitly terminating algorithm, each execution is finite and in the last state of the execution each node has the correct result. However, the nodes are not aware that their state is the last one in the execution. The impossibility result implies that such awareness can never be obtained in a finite computation. During the nineteen eighties there were many proposals for termination detection algorithms: such algorithms transform implicitly into explicitly terminating algorithms. As it is explained in [12], they superimposed on a given so-called basic computation a control computation which enables one or more of the processes to detect when the termination condition holds for the basic computation. It is not easy to detect whether a distributed algorithm has reached a state where no process is active and no message is in transit. Several conditions were found to allow such algorithms and for each of these conditions a specific algorithm was given (see [12] and [17] Chap. 8). These conditions include: a unique leader exists Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 200–211, 2007. c Springer-Verlag Berlin Heidelberg 2007
About the Termination Detection in the Asynchronous Message
201
in the network [1], the network is known to be a tree [1], a bound of the diameter of the network is known [16], the nodes have different identification numbers. The Main Result. In this paper we show that these four conditions are just special cases of one common criteria, namely that the local knowledge of nodes prohibits the existance of quasi-coverings of unbounded depth. Moreover, we generalise the algorithm by Szymanski et al. [16] to a common algorithm that works in all graph families without quasi-coverings of unbounded depth. We also prove, by generalising the existing impossibility proofs to the limit, that in families with quasi-coverings of unbounded depth, termination detection is impossible. Thus, the generalised algorithm can be considered as an universal termination detection algorithm that can be applied in all cases where detection is possible at all. It is precisely what is stated in Theorem 3. From this theorem and [8] we deduce a characterisation of families of labelled graphs which admit an election algorithm: Theorem 4. The key points of this work to build the main result (Theorem 5 in Section 6) are: (1) coverings and quasi-coverings (Section 2), (2) a coding of the network (Section 3), (3) an algorithm to build the common knowledge of the nodes (Section 4), (4) an algorithm to detect stable properties (Section 5). Related Works. In [5,6], Boldi and Vigna study a model where a network is represented by a directed graph. In one computation step, a process can modify its state according to the states of its in-neighbours. In [6], they use fibrations to characterize the tasks that can be computed in an anonymous network, provided a bound on the network is known. In [5], they give a characterization of what can be computed with arbitrary knowledge; their results are based on the notion of view that is adapted from the work of Yamashita and Kameda [18]. From our results, if a task can be computed on a network, provided a bound on the size is known, then we can also detect the termination of the algorithm: in some sense, we generalize the results presented in [6]. On the other hand, when a bound on the size is not available, there exist some tasks that are computable in the sense of [5] but there does not exists any algorithm that enables to detect that the computation is globally over. In [15,10] a characterisation of networks which autorise explicit termination is given in the local computation model where in a step a vertex can read and write its state and the states of adjacent vertices ; for the same model [9] characterises families of graphs which admit an election algorithm. [10] defines and studies several kinds of termination (local termination, observed termination, global termination).
2
Preliminaries
The notations used here are essentially standard. Definitions and main properties are presented in [3,4,7]. Undirected Graphs, Directed Graphs and Labelled (Di)Graphs. We consider finite, undirected, connected graphs having possibly self-loops and multiple edges, G = (V (G), E(G), Ends), where V (G) denotes the set of vertices,
202
J. Chalopin et al.
E(G) denotes the set of edges and Ends is a map assigning to every edge two vertices: its ends. A simple graph G = (V (G), E(G)) is a graph without self-loop or multiple edges. For an edge e, if the vertex v belongs to Ends(e) then we say that e is incident to v. A homomorphism between G and H is a mapping γ : V (G) ∪ E(G) → V (H) ∪ E(H) such that if e is an edge of G and Ends(e) = {u, v} then Ends(γ(e)) = {γ(u), γ(v)}. We say that γ is an isomorphism if γ is bijective and γ −1 is a homomorphism, too. In some applications we need a direction on each edge of a graph; a graph augmented in this way is called a directed graph or a digraph. More formally, a digraph D = (V (D), A(D), sD , tD ) is defined by a set V (D) of nodes, a set A(D) of arcs and by two maps sD and tD that assign to each arc two elements of V (D) : a source and a target. A self-loop is an arc with the same source and target. Throughout the paper we will consider graphs where vertices and edges are labelled with labels from a recursive label set L. A graph G labelled over L will be denoted by (G, λ), where λ : V (G) ∪ E(G) → L is the labelling function. A homomorphism from (G, λ) to (G′ , λ′ ) is a graph homomorphism from G to G′ which preserves the labelling. Labelled graphs will be designated by bold letters like G, H, . . . If G is a labelled graph, then G denotes the underlying graph. The same definitions are available for digraphs. Fibration, covering and quasi-covering. A fibration between the digraphs D and D′ is a homomorphism ϕ from D to D′ such that for each arc a′ of A(D′ ) and for each vertex v of V (D) such that ϕ(v) = v ′ = t(a′ ) there exists a unique arc a in A(D) such that t(a) = v and ϕ(a) = a′ . The arc a is called the lifting of a′ at v, D is called the total digraph and D′ the base of ϕ. We shall also say that D is fibred (over D′ ). The fibre over a vertex v of D′ is the set ϕ−1 (v) of vertices of D. A fibre over v is trivial if it is a singleton, i.e., |ϕ−1 (v)| = 1. A fibration is nontrivial if at least one fibre is nontrivial, trivial otherwise; it is proper if all fibres are not trivial. A graph D is fibration prime if it cannot be fibred non trivially, that is, every surjective fibration is an isomorphism. In the sequel directed graphs are always strongly connected and total digraphs non empty thus fibrations will be always surjective. An opfibration between the digraphs D and D′ is a homomorphism ϕ from D to D′ such that for each arc a′ of A(D′ ) and for each vertex v of V (D) such that ϕ(v) = v ′ = s(a′ ) there exists a unique arc a in A(D) such that s(a) = v and ϕ(a) = a′ . A covering projection is a fibration that is also an opfibration. If a covering projection ϕ : D → D′ exists, D is said to be a covering of D′ via ϕ. Covering projections satisfy: Proposition 1. A covering projection ϕ : D → D′ with a connected base and a nonempty covering is surjective; moreover, all the fibres have the same cardinality. This cardinality is called the number of sheets of the covering. As for fibrations, a digraph D is covering prime if there is no digraph D′ not isomorphic to D such that D is a covering of D′ (i.e., D is a covering of D′ implies that D ≃ D′ ). Let D and D′ be two digraphs such that D is a surjective covering of D′ via ϕ. If D′ has no self-loop then for each arc a ∈ A(D) : ϕ(s(a)) = ϕ(t(a)).
About the Termination Detection in the Asynchronous Message
203
Finally the following property is a direct consequence of the definitions and it is fundamental in the sequel of this paper: Proposition 2. Let D and D′ be two digraphs such that D′ has no self-loop and D is a surjective covering of D′ via ϕ. If a1 = a2 and ϕ(a1 ) = ϕ(a2 ) then Ends(a1 ) ∩ Ends(a2 ) = ∅. The notions of fibrations and of coverings extend to labelled digraphs in an obvious way: the homomorphisms must preserve the labelling. The last notion we will use is a generalisation of coverings, it is called quasicoverings; it has been introduced in [14]. Let D, D′ be two labelled digraphs and let γ be a partial function on V (D) that assigns to each element of a subset of V (D) exactly one element of V (D′ ). Then D is a quasi-covering of D′ via γ of radius r if there exists a finite or infinite covering D0 of D′ via δ, vertices z0 ∈ V (D0 ), z ∈ V (D) such that: 1. BD (z, r) is isomorphic via ϕ to BD0 (z0 , r), 2. the domain of definition of γ contains BD (z, r), and 3. γ = δ ◦ ϕ when restricted to V (BD (z, r)). card(V (BD (z, r))) is called the size of the quasi-covering, and z the center. The digraph D0 is called the associated covering of the quasi-covering. Local Computations on Arcs. In this paper we consider labelled digraphs and we assume that local computations modify only labels of vertices. Digraph relabelling systems on arcs and more generally local computations on arcs satisfy the following constraints, that arise naturally when describing distributed computations with decentralized control: (1) they do not change the underlying digraph but only the labelling of vertices, the final labelling being the result of the computation (relabelling relations), (2) they are local, that is, each relabelling step changes only the label of the source and the label of the target of an arc, (3) they are locally generated, that is, the applicability of a relabelling rule on an arc only depends on the label of the arc, the labels of the source and of the target (locally generated relabelling relation).
3
From Asynchronous Message Passing to Local Computations on Arcs
The model. Our model follows standard models for distributed systems given in [2,17]. The communication model is a point-to-point communication network which is represented as a simple connected undirected graph where vertices represent processes and two vertices are linked by an edge if the corresponding processes have a direct communication link. Processes communicate by message passing, and each process knows from which channel it receives a message or it sends a message. An edge between two vertices v1 and v2 represents a channel connecting a port i of v1 to a port j of v2 . Let ν be the port numbering function, we assume that for each vertex u and each adjacent vertex v, νu (v) is a unique integer belonging to [1, deg(u)]. We consider the asynchronous message passing model: processes cannot access a global clock and a message sent from a process to a neighbour arrives within some finite but unpredictable time.
204
J. Chalopin et al.
↔
Fig. 1. We adopt the following notation conventions for vertices of (G, κG , νG ). A black-circle vertex corresponds to the label process, a square vertex corresponds to the label send, a diamond vertex corresponds to the label transmission, and a doublesquare vertex corresponds to the label receive.
From Undirected Labelled Graphs to Labelled Digraphs. The construction presented in this section may appear technical nevertheless the intuition is very natural and simple, and it is illustrated in Figure 1. A first approximation of a network, with knowledge about the structure of the underlying graph, is a simple labelled graph G = (V (G), E(G)). We associate to this undirected ← → ← → ← → labelled graph a labelled digraph G = (V ( G ), A( G )) presented in Figure 1 (see [8] for more details). We need to memorize the meaning (semantic) of vertices thus we label vertices ← → of G with a labelling function κ, the set of labels is: ← → {process, send, receive, transmission}, – if a vertex x of V ( G ) corresponds ← → to a vertex u of V (G) then κ(x) = process, – if a vertex x of V ( G ) corresponds to a vertex of the form outbuf (u, v) then κ(x) = send, – if a vertex x ← → of V ( G ) corresponds to a vertex of the form inbuf (u, v) then κ(x) = receive, ← → – if a vertex x of V ( G ) corresponds to a vertex of the form t(u, v) then κ(x) = ← → ← → transmission. Using the label neutral, κ is extended to (V ( G ), A( G )). Two ← → ← → adjacent vertices of ( G , κ) have different labels thus if the digraph ( G , κ) is a covering of a digraph D then D has no self-loop. We consider the labelling ν ← → of the arcs of ( G , κ) coming into or going out of vertices labelled process such that for each vertex x labelled process the restriction of ν assigns to each outgoing arc a unique integer of [1, deg + (x)] and assigns to each arc coming into a unique integer of [1, deg − (x)], such a labelling is a local enumeration of arcs incident to process vertices (it corresponds to the port numbering). This
About the Termination Detection in the Asynchronous Message
205
enumeration is symmetric, i.e., ν verifies for each arc of the form out(u, v) : ← → ν(out(u, v)) = ν(in(v, u)). In the sequel, ( G , κ, ν) is denoted by G.
Basic Instructions. As in [18] (see also [17] p. 45-46), we assume that each process, depending on its state, either changes its state, or receives a message via a port or sends a message via a port. Let Inst be this set of instructions. This model is equivalent to the model of local computations on arcs with respect to the initial labelling. From now one will speak indistinctly of distributed algorithm encoded in the asynchronous message passing model on the labelled graph G equipped with a port numbering ν or of a distributed algorithm encoded using local computations on arcs on the labelled digraph G.
4
A Mazurkiewicz-Like Algorithm
In this section, we recall the algorithm M inspired by [13] and described in [8]. We can interpret the mailbox of a vertex v at a step i of the computation as is a quasi-covering of H i . Furthermore when the algoi such that G a graph H rithm has reached the final labelling all the vertices compute the same graph H and G is a covering of H. Presentation of M. We first give a general description of the algorithm M applied to a labelled graph G equipped with a port numbering ν. We assume that G is connected. Let G = (G, λ) and consider a vertex v0 of G, and the set {v1 , ..., vd } of neighbours of v0 . During the computation, each vertex v0 is labelled by a pair of the form (λ(v0 ), c(v0 )), where c(v0 ) is a triple (n(v0 ), N (v0 ), M (v0 )) representing the following information obtained during the computation: – n(v0 ) ∈ N is the number of the vertex v0 computed by the algorithm, – N (v0 ) is the local view of v0 , this view can be either empty or it is a set of the form: {(n(vi ), ps,i , pr,i )|1 ≤ i ≤ d} (where ps,i and pr,i are port numbers), – M (v0 ) is the mailbox of v0 containing the whole information received by v0 at previous computation steps, each element of M (v0 ) has the following form (λ(v), n(v), N (v)) where v is a vertex. An Order on Local Views. The fundamental property of the algorithm is based on a total order on local views such that the local view of any vertex cannot decrease during the computation. We assume for the rest of this paper that the set of labels L is totally ordered by (v0 ) of the local view of v0 . Let N> be the set of such ordered tuples. We define a total order ≺ on N> ∪ {∅} using the alphabetical order that induces naturally a total order on N> with by definition, ∅ ≺ N for every N ∈ N . This order can also be defined on N> as follows: N1 ≺ N2 if the maximal element for the lexical order via port i ; end R0 : {A message < na , Ma , p > has arrived at v0 from port q} begin M := M (v0 ); M (v0 ) := M (v0 ) ∪ Ma ; if (N (v0 ) does not contain an element of the form (x, p, q)) then N (v0 ) := N (v0 ) ∪ {(na , p, q)}; if ((x, p, q) ∈ N (v0 ) for some x < na ) then N (v0 ) := (N (v0 ) \ {(x, p, q)}) ∪ {(na , p, q)}; if (n(v0 ) = 0) or (n(v0 ) > 0 and there exists (l, n(v0 ), N ) ∈ M (v0 ) such that (λ(v0 ) via port i; end
About the Termination Detection in the Asynchronous Message
207
Interpretation of the Mailboxes at the Step i. For a mailbox M , we define the graph of the “strongest” vertices as follows. First, for l ∈ L, n ∈ N, N ∈ N , M ⊆ L × N × N , we define the predicate Strong(l, n, N, M ) that is true if there is no (l′ , n, N ′ ) ∈ M verifying l < l′ or (l = l′ and N ≺ N ′ ). Let G be a labelled graph equipped with a port numbering. Let ρ be a run of the Mazurkiewicz algorithm and let (Gi )0≤i be a chain associated to ρ with (G0 = G). If v is a vertex of G then the label of v at step i is denoted by (λ(v), ci (v)) = (λ(v), (ni (v), Ni (v), Mi (v))). We note HMi (v) if it is defined and (ni (v), λ(v), Ni (v)) ∈ Strong(Mi (v)) Hi (v) = (1) ⊥ otherwise. (i)
Let ragree (v) being the maximal integer bounded by the diameter of G such that (i) any vertex w of B(v, ragree (v)) verifies: Hi (v) = Hi (w). We have: Theorem 1. Let (Gi )0≤i , be a relabelling chain obtained with the Mazurkie i is a quasi-covering of H wicz algorithm and let v be a vertex. The graph G i (v) (i) centered on v of radius ragree (v).
5
An Algorithm to Detect Stable Properties
In this section we describe a generalisation of the algorithm by Szymanski, Shy and Prywes (the SSP algorithm for short) [16]. We consider a distributed algorithm which terminates when all processes reach their local termination conditions. Each process is able to determine only its own termination condition. The SSP algorithm detects an instant in which the entire computation is achieved. We present here a generalization of the hypothesis under which the SSP rules are run. For every vertex v, the value of P (v) is no more a boolean and can have any value which depends on the label (state) of v denoted by state(v). Hence, we do not require each process to determine when it reachs its own termination condition. Moreover the function P must verify the following property: for any α, if P (state(v)) has the value α (α = ⊥) and changes to α′ = α then it can not be equal to α at an other time. In other words, under this hypothesis, the function is constant between two moments where it has the same value (different from ⊥). We say that the function P is value-convex. We extend the SSP rules and we shall denote by GSSP this generalisation. In GSSP, the counter of v is incremented only if P is constant on the neighbors of v. As previously, every underlying rule that computes in particular P (state(v)), has to be modified in order to eventually reinitialize the counter. Initially a(v) = −1 for all vertices. The GSSP rule modifies the counter a. Mazurkiewicz Algorithm + GSSP algorithm = Maximal Common Knowledge. The main idea in this section is to use the GSSP algorithm in order to compute, in each node, the radius of stability of M. In other words, each node u will know how far other nodes agree with its reconstructed graph HM(u) . Let G = (G, λ) be a labelled graph equipped with a port numbering, let (Gi )0≤i
208
J. Chalopin et al.
Algorithm 2. Algorithm GSSP Var : a(v0 ) : integer init −1 ; tv 0 [i] : integer init −1 for each port i of v0 ; valv 0 [i] : value init ⊥ for each port i of v0 ; i, j, x, temp : integer; C0 : {A new value P (state(v0 )) = ⊥ is computed} begin a(v0 ) := −1 ; for i := 1 to deg(v0 ) do send< ⊥, −1 > via port i ; end C1 : {A new value P (state(v0 )) different from ⊥ is computed} begin a(v0 ) := 0 ; if (P (state(v0 )) is equal to valv 0 [i] for each port i) then a(v0 ) := 1; for i := 1 to deg(v0 ) do send< P (state(v0 )), a(v0 ) > via port i ; end C2 : {A message < α, x > has arrived at v0 from port j} begin valv 0 [j] := α; tv 0 [j] := x; temp := a(v0 ); if (P (state(v0 )) = ⊥ is equal to valv 0 [i] for each port i) then a(v0 ) := 1 + M in{tv 0 [i] | i is a port of v0 }; if (temp = a(v0 )) then for i := 1 to deg(v0 ) do send< P (state(v0 )), a(v0 ) > via port i ; end
be a relabelling chain associated to a run of Mazurkiewicz’ Algorithm on the graph G. The vertex v of Gi is associated to the label (λ(v), (ni (v), Ni (v), Mi (v))). Let’s consider the algorithm obtained by adding to each rule of the Mazurkiewicz algorithm, the calculus of Hi (v) on each node v and the modifications for the GSSP rule. We note AS the merging of the two algorithms. The output of AS on the node v is < Hi (v), ai (v) > . The main property of the computation of AS is: Theorem 2 (quasi-covering progression). At all step j, for all vertex v, the output of AS on v is a couple < Hj (v), aj (v) > such that if Hj = ⊥, then there i is a quasi-covering of H exists a previous step i < j, such that G i (v) of center v and of radius ⌊
aj (v) 3 ⌋.
And as the underlying Mazurkiewicz Algorithm is always terminating, we have that the value of H will stabilize with a going to the infinite.
About the Termination Detection in the Asynchronous Message
6
209
Termination Detection
Irreducibility with respect to a relabelling relation yields a notion of implicit termination: the computation has ended – no more relabelling rule can be applied – but no node is aware of the termination. On the other hand, one shall ask a node to be aware of the termination of the algorithm. (See [10] for more details). We consider two kinds of terminations: (1) Termination of the algorithm but without detection: implicit termination. (2) The nodes know when all other nodes have computed their final output value. Due to the asynchronous aspect of distributed computations, there is still some observational computations that are going on. This is the observed termination detection as when termination is detected, some observation computations are not necessarily terminated; it is called usually explicit termination. A normalized labelled digraph D is a labelled digraph whose labelling is of the form (mem, out, term). A normalized relabelling system R is a digraph relabelling system on normalized digraphs where: mem can be used in preconditions and relabelled, out is only relabelled, term is only relabelled and has a value in {⊥, Term}. We also use the following convention: if the initial labelled digraph is D = (D, in) then it is implicitly extended to the normalized labelling (D, (in, ⊥, ⊥)). The initial value of mem is therefore given by in. All digraphs are labelled digraphs and are now all considered to be normalized. All relabelling relations are relabelling relations of normalized labelled digraphs. We also use the following notations. Let D and D’ be some given normalized digraphs then, for any vertex u ∈ D (resp. ∈ D′ ), for any x ∈ {mem, out, term}, x(u) (resp. x′ (u)) is the x component of u in D (resp. D’). This presentation will find its justifications with the following definitions. For the implicit termination, there is no detection mechanism. Hence term is not used. If the underlying distributed algorithm is aimed at the computation of a special value, we will, in order to distinguish this value from the intermediate computed values, only look the special purpose component out. As there is no detection of termination, this label is written all over the computation. It becomes significant only when the digraph is irreducible, but no node knows when this happens. Let F be a family of labelled digraphs. A digraph relabelling relation R has an observed termination detection (OTD) on F if: (1) R is noetherian on F , (2) the term component of R-irreducible digraphs is equal to Term, (3) for all digraphs D, D′ ∈ F such that DR∗ D′ , if there exists a vertex u such that term(u) = Term, then (a) term′ (u) = Term, (b) for all vertex v ∈ D, out′ (v) = out(v). In this definition, we ask the network to detect the termination of the computation (in the sense of the out value that is computed), but not to detect the termination of that detection. We have at least one vertex that detects that the out values are final and then it can perform a broadcast of Term. This broadcast is performed by an “observer algorithm” whose termination we do not consider. Let F be a digraph family. We denote by F ↓ the family of digraphs that are covered by a digraph of F : F ↓ = {D′ | ∃D ∈ F, D is a covering of D′ }. Note that F is a subset of F ↓. Let R be a relabelling system. If R is noetherian on F , it is also noetherian on F ↓. Let G be a recursive family of labelled graphs equipped with a port numbering. Let F be the family of labelled digraphs
210
J. Chalopin et al.
Now we can obtained from G and defined by: F = {D | ∃G ∈ G and D = G}. state the characterization for the existence of an equivalent relation with the observed termination detection. Theorem 3. For family F , there exists a transformation that maps any noetherian digraph relabelling relation on arcs R to a noetherian digraph relabelling relation on arcs with observed termination detection if and only if there exists a recursive function r : F ↓ −→ N such that for any D′ ∈ F ↓, there is no strict quasi-covering of D’ of radius r(D′ ) in F . The necessary part of the proof of this theorem is a corollary of the following quasi-lifting lemma: Let R be a locally generated relabelling relation and let D be a quasi-covering of D′ of radius r via γ. Moreover, let D′ R D′1 . Then there exists D1 such that D R∗ D1 and D1 is a quasi-covering of radius r − 2 of D′1 . For the sufficient part, the main idea is to compose R with the algorithm AS. On each vertex v of D and for each port i of v we define two counters cout (i) and cin (i) : cout (i) stores the number of basic messages sent by v via i for R and cin (i) stores the number of messages received by v via i for R. Now we consider the following termination detection condition: each channel is empty, and D’ is irreducible for R, and there exists D ∈ F such that D is a covering of D’, and a (v) r(D′ (v)) < rt (v), with rt = ⌊ j3 ⌋. To test if there exists D ∈ F such that D is a covering of D’, we enumerate always in the same order all the graphs of F by order of increasing diameter. We denote ASR this algorithm. If R is noetherian then ASR is noetherian: as R is noetherian this implies that the numbers of input-values for computing D’ is bounded and the result follows. Known Results as Corollaries. We deduce immediatly that in the asynchronous message passing model a distributed algorithm having an implicit termination may be transformed into a distributed algorithm having an observed (explicit) termination detection for the following families of graphs: graphs having a distinguished vertex, graphs such that each node is identified by a unique name graphs having a known size or diameter bounds the family of connected subgraphs of grids with a sense of direction trees. We deduce there is no observed (explicit) termination detection for: the family of rings, the family of connected subgraphs of grids without sense of direction, the family of rings having a prime size. New Corollaries. New corollaries are obtained from this theorem; in the asynchronous message passing model a distributed algorithm having an implicit termination may be transformed into a distributed algorithm having an observed (explicit) termination detection for the following families of graphs: graphs having exactly k leaders (distinguished vertices), graphs having at least one and at most k, leaders (distinguished vertices). For the election problem this theorem and results of [8] imply: Theorem 4. For family F , there exists an election algorithm if and only if graphs of F are minimal for the covering relation and there exists a recursive function r : F −→ N such that for any D ∈ F, there is no quasi-covering of D of radius r(D) in F , except D itself.
About the Termination Detection in the Asynchronous Message
211
References 1. Angluin, D.: Local and Global Properties in Networks of Processors. In Proceedings of the 12th Symposium on Theory of Computing (1980) 82–93 2. Attiya, H. and Welch, J.: Distributed Computing: Fundamentals, Simulations, and Advanced Topics. John Wiley & Sons (2004) 3. Bodlaender, H.-L. and Van Leeuwen, J.: Simulation of Large Networks on Smaller Networks. Information and Control 71 (1986) 143–180 4. Bodlaender, H.-L.: The Classification of Coverings of Processor Networks. Journal of Parallel and Distributed Computing 6 (1989) 166–182 5. Boldi, P. and Vigna, S.: Computing Anonymously with Arbitrary Knowledge. In Proceedings of the 18th ACM Symposium on Principles of Distributed Computing, ACM Press (1999) 181–188 6. Boldi, P. and Vigna, S.: An Effective Characterization of Computability in Anonymous Networks. In Jennifer L. Welch (eds), Distributed Computing. 15th International Conference, DISC 2001, Springer-Verlag, Lecture Notes in Computer Science 2180 (2001) 33–47 7. Boldi, P. and Vigna, S.: Fibrations of Graphs. Discrete Math. 243 (2002) 21–66 8. Chalopin, J. and M´etivier, Y.: A Bridge between the Asynchronous Message Passing Model and Local Computations in Graphs (extended abstract). In Proc. of Mathematical Foundations of Computer Science, MFCS’05, LNCS 3618 (2005) 212–223 9. Godard, E. and M´etivier, Y.: A Characterization of Families of Graphs in which Election Is Possible (ext. abstract). In M. Nielsen and U. Engberg (eds), Proc. of Foundations of Software Science and Computation Structures, FOSSACS’02, Springer-Verlag, LNCS 2303 (2002) 159–171 10. Godard, E., M´etivier, Y., and Tel, G.: Detection of the Termination of Distributed Algorithms. Submitted. 11. Itai, A. and Rodeh, M.: Symmetry Breaking in Distributive Networks. In Proceedings of the 13th Symposium on Theory of Computing (1981) 150–158 12. Mattern, F.: Algorithms for Distributed Termination Detection. Distributed Computing 2 (1987) 161–175 13. Mazurkiewicz, A.: Distributed Enumeration. Inf. Processing Letters 61 (1997) 233–239 14. M´etivier, Y., Muscholl, A., and Wacrenier, P.-A.: About the Local Detection of Termination of Local Computations in Graphs. In D. Krizanc and P. Widmayer (eds), SIROCCO97 – 4th International Colloquium on Structural Information & Communication Complexity, Proceedings in Informatics, Carleton Scientific (1997) 188–200 15. M´etivier, Y. and Tel, G.: Termination Detection and Universal Graph Reconstruction. In SIROCCO00 – 7th International Colloquium on Structural Information & Communication Complexity (2000) 237–251 16. Szymanski, B., Shy, Y., and Prywes, N.: Synchronized Distributed Termination. IEEE Transactions on Software Engineering, SE-11 10 (1985) 1136–1140 17. Tel, G.: Introduction to Distributed Algorithms. Cambridge University Press (2000) 18. Yamashita, M. and Kameda, T.: Computing on Anonymous Networks: Part i – Characterizing the Solvable Cases. IEEE Transactions on Parallel and Distributed Systems 7 1 (1996) 69–89
Fast Approximate Point Set Matching for Information Retrieval Rapha¨el Clifford and Benjamin Sach University of Bristol, Department of Computer Science Woodland Road, Bristol, BS8 1UB, UK
[email protected],
[email protected]
Abstract. We investigate randomised algorithms for subset matching with spatial point sets—given two sets of d-dimensional points: a data set T consisting of n points and a pattern P consisting of m points, find the largest match for a subset of the pattern in the data set. This problem is known to be 3-SUM hard and so unlikely to be solvable exactly in subquadratic time. We present an efficient bit-parallel O(nm) time algorithm and an O(n log m) time solution based on correlation calculations using fast Fourier transforms. Both methods are shown experimentally to give answers within a few percent of the exact solution and provide a considerable practical speedup over existing deterministic algorithms.
1
Introduction
We consider a pattern matching problem where the data (or ‘text’) T and the pattern P are represented by sets of d-dimensional points. We wish to determine whether there is a transformation that will carry a subset of the pattern onto the data set . Specifically, we would like to find the largest subset of P for which every point is carried exactly onto a point in T . This spatial point set matching, or “constellation”1 problem has a number of applications including pharmacophore identification, protein structure alignment, image registration and model-based object recognition. Within this formulation points can be said to match exactly or approximately and a variety of rigid motion transformations such as rotation and translation have previously been considered (see e.g. [1]). Our core motivation however comes from musical information retrieval (MIR), where large collections of musical documents must be searched quickly to ascertain similarity to a query. A natural measure of distance is to count the number of notes that are in common between the query and some target musical piece. In this context we must allow for one or both pieces to be transposed musically (a constant is added or subtracted from each pitch) and that the pattern may occur at any point in time during the target piece. The task is not to find pieces that contain the query exactly but rather that have passages that are similar to parts of the query. A musical score can be represented as a set of 2-dimensional 1
Given a constellation of stars, locate the constellation in the night sky or in the star chart, as termed by B. Chazelle.
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 212–223, 2007. c Springer-Verlag Berlin Heidelberg 2007
Fast Approximate Point Set Matching for Information Retrieval
C 60
A 69
C# 73
A 69
213
73 69
60 1
2
3
4
Fig. 1. Mapping of sheet music into a two dimensional point set
points, for example, representing the pitch and onset time of individual notes. See Figure 1 for example. In [1] a number of algorithms were given for pattern matching in spatial point sets under different approximation metrics. For exact matching, where the whole pattern must be found, an O(bn log n + logO(1) θ) time algorithm is shown which with probability at least 1 − n−b gives the correct answer, where θ is the maximum distance between any two points in the input. In [3] a Las Vegas algorithm is given that essentially adds an O(d log θ) multiplicative factor to the running time. The problem of finding the largest subset of P which occurs in the data set under translation was recently shown to be 3SUM-hard and is therefore unlikely to have a subquadratic time solution [2]. A naive O(nm log n) algorithm can be found by taking the differences between every point in the pattern and data set and sorting. This approach has been used in practice by [6] for example. The time complexity can be reduced to O(nm) by simply keeping a counter array but the bottleneck remains that of space usage. By observing that the differences from any given point in the pattern can be considered in sorted order, the working space overhead was reduced to O(m) in [7] (algorithm P3). Although the time complexity of this approach is O(nm log m) it is the only existing practical method for solving large problems that we are aware of. In [1] the related problem of matching with r “point failures” is considered. Matches can be found with up to r points from the pattern missing from the alignment in O(bn(r + 1) log2 n + logO(1) θ) time with an error probability of n−b . When r is a fixed percentage of n, for example, the algorithm is worse than the naive approach. Let P and T be two sets of d-dimensional points. We define m and n to be the cardinalities of P and T respectively. We say that two points x = (x1 , x2 , . . . , xd ) and y = (y1 , y2 , . . . yd ) match with shift v = (v1 , v2 , . . . , vd ) if and only if xi+vi = yi for 1 ≤ i ≤ d. Furthermore, a subset P ′ of P is a subset match in T if and only if there exists a vector v such that every point in P ′ matches a point T with the shift v. The central problem we study can now be defined as follows. Problem 1. Given sets P and T , the subset matching problem is to find the largest subset of P which is a subset match in T .
214
R. Clifford and B. Sach
We present two efficient and practical algorithms, called MSMBP and MSMFT, for the subset matching problem. MSMBP is a bit-parallel implementation which runs in O(nm) time and O(n) space with low constant overheads. MSMFT solves the matching problem by performing correlation calculations using fast Fourier transforms. Its running time is therefore O(n log m) with O(n) space requirements, making it especially suitable for larger pattern sizes2 . Both are randomised and approximate and are shown experimentally in Section 5 to give near perfect answers on a wide range of inputs. In information retrieval, approximate similarity ranking is often sufficient as the user will typically browse through “good” matches rather than simply take the first one. We then compare our methods experimentally to the space efficient approach of [7], called P3 and show that for all cases except for very short patterns, where the time complexity is effectively linear in the size of the data set, both MSMBP and MSMFT give a considerable practical speedup. Where the pattern size grows as a proportion of the data set size, MSMFT is shown to be orders of magnitude faster than both MSMBP and P3 for even moderately sized inputs. For very large data sets and small queries, the lower constant factor space overheads of MSMBP make it the most practical solution. MSMFT has also been shown recently to give near perfect precision/recall results for a database of artificially corrupted musical pieces [2]. Both MSMFT and MSMBP algorithms start with the project and reduce length approach which we describe briefly first. Whereas in [3], for example, the result of the length reduction is used to find occurrences of the whole pattern, we explain in Section 3 how first to find an unbiased estimate for subset match problem and then to improve this estimate to give a more accurate but estimate approximation. In Section 4, we then present MSMBP and MSMFT and discuss some implementation details. In Section 5 experimental results are shown using random input data comparing both speed and accuracy. Finally, we conclude and discuss open problems.
2
Randomised Projection and Length Reduction
In this Section we describe the project and reduce length steps which we use as the first stages of both MSMFT and MSMBP. The approach is similar to that taken in [1] and [3]. See Figure 2 for an illustration. 1. We first reduce the problem to one dimension by performing a randomised projection of P and T . In order to project P and T to one dimension we first at random from a large space. For every point, pick d integers bi uniformly x in P ∪ T calculate bi xi , where xi is the ith coordinate of x. Call the resulting sets of points in one dimension, P ′ and T ′ .
2
Although the time complexity of the FFT is O(n log n), when performing pattern matching this can be reduced to O(n log m) by employing a standard trick. The text is partitioned into n/m overlapping substrings of length 2m. The matching algorithm is then performed separately on each substring giving an overall running time of O((n/m)m log m) = O(n log m).
Fast Approximate Point Set Matching for Information Retrieval
215
a)
65 64 79
85 86
b)
94
96
63 62
4
4
5
6
7
6
8
15
19
8
Fig. 2. a) Randomised projection followed by b) Length reduction
2. The next stage is to reduce the “sparsity” of the projected data by reducing its length. The goal is to reduce the maximum distance between any pair of points so that the whole length reduced point set can be represented as a binary array of size Θ(n). The method used is universal hashing. Specifically, define the hash functions g(x) = ax mod q, h(x) = g(x) mod s and h2(x) = (g(x)+q) mod s for some a, q and s. Choose q to be a random prime in the range [2N, . . . , 4N ], where N is the maximum of the projected values of P ′ and T ′ , a uniformly from [1, . . . , q − 1] and s to be rn, where r > 1 is a constant. Each non-zero location in the P ′ is mapped to a 1 at position h(Pi′ ) of a binary array p. Each non-zero location in T ′ is mapped to 1s at two positions, h(Tj′ ) and h2(Tj′ ) in a binary array t. All positions in p and t which have not been set to 1 are set to 0. Both p and t are of same length s. The length reduced arrays p and t will form the starting points for the subset matching algorithms we present. The following Lemma shows why we can use them to perform matching. Lemma 1 (h(x) + h(y)) mod s =
h(x + y) h2(x + y)
if g(x) + g(y) < q, otherwise
(1)
Proof (h(x) + h(y)) mod s = (g(x) mod s + g(y) mod s) mod s = (g(x) + g(y)) mod s. If g(x)+ g(y) < q, then g(x+ y) = g(x)+ g(y). However if g(x)+ g(y) ≥ q, then g(x + y) = g(x) + g(y) − q. ⊓ ⊔ The significance of Lemma 1 is that if some subset of the positions in p matches in the text so that pi + c = tj for some c, then we know that (h(pi ) + h(c)) mod s matches either h(tj ) or h2(tj ). Therefore, by counting the number of 1s that p and t have in common at every wraparound alignment we can attempt to estimate the size of the true subset matches in P and T . A wraparound alignment at shift i is performed by comparing pj with t(i+j−1) mod s for all j 1 ≤ j ≤ m.
216
3
R. Clifford and B. Sach
Estimating the Size of the Largest Subset Match
At every alignment of p in t we estimate the number of true matches in the original data from the number of matches in the projected and length-reduced version. For any point Pi in the original data, consider an integer c so that Pi + c ∈ T . We determine the probability that h(Pi′ + c) = h′ (Tj′ ) for some Tj′ . This is equivalently the probability of a false positive match. For simplicity, we omit the analysis of the randomised projection stage and concentrate on the effect of length reduction. For an analysis of the probability of false positives being introduced by randomised projection see [3]. Lemma 2. For any point Pi′ ∈ P ′ and integer c such that Pi′ + c ∈ T ′ , 2 P r(h(Pi′ + c) ∈ (h(T ′ ) ∪ h2(T ′ ))) ≈ 1 − e− r Proof. Each point in T ′ is transformed into two copies over the range 0 to s − 1 by the hash functions h and h2. Therefore, for each pair of points h(Tj′ ) and h2(Tj′ ), the probability that at least one of them lies at any given point h(Pi′ + c) is 2/s. Therefore the probability that none of them is mapped to h(Pi′ + c) is ((s − 2)/s)n . So the probability that at least one is mapped to h(Pi′ + c) is 2 1 − ((s − 2)/s)n ≈ 1 − e− r , as s = rn. We can now calculate the expected number of matches at a given alignment. We assume for the moment that m ≪ n. If m and n are of similar size then a further correction factor is required. The value we observe is the true number of matches plus the false matches. Let C be the true number of matches between p and t at shift c. Let X be a random variable representing the number of false positives. Clearly, C + X is the total number of observed matches at shift c and we wish to estimate C as accurately as possible. Lemma 3. Let Oc be the observed number of matches of h(p) and h′ (t) at shift c. 2 2 (Oc − m(1 − e− r ))/e− r is an unbiased estimator for Cc , the true number of matches at shift c. The variance of the estimator is approximately (m − Cc ) 2 (1 − e− r ). 2
Proof. E(Oc ) = E(Cc )+E(Xc ) = E(Cc )+(m−Cc)(1−e− r ). Therefore E(Cc ) = 2 2 (E(Oc ) − m(1 − e− r ))/e− r By making a Poisson approximation to Xc we have that the variance is equal to the mean of the number of false positives. Therefore the variance of the 2 estimator is approximately (m − Cc )(1 − e− r ). We can now see that the variance of the estimate grows linearly as the true number of matches decreases. Although we have an unbiased estimator under our simplifying assumptions, the variance may be impractically high for all but very close matches. One option could be to repeat the whole process and take the mean of the estimates. However, we now show how to derive a much improved, although biased, estimator.
Fast Approximate Point Set Matching for Information Retrieval
217
An Improved Estimate We show here how to improve our estimate of the true size of the largest subset match. The technique is to find the shift in the length reduced data that gives the best match and then perform a reverse lookup to see what shift in the original data this corresponds to. Then we can check in the original data how many points align at this shift giving us a much improved estimate of the largest subset match. For ease of description we assume the final checks will be performed in the data after randomised projection but the method can easily be extended to use the original d dimensional data instead if need be. The main steps are as follows. 1. Find the best match of p in t and call it h(c). Determine in O(m) time which points in p match t at that shift. 2. Look up by use of a precalculated hash table where each of the matching points pi at shift h(c) were mapped from in P ′ . Then lookup where the corresponding points in t were mapped from in T ′ . Note that h(Tj′ ) = h2(Tj′ ) but that two points in P ′ or T ′ can still map to one point in p or t due to collisions. Therefore one lookup may return more than one point from P ′ or T ′ . 3. Now we have a shift for each pair of points in P ′ and T ′ that we have looked up. However, this list may have rare inconsistencies due to collisions. We therefore take all such putative shifts and count which one occurs most frequently. We return the value found as the shift in the original data which will give us the largest subset match. We consider the case where this algorithm reports the wrong answer. That is the shift of the pattern in the original data set that we find does not give us the largest subset match. Our algorithm therefore reports that there is a match of size C1 + X1 at shift c1 . We know that C1 is the number of true matches while there are X1 false positives. But, in fact, there is another match of size C2 + X2 at shift c2 where C1 < C2 but C1 + X1 > C2 + X2 . We would like to estimate the conditional probability P (C1 + X1 > C2 + X2 | C1 < C2 ) and show that it is small. Since the main application of our algorithm is information retrieval, we are not interested in the cases where C1 ≈ C2 or both X1 and X2 are quite small. We only consider the case where C1 and C2 differ significantly and there are many false positives. As before, we make a Poisson approximation to both X1 and X2 . Letting Z = X1 − X2 , our algorithm reports a wrong answer when Z is greater than C2 − C1 . The random variable Z follows the normal distribution N (E(X1 ) − E(X2 ), E(X1 ) + E(X2 )). Therefore, E(X1 ) − E(X2 ) = (C2 − C1 ) (1 − e−2/r ) and E(X1 ) + E(X2 ) = (2m − (C1 + C2 ))(1 − e−2/r ). 1 )−E(X2 )) √ follows the standard normal distribution and we get Z ′ = Z−(E(X E(X1 )+E(X2 )
P (Z ≥ C2 − C1 ) = P
′
(C2 − C1 )e−2/r
Z ≥ (2m − (C1 + C2 ))(1 − e−2/r )
.
(2)
218
R. Clifford and B. Sach
Of course we do not know the exact values of C1 and C2 . But if C1 and C2 are not too small (there are enough true matches) and C2 − C1 is reasonably large, P (Z ≥ C2 − C1 ) quickly becomes vanishingly small. As a consequence this method is suitable for information retrieval where ranks of documents retrieved are more important than absolute measures of similarity.
4
Algorithms and Implementation
In this Section we present MSMBP and MSMFT and discuss some implementation details. The overall structure of both algorithms is described in Algorithm 1. The difference between the two algorithms is in the design and implementation of the cross-correlation step.
Input: Point sets P and T Output: Estimate of size of largest subset match of P and T {P ′ , T ′ } ← {randomproject(P ), randomproject(T )}; {p, t} ← {lengthreduce(P ′ ), lengthreduce(T ′ )}; A ← crosscorrel(p, t); c′ ← shift that gives largest value in A; c ← shift in T , inferred from c′ using the improved estimate technique; return |(P + c) ∩ T |;
Algorithm 1. Overall structure of MSMBP and MSMFT A Bit-Parallel Algorithm for Largest Subset Matching (MSMBP) A naive implementation of wraparound cross-correlation calculation on two arrays of length s will take O(s2 ). As s ∈ Θ(n) this implies an O(n2 ) time algorithm. To reduce this time to O(nm) we take advantage of the sparsity of the length reduced pattern and further we improve the constant time overheads by implementing a bit-parallel scheme for performing the matching. A simple bit-parallel implementation must perform s/w bitwise ANDs and shifts per word sized block, w, in the pattern. By only storing the, at most m words in the pattern that contain non-zero entries we are only required to perform at most m ANDs at each shift. We also reduce the number of shifts to a constant number by noting that shifts of size k bytes can be accessed by pointer arithmetic in constant time, where k is any positive integer. Thus we gain access to all shifts in constant time after performing a total of b − 1 shifts, where b is the number of bits in a byte. The total running time is therefore O(n) for each of the constant number of shifts plus O(m) for each of O(n) alignments of the pattern in the text. The sparse bit-parallel approach is therefore efficient both when m ≪ n and as m gets closer to n when the advantage of parallelism will take effect.
Fast Approximate Point Set Matching for Information Retrieval
219
Largest Subset Matching Using FFTs (MSMFT ) MSMFT carries out the same steps as MSMBP except the the cross-correlation step is implemented using FFTs. The property of the FFT that we use is that, for numerical strings, all the inner-products, p · t(i)
def
=
m
pj t(i+j−1) , 1 ≤ i ≤ n,
(3)
j=1
can be calculated accurately and efficiently in O(n log m) time (see e.g. [4], Chapter 32). The same method can be used without modification to compute wraparound cross-correlations. As both reduced length arrays p and t contain only the values zero and one, p · t(i) counts the number of ones that p and t have in common at shift i. This gives us the estimate of the largest subset match in the reduced length data as required
5
Experimental Results
The algorithms were implemented in C++ and compiled with mingw g++ version 3.4.4 using the libstdc++ standard template library with -O3 optimization and FFTW version 3.1.2 [5]. The tests were run on a 1.73Ghz Pentium M processor with 1 GB of RAM under Windows XP SP2. The test data consisted of randomly generated two dimensional points with integer coordinates. Non-unique points were discarded and replaced until the desired number of unique points was obtained. The point range had a fixed height and sparsity with a width proportional to number of points. The running times given are averages over three trials. For a given test the variation in running times was minimal. In the accuracy tests of Table 1, we inserted a partial match of a specific size into the text. The pattern contained 200 points in total with the text containing 4000 points after the insertion of the partial match. When inserting the match we ensured that no other match between the pattern and the text was greater than 25% of the desired partial match size. The points to be inserted were selected randomly from the pattern and inserted at a random offset into the text. We also ensured that this insertion did not duplicate any text points or create a larger match than we had intended. The accuracy tests of Table 2 were created by inserting two subsets of the pattern of different sizes into the text. Checks were carried out to ensure no other significant matches were accidentally created. All accuracy tests were only run with MSMBP which gives the same output as MSMFT. In the following Figures, P3 refers to the queue based method of [7], MSMFT refers to the FFT based algorithm presented in Section 4 and MSMBP refers to the bit-parallel algorithm also presented in the same section. In the length reduction step, of MSMFT and MSMBP, the constant r = s/n was set to 8 as a compromise between speed and accuracy.
220
5.1
R. Clifford and B. Sach
Speed Comparisons
Figure 3 shows timings for P3 and MSMBP with increasing text size. For each algorithm the pattern size is fixed to 25% (P3 -25,MSMBP -25) and 75% (P3 75,MSMBP -75) of the text size. Comparing the effect of the increase in pattern size on both algorithms, we can see that the improved time complexity of MSMBP coupled with its bit-parallel advantages are sufficient to provide a significent speedup advantage over P3. In the 75% test, MSMBP ran up to 23 times faster than P3. We also tested MSMBP with patterns of size 25% and 75% of the text to see what advantage the bit-parallelism specifically gave. The results showed speedups of 3.6 and 7.1 times respectively over the equivalent algorithm without bit-parallelism. 30
MSMBP-25 MSMBP-75 P3-25 P3-75
25
Time (seconds)
20
15
10
5
0 0
5000
10000 Size of data set
15000
20000
Fig. 3. Running times for MSMBP and P3
Figure 4 shows the same information as Figure 3 but for MSMBP and MSMFT. The advantage of MSMFT ’s greatly improved time complexity with large pattern sizes is apparent in this figure. Unlike P3 and MSMBP, MSMFT is negligibly effected by the increase in pattern size from 25% to 75%. For a text size of 20000 with the pattern 75% of the size of the text, MSMFT is approximately 500 times faster than P3. Figure 5 shows the time taken by P3, MSMBP and MSMFT with an increasing text size and a pattern size fixed at 40 points. All three algorithms show near linear increases in running time. In this case, P3 is faster than the other two as a result of its simplicity and associated low constant factors. Figure 6 shows a constant text size of 960000 with an increasing pattern size. This figure highlights the crossover between the three methods. P3 initially is faster when the pattern is very small. The advantage dissappears rapidly as the pattern size increases.
Fast Approximate Point Set Matching for Information Retrieval 5
MSMBP-25 MSMBP-75 MSMFT-25 MSMFT-75
Time (seconds)
4
3
2
1
0 0
5000
10000 Size of data set
15000
20000
Fig. 4. Running times for MSMBP and MSMFT 14
MSMBP MSMFT P3
12
Time (seconds)
10
8
6
4
2
0 0
100000
200000
300000
400000 500000 600000 Size of data set
700000
800000
900000
1e+06
Fig. 5. Running times with pattern size of 40 20
MSMBP MSMFT P3
18 16
Time (seconds)
14 12 10 8 6 4 2 0 0
10
20
30
40
50
60
70
80
Pattern size
Fig. 6. Running times with fixed text size and increasing pattern size
221
222
5.2
R. Clifford and B. Sach
Accuracy Tests
Table 1 shows accuracy figures for MSMBP and MSMFT in the single match conditions explained above. The Match % column refers to the percentage of the pattern inserted into the text and Actual refers to the actual match size. The columns Run 1. . . 3 give the size of the largest pattern found by the algorithm and Avr. Diff gives the average percentage over the three runs between the found match and the real largest match. The inserted match is successfully found in all cases except where the largest match is only 10% of the pattern size. When the best match is very small the algorithm is less accurate as expected. However, in information retrieval applications the system would correctly return that there were no “good” matches in the data if this situation were to arise. Table 1. Accuracy of MSM algorithms with one inserted match Match 90 75 25 10
% Actual Run 1 Run 2 Run 3 % 180 180 180 180 % 150 150 150 150 % 50 50 50 50 % 20 4 5 5
Avr. Diff 100 % 100 % 100 % 23 %
Table 2. Accuracy of MSM algorithms with two inserted matches Match % 1st, 2nd 100%,10% 100%,50% 100%,90% 100%,99% 75%,10% 75%,65% 75%,70% 75%,73% 50%,10% 50%,40% 50%,45% 25%,5% 25%,15% 25%,20%
Actual 1st, 2nd 200,20 200,100 200,180 200,198 150,20 150,130 150,140 150,146 100,20 100,80 100,90 50,10 50,30 50,40
Run 1 Run 2 Run 3 Avr. Diff 200 200 200 200 150 150 150 150 100 100 100 50 50 40
200 200 200 200 150 150 150 150 100 100 100 50 50 50
200 200 200 200 150 150 140 150 100 100 90 50 50 50
100 % 100 % 100 % 100 % 100 % 100 % 98 % 100 % 100 % 100 % 97 % 100 % 100 % 93 %
Table 2 gives more detail on how close the first and second best match can be before our algorithms return a sub-optimal result. The columns are defined as in Table 1 except that two matches have now been inserted. The sizes of these inserted matches are given in the first two columns. The results confirm that when one match is much better than the others, our algorithm are consistently correct. However when matches are very close in size to each other, small
Fast Approximate Point Set Matching for Information Retrieval
223
inaccuracies are introduced. However, in our tests even when the matches are extremely close, for example matches of size 100% and 99% or 50% and 45%, the algorithm was either correct every time or in the majority of cases tested. If more accuracy is required then either the constant r can be increased, thereby increasing the sparsity of p and t and reducing the probablilty of false positives or the whole matching algorithm can be repeated and the mode of the estimates chosen.
6
Conclusion
We have presented two algorithms, MSMBP and MSMFT which enable us to solve the largest subset match efficiently on large point sets. Speed is crucial to information retrieval, where many users may be searching the stored data simultaneously and expect fast response. We have shown experimentally that it is possible to achieve speedups of several orders of magnitude in some cases without a significant decrease in accuracy. However, to reach near instant response times for other than small inputs it seems likely that a completely new index based approach will be required. It is an open question how this will be achieved. Acknowledgments. The authors would like to thank Manolis Christodoulakis for the original implementation of the MSM-FT algorithm and the EPSRC for the funding of the second author.
References 1. Cardoze, D.E. and Schulman, L.J.: Pattern Matching for Spatial Point Sets. In IEEE Symposium on Foundations of Computer Science (1998) 156–165 2. Clifford, R., Christodoulakis, M., Crawford, T., Meredith, D., and Wiggins, G.: A Fast, Randomised, Maximal Subset Matching Algorithm for Document-Level Music Retrieval. In Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR ’06) (2006) to appear 3. Cole, R. and Hariharan, R.: Verifying Candidate Matches in Sparse and Wildcard Matching. In Proceedings of the Annual ACM Symposium on Theory of Computing (2002) 592–601 4. Cormen, T.H., Leiserson, C.E., and Rivest, R.L.: Introduction to Algorithms. MIT Press (1990) 5. Frigo, M. and Johnson, S.G.: The Design and Implementation of FFTW3. Proceedings of the IEEE, Special issue on “Program Generation, Optimization, and Platform Adaptation” 93 2 (2005) 216–231 6. Meredith, D., Lemstr¨ om, K., and Wiggins, G.A.: Algorithms for Discovering Repeated Patterns in Multidimensional Representations of Polyphonic Music. Journal of New Music Research 31 4, December 2002, 321–345 7. Ukkonen, E., Lemstr¨ om, K., and M¨ akinen, V.: Geometric Algorithms for Transposition Invariant Content–Based Music Retrieval. In Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR ’03), Johns Hopkins University (2003) 193–199
A Software Architecture for Shared Resource Management in Mobile Ad Hoc Networks Orhan Dagdeviren and Kayhan Erciyes Izmir Institute of Technology, Computer Engineering Department Urla, Izmir 35340, Turkey {orhandagdeviren,kayhanerciyes}@iyte.edu.tr
Abstract. We propose a three layer software architecture for shared resource management in mobile ad hoc networks(MANETs). At the lowest layer, the Merging Clustering Algorithm(MCA)[11] partitions the MANET into a number of balanced clusters periodically. At the second layer, the Backbone Formation Algorithm(BFA) provides a virtual ring using the clusterheads found by MCA. Finally, an example resource management protocol which is a modified and scaled version of the RicartAgrawala algorithm implemented using the virtual ring structure is presented with the performance results.
1
Introduction
Mobile ad hoc networks do not have a fixed topology and the nodes of a MANET communicate using temporary connections with their neighbors. A MANET can be partitioned into a number of clusters to solve various problems such as routing and mutual exclusion in such networks. Mutual exclusion algorithms provide an efficient way of resource sharing in MANETS and also in distributed systems. Distributed mutual exclusion algorithms are either permission based or token based. A node would need permission from all of the related nodes to enter a critical section in a permission based algorithm. In token-based algorithms however, a node would require the possession of a system-wide unique token to enter a critical section. Susuki-Kasami’s algorithm [8] (N messages) and Raymond’s tree based algorithm [5] (log(N) messages) are examples of token based mutual exclusion algorithms. Examples of non-token based distributed mutual exclusion algorithms are Lamport’s algorithm [3] (3(N-1) messages), RicartAgrawala (RA) algorithm (2(N-1) messages) [6] and Maekawa’s algorithm [4]. Safety, liveness and fairness are the main requirements for any mutual exclusion algorithm. Lamport’s algorithm and RA algorithm are considered as one of the only fair distributed mutual exclusion algorithms in literature. A distributed mutual exclusion algorithm using tokens is shown in [9] and a k-way mutual exclusion algorithm for ad hoc wireless networks where there may be at most k nodes executing a critical section at one time is described in [10]. In this study, we propose a three layer architecture for resource management in MANETs. At the lowest layer, a clustering algorithm provides dynamic clusters of the MANET, using the previously designed MCA [11]. The Backbone Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 224–234, 2007. c Springer-Verlag Berlin Heidelberg 2007
A Software Architecture for Shared Resource Management
225
Formation Algorithm at the second layer provides a virtual ring architecture of the coordinators of the clusters formed by MCA [11]. Finally, we show the implementation of the Distributed Mutual Exclusion Algorithm described in [1,2] as the third layer application which uses the virtual ring structure. We first partition the MANET into a number of clusters periodically using the Merging Clustering Algorithm (MCA). The nodes in the cluster that have direct connections, that is, in the communication ranges of the nodes of other clusters are called neighbor nodes. The MCA also provides the leader for every cluster which we will call coordinator here. Secondly, we construct a directed ring architecture across coordinators. To achieve this goal, we propose the backbone formation algorithm. After formation of the ring, the coordinators perform the required critical section entry and exit procedures for the nodes they represent. Using this architecture, we improve and extend the RA algorithm described in [2] to MANETs and show that these algorithms may achieve an order of magnitude reduction in the number of messages required to execute a critical section at the expense of increased response times and synchronization delays which may also be useful in environments that use wireless sensor networks where energy efficiency, therefore message complexity is of paramount importance. The rest of the paper is organized as follows. Section 2 provides the background. Section 3 reviews the extended RA algorithm on the proposed model called Mobile RA. The implementation results are explained in Section 4 and the discussions and the conclusions are outlined in Section 5.
2 2.1
Background Clustering Using Merging Clustering Algorithm
An undirected graph is defined as G = (V, E), where V is a finite nonempty set and E ⊆ V × V . The V is a set of nodes v and the E is a set of edges e. A graph GS = (VS , ES ) is a spanning subgraph of G = (V, E) if VS = V . A spanning tree of a graph is an undirected connected acyclic spanning subgraph. Intuitively, a minimum spanning tree (MST) for a graph is a subgraph that has the minimum number of edges for maintaining connectivity [16]. Merging Clustering Algorithm M CA [11] finds clusters in a MANET by merging the clusters to form higher level clusters as mentioned in Gallagher, Humblet, Spira’s algorithm [17]. However, we focus on the clustering operation by discarding minimum spanning tree. This reduces the message complexity as explained in [11] . The second contribution is to use upper(2 ∗ K) and lower(K) bound heuristics for clustering operation which results balanced number of nodes in the cluster formed. Cluster leader is the node with the greatest node id in a cluster. Cluster id is equal to the node id of the cluster leader. 2.2
Backbone Formation Algorithm
Backbone Formation Algorithm constructs a backbone architecture on a clustered MANET [12]. Different than other algorithms, the backbone is constructed
226
O. Dagdeviren and K. Erciyes
as a directed ring architecture to gain the advantage of this topology and to give better services for other middleware protocols [18,19,20,2]. The second contribution is to connect the clusterheads of a balanced clustering scheme which completes two essential needs of clustering by having balanced clusters and minimized routing delay. Beside these, the backbone formation algorithm is fault tolerant as the third contribution. Our main idea is to maintain a directed ring architecture by constructing a minimum spanning tree between clusterheads and classifying clusterheads into BACKBONE or LEAF nodes, periodically. To maintain these structures, each clusterhead broadcasts a Leader Info message by flooding. In this phase, clustermember nodes act as routers to transmit Leader Info messages. Algorithm has two modes of operation; hop-based backbone formation scheme and position-based backbone formation scheme. In hop-based backbone formation scheme, minimum number of hops between clusterheads are taken into consideration in the minimum spanning tree construction. Minimum hop counts can be obtained during flooding scheme. For highly mobile scenarios, an agreement between clusterheads must be maintained to guarantee the consistent hop information. In position-based backbone formation scheme, positions of clusterheads are used to construct the minimum spanning tree. If each node knows its velocity and the direction of velocity, these information can be appended with a timestamp to the Leader Info message to construct better minimum spanning tree. But in this mode, nodes must be equipped with a position tracker like a GPS receiver. Every node in the network performs the same local algorithm as shown in [12]. 2.3
Performance Metrics
Performance of a distributed mutual exclusion algorithm depends on whether the system is lightly or heavily loaded. If no other process is in the critical section when a process makes a request to enter it, the system is lightly loaded. Otherwise, when there is a high demand for the critical section which results in queueing up of the requests, the system is said to be heavily loaded. The important metrics to evaluate the performance of a mutual exclusion algorithm are the Number of Messages per request (M ), Response Time (R) and the Synchronization Delay (S). M can be specified for high load or light load in the system. The Response Time R is measured as the interval between the request of a node to enter critical section and the time it finishes executing the critical section. The synchronization delay S is the time required for a node to enter a critical section after another node finishes executing it. The minimum value of S is one message transfer time T since one message suffices to transfer the access rights to another node [7]. 2.4
The Proposed Architecture
We propose a three layer architecture for MANETs as shown in Fig. 1. Implementations of other higher level functions on top of the lower two layers are possible. The lowest layer is where the clustering takes place at the end of which,
A Software Architecture for Shared Resource Management
3
Mobile Ricart Agrawala Algorithm
2
Backbone Formation Algorithm
1
Merging Clustering Algorithm
227
Fig. 1. Proposed Architecture
balanced clusters are formed. The second layer inputs these clusters and form a virtual ring of the coordinators of these clusters. Finally, the third layer shows the implementation of the Mobile RA Algorithm on top of these two layers.
3
Mobile Ricart Agrawala Algorithm
For distributed mutual exclusion in MANETs, we proposed a hierarchical architecture where nodes form clusters and each cluster is represented by a coordinator in the ring [1]. The relation between the cluster coordinator and an ordinary node is similar to a central coordinator based mutual exclusion algorithm. Coord_Req / Coord_Rep IDLE Node_Rel / Coord_Rep
/
Node_Req / Coord_Req Node_Rel / Coord_Rep Node_Rel / Coord_Rep
Node_Req Coord_Req WAITND
Coord_Rep / Node_Rep
Node_Rel / Coord_Rep
WAITRP
/
Node_Req Coord_Req
Coord_Req / Coord_Rep
Fig. 2. FSM of the Mobile RA Coordinator
The types of messages exchanged are Request, Reply and Release where a node first requests a critical section and upon the reply from the coordinator, it enters its critical section and then releases the critical section. The finite state machine representation of the Mobile RA coordinator is shown in Fig. 2 [1,2]. The coordinator sends a critical section request (Coord Req) to the ring for each node request (N ode Req) it receives. When it receives an external request (Coord Req), it performs the operation of a normal RA node by checking the
228
O. Dagdeviren and K. Erciyes
timestamps of the incoming request by the pending requests in its cluster and sends a reply (Coord Reply) only if all of the pending requests have greater timestamps than the incoming request. When a node sends a N ode Rel message, the coordinator sends Coord Rel messages to all of the requests in the wait queue that have smaller timestamps than the local pending ones. 3.1
Illustration of the Mobile RA Algorithm
Fig. 3 shows an example scenario for the Mobile RA Algorithm in the network where the network of 20 nodes is partitioned into clusters 19, 14, 17 and 18 using MCA. K parameter is selected as 4. Nodes 19, 14, 17 and 18 are the cluster leaders and the cluster coordinators of clusters 1, 2, 3 and 4. They form a ring together with 0,3 and 10. Node 6, node 4, node 16 makes request for critical section respectively at 3.75s, 3.85s, 3,90s. Execution Time of critical section is taken as 350ms. The following describes the events that occur:
15
Node_Req(6,19,3.75)
9 6
9
Coord_Req(4,18,3.85)
19
4
18
Coord_Rep
15
R18
Coord_Req(6,19,3.75)
19
4
18
6
Node_Req(4,18,3.85)
R17
Coord_Req(6,19,3.75)
3
13
3
13
0
Coord_Req(6,19,3.75)
0
Coord_Req(16,17,3.90)
11 Coord_Req(6,19,3.75)
11 2
17
2
17
14
14 16
10 5
10
1
7
5
Node_Req(16,17,3.90)
16
1
7
8 12
15 Coord_Req(16,17,3.90)
9
19
19
4
18
6
4
18
6
Coord_Rep
Node_Rel
Coord_Req(4,18,3.85)
13
(b)
15
Node_Rel(6,19)
9
8 12
(a)
Coord_Req(16,17,3.90)
3 Coord_Req(4,18,3.85)
13
0
3
0
11 Coord_Req(4,18,3.85)
11 Coord_Req(16,17,3.90)
2
17
17
14
2
14 16
10 5
Node_Rel
1
7
10 5
Coord_Rep
8 12
(c)
16
1
7
8 12
(d)
Fig. 3. Operation of the Mobile RA Algorithm
1. Node 6 in cluster 19 makes a critical section request at 3.75s by sending N ode Req (6,19,3.75) message to node 19 which is the cluster coordinator. Node 19 receives the message at 3.76s and changes its state to W AIT RP . Node 19 sends a Coord Req (6,19,3.75) message to next coordinator (node 14) on the ring. Node 14, which is in IDLE state and has no pending requests in its cluster, receives the Coord Req (6,19,3.75) message
A Software Architecture for Shared Resource Management
229
at 3.78s and forwards the message to the next coordinator(node 17) on the ring. The message traverses the ring and received by node 19 which is in W AIT RP state at 3,82s meaning all of the coordinators have confirmed that either they have no pending requests or their pending requests all have higher timestamps. Node 19 sends a Coord Rep message to node 6 and changes its state to W AIT N D. Node 6 receives the Coord Rep message at 3.83s and enters the critical section. Step 1 is depicted in Fig. 3.(a). 2. Node 4 in cluster 18 makes a critical section request by sending a N ode Req (4,18,3.85) at 3.85s. Node 18 receives the N ode Req (4,18,3.85) message at 3.86s and sends a Coord Req (4,18,3.85) message to its next coordinator(node 19) on the ring. Node 19, which is in W AIT N D state, receives the message and enqueues the Coord Req (4,18,3.85) at 3.87s. Node 16 makes a critical section request at 3.90s. Node 18 which is in W AIT RP state receives the Coord Req (16,17,3.90) message and enqueues the message at 3.93s. Step 2 is depicted in Fig. 3.(b). 3. Node 6 exits from critical section at 4.18s and sends a N ode Rel message to node 19. Node 19 which is in W AIT N D state receives the message at 4.19s and makes a transition to IDLE. Node 19 dequeues and forwards Coord Req (4,18,3.85) message to next coordinator(node 14). The Coord Req (4,18,3.85) message is forwarded by node 17 since its request has higher timestamp. Node 18 receives its original request at 4.25s and sends a Coord Rep message to node 4. Node 4 enters the critical section at 4.26s. Step 3 is depicted in Fig. 3.(c). 4. Node 4 finishes to execute critical section at 4.61s. Node 18 receives the N ode Rel message at 4.62s. Node 18 dequeues and forwards the Coord Req (16,17,3.90) message to its next coordinator(node 19) on the ring. Operation is continued as explained before. Node 17 receives N ode Rel message from node 16 at 5.03s. The Step 4 is depicted in Fig. 3.(d). If there are multiple requests within the same cluster, time stamps are checked similarly for local request. The order of execution in this example is nodes 6 → 4 → 16 in the order of the timestamps of the requests. We briefly state the following properties of the Mobile RA Algorithm which were described and proven in [2] – The total number of messages per critical section using the Mobile RA Algorithm is k + 3d where k is an upper bound on the n umber of neighbor nodes in the ring including the cluster coordinators and d is an upperbound on the diameter of a cluster. – The Synchronization Delay (S) in the Mobile RA Algorithm varies from 2dT to (k + 2d − 1)T . – In the Mobile RA Algorithm, the response times are Rlight =(k + 3d)T + E and Rheavy varies from w(2dT + E) to w((k + 2d − 1)T + E) where k is the number of clusters and w is the number of pending requests. Since the sending and receiving ends of the algorithm are the same as of RA algorithm, the safety, liveness and fairness attributes are the same. The performance metrics for the Mobile RA Algorithm is summarized in Tab. 1.
230
O. Dagdeviren and K. Erciyes Table 1. Performance of Mobile RA Algorithm Mlight Mheavy Rlight k + 3d k + 3d
4
Rheavy−min
Smin Smax
(k + 3)dT + E w(2dT + E) 2dT
(k + 2d − 1)T
Results
We implemented the protocol stack with the ns2 simulator. A random load generator is implemented to generate high, medium and low loads for different number of nodes. Different size of flat surfaces are chosen for each simulation to create small, medium and large distances between nodes. Very Small, Small and Medium surfaces vary between 310m × 310m to 400m × 400m, 410m × 410m to 500m × 500m, 515m × 515m to 650m × 650m respectively. Random movements are generated for each simulation. Low, medium and high mobility scenarios are generated and respective node speeds are limited between 1.0m/s to 5.0m/s, 5.0m/s to 10.0m/s, 10.0m/s to 20.0m/s. K parameter of merging clustering algorithm is changed to obtain different size of clusters. Response times and synchronization delays as measured with respect to load, mobility, distance and K are recorded. Execution of critical section is selected as 100ms. Response time behaves as expected in low load scenarios as shown in Fig. 4. Synchronization delay values are smaller in medium load as shown in Fig. 5. The synchronization delay is 0 in low load scenarios since there will be no waiting requests in the queues. When the load is increased, response time increases due to the waiting times of requests in the queue. Also, the response time and the synchronization delay increase due to collisions and routing delays caused by high network traffic as shown in Fig. 4 and Fig. 5. Response time and synchronization delay values are scalable against against the mobility as shown in Fig. 6 and Fig. 7. Fig. 8 and Fig. 9 shows the effects of distance between nodes to response time and synchronization delay. As the distance between nodes increases the connectivity is decreased. This situation causes greater delays. K parameter is
Fig. 4. Response Time against Load for Mobile RA
A Software Architecture for Shared Resource Management
231
Fig. 5. Synchronization Delay against Load for Mobile RA
Fig. 6. Response Time against Mobility for Mobile RA
Fig. 7. Synchronization Delay against Mobility for Mobile RA
selected between 3 to 8 in a MANET with 60 node. In fixed number of nodes, as the cluster size increases, total number of clusters in the network decreases. This also reduces the number of cluster leaders forming the ring and routing delay which causes decrease in the response time and the synchronization delay as shown in Fig. 10 and Fig. 11.
232
O. Dagdeviren and K. Erciyes
Consequently, our results conform with the analysis that response time against low and medium loads increases linearly with a small gradient. Synchronization delay values against medium and high load also increase linearly. Response time against high load makes a sharp increase due to high network traffic. Response time and synchronization delay values are stable under different mobility and
Fig. 8. Response Time against Surface Area for Mobile RA
Fig. 9. Synchronization Delay against Surface Area for Mobile RA
Fig. 10. Response Time against K for Mobile RA
A Software Architecture for Shared Resource Management
233
Fig. 11. Synchronization Delay against K for Mobile RA Table 2. Comparison of the Mobile Mutual Exclusion Algorithms with others Regular Mobile Algs. Mobile (k=m=d) Ricart-Agrawala Alg. 2(N − 1) k + 3d Token Passing Alg. N O(k + 3d)
Θ(4sqrt(N )) O(4sqrt(N ))
surface area conditions. Response time and synchronization delay value decrease linearly against the number of clusters in the MANET.
5
Conclusions
We proposed a three layer architecture for resource management in a MANET and the implementation results of the Mobile RA Algorithm for MANETs. The MANET is partitioned into clusters at regular intervals by the MCA which also provides connected clusterheads. Ring architecture is constructed by Backbone Formation Algorithm. The Mobile RA Algorithm, together with the architecture that it is executed on, provides improvement over message complexities of Ricart and Agrawala and other distributed mutual exclusion algorithms. A comparison of the two algorithms with their regular counterparts in terms of their message complexities is shown in Tab. 2. If we assume k=m=d for simplicity, the message complexities of the mobile algorithms are in the order of sqrt(N ) where N is the total number of nodes in the network [2]. From the test results, we observe that response time R is scalable with respect to the number of mobile nodes for all load states in the MANET as high, medium or low loads. R is also scalable with respect to node mobility and the distance between the mobile nodes. The coordinators have an important role and they may fail. New coordinators may be elected and also any failed node member can be excluded from the clusters using Backbone Formation Algorithm. Our work is ongoing and we are looking into implementing this algorithm in wireless sensor network architectures where preserving energy is important, hence low message complexities are required. We are also considering k-way distributed mutual exclusion algorithms in MANETs.
234
O. Dagdeviren and K. Erciyes
References 1. Erciyes, K.: Distributed Mutual Exclusion Algorithms on a Ring of Clusters. ICCSA 2004, Springer-Verlag, LNCS 3045 (2004) 518–527 2. Erciyes, K.: Cluster-Based Distributed Mutual Exclusion Algorithms for Mobile Networks. EUROPAR 2004, Springer-Verlag, LNCS 3149 (2004) 933–940 3. Lamport, L.: Time, Clocks and the Ordering of Events in a Distributed System. CACM 21 (1978) 558–565 4. Maekawa, M.: A sqrt(n) Algorithm for Mutual exclusion in Decentralized Systems. ACM Transactions on Computer Systems 3 2 (1985) 145–159 5. Raymond, K.: A Tree-Based Algorithm for Distributed Mutual Exclusion. ACM Trans. Comput. Systems 7 1 (1989) 61–77 6. Ricart, G. and Agrawala, A.: An Optimal Algorithm for Mutual Exclusion in Computer Networks. CACM 24 1 (1981) 9–17 7. Shu, Wu: An Efficient Distributed Token-Based Mutual Exclusion Algorithm with a Central Coordinator. Journal of Parallel and Distributed Processing 62 10 (2002) 1602–1613 8. Susuki, I. and Kasami, T.: A Distributed Mutual Exclusion Algorithm. ACM Trans. Computer Systems 3 4 (1985) 344–349 9. Walter, J.E., Welch, J.L., and Vaidya, N.H.: A Mutual Exclusion Algorithm for Ad Hoc Mobile Networks. Wireless Networks 7 6 (2001) 585–600 10. Walter, J.E., Cao, G., and Mohanty, M.: A K-way Mutual Exclusion Algorithm for Ad Hoc Wireless Networks. Proc. of the First Annual Workshop on Principles of Mobile Computing (2001) 11. Dagdeviren, O., Erciyes, K., and Cokuslu, D.: Merging Clustering Algorithms, ICCSA, LNCS 3981 (2006) 681–690 12. Dagdeviren, O. and Erciyes, K.: A Distributed Backbone Formation Algorithm for Mobile Ad hoc Networks. To be published in the Proc. of ISPA06 (2006) 13. West, D.: Introduction to Graph Theory. Second edition, Prentice Hall, Upper Saddle River, N.J. (2001) 14. Chen, Y.P. and Liestman, A.L.: Approximating Minimum Size Weakly-Connected Dominating Sets for Clustering Mobile Ad Hoc Networks. Proc. 3rd ACM Int. Symp. Mobile Ad Hoc Net. and Comp. (2002) 165–72 15. Haynes, T.W., Hedetniemi, S.T., and Slater, P.J.: Domination in Graphs, Advanced Topics. Marcel Dekker Inc. (1998) 16. Grimaldi, R.P.: Discrete and Combinatorial Mathematics. An Applied Introduction. Addison Wesley Longman, Inc. (1999) 17. Gallagher, R.G., Humblet, P.A., and Spira, P.M.: A Distributed Algorithm for Minimum-Weight Spanning Trees. ACM Transactions on Programming Languages and Systems 5 (1983) 66–77 18. Baldoni, R. Virgillito, A., and Petrassi, R.: A Distributed Mutual Exclusion Algorithm for Mobile Ad-Hoc Networks. Computers and Communications (2002) 539–544 19. Delmastro, F.: From Pastry to CrossROAD: CROSS-Layer Ring Overlay for Ad Hoc Networks. Third IEEE International Conference on Pervasive Computing and Communications Workshops (2005) 60–64 20. Yang, C.Z.: A Token-Based h-out of-k Distributed Mutual Exclusion Algorithm for Mobile Ad Hoc Networks. 3rd International Conference on Information Technology (2005) 73–77
Compressed Prefix Sums⋆ O’Neil Delpratt, Naila Rahman, and Rajeev Raman Department of Computer Science, University of Leicester, Leicester LE1 7RH, UK {ond1,naila,r.raman}@mcs.le.ac.uk
Abstract. We consider the prefix sums problem: given a (static) sen quence of positive integers x = (x1 , . . . , xn ), such that i=1 xi = m, we wish to support the operation sum(x, j), which returns ji=1 xi . Our interest is in minimising the space required for storing x, where ‘minimal space’ is defined according to some compressibility criteria, while supporting sum as rapidly as possible. There are two main compressibility criteria: (a) the succinct space ⌉ bits, applies to any sequence x whose bound, B(m, n) = ⌈log 2 m−1 n−1 elements add up to m; (b) data-aware measures, which depend on the values in x, and can be lower than the succinct bound for some sequences. Appropriate data-aware measures have been studied extensively in the information retrieval (IR) community [17]. We demonstrate a close connection between the data-aware measure that is the best in practice for an important IR application and the succinct bound. We give theoretical solutions that use space close to other data-aware compressibility measures (often within o(n) bits), and support sum in doubly-logarithmic (or better) time, and experimental evaluations of practical variants thereof. A bit-vector is a data structure that supports ‘rank/select’ on a bitstring, and is fundamental to succinct and compressed data structures. We describe a new bit-vector that is robust and efficient.
1
Introduction
The prefix sum problem is fundamental in a number of applications. An inverted list is a sequence of integers 0 < y1 < . . . < yn representing (typically) the locations where a keyword appears in a text corpus. Compressing this inverted list, called index compression, is commonly done by storing the difference sequence x, where xi = yi − yi−1 (taking y0 = 0) in compressed form [17]. sum(x, i) then provides direct access to yi ; such direct access is important for answering queries that have conjunctions of keywords [17, Chapter 4]. The application that we are interested in involves storing a collection of strings. We concatenate all strings into an array, and let xi denote the length of the i-th string. sum(x, i − 1) then ⋆
Delpratt is supported by PPARC e-Science Studentship PPA/S/E/2003/03749.
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 235–247, 2007. c Springer-Verlag Berlin Heidelberg 2007
236
O. Delpratt, N. Rahman, and R. Raman
gives the offset in the string array where the i-th string begins.1 A plethora of other applications can be found in the literature [7,12,14]. Measures. Let x be a sequence of n positive integers that add up to m. There are l = m−1 n−1 such sequences, so no representation can store all such sequences in fewer than B(m, n) = ⌈lg l⌉ ≤ n lg(m/n) + n lg e bits2 . B(m, n) is never more than the cost of writing down all prefix sums explicitly, i.e., n⌈lg m⌉ bits. So-called data-aware measures are based on self-delimiting encodings of the individual values xi , and have been studied extensively in the context of IR applications [17]. There are two main families; the first is best represented by the Golomb and Rice codes, and the second by the δ and γ codes. Given an integer parameter b > 1, the Golomb code of an integer x > 0, denoted G(b, x), is obtained by writing the number q = ⌊(x − 1)/b⌋ in unary (i.e. as 1q 0), followed by r = x − qb − 1 in binary using either ⌊lg b⌋ or ⌈lg b⌉ bits. A Rice code is a Golomb code where nb is a power of 2. This gives a first data-aware measure: GOLOMB(b, x) = i=1 |G(b, xi )|, where |σ| denotes the length (in bits) of a string σ. In other words, GOLOMB measures how well x compresses by coding each xi using a Golomb code. The γ-code of an integer x > 0, γ(x), is obtained by writing ⌊lg x⌋ in unary, followed by the value x − 2⌊lg x⌋ in a field of ⌊lg x⌋ bits, e.g, γ(6) = 110 10. Clearly |γ(x)| = 2⌊lg x⌋+1. The δ-code of an integer x > 0, δ(x), writes ⌊lg x⌋+1 using the γ-code, followed by x − 2⌊lg x⌋ in a field of ⌊lg x⌋ bits; e.g., δ(33) = 110 10 00001. We thus get two more measures of compressibility of x: Γ (x) =
n i=1
|γ(xi )|
and
∆(x) =
n i=1
|δ(xi )|
By the concavity of the lg function, it follows that the Γ and ∆ measures are maximised when all the xi ’s are equal. This gives the following observation: Γ (x) = ∆(x) = O(n log(m/n))
(1)
A careful estimate, using the fact that |δ(x)| = lg x + 2 lg lg x + O(1) bits, shows that the worst case of the ∆ measure is not much worse than the succinct bound. Conversely, if the values in x are unevenly distributed, then the Γ and ∆ measures are reduced, and may be much less than the succinct bound. This, together with the simple observation that Γ (x) can never exceed ∆(x) by more than Θ(n) bits, makes the ∆ measure asymptotically attractive. However, extensive experiments show [17] that the ∆, Γ and GOLOMB measures of a sequence are broadly similar, and Γ is often less than ∆; GOLOMB with the choice b = ⌈(m ln 2)/n⌉ has generally been observed to be the smallest. 1
2
In our application the strings tend to be 10-12 characters long on average; string array may be stored in compressed form, taking maybe 3-4 bytes per string on average. Thus, a 32-bit pointer for each string is a large overhead in this context. We use lg x to denote log2 x.
Compressed Prefix Sums
237
Our Contributions. We study the prefix sum problem in the word RAM model [11] with a word size of O(log m) bits. Our contributions are as follows: 1. We observe that GOLOMB is closely related to the succinct bound when the Golomb parameter b is chosen to be Θ(m/n). As noted above, Golomb coding, with a parameter chosen in this range, offers consistently good practical compression performance for a range of applications. 2. We argue, that due to the not-so-large differences between the various compressibility measures in practice, any data structure that attempts to achieve the data-aware bounds above must have a space usage very close to the bound. We show several data structures that are fast yet highly spaceefficient, and a number of trade-offs are possible by tuning parameters. For example, we show how to achieve ∆(x) + O(n) bits and sum in O(log log(m/n)) time, and we show how to achieve ∆(x) + o(n) bits, and sum in O(log log(m)) time. 3. Item (1) motivates the engineering of a data structure that approaches the succinct bound. For one particular prefix sum representation, due to [3,7], the main component is a data structure that stores a (static) bit-string of size N and supports the operations: select(i): returns the position of the i-th 1, and rank(x): returns the number of 1 bits to the left of position x (inclusive). Such a data structure is called a bit-vector and is of fundamental importance in succinct data structures. There are N +o(N )-bit bit-vector data structures that support both operations in O(1) time (see e.g. [1]), but there does not yet appear to be a suitably satisfactory “fast” data structure that uses reliably “little” space in practice, despite some work [5,13]. Combining ideas from [5,13], we give a new N + o(N )-bit data structure that supports rank and select in O(1) time, whose worst-case space usage is superior to that of [5,13], but whose space usage and running time in practice, particularly for select, are competitive with the best of the existing data structures. 4. We implement and experimentally analyze data measures and running times. Although some results are preliminary, our conclusions are that the new bit-vector is probably, for our applications, superior to other practical bitvectors [5,13], and that the Golomb measure is indeed very close to the succinct measure. Related Work. There is a large body of related work: Data structures achieving within O(n) bits of the succinct bound were given by many authors (e.g. [3,7]); the optimal bound was achieved in [14]. In recent nwork [9], a new data-aware measure, gap was proposed, where gap(x) = i=1 ⌈lg xi ⌉. The authors considered, in addition to sum, a variety of operations including predecessor operations on the set represented by the prefix sums of x. Unfortunately, gap is not an achievable measure, in that there exist sequences that provably cannot be compressed to gap, and the best space bounds of [9] tend to be of the form gap + o(gap).
238
O. Delpratt, N. Rahman, and R. Raman
Given the relatively little difference that exists in practice between the succinct and data-aware bounds, one must pay special attention to the lower-order terms when considering such data structures. The advantages of our data structure are that we are able to prove more careful space bounds, while achieving the same time bounds. For example, it appears that (c is any constant > 0): Time (sum) [9,10] This paper O(lg lg(m/n)) ∆(x) + O(n(lg(m/n))c ) ∆(x) + O(n) O(lg lg m) ∆(x) + O(n lg lg(m/n)) ∆(x) + o(n) Our methods are similar at a high level to those developed independently [8] by [10], but we use the building blocks more carefully. In [10], an experimental evaluation is performed on data-aware data structures. Their focus is on rank queries, while ours is on select, and our data sets are different. Contrary to [10], we uphold the conclusions of [17] that Golomb coding (and hence the succinct bound) are superior to the other gap-aware measures. Although it would be meaningless to draw direct conclusions regarding running times between our work and theirs, in our implementations, only the trivial gap-aware data structures came even close to the succinct data structure. Other work [6] implies that O(1)-time select is possible if space gap(x)+ o(m) bits is used, but the second term can be much larger than gap.
2
Preliminaries
We use the following notation. A sequence refers hereafter to a sequence of positive integers. Given a sequence x its length is denoted by |x| and, if |x| = n |x| then its components are denoted by x1 , . . . , xn . By W (x) we denote i=1 xi . 2.1
Succinct Representations and Golomb Codes
A simple representation of a sequence that approaches the succinct space bound is [3,7]: Theorem 1. A sequence x with W (x) = m and |x| = n can be represented in n lg(m/n) + O(n) bits so that sum(x, i) can be computed in O(1) time. Proof. Let yi = sum(x, i) for i = 1, . . . , n. Let u be an integer, 1 ≤ u < lg m. We store the lower-order lg m − u bits of each yi in an array, using n(lg m − u) bits. The multi-set of values formed by the top-order u bits is represented by coding the multiplicity of each of the values 0, . . . , 2u − 1 in unary, as a bitstring s with n 1s and 2u 0s. We choose u = ⌊lg n⌋, so |s| = O(n). A select operation on s lets us compute yi (and hence sum(x, i)) in O(1) time, but the data structures to support select on s in O(1) time require only o(n) additional bits. We now show the connection between the succinct and Golomb bounds:
Compressed Prefix Sums
239
Proposition 1. Let c > 0 be any constant, and let x be a sequence with W (x) = m and |x| = n. Then, taking b = ⌈cm/n⌉, |GOLOMB(b, x) − B(m, n)| = O(n). Proof. We note that B(m, n) = n lg(m/n) + Θ(n), and: GOLOMB(b, x) ≤
=
n xi − 1 i=1
b
+ 1 + ⌈lg b⌉
≤
i=1
b
+ n(⌈lg b⌉ + 1)
m + n(⌈lg b⌉ + 1) = n lg(m/n) + O(n). b
Similarly, we show that GOLOMB(b, x) ≥ n lg(m/n). 2.2
n xi
2
A New Bit-Vector Data Structure
We now discuss a new data structure to support select on a bit-string of length N . √ Let t = ⌈ lg N ⌉ and l = ⌈(lg N )/2⌉. We divide the given bit-string A into blocks of size B = tl, and sub-divide each block into t sub-blocks of size l. We obtain the extracted string A′ (cf. [13]) by removing from A all blocks with no 1s. We let N ′ denote the length of A′ . The data structure comprises the following: – For each block in A′ , we store the number of 0s up to the start of the block in A (the original bitstring) in an array R. Since each entry in R is log N √ bits long, and it has N ′ /B entries, the size of R is O(N/ lg N ) bits. – For each sub-block we store the number of 1s in that sub-block in an array SBC; counts of 1s for each block are stored in BC. Since each entry in SBC occupies O(lg lg N ) bits, SBC takes O(N lg lg N/ lg N ) = o(N ) bits of storage; BC takes even less space. – Finally, we store the index (in A′ ) of the location of the it + 1-st 1, for i = 0, 1, . . . , ⌊N1 /t⌋, in an array S, where N1 is the number of 1s in the bitstring. As each block in A′ contains at least one 1, adjacent entries in S differ by at most tB = O((lg N )2 ). We store every log N -th value in S explicitly, and all remaining values relative to the previous explicit value. This requires O(|S| lg lg N ) = o(N ) bits. The data structure thus takes N ′ + o(N ) bits. We note that we can perform table lookup on a block in O(1) time, as well as on on t consecutive values in both BC and SBC, as O(t lg lg N ) = o(log N ) bits. A select(i) works as follows: from S we find the position in A′ of the ⌊i/t⌋t-th 1. Let this lie in a block z. Using (one) table lookup on z, we determine the number of 1s that precede the ⌊i/t⌋t-th 1 in z, and hence the number of 1s up to the start of z. Since the i-th 1 lies within t − 1 blocks of z, we apply table lookup (once) to t consecutive values in BC to determine the block y in which the i-th 1 lies, as well as the number of 1s before y. One more table lookup (on SBC) suffices to determine the sub-block y ′ containing the i-th 1, as well as the number of 1s in y before y ′ . A final table lookup on y ′ then locates the i-th 1, giving its position within the
240
O. Delpratt, N. Rahman, and R. Raman
extracted string A′ . From R, we obtain the number of 0s in A that precede y, from which we can calculate the position of the i-th 1 in A. To support rank, we need to store the contracted string (cf. [13]), which stores one bit for each block in A, indicating whether or not it is a block with all 0s, and some auxiliary data structures (details omitted). We have thus shown: √ Theorem 2. There is a data structure that occupies N + O(N lg lg N/ lg N ) bits, and supports rank and select in O(1) time. Remark 1. A practical version of this data structure (which occupies (1 + )N bits) is described in Section 4, and its performance for select is discussed there as well. However, it is slightly slower than [13,5] for rank. An important advantage of this data structure is that its space usage is predictably low. If parameters are chosen so that for “most” inputs the space usage of [13,5] is moderate, then there are some bit-strings for which these data structures may take a lot of space.
3
γ and δ Codes
We now consider compression criteria based on the γ and δ codes. A continuing assumption will be that, given γ(x) or δ(x), we can decode x in O(1) time, provided the code fits in O(1) machine words. With the appropriate low-level representation, this is easy to do in our model. For an integer x, γ(x) is assumed to be represented in a word with the unary representation of ⌊lg x⌋ stored reversed in the lower-order bits, and the ‘binary’ part stored in the next higher-order bits. For example, γ(11) = 1110 011 is stored in a word z as . . . 011 0111, where the lower-order bits are shown on the right. Standard tricks, such as computing z and (z xor (z + 1)) leave only the ‘unary part’ of γ(x) in the lower-order bits. Completing the decoding requires computing lg z, which can be done in O(1) time in our model [4]. Decoding a δ-code is similar. Define the operation access(x, i) as returning xi . We now show: Proposition 2. A sequence x with |x| = n and W (x) = m can be stored in Γ (x) + O(n log log(m/n)) bits and support access in O(1) time. Proof. We form the bit-string σ by concatenating γ(x1 ), . . . , γ(xn ) (the lowlevel representation is modified as above). We create the sequence o, where oi = |γ(xi )| and store it in the data structure of Theorem 1. Evaluating sum(o, i − 1) and sum(o, i) gives the start and end points of γ(xi ) in O(1) time, and xi is decoded in O(1) further time. Since W (o) = Γ (x) = O(n log(m/n)), the space used to represent o is O(n log log(m/n)) bits. Remark 2. An obvious optimisation is to remove the unary parts altogether from σ, since they are encoded in o, and this is what we do in practice. A simple prefix-sum data structure is obtained as follows (Lemma 1 is quite similar to one in [10]):
Compressed Prefix Sums
241
32 15
17
7
3
8
4
6
6
11
2
6
5
3
3
Fig. 1. Formation of tree(x); shaded nodes are removed from the output
Lemma 1. Given a sequence x, |x| = n and W (x) = m, we can store it using Γ (x) + O(n log log(m/n)) bits and support sum in O(log n) time. Proof. For convenience of description, assume that n is a power of 2. Consider a complete binary tree T with n leaves, with the values xi stored in left-to-right order at the leaves. At each internal node we store the sum of its two children. We then list the values at the nodes in the tree in level-order (starting from the root), except that for every internal node, we only enumerate its smaller child. This produces a new sequence of length n, which we denote as tree(x). For example, in the tree of Fig. 1, x = (3, 4, 6, 2, 6, 5, 3, 3) and tree(x) = (32, 15, 7, 6, 3, 2, 5, 3). Given tree(x) and an additional n − 1 bits that specify for each internal node, which of the two children was enumerated, we can easily reconstruct all values in nodes on, or adjacent to, any root-to-leaf path, which suffices to answer sum queries. The key observation is: Γ (tree(x)) ≤ Γ (x) + 2n − 2.
(2)
To prove this, consider a procedure to fill in the values in T bottom up. First, it stores in each node at level 1 the sum of its two children. Let the values stored at level 1 be y1 , . . . , yn/2 , and note that yi = x2i−1 + x2i ≤ 2 max{x2i−1 , x2i }, so |γ(yi )| ≤ γ(max{x2i−1 , x2i }) + 2. If we now delete max{x2i−1 , x2i } for all i, the total lengths of the γ-codes of the yi s, together with the remaining n/2 values at the leaves, is n bits more than Γ (x). Since the construction of tree(x) now essentially recurses on y1 , . . . , yn/2 , Equation 2 follows. If we store tree(x) in the data structure of Prop. 2, we have O(1) time access to each of the values in tree(x), and decoding all the values from a root-to leaf path, and hence computing sum, takes O(log n) time. 2 We now obtain the next result: Lemma 2. Given an integer λ > 0, such that λ is a power of 2, a sequence x with |x| = n and W (x) = m, there is a data structure that stores x using: log λ + log log(m/n) λ + log(m/n) + Γ (x) + O n λ 2λ bits and supports sum in O(λ) time.
242
O. Delpratt, N. Rahman, and R. Raman
Before we prove this lemma, we note some consequences: Corollary 1. Given an integer λ > 0, such that λ is a power of 2, a sequence x with |x| = n and W (x) = m, there is a data structure that stores x using: (a) Γ (x) + O(n log(m/n)/(log n)c ) bits, for any c > 0, and supporting sum in O(log log n) time. (b) Γ (x) + O(n) bits, and supporting sum in O(log log(m/n)) time. Proof. Follows by choosing λ = c log log n and λ = Θ(log log(m/n)) respectively. Proof. (of Lemma 2.) We use mostly standard ideas: we store a regularly-spaced subset of prefix sums in the O(1)-time data structure of Theorem 1, and apply the slower data structure of Lemma 1 only to the short subsequences that lie in between. We also replace the lower levels of the trees of Lemma 1 with slow but optimally space-efficient bitstrings comprising concatenated γ-codes. We begin by partitioning x into ⌈n/λ⌉ contiguous subsequences s1 , s2 , . . .. Let r = (r1 , . . . , r⌈n/λ⌉ ) where ri = W (si ). We first discuss the representation of the subsequences si . From each such subsequence, we delete the largest value, giving a new subsequence s′i and indicate, using a lg λ-bit value, the position of the deleted element. All numbers in the subsequences s′i are γ-encoded and concatenated into a single bit-string σ. The sequence o, where oi = Γ (s′i ), is stored using the data structure of Theorem 1, and sum(o, i − 1) gives the start of the representation of s′i in σ. Since W (o) ≤ Γ (x) = O(n log(m/n)), the space used by the representation of o is O((n/λ) log(λ log(m/n))) bits. Within this space bound, we can also include the O((n log λ)/λ) bits needed to specify which elements were deleted from the subsequences si . ⌈n/λ⌉ We claim that Γ (r) + i=1 Γ (s′i ) is bounded by Γ (x) + O((n/λ) log λ). The reasoning is similar to that of Equation 2: the γ-code of any value ri is O(log λ) bits longer than the γ-code of the value deleted from si . Note that this additional space is also absorbed into the space bound for representing o. Now we consider the representation of r. r is partitioned into ⌈n/2λ ⌉ subsequences, r1 , r2 , . . . of length 2λ /λ. We create a top-level sequence t where ti = W (ri ); |t| = ⌈n/2λ ⌉. We represent t using Theorem 1, which requires O((n/2λ )(λ + log(m/n))) bits, and allows sum queries on t to be answered in O(1) time. Finally, let z be the sequence obtained by concatenating tree(r1 ), tree(r2 ) . . . ; z is stored in the structure of Proposition 2, and it should be clear that supporting O(1) time access operations on z suffices to traverse the trees representing the sequences ri in O(λ) time. Noting that W (z) = O(2λ m), the space overhead of this representation is easily seen to be negligible. 2 An analogue of Lemma 2 for δ-codes can be proved similarly (proof omitted): Lemma 3. Given an integer λ > 0, such that λ is a power of 2, a sequence x with |x| = n and W (x) = m, there is a data structure that stores x using: log λ + log log(m/n) λ + log(m/n) + ∆(x) + O n λ 2λ bits and supports sum in O(λ) time.
Compressed Prefix Sums
243
The final result requires an additional idea. We begin as in Lemma 2. For some parameter ν, we begin by partitioning x into ⌈n/ν⌉ contiguous subsequences s1 , s2 , . . .. Let r = (r1 , . . . , r⌈n/λ⌉ ) where ri = W (si ). We represent r using Lemma 3, and delete the largest value from each of s1 , s2 , . . ., giving s′1 , s′2 , . . ., as before, where |s′i | = ν − 1. Access to the s′i is handled differently. Note that a δ-code can be thought of as a ‘binary’ part and a γ-code containing the length of the binary part. We let l be such that li is the length of the binary part of xi . Grouping the li s into contiguous sequences ti , we create a sequence p, that pi = W (ti ). p is stored in the data structure of Corollary 1(b), which, since W (p) = O(n log(m/n)), supports sum(p, i) in O(log log log(m/n)) time. Modulo some details, this suffices to access s′i in O(ν + log log log(m/n)) time; we can choose e.g. ν = Θ(log log m) to obtain the following (a full tradeoff is omitted in the interests of brevity): Theorem 3. Given a sequence x with |x| = n and W (x) = m, there is a data structure that stores x using: ∆(x) + O(n log log log m/ log log m) bits and supports sum in O(log log m) time.
4
Implementation and Experimental Evaluation
We implemented three data structures to support the sum operation, the succinct data structure (Theorem 1) and two that store γ-codes. Our test data are derived from XML files. We used 14 real-world XML files [15,16] with different characteristics that come from applications including genomics and astronomy. For each file, the input sequence x is such that xi is the length of the string stored in the i-th text node in the XML file, numbered in document order (pre-order). Section 1 explains the rationale for this. In this section, we first describe the implementations of our data structures. We then evaluate the compressibility of the test data under various measures. Finally, we evaluate the space usage and (running time) performance of our implementations. Implementation of Data Structures. We implemented the data structures in C++ and tested them on a dual processor Pentium 4 machine and a Sun UltraSparc-III machine. The Pentium 4 has 512MB RAM, 2.8GHz CPUs and a 512KB L2 cache, running Debian Linux. The compiler was g++ 3.3.5 with optimisation level 2. The UltraSparc-III has 8GB RAM, a 1.2GHz CPU and a 8MB cache, running SunOS 5.9. The compiler was g++ 3.3.2 with optimisation level 2. We now describe the implementations of the new bit-vector data structure and the prefix sums data structures. Bit-vector data structure. The algorithm of Section 2.2 is implemented as follows. We use a block size of B = 64 bits, and no sub-blocks. We use 32-bit integers to store values in R. We store the offset of every s = 32-nd 1 bit in the array S, which is compressed as follows. Every 8th value in S is stored explicitly as a 32-bit value, every other value is represented relative to the previous explicit value using 16 bits. With each block we also store an 8-bit value for the count
244
O. Delpratt, N. Rahman, and R. Raman
of 0s from the start of the block until the last offset from the S array into that block. We compared with our optimised Clark-Jacobson bit-vector [5] (CJ-BV) and our implementation [2] of Kim et al.’s bit-vector [13] (KNKP-BV). For the important case where half the bits are 1, the table below gives the typical and worst-case space usage for our new bit-vector and for CJ-BV using parameters B = 64, S = 32 and L = 256, and for KNKP-BV using 256-bit superblocks and 64-bit blocks ( varies with file but is typically less than 0.2). The typical space used by the new bit-vector to store a sequence of N bits is just under 2N bits, which compares well with the typical usage of KNKP-BV and CJ-BV; the worst-case is a lot better, however3. Typical Worst-case New CJ-BV KNKP-BV New CJ-BV KNKP-BV Input bit-string (1 − ǫ)N N N N N N select (1 − ǫ)0.94N (1 + ǫ)0.52N (1 + ǫ)0.63N 0.94N 2.77N 1.17N rank 0.03N 0.5N 0.25N 0.02N 0.5N 0.25N
Succinct prefix sums data structure. For the implementation of the succinct prefix sums data structure described in Theorem 1 we used u = ⌊lg n⌋ top-order bits. The low-order lg n − u bits are packed tightly, so for example if lg n − u = 5 then 64 values are stored using ten 32-bit integers. γ-code data structures. We have implemented two data structures for storing γ-codes, which we refer to as explicit-γ and succinct-γ. For a sequence x = (x1 , . . . , xn ) we form the bit-string σ by concatenating γ(x1 ), . . . , γ(xn ). In the explicit-γ data structure we store every G-th prefix sum, as well as offsets into σ to the start of the G-th γ-code, explicitly (using 32 bits); in the succinct-γ data structure, these prefix sums and offsets are stored using the succinct data structure. To compute sum(x, i − 1), we access the appropriate G-th prefix sum, and the corresponding offset, and sequentially scan σ from this offset. Compressibility, Space Usage and Performance. Table 1 summarises the measures of compressibility, in terms of bits per prefix sum value, using the various encoding schemes and using a succinct representation. In the Golomb codes we use b = ⌈0.69m/n⌉. Although gap gives the best measure of compressibility it does not give decodable data. We see that in practice Γ and ∆ are greater than GOLOMB in 10 of our test XML files, and for half our files GOLOMB is at least 29% less than either Γ or ∆; this is in line with many results on compressing inverted lists [17] (however, [10] give examples where Γ and ∆ are smallest). GOLOMB and the succinct bound were even closer than Prop. 1 suggested: for 13 of our XML files they were within 10% of each other. Recall that Γ (tree(x)) ≤ Γ (x) + 2|x| − 2 (Eq. 2 in Lemma 1). Let tree∗ (x) be the sequence obtained by always deleting the right child. In the worst case, Γ (tree∗ (x)) ≥ 2Γ (x), and in the best case, Γ (tree∗ (x)) = Γ (x) = Γ (tree(x)) (e.g. take x = (8, 1, 4, 1)). Table 1 shows (Γ (tree∗ (x)) − Γ (x))/|x| for our sequences. It is interesting to note that this does not go below 1.96, which gives 3
As noted in [2], bit-vectors used to represent XML documents can have certain regular patterns that lead to worst-case space usage in CJ-BV and KNKP-BV.
Compressed Prefix Sums
245
Table 1. Test file, number of text nodes. Compressibility measures: gap(x), ∆(x), Γ (x), GOLOMB(b, x) (gol), B(m, n) (suc), all divided by n = |x|; m = W (x). Tree ovhd: (Γ (tree∗(x)) − Γ (x))/|x|. Space usage: Total space in bits (spac) and wasted space in bits (wast) per prefix value using the succinct prefix sum data structure and using the explicit-γ and succinct-γ data structures. Data structure parameters selected such that wasted space is roughly equal.
File
text nodes elts 3896 w3c1 7102 w3c2 7689 mondial 34.9K unspsc 39.3K partsupp 48.0K orders 150.0K xcrl 155.6K votable2 841.0K nasa 948.9K lineitem 1.0M xpath 1.7M treebank 2.4M xcdna 16.8M
Compressibility measures gap ∆ Γ GOL Suc 2.90 5.53 5.36 4.15 4.04 2.22 4.73 4.70 5.86 5.46 1.85 3.98 3.96 5.05 5.26 3.55 6.87 6.56 4.94 4.90 3.83 7.16 6.71 4.75 4.89 2.53 5.24 5.23 6.27 5.95 2.56 5.31 4.99 4.87 4.71 3.84 7.75 6.96 4.96 4.98 2.56 5.67 5.28 3.97 4.03 3.04 5.58 5.45 5.53 5.39 2.16 4.94 4.55 3.96 3.94 3.26 6.41 5.81 4.42 4.37 4.00 7.67 7.28 5.24 6.04 3.33 6.62 6.18 5.61 5.39
tree ovhd 1.97 2.72 2.37 2.11 2.05 2.77 3.04 2.03 1.96 2.38 2.10 2.21 2.32 2.29
Succinct spac wast 7.10 3.07 8.19 2.73 8.12 2.85 7.77 2.88 7.61 2.71 9.36 3.41 7.67 2.96 7.62 2.64 7.26 3.23 8.15 2.76 7.08 3.14 7.26 2.89 8.65 2.61 7.87 2.48
Space usage explicit-γ succinct-γ spac wast spac wast 7.36 2.00 7.89 2.53 6.70 2.00 7.38 2.67 5.96 2.00 6.49 2.53 8.56 2.00 9.13 2.57 8.71 2.00 9.36 2.65 7.23 2.00 7.94 2.71 6.99 2.00 7.53 2.54 8.96 2.00 9.62 2.65 7.28 2.00 7.85 2.57 7.45 2.00 8.11 2.66 6.55 2.00 7.08 2.52 7.81 2.00 8.38 2.57 9.28 2.00 10.07 2.79 8.18 2.00 8.77 2.59
some insight into the distribution of values. Neither does it go above 3.04—and is typically much smaller—showing that always deleting the right child (which is simpler and faster) does not waste space in practice4 . We now consider the space usage of our data structures. We calculate the space used, in bits per input sequence value, and also the difference between the space used by the data structures and the corresponding compressibility measure (we refer to this as wasted space). Table 1 summarises the space usage of the various data structures where parameters have been selected such that the wasted space is roughly the same. For the explicit-γ and succinct-γ data structures we used G = 32 and G = 8 respectively. For these values the space usage in the γ-codes data structures is comparable to the succinct data structure. The performance measure we report is time in µs for determining a random prefix sum value. Each data point reported is the median of 10 runs in which we perform 8 million random sum operations. We have again selected parameters such that the wasted space in each data structure is about the same. Table 2 summarises the performance of the data structures. The fastest runtime for each file on the Pentium 4 and on the UltraSparc-III platforms is shown in bold. The table shows the performance of the succinct data structure using the three different bit-vectors. We see that the performance of the new bit-vector is similar to CJ-BV and better than KNKP-BV. The table also shows the performance of 4
Recall that Γ (tree(x)) does not include the n − 1 bits needed for decoding x.
246
O. Delpratt, N. Rahman, and R. Raman
Table 2. Speed evaluation on Intel Pentium 4 and Sun UltraSparc-III. Test file, number of text nodes, time in µs to determine a prefix sum value for succinct data structures using CJ-BV, KNKP-BV and the new bit-vector. Time to determine a prefix sum for explicit-γ (Exp) and for succinct-γ (Succ) data structures, both of which are based on the new bit-vector. The best runtime for each file and platform is in bold.
text File nodes elts 3896 w3c1 7102 w3c2 7689 mondial 34.9K unspsc 39.3K partsupp 48.0K orders 150.0K xcrl 155.6K votable2 841.0K nasa 948.9K lineitem 1.0M xpath 1.7M treebank 2.4M xcdna 16.8M
Intel Pentium 4 Succinct prefix sums γ-code CJ KNKP New Exp Succ 0.070 0.143 0.066 0.233 0.293 0.084 0.156 0.081 0.241 0.298 0.086 0.156 0.081 0.239 0.305 0.086 0.159 0.083 0.249 0.305 0.083 0.158 0.081 0.241 0.293 0.085 0.161 0.081 0.239 0.303 0.105 0.178 0.101 0.235 0.306 0.088 0.163 0.085 0.244 0.313 0.215 0.316 0.213 0.361 0.434 0.305 0.423 0.294 0.391 0.545 0.283 0.401 0.274 0.378 0.443 0.326 0.459 0.306 0.453 0.564 0.410 0.556 0.409 0.506 0.686 0.464 0.759 0.471 0.551 1.175
Sun UltraSparc-III Succinct prefix sums γ-code CJ KNKP New Exp Succ 0.151 0.222 0.138 0.284 0.389 0.158 0.230 0.138 0.279 0.389 0.158 0.229 0.140 0.279 0.390 0.176 0.240 0.146 0.293 0.399 0.176 0.244 0.149 0.290 0.401 0.168 0.240 0.150 0.284 0.396 0.199 0.270 0.176 0.298 0.408 0.196 0.270 0.170 0.313 0.418 0.208 0.298 0.198 0.316 0.470 0.223 0.321 0.212 0.324 0.519 0.215 0.310 0.207 0.316 0.481 0.218 0.308 0.203 0.328 0.510 0.241 0.341 0.244 0.345 0.545 0.742 0.951 0.733 0.646 0.989
the explicit-γ and succinct-γ data structures using the bit-vector. We see that the explicit-γ data structure out-performs the succinct-γ data structure when the space usage is roughly the same. Our performance results are preliminary but we note that the succinct prefix sums data structure almost always outperforms both the γ-codes data structures. We observed that a single γ-decode is abouttwenty times faster than a select operation, so improvements in the bitvector would make succinct-γ more competitive. We also perfomed some limited experiments on the relative performance of the data structure of Lemma 1. We compared the time for sum(x, i), when x is stored as in Lemma 1 (but always deleting the right child), versus in a simple bit-string. At |x| = 64, 128, 256, 512 and 1024, the times in µs for the tree were 0.767, 0.91, 1.12, 1.28 and 1.5, and for the bit-string were 0.411, 0.81, 1.57, 3.08 and 6.03. We are not comparing like for like, as the tree uses more space, even then we find that the (logarithmic) tree data structure does not outperform the (linear) bit-string until |x| > 128. The tree requires two select operations at each node visited, so an approach to speeding-up the tree data structure would be to increase the arity and thereby reduce the height of the tree. Summary. On our data sets, Golomb encoding and the succinct bound are usually very similar, and they generally use less space than γ and δ encoding. The succinct prefix sums data structure is faster than the γ codes data structures when space usage is comparable. The new bit-vector has similar or better speed than existing bit-vectors but uses less space in the worst case.
Compressed Prefix Sums
5
247
Conclusions
We have presented new, highly space-efficient, data structure for data-aware storage of a sequence. An immediate question is whether there is a data structure that supports sum in O(1) time using close to Γ (x) or ∆(x) space—there is no obvious lower bound that rules it out. We have presented a new bit-vector data structure, and shown it to be competitive in terms of speed to existing bit-vectors, but with a robust space bound. Our experimental results show that storing prefix sums succinctly, rather than in a data-aware manner, is appropriate in some applications.
References 1. Clark, D. and Munro, J.I.: Efficient Suffix Trees on Secondary Storage. In Proc. 7th ACM-SIAM SODA, ACM Press (1996) 383–391 2. Delpratt, O., Rahman, N., and Raman, R.: Engineering the LOUDS Succinct Tree Representation. In Proc. WEA 2006, Springer, LNCS 4007 (2006) 134–145 3. Elias, P.: Efficient Storage Retrieval by Content and Address of Static Files. J. ACM 21 (1974) 246–260 4. Fredman, M.L. and Willard, D.E.: Trans-Dichotomous Algorithms for Minimum Spanning Trees and Shortest Paths. J. Comput. Sys. Sci. 48 (1994) 533–551 5. Geary, R.F., Rahman, N., Raman, R., and Raman, V.: A Simple Optimal Representation for Balanced Parentheses. In Proc. 15th CPM, Springer, LNCS 3109 (2004) 159–172 6. Grossi, R. and Sadakane, K.: Squeezing Succinct Data Structures into Entropy Bounds. In Proc. 17th ACM-SIAM SODA, ACM Press (2006) 1230–1239 7. Grossi, R. and Vitter, J.S.: Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. Manuscript (2002), Prel. vers. in Proc. ACM STOC, ACM Press (2000) 397–406 8. Grossi, R. and Vitter, J.S.: Private communication (2004) 9. Gupta, A., Hon, W.-K., Shah, R., and Vitter, J.S.: Compressed Data Structures: Dictionaries and Data-Aware Measures. In Proc. DCC ’06, IEEE (2006) 213–222 10. Gupta, A., Hon, W.-K., Shah, R., and Vitter, J.S.: Compressed Dictionaries: Space Measures, Data Sets, and Experiments. In Proc. WEA ’06, Springer, LNCS 4007 (2006) 158–169 11. Hagerup, T.: Sorting and Searching on the Word RAM. In Proc. 15th STACS, Springer, LNCS 1373 (1998) 366–398 12. Hagerup, T. and Tholey, T.: Efficient Minimal Perfect Hashing in Nearly Minimal Space. In Proc. 18th STACS, Springer, LNCS 2010 (2001) 317–326 13. Kim, D.K., Na, J.C., Kim, J.E., and Park, K.: Effcient Implementation of Rank and Select Functions for Succinct Representation. In Proc. WEA 2005, Springer, LNCS 3503 (2005) 315–327 14. Raman, R., Raman, V., and Rao, S.S.: Succinct Indexable Dictionaries, with Applications to Representing k-Ary Trees and Multisets. In Proc. 13th ACM-SIAM SODA, ACM Press (2002) 233–242 15. UW XML Repository. http://www.cs.washington.edu/research/xmldatasets/ 16. VOTable Documentation. http://www.us-vo.org/VOTable/ 17. Witten, I., Moffat, A., and Bell, I.: Managing Gigabytes, 2e. Morgan Kaufmann (1999)
On Optimal Solutions for the Bottleneck Tower of Hanoi Problem⋆ Yefim Dinitz and Shay Solomon Department of Computer Science, Ben-Gurion University of the Negev Beer-Sheva 84105, Israel {dinitz,shayso}@cs.bgu.ac.il
Abstract. We study two aspects of a generalization of the Tower of Hanoi puzzle. In 1981, D. Wood suggested its variant, where a bigger disk may be placed higher than a smaller one if their size difference is less than k. In 1992, D. Poole suggested a natural disk-moving strategy for this problem, but only in 2005, the authors proved it be optimal in the general case. We describe the family of all optimal solutions to this problem and present a closed formula for their number, as a function of the number of disks and k. Besides, we prove a tight bound for the diameter of the configuration graph of the problem suggested by Wood. Finally, we prove that the average length of shortest sequence of moves, over all pairs of initial and final configurations, is the same as the above diameter, up to a constant factor.
1
Introduction
The classic Tower of Hanoi (ToH) puzzle is well known. It consists of three pegs and disks of sizes 1, 2, . . . , n arranged on one of the pegs as a “tower”: in decreasing, bottom-to-top size. The goal of the puzzle is to transfer all disks to another peg, placed in the same order. The rules are to move a single disk from (the top of) one peg to (the top of) another one, at each step, subject to the divine rule: to never have a larger disk above a smaller one. The goal of the corresponding mathematical problem, which we denote by HT = HTn , is to find a sequence of moves (“algorithm”) of a minimal length (“optimal”), solving the puzzle. We denote the pegs naturally as source, target, and auxiliary, while the size of a disk is referred as its name. The following algorithm γn is taught in introductory CS courses as a nice example of a recursive algorithm. It is known and easy to prove that it solves HTn in 2n − 1 disk moves, and is the unique optimal algorithm for it. – If n is 1, move disk n from source to target. – Otherwise: • recursively perform γn−1 (source, auxiliary); • move disk n from source to target; • recursively perform γn−1 (auxilary, target). ⋆
Partially supported by the Lynn and William Frankel Center for Computer Science.
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 248–259, 2007. c Springer-Verlag Berlin Heidelberg 2007
On Optimal Solutions for the Bottleneck Tower of Hanoi Problem
249
In the recent two decades, various ToH type problems were considered in the mathematical literature. Many algorithms were suggested, and extensive related analysis was performed. As usual, the most difficult, far not always achievable task is showing that a certain algorithm is optimal, by providing the matching lower bound. A distinguished example is the Frame-Stewart algorithm (of 1941), solving the generalization of the ToH problem to four or more pegs. It is simple, and an extensive research was conducted on its behavior, since then. However, its optimality still remains an open problem; the proof of its approximate optimality [5] was considered a breakthrough, in 1999. This paper contributes to the difficult sub-area of the ToH research—optimality proofs. In 1981, D. Wood [6] suggested a generalization of HT , characterized by the k-relaxed placement rule, k ≥ 1: If disk j is placed higher than disk i on the same peg (not necessarily neighboring it), then their size difference j − i is less than k. In this paper, we refer it as the Bottleneck Tower of Hanoi problem (following D. Poole [4]), and denote it BT Hn = BT Hn,k . Now, there are more than one legal way to place a given set of disks on the same peg, in general; we refer the decreasing bottom-to-top placement of all disks on the same peg as the perfect disk configuration. If k is 1, we arrive at the classic ToH problem. In 1992, D. Poole [4] suggested a natural algorithm for BT Hn and declared its optimality. However, his (straightforward) proof is done under the fundamental assumption that before the last move of disk n to the (empty) target peg, all other n − 1 disks are gathered on the spare peg. This situation is far not general, since before the last move of disk n, from some peg X to the target peg, any set of the disks n − 1, n − 2, . . . , n − k + 1 may be placed below disk n on peg X. In 1998, S. Beneditkis, D. Berend, and I. Safro [1] gave a (far not trivial) proof of optimality of Poole’s algorithm for the first non-trivial case k = 2 only. In 2005, the authors proved it for the general case, by different techniques (see [3]). X. Chen et al. [2] considered independently a few ToH problems, including the bottleneck ToH problem. They also suggested a proof of optimality of Poole’s algorithm, based on another technical approach. Poole’s algorithm is based on an optimal algorithm for another related problem of ”moving somehow”, under the k-relaxed placement rule: To move m disks [1..m], placed entirely on one peg, to another peg, in any order. This algorithm is denoted by βm = βm (source, target), and is as follows: – If m is at most k, move all disks from source to target one by one. – Otherwise: • recursively perform βm−k (source, auxiliary); • move disks [(m − k + 1)..m] from source to target one by one; • recursively perform βm−k (auxilary, target). Poole’s algorithm, denoted by αn = αn (source, target) is as follows: – perform βn−1 (source, auxiliary); – move disk n from source to target; – perform βn−1 (auxilary, target).
250
Y. Dinitz and S. Solomon
In [4], it was erroneously stated that βm and αn are unique optimal solutions for the corresponding problems. Let us show an example of an optimal solution to BT H7 , for k = 2, distinct from α7 . It has the same template as α7 , but uses another optimal “somehow” algorithms, instead of β6 . In the following description, a configuration of BT H7 on the three pegs is depicted by three parentheses, containing the disk numbers for pegs A, B, and C, from bottom to top; each configuration is obtained from the previous one by one or more moves. The difference from α7 begins from the eighth configuration, marked by !!. (7654321)()() → (76543)(12)() → (765)(12)(34) → (765)()(3421) → (7)(56)(3421) → (712)(56)(34) → (712)(564)(3) → (71)(5642)(3) !! → (71)(56423)() → (7)(564231)() → ()(564231)(7) → ()(56423)(71) → (3)(5642)(71) → (3)(564)(712) → (34)(56)(712) → (3421)(56)(7) → (3421)()(765) → (34)(12)(765) → ()(12)(76543) → ()()(7654321) In this paper, we find the family of all optimal solutions to the Bottleneck Tower of Hanoi Problem, and present a closed formula for their number. Consider now a generalization of BT Hn , where the prescribed initial order of disks on peg A and their final order on peg C are not by decreasing size, but arbitrary. It is easy to see that αn is far not always legal w.r.t. the k-relaxed placement rule. A natural arising question is what is the length of the longest one among all shortest sequences of moves, over all pairs of initial and final configurations, that is what is the diameter of the configuration graph of BT Hn . We prove a tight bound for the diameter, up to a constant factor. We also prove a stronger result: that the average length of shortest sequence of moves, over all pairs of initial and final configurations, is the same as the above diameter, up to a constant factor (for the cases n ≤ k and n > 3k). We believe that finding exact bounds for these problems is difficult even for the degenerate case n ≤ k.
2
Definitions and Notation
A configuration of a disk set D is called gathered, if all disks in D are on the same peg. Such a configuration is called perfect, if D is an initial interval of naturals, and the order of disks (on a single peg) is decreasing. For any configuration C of D and any D′ ⊆ D, the restriction C|D′ is C with all disks not in D′ removed. A move of disk m from peg X to peg Y is denoted by the triplet (m, X, Y ); the third peg, Z = X, Y , is referred as the spare peg of (m, X, Y ). For a disk set D, the configuration of D \ {m} is the same before and after such a move; we refer it as the configuration of D \ {m} during (m, X, Y ). A packet-move, P , of D is a sequence of moves transferring the entire set of disks D from one peg to another. W.r.t. P , the former peg is called source, the latter target, and the third peg auxiliary. The length |P | of P is the number of moves in it. If both initial and final configurations of P are perfect, we call P a perfect-to-perfect (or p.t.p., for short) packet-move of D. For better mnemonics, the entire set of disks [1..m] is divided into ⌈ m k ⌉ blocks Bi = Bi (m): B1 = [(m − k + 1)..m], B2 = [(m − 2k + 1)..(m − k)], . . ., B⌈ mk ⌉ =
On Optimal Solutions for the Bottleneck Tower of Hanoi Problem
251
[1..(m − (⌈ m k ⌉ − 1) · k)]. Note that the set of disks in any block is allowed to be placed on the same peg in an arbitrary order. For any m ≥ 1, let Dm denote [1..m], and Small(m) denote Dm \B1 (Dm ). In the above notion, BT Hn concerns finding the shortest perfect-to-perfect packet-move of Dn . A configuration is called well-separated if it satisfies the condition that at each peg, the disks in any block are placed continuously. Notice that βm applied to a gathered well-separated configuration of m disks is legal, and results in the configuration, where the first block of disks is at the bottom of target in the reverse order w.r.t. its initial order at the bottom of source, while the rest of the disks are above it in their original order. As well, βm applied to the latter configuration is legal and results in the original disk ordering. We say that a move-sequence S contains a move-sequnece S ′ if S ′ is a subsequence of S. Several move-sequences Si , 1 ≤ i ≤ r, contained in S, are called disjoint if the last move in Siprecedes the first move in Si+1 , for each 1 ≤ i ≤ r − 1; the case when also |S| = ri=1 |Si | holds is denoted by S = S1 ◦ S2 . . . ◦ Sr . For any sequence of moves S of D and any D′ ⊆ D, the restriction of S to D′ , denoted by S|D′ , is the result of omission from S all moves of disks not in D′ . Note that any restriction of a legal sequence of moves is legal as well, and if D′ ⊆ D, a restriction of a packet-move of D to D′ is a packet-move of D′ . Clearly, if D is partitioned into D′ and D′′ , then |P | = |P |D′ | + |P |D′′ |. Let us denote the length of βm by bm . It is known from [4,3] that if m = sk+r, where 0 ≤ r < k, then bm = (k + r) · 2s − k. We will use the following result. Theorem 1 ([3]). Under the rules of BT H, the length of any packet-move of Dm is at least bm .
3 3.1
Exploration of the Configuration Graph Main Results
The Diameter. We define the Configuration Graph for BT Hn,k as the directed graph Gconf = Gconf n,k = (V, E), where V is the set of all the possible configurations of Dn on the three pegs, under the k-relaxed placement rule, and an edge e = (u, v) is in E if u, v ∈ V and u and v are reached one from the other by a single move. Let us denote the diameter of Gconf n,k by Diam(n, k). Our first result provides tight bounds for the diameter, up to a constant factor. Theorem 2 (proof is omitted) ⎧ if n ≤ k ⎨ Θ(n · log n) Diam(n, k) = Θ(k · log k + (n − k)2 ) if k < n ≤ 2k n ⎩ if n > 2k . Θ(k 2 · 2 k )
The Average. Let us denote by Avg(n, k) the average number of moves required to get from one configuration to another, taken over all pairs of configurations. The following theorem strengthens the first and the asymptotic cases
252
Y. Dinitz and S. Solomon
of Theorem 2, asserting that Avg(n, k) is the same as D(n, k), up to a constant factor. Theorem 3 Avg(n, k) =
Θ (n · log n) if n ≤ k n Θ(k 2 · 2 k ) if n > 3k .
The following remains open. Conjecture 1 For k < n ≤ 2k, Avg(n, k) = Θ(n log n + (n − k)2 ). For 2k < n ≤ 3k, Avg(n, k) = Θ(k 2 ). 3.2
Proof of Theorem 3
We first consider the degenerate case n ≤ k. By the similar case of Theorem 2, any pair of disk configurations is reachable one from the other by a sequence of O(n log n) moves. Thus, the following lemma suffices. Lemma 1. Let n ≤ k. The average number of moves required to get from one configuration to another, taken over all pairs of configurations, is Ω(n · log n). Proof. It suffices to prove that, for any configuration C, the average number of moves required to get from C to any other configuration is Ω(n · log n). We construct a BFS tree (the tree of shortest paths) of Gconf rooted at C, TC , and note that the maximum degree of such a tree is six. A tree is called 6-ary if its maximal degree is six. We call a 6-ary tree T f ull if the number of vertices in each layer i, except for, maybe, the last layer, is 6i−1 ; in this case, the depth of T is h = ⌈log6 (5 · |V | + 1)⌉. In order to bound the average distance from the root to a vertex in the tree TC , we prove that the minimum argument for this value among all 6-ary trees, is a full 6-ary tree, and show that for such a tree, this value is Ω(n · log n) (details are omitted). ⊔ ⊓ Now we turn to the case m > k. A move M in a move-sequence S is called switched either if it is the first move in S, or if the spare peg of M is different from the spare peg of its preceded move in S. A disk is called switched w.r.t. S if it participates in at least one switched move. We define the number of switched disks required to get from one configuration C to another C ′ as the minimal number of switched disks in a move sequence with the initial configuration C and the final configuration C ′ . Lemma 2 (joint with N. Solomon). Let m > k. The average number of switched disks in B1 (m) ∪ B2 (m) required to get from one configuration of Dm to another, taken over all pairs of configurations of Dm , is Ω(k). Proof. We may assume that k > 10, since otherwise the proof of the lemma is immediate. Let Cinit be some configuration of Dm . We will show that the
On Optimal Solutions for the Bottleneck Tower of Hanoi Problem
253
average number of switched disks in B1 (m) ∪ B2 (m) required to get from Cinit to another configuration, taken over all configurations of Dm , is Ω (k), which suffices. Consider some configuration C ′ of Dm . For each peg X, denote by dX the highest disk from B1 (m) on peg X, if any, and by DX (C ′ ) the set of all disks which reside on X lower than dX ; note that all of them belong to B1 (m)∪B2 (m). We define B1,2 (C ′ ) := DA (C ′ ) ∪ DB (C ′ ) ∪ DC (C ′ ), and note that |B1,2 (C ′ )| ≥ k − 3, since (B1 (m) \ {dA , dB , dC }) ⊆ B1,2 (C ′ ). Let us divide the entire set of disks B1,2 (C ′ ) into triads and a residue of size at most six, according to their placement at C ′ , as follows. Each triad consists of three consecutive disks placed on the same peg, X, from below dX downwards, whereas there may remain a residue of size at most two, close to the bottom of each peg. Let lC ′ denote the number of such triads; note that lC ′ = Ω(k). We say that a triad is switched w.r.t. a move-sequence S from Cinit to C ′ , if at least one disk in that triad is switched w.r.t. S. A triad is called cheap, if the disks in it at Cinit are consecutive on some peg, and either preserve their order at C ′ , or reverse it; otherwise, it is called expensive. Claim. Any expensive triad is switched w.r.t. any move-sequence from Cinit to C ′ . Proof. Let S be a move-sequence from Cinit to C ′ and let S −1 be the symmetric move-sequence of S, from C ′ to Cinit . Let τ be an expensive triad in C ′ w.r.t. Cinit . We claim that during S, at least one disk in τ performs a switched move. Assume for contradiction that τ is not switched w.r.t. S. It follows that in S, for each disk d in τ , any move of it from some peg X to another peg Y , is preceded by the move from peg X to peg Y of the disk sitting on d. It follows that in S −1 , for each disk d in τ , any move of it from peg Y to peg X is followed by the move from peg Y to peg X of the disk on which d was sitting. Recall that at the initial configuration of S −1 , C ′ , the three disks sit on each other. This property is preserved during S −1 , since whenever some disk in τ moves from some peg Y to another peg X, the two other disks in τ should move from peg Y to peg X immediately afterwards. Since each such triple of moves inverses the order of τ , at the final configuration Cinit of S −1 , the three disks sit on each other in either their initial order at C ′ or in the reversed order, yielding a contradiction to the choice of τ as expensive w.r.t. Cinit . ⊔ ⊓ Denote the set of all configurations of Dn by C and define l := min(lC ′ |C ′ ∈ C). We show that for at least half of the configurations, C ′′ , at least ⌊ 2l ⌋ = Ω(k) switched disks in B1,2 (C ′′ ) are required to get from Cinit to C ′′ , which suffices. For any configuration C ′ , let e(C ′ ) denote the number of expensive triads in C ′ w.r.t. Cinit . By the above claim, in any move sequence with the initial configuration Cinit and the final configuration C ′ , there are at least e(C ′ ) switched triads. The following claim completes the proof of Lemma 2. Lemma 3. |{C ′ |C ′ ∈ C and e(C ′ ) ≥ ⌊ 2l ⌋}| ≥ ⌈ |C| 2 ⌉.
254
Y. Dinitz and S. Solomon
Proof. Denote {C ′ |C ′ ∈ C and e(C ′ ) < ⌊ 2l ⌋} by S1 and {C ′ |C ′ ∈ C and e(C ′ ) ≥ ⌊ 2l ⌋} by S2. Clearly, S1 S2 = C. Therefore, showing that |S1| ≤ |S2| provides the required result. For this, we construct now an injection, h : S1 → S2, which ˆ < ⌊ l ⌋. will suffice. Let Cˆ be a configuration in S1, s.t. e(C) 2 ˆ in detail, let us outline the basic tool. In each triad as Before describing h(C) defined above, we change the disk order, but not by just swapping the top-most and the bottom-most disks in it. Note that since each triad consists of three consecutive disks, if such a transformation does not violate the k-relaxed rule inside the triad, then it does not cause the entire new configuration to contradict this rule. Besides, since each such transformation rearranges disks inside a triad only, the configuration resulting from any sequence of such transformations defines the same set of unordered triads. It is easy to see that any transformation as above converts any cheap triad ˆ l ˆ Therefore, e(h(C)) ˆ ≥ l(C)−e( ˆ ˆ ≥ ⌊ l(C) in Cˆ to an expensive one in h(C). C) 2 ⌋ ≥ ⌊ 2 ⌋, ˆ ∈ S2. that is h(C) Then, it would remain to show that h is an injection. For this, it would suffice to show that h restricted to each triad is an injection. Now, we define the disk rearrangement, as above, of each triad τ . The only possible cases, allowed by the k -relaxed rule, are as follows: – The disks in τ are allowed to be in an arbitrary order. Then, we swap the two top-most disks in τ . – The two bigger disks should be below the smallest one, in an arbitrary order. Then, we swap the two bigger disks. – The two smaller disks should be above the biggest one, in an arbitrary order. Then, we swap the two smaller disks. – The biggest disk should be below the smallest one, while the intermediate disk d is allowed to be at any place. If d is above all or in the middle, we swap it with the disk that it sits on; otherwise (i.e. when it is bottom-most), we move it above the two other disks. Note that the case, where only the decreasing order of disks in τ is allowed, is impossible, since τ ⊆ B1 (m) ∪ B2 (m). It is easy to show that in any one of the above cases, the resulting ordered triad τ ′ allows to restore the unique triad τ , whose transformation as above results in τ ′ . The required result follows. ⊓ ⊔ ⊓ ⊔ n Proposition 1. For n > 3k, Avg(n, k) = Θ(k 2 · 2 k ). n
Proof. By Theorem 2, it suffices to prove that Avg(n, k) = Ω(k 2 · 2 k ). By Lemma 1, the average number of switched disks in B1 (n) ∪ B2 (n) required to get from one configuration of Dn to another, taken over all pairs of configurations, is Ω(k). Clearly, the number of switched disks in B1 (n) ∪ B2 (n) required to get from one configuration of Dn to another is at most 2k. It follows that there exist constants c1 > 1 and c2 < 2, s.t. c11 out of all thepairs of configurations of Dn require at least c2 · k switched disks in B1 (n) B2 (n), in order to get from one to the other. We denote c22 by c3 .
On Optimal Solutions for the Bottleneck Tower of Hanoi Problem
255
Note that if there are at least c2 · k = 2c3 · k switched disks, then at least ⌈c3 ·k⌉ of them belong to the set B1 (n) B2 (n)\{n−2k +1, . . . , n−2k +⌊c3 ·k⌋}. It follows that at least c11 out of all the pairs of configurations of Dn require at least ⌈c3 · k⌉ switched disks in B1 (n) B2 (n) \ {n − 2k + 1, . . . , n − 2k + ⌊c3 · k⌋} in order to get from one to the other. We note that any move-sequence of Dn that contains N switched moves of disks in B1 (n) B2 (n) \ {n − 2k + 1, . . . , n − 2k + ⌊c3 · k⌋} requires N packetmoves of Small(n − 2k + ⌊c3 · k⌋}. By Theorem 1, it follows that at least ⌈c3 · k⌉ · bn−3k+⌊c3 ·k⌋ moves are made. Recall that bm ≥ m, for any m ≥ 1, and if m = qk + r, where 0 ≤ r < k, then bm = (k + r) · 2q − k. We distinguish two cases. If 3k ≤ n ≤ 4k, then ⌈c3 · k⌉ · bn−3k+⌊c3 ·k⌋ ≥ ⌈c3 · k⌉ · b⌊c3 ·k⌋ ≥ ⌈c3 · k⌉ · ⌊c3 · k⌋ = Ω(k 2 ). If n ≥ 4k, then n−3k n ⊔ ⌈c3 · k⌉ · bn−3k+⌊c3 ·k⌋ = Ω(k · k · 2 k ) = Ω(k 2 · 2 k ). The Proposition follows. ⊓
4 4.1
Family of All Optimal Solutions Local Definitions and Problem Domain
In this sub-section we describe the family of all optimal solutions to BT Hn , and present a closed formula for their number. We use the following result, based on the description of an optimal algorithm αn (see Section 1 for the definitions and the description). Corollary 1 ([3]). The only difference of an arbitrary optimal algorithm for BT Hn from αn could be in choosing another optimal algorithms, instead of βn−1 , for the two included optimal “somehow” packet-moves of Dn−1 . Denote by Somehow-ptX→Y (m, R) the family of all optimal “somehow” packetmoves of Dm with the initial perfect configuration on peg X, s.t. the final configuration of Dm on another peg Y is R; recall that they are of length bm each. Let us denote by Fm the set of all such possible final configurations. We introduce also the family Somehow-pt−1 Y →X (m, R) consisting of the move sequences, symmetric to those in Somehow-ptX→Y (m, R). Obviously, it consists of the move-sequences optimal among those with the gathered initial configuration R on peg Y and the perfect final configuration on another peg X. Theorem 4 is the main result of this section. Note that its item 1 follows from Corollary 1, by the definitions of Somehow-ptX→Y (m, R), Somehow-pt−1 Y →X (m, R) and Fm . Theorem 4. 1. The family of all optimal perfect-to-perfect packet-moves of Dn from peg A to peg C is OptA→C (n) = {S1 ◦ (n, A, C) ◦ S2 | ∃R ∈ Fn−1 : S1 ∈ Somehow-ptA→B (n − 1, R), S2 ∈ Somehow-pt−1 B→C (n − 1, R)} . k+r ⌈2⌈ n−1 ⌉−2 k ⌉−1 , where r = n − 1 mod k. 2. |OptA→C (n)| = ( k+r − k+1 ) k
256
Y. Dinitz and S. Solomon
The description provided by the first item of this theorem becomes explicit using the description of the family Somehow-ptX→Y (m, R) given in Proposition 2 below. When studying Somehow-ptX→Y (m, R), we assume m > k ≥ 2. The case k = 1 is disregarded, since it has been proved long ago that there exists a unique optimal solution for the classical problem of Hanoi. We assume that m > k, since otherwise it is obvious that the packet-move that moves all disks, one by one, from one peg to another, is the unique optimal solution to the problem. In the sequel, we would see that the last two blocks of Dm (see Section 2 to and B⌈ m , behave differently than the recall the division into blocks), B⌈ m k ⌉ k ⌉−1 other blocks in the following sense. For any packet-move in Somehow-ptX→Y (m, R), no disk-moves of two different blocks are interleaved, except for, maybe, interleaving disk moves of these two special blocks. We use the following definitions to distinguish the last two blocks from the other blocks. A configuration is called almost-well-separated if it satisfies the condition that at each peg, the disks in any block except for, maybe, the last two blocks, are placed continuously. An almost-well-separated configuration is called almost-perfect, if the two following conditions hold: 1. On each peg, the disks in each block are in either the increasing or the decreasing order. 2. If B⌈ mk ⌉ are gathered on the same peg, then the disks in B⌈ mk ⌉ are in the and B⌈ m k ⌉−1 decreasing order. Let m = sk + rm , where 0 ≤ rm ≤ k − 1, and let q = k + rm . Clearly, B⌈ mk ⌉ ∪ B⌈ mk ⌉−1 = Dq . Denote by Rq the unique almost-perfect configuration of Dq on some peg, where the k bigger disks are in the increasing order and the rm smaller disks are in the decreasing order. A gathered configuration of Dq is called perfect-mixed if the k bigger disks, as well as the rm smaller disks are in the decreasing order. In order to investigate Somehow-ptX→Y (m, R), for each R in Fm , we extend our discussion to a more coarse grain family of packet-moves. Denote by SX→Y (m) the family of all optimal packet-moves of Dm , whose initial configuration is almost-perfect gathered on peg X and whose final configuration is gathered on another peg Y . Proposition 2. 1. An arbitrary packet-move in Ssource→target (m) with an initial almost-perfect configuration Init can be described as follows. – If (m ≤ 2k, and hence q = m) • If (Init is perfect-mixed) ∗ perform /* named F rom-P M (Init, source, target) */: • move disks in Init from source one by one, so that disks in B1 (m) go to target and disks in B2 (m) to auxiliary; • move all disks in B2 (m) from auxiliary to target one by one; • Otherwise /* Init is Rm */ ∗ perform /* named T o-P M (source, target, R′) */, for an arbitrary perfect-mixed configuration R′ of Dm : • move all disks in B2 (m) from source to auxiliary one by one;
On Optimal Solutions for the Bottleneck Tower of Hanoi Problem
257
• move to target the disks, in the bottom-to-top order of R′ , one by one from the peg on which it resides; – Otherwise: /* m > 2k */ • perform an arbitrary packet-move in Ssource→auxiliary (m − k) with the initial configuration Init|Dm−k ; let T emp denote its final configuration. • move disks [(m − k + 1)..m] from X to Y one by one; • perform an arbitrary packet-move in Sauxiliary→target (m − k) with the initial configuration T emp; 2. For the case m ≤ 2k, the unique packet-move in Somehow-ptX→Y (m, R) is F rom-P M (Init, source, target). 3. For the case m > 2k, in any packet-move as in item 1, the contained packetmoves of Dq alternate between F rom-P M and T o-P M types. 4. For the case m > 2k, an arbitrary packet-move P in Somehow-ptsource→target (m, R) can be described as in item 1, except that for the last packet-move of Dq (that finishing P ), the perfect-mixed configuration R′ is not arbitrary, but R|Dq . 4.2
Proof of Proposition 2
Fact 5 ([3]) 1. During a move (m, X, Y ), all disks in Small(m) are on the spare peg Z = X, Y . 2. If some sequence of moves S begins from a configuration, where disk m and Small(m) are gathered on X, and finishes at a configuration, where disk m and Small(m) are gathered on Y , X = Y , then it contains two disjoint packet-moves of Small(m): one (from X) before the first move of disk m and another (to Y ) after its last move. Lemma 4 (proof is omitted). For any P in SX→Y (m), P contains 2i−1 disjoint packet-moves of j≥i Bj (m), for each 1 ≤ i ≤ ⌈ m k ⌉.
The optimal length bm of the packet-moves in SX→Y (m), together with Lemma 4 yield: Corollary 2. For any P in SX→Y (m) and any 1 ≤ i ≤ ⌈ m k ⌉, each disk in Bi (m) moves exactly 2i−1 times, during P .
The following lemma is the central statement for proving Proposition 2. Lemma 5 (proof is omitted). For any packet-move in SX→Y (m), every configuration reached during its execution is almost-perfect. An easy consequence is that, in SX→Y (m), the final configuration of any packet move of j≥i Bj (m), for each 1 ≤ i ≤ ⌈ m k ⌉, is almost-perfect. Therefore, the third item for the case m > 2k in the description of SX→Y (m) is well-defined. Corollary 3 (proof is omitted). Item 1 of Proposition 2 is valid for the case m > 2k.
258
Y. Dinitz and S. Solomon
By Lemma 5 and Corollary 3, it follows that packet-moves in SX→Y (m) with the same initial almost-perfect configuration are not very different one from another. Such a difference is reflected only in interleaving moves of disks of the two last and B⌈ mk ⌉−1 , in packet-moves of their union. In the sequel, blocks of Dm , B⌈ m k ⌉ we investigate the possibilities of such interleaving of moves. Recall that B⌈ mk ⌉ ∪ B⌈ m is Dq , where m = sk + rm , 0 ≤ rm ≤ k − 1, and k ⌉−1 q = k + rm . By Lemma 4 and the optimality of packet-moves in SX→Y (m), any packetm move P in SX→Y (m) contains 2⌈ k ⌉−2 disjoint packet-moves of Dq , of length bq each. (Note that if all of these packet-moves are fixed as βq , the resulting packetmove is βm .) Hence, the study of SX→Y (m) is reduced to the study of the family of all optimal solutions to the following problem, and its cardinality. Problem 1. Let q = k + rm , s.t. 1 ≤ rm ≤ k , l ≥ 0. Describe an optimal packetmove P of Dq with an initial almost-perfect configuration, which is a composition of 2l disjoint packet-moves of Dq , of length bq each. Lemma 6 (proof is omitted). For any perfect-mixed configuration R of Dq , holds: 1. F rom-P M (R, source, target) is the unique optimal packet-move of Dq with the initial configuration R. Its final configuration is Rq . 2. T o-P M (source, target, R) is the unique optimal packet-move of Dq with the initial configuration Rq and the final configuration R. Validity of items 1 and 2 of Proposition 2 follows. Denote the family of all optimal solutions to Problem 1 by OP T (q, 2l ). Lemma 7 (proof is omitted). For any member in OP T (q, 2l ), if its initial configuration is perfect-mixed (resp., Rq ), then: 1. The final configuration move contained in it is 2. The final configuration move contained in it is
of any odd-numbered (resp., even-numbered) packetRq . of any even-numbered (resp., odd-numbered) packeta perfect-mixed configuration of Dq .
Validity of items 3 and 4 of Proposition 2 follows. 4.3
Counting the Optimal Solutions to BT Hn
In this sub-section, we prove Item 2 of Theorem 4. By Item 1 of Theorem 4 together with Item 2 of Proposition 2, it follows that in the case n ≤ 2k + 1 holds |OptA→C (n)| = 1, which corresponds to Item 2 of Theorem 4. Thus, we may henceforth assume that n > 2k + 1. By the results of the previous sub-section, the members of OP T (q, 2l ) correspond bijectively to the sequences of 2l−1 perfect-mixed configurations of Dq . Hence, the members of Somehow-ptA→B (n−1, R) and those of Somehow-pt−1 B→C n−1 (n−1, R) correspond bijectively to the sequences of ⌈2⌈ k ⌉−3 ⌉−1 perfect-mixed
On Optimal Solutions for the Bottleneck Tower of Hanoi Problem
259
configurations of Dq . Therefore, by Theorem 4(1), the members of OptA→C (n) n−1 n−1 correspond bijectively to the sequences of 2(⌈2⌈ k ⌉−3 ⌉ − 1) + 1 = ⌈2⌈ k ⌉−2 ⌉ − 1 perfect-mixed configurations of Dq . Our next goal is to describe the family of all perfect-mixed configurations k+r − k+1 . of Dq and to prove that their number, denoted by f (k, r), equals k+r k This equality and the above correspondence will yield item 2 of Theorem 4. Consider some perfect-mixed configuration, denoted by M . We denote by Mi the set of disks in Small(q), higher than q − i + 1 at M but lower than q − i, for each 1 ≤ i ≤ k − 1, and denote the set of disks in Small(q), higher than q − k + 1 at M by Mk . The fact that Small(q − i) is higher than q − i at any perfect-mixed configuration of Dq , M , together with the fact that the disks in each one of Small(q) and B1 (q) are in decreasing order implies that Mi ≤ i, for each 1 ≤ i ≤ k, and k that i=1 Mi = Small(q). The equality in question holds by the case c = 0 of the following proposition. Proposition 3. Let t, n and c be three non-negative integers s.t. n ≤ t + c. Denote the number of non-negative integer solutions that satisfy the two following conditions, as a function of n, t and c by φ (t, n, c). t 1. i=0 xi = n. j 2. For each 0 ≤ j ≤ n− i=0 xi ≤ j + c. c : n+t n+t Then, φ(t, n, c) = t − t+c+1 .
Proposition 3 is proved by a complete induction on t, based onthe fact that for c all natural values of n, t and c s.t. n ≤ t + c, holds φ(n, t, c) = i=0 φ(t − 1, n − c i+x i, c − i + 1) and using the Pascal Triangle equality i=0 x = c+x+1 x+1 .
References 1. Beneditkis, S. and Safro, I.: Generalizations of the Tower of Hanoi Problem. Final Project Report, supervised by D. Berend, Dept. of Mathematics and Computer Science, Ben-Gurion University (1998) 2. Chen, X., Tian, B., and Wang, L.: Santa Claus’ Towers of Hanoi. Manuscript (2005) 3. Dinitz, Y. and Solomon, S.: Optimal Algorithms for Tower of Hanoi Problems with Relaxed Placement Rules. To appear in Proc. of the 17th International Symp. on Algorithms and Computation (ISAAC’06), Kolkata, India (December 2006) 4. Poole, D.: The Bottleneck Towers of Hanoi Problem. J. of Recreational Math. 24 3 (1992) 203–207. 5. Szegedy, M.: In How Many Steps the k Peg Version of the Towers of Hanoi Game Can Be Solved? Symposium on Theoretical Aspects of Computer Science 1563 (1999) 356 6. Wood, D.: The Towers of Brahma and Hanoi Revisited. J. of Recreational Math. 14 1 (1981-1982) 17-24
Competitive Maintenance of Minimum Spanning Trees in Dynamic Graphs⋆ Miroslaw Dynia1 , Miroslaw Korzeniowski2,⋆⋆ , and Jaroslaw Kutylowski3 1
DFG Graduate College “Automatic Configuration in Open Systems”, Heinz Nixdorf Institute, University of Paderborn, Germany 2 Institute of Computer Science, University of Wroclaw, Poland and LaBRI – INRIA Futurs, Bordeaux, France 3 International Graduate School of Dynamic Intelligent Systems, Heinz Nixdorf Institute, University of Paderborn, Germany
Abstract. We consider the problem of maintaining a minimum spanning tree within a graph with dynamically changing edge weights. An online algorithm is confronted with an input sequence of edge weight changes and has to choose a minimum spanning tree after each such change in the graph. The task of the algorithm is to perform as few changes in its minimum spanning tree as possible. We compare the number of changes in the minimum spanning tree produced by an online algorithm and that produced by an optimal offline algorithm. The number of changes is counted in the number of edges changed between spanning trees in consecutive rounds. For any graph with n vertices we provide a deterministic algorithm achieving a competitive ratio of O(n2 ). We show that this result is optimal up to a constant. Furthermore we give a lower bound for randomized algorithms of Ω(log n). We show a randomized algorithm achieving a competitive ratio of O(n log n) for general graphs and O(log n) for planar graphs.
1
Introduction
We consider the problem of maintaining a minimum spanning tree by an online algorithm with an adversary changing weights of edges of the underlying graph. Every time the weight of an edge is changed, the algorithm must output a minimum spanning tree for the new graph. If possible, this spanning tree should be the same spanning tree as computed in the previous round or at least both ⋆
⋆⋆
Partially supported by the EU within the 6th Framework Programme under contract 001907 (DELIS) and by the DFG-Sonderforschungsbereich SPP 1183: “Organic Computing. Smart Teams: Local, Distributed Strategies for Self-Organizing Robotic Exploration Teams”. This work was done while the author was in the International Graduate School of Dynamic Intelligent Systems, Heinz Nixdorf Institute, University of Paderborn, Germany. The author is partially supported by MNiSW grant number N206 001 31/0436, 2006-2008.
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 260–271, 2007. c Springer-Verlag Berlin Heidelberg 2007
Competitive Maintenance of Minimum Spanning Trees in Dynamic Graphs
261
trees should be similar. For every edge changed in the minimum spanning tree between consecutive rounds, the algorithm is charged unit cost. The problem of mainting a minimum spanning tree after the underlying graph is changed has been widely studied in literature (see e.g. [9,10,1,5,8]). Typically only the computational effort needed to maintain the minimum spanning tree, i.e. to choose a proper minimum spanning tree after an edge weight has been changed, has been considered. This research has resulted in the development of very efficient data structures which maintain information about the graph and allow to calculate new minimum spanning trees fast. On the other hand, in many applications the computational complexity needed for computing a new minimum spanning tree is not the only important factor. Another complexity parameter is the number of changes in the minimum spanning tree between rounds. Here, the main problem lies in choosing a minimum spanning tree from the set of possible minimum spanning trees. The chosen MST should retain its minimality property for a long time. In our model changing the minimum spanning tree by an algorithm is considered the most costly factor. The question on how to compute new minimum spanning trees has been already well studied, so that we do not consider it in our paper. We want to motivate this setting by giving an example coming from our research in the area of mobile networks. We also show further application areas of the results presented in this paper. We consider mobile networks which are wide-spread over large terrain. The network as a whole has its tasks to perform in the terrain and needs a communication structure so that individual participants can communicate with each other and coordinate. Due to the large size of the terrain and heavy environmental conditions, the transmission power of the network participants may not suffice to form a connected multihop network. This may happen for example in a mountainous area during rescue actions, when satellite communication is not available and mountains strongly hinder radio wave propagation. To ensure a communication framework connecting network parts, we propose to use mobile robots, serving as relay stations which form multihop communication paths between network parts. These small robots have no other function as to keep their position on the communication path and to forward messages along the established relay path. This is a new approach, and the authors are not aware of a similar solution presented in the literature. We can model network parts as nodes of a graph and paths in the terrain between them as edges of this graph. Obviously, the paths have different lengths and these are mapped to edge weights of the graph. Our goal is to create a communication structure between network parts by using mobile relay stations on some of the paths. These relay stations must be located on the path in some limited distance to be able to communicate with each other – consequently the number of required relay stations per path is proportional to its length. We want to select the path to be populated by relay stations so that the network is connected and simultaneously to minimize the number of used relay stations. Minimum spanning trees in this graph are the optimal solutions regarding this
262
M. Dynia, M. Korzeniowski, and J. Kutylowski
measure. The minimum spanning tree must be maintained while the graph dynamically changes – the weights of edges increase or decrease while nodes move. The goal of an algorithm should be not only to maintain a minimum spanning tree all the time but also to minimize the number of changes in the spanning tree. The rationale for this is that every change in the spanning tree forces mobile relay stations to move from one path to another, incurring large energy and time cost. This cost is surely larger than the cost of computing a new minimum spanning tree and thus the primary goal should be to minimize the number of changes in the minimum spanning tree. Apart from the scenario from mobile network research, the described cost model is reasonable in any application where changes in the minimum spanning tree are costly. This occurs e.g. in networks where trees are used as a routing or overlay structure and changing the MST means that routing or configuration tables have to be broadcasted along the network and updated. Minimum spanning trees have been used in such scenarios for a long time, some recent examples may be found in [15,16,12]. Our model does not explicitely require that the graph contains many different minimum spanning trees, but the application of our algorithms is only reasonable when this occurs. In such graphs, both bad and good choices for the minimum spanning tree can be made. Graphs contain many minimum spanning trees when there are many edges with equal weight. This is quite improbable to happen if the edge weights are given by sensor readings, as described in the previous paragraph. On the other hand, small fluctuations of these sensor readings can cause the only minimum spanning tree to change very frequently. Thus, we recommend to round the sensor readings appropriately, so that some stabiblity is brought into the edge weights (without sacrificing accuracy) and fluctuations can be reduced. Then, presented competitive algorithms can show their power when it comes to choosing the proper minimum spanning tree from those available at a moment. 1.1
Our Contribution
We compare the performance of our algorithms regarding the described cost model to the performance of an optimal offline algorithm by competitive analysis. We present a detailed model for the mentioned problem and give lower and upper bounds for deterministic online algorithms (see Sections 2 and 3). Our deterministic algorithm achieves a competitive ratio of n2 /2 and this is optimal up to a constant factor. We improve the competitive ratio by introducing a randomized algorithm with an expected competitive ratio of O(n log n) and O(log n) for planar graphs. The discussion of the planar case can be found in the full version of this paper, together with a lower bound of Ω(log n) for the expected competitive ratio of any randomized algorithm. The mentioned randomized algorithm works only for a restricted scenario, where the weights of edges can only grow. In this context, it is worth noting that the lower bound presented in Section 2 does not need to decrease edge weights. This gives some indication that the real hardness of the problem does
Competitive Maintenance of Minimum Spanning Trees in Dynamic Graphs
263
not lie within decreasing edges, but can be also expressed by only increasing the weights of edges. 1.2
Related Work
Research on minimum spanning trees dates back to textbook algorithms by Kruskal [11] and Prim [14]. In the static setting improved solutions have been considered e.g in [4]. All this work assumes that the graph remains static and considers the classical runtime complexity of algorithms. Research in this area is still vivid, see e.g. recent results by Chazelle [3] and Pettie [13]. As already noted, large effort has been put into constructing data structures which also allow minimum spanning trees to be computed efficiently when changes in the structure of the graph occur. These changes can either concern edge weights as assumed in our work (see e.g. [9]) or might encompass adding and deleting vertices ([5,8,10]). Furthermore kinetic spanning trees have been considered in [1] to model the changes of edge lengths in a more predictable way. For an in-depth survey of different types of subproblems in the area of minimum spanning trees, for applications and results we refer the interested reader to [7]. 1.3
Our Model
We model a mobile ad-hoc network as a connected graph G = (V, E) with edges weighted initially by the function w0 : E → N+ . Time is divided into discrete time steps called rounds. In round i the value of σ(i) ∈ E × {−1, +1} defines the change of weight of an edge. Only one edge changes its weight in one round and the change is bounded to either +1 or −1. The sequence σ is the input sequence. Basing on the original input sequence we denote for convenience of notation by δ(i, e) ∈ {−1, 0, 1} the change of weight of edge e ∈ E in the i-th round. Formally we have ⎧ ⎨ −1 , if σ(i) = (e, −1) δ(i, e) = 1 , if σ(i) = (e, +1) ⎩ 0 , otherwise .
Furthermore, we introduce the function w : N+ × E → N+ which maps a round number r and edge e to the edge weight at the beginning of round r. This gives wr (e) = w0 (e) +
r−1
δ(i, e) .
i=1
An algorithm alg solving the Online Dynamic Minimum Spanning Tree (ODMST) problem reads the input sequence σ and after obtaining σ(r) outputs a minimum spanning tree, denoted by Mralg . Since the algorithm does not know the future, it has no access to the values of σ(i) for i > r in round r. The cost of an algorithm in round r is defined as the number of edges in which r−1 and Mralg differ, formally Malg r r Calg := {e ∈ E|e ∈ / Mr−1 alg ∧ e ∈ Malg } .
264
M. Dynia, M. Korzeniowski, and J. Kutylowski
Additionally Calg (σ) is the cost of the algorithm on the whole sequence, thus Calg (σ) =
|σ|
i Calg .
i=1
To measure the performance of algorithms solving the ODMST problem we employ competitive analysis (see e.g. [2]). The definition of the ODMST problem fulfills the typical definition of online problems. An optimal algorithm opt computes a sequence of minimum spanning trees Miopt for i = 1, . . . , |σ| minimizing |σ| i the value of Copt (σ), with Copt (σ) = i=1 Copt , where r r / Mr−1 Copt := {e ∈ E|e ∈ opt ∧ e ∈ Mopt } .
The optimal algorithm has access to the whole sequence σ in advance. A deterministic algorithm alg has a competitive ratio of Ralg if for all input sequences σ we have Calg (σ) ≤ Ralg · Copt (σ) + c, where c does not depend on σ. For a randomized algorithm alg we have to introduce the notion of an oblivious adversary. The input sequence σ is constructed by an adversary having access to the algorithm alg and the probability distributions used by alg to perform its task. The oblivious adversary does not know the random bits used by alg. With the given notion a randomized algorithm alg has a competitive ratio of Ralg if E[Calg (σ)] ≤ Ralg · Copt (σ) + c for every input sequence σ. The expected value of Calg is taken with respect to the random choices of alg. 1.4
Notation
The following notation will be used throughout the whole paper. The set of alternative edges A(e, r) is defined for a graph G = (V, E), a round r, an algorithm alg and an edge e ∈ Mralg . Removing e from Mralg splits the tree into two parts. Consider the vertex sets V1 and V2 of both parts. Then the set of edges on the cut between V1 and V2 is denoted by A(e, r) = {(u, v) ∈ E|u ∈ V1 ∧ v ∈ V2 } . We also define a set of alternatives which have a certain weight Aw (e, r, w) = {e′ ∈ A(e, Mralg )|wr (e′ ) = w} . Suppose we extend Mralg by adding an edge e and thus creating a cycle in Mralg . Then all edges on this cycle except for e are denoted by C(e, r). Analogously to the set Aw (·), we define a set of all edges from C(e, r) with a certain weight Cw (e, r, w) = {e′ ∈ C(e, r)|wr (e′ ) = w} . Note that A(e, r) includes e, whereas C(e, r) does not.
Competitive Maintenance of Minimum Spanning Trees in Dynamic Graphs
2
265
Deterministic Lower Bound
We construct an input sequence for the ODMST problem which causes every online deterministic algorithm to have a competitive ratio Ralg ∈ Ω(n2 ). We assume that the input sequence is given by an adversary who examines the moves of alg. To construct a deterministic lower bound we have to be able to construct an input sequence σ such that for an arbitrary large k we have Copt (σ) ≥ k and Calg (σ) ≤ Ralg ·Copt (σ)+c with an appropriate constant c only dependent on the input graph for any deterministic algorithm alg. This is analogous to the formulation found in [6] given for randomized algorithms, rewritten for deterministic algorithms here. Input graph. We first describe the graph for the lower bound construction. We take a complete graph G = (V, E) with |V | even and partition V into two sets V1 and V2 with |V1 | = |V2 |. Call EC the set of edges lying on the cut between V1 and V2 . To each edge e ∈ EC we assign a weight w0 (e) = n = |V |, all other edges are assigned a weight of 1. Obviously at least one edge from EC must be used in every spanning tree and, since we consider minimum spanning trees, it will be the only one. Input sequence. We construct an input sequence consisting of phases of length 2|EC |. Within each phase alg has a cost of at least |EC | and opt has a cost of 1 or 2. For each k we can concatenate k phases and obtain an input sequence σ for which k ≤ Copt (σ) ≤ 2k and Calg (σ) ≥ |EC2−1| Copt (σ). From this fact we can conclude that every deterministic algorithm has a competitive ratio greater or equal |EC2−1| . In the first part of a phase, the adversary’s goal is to increase the weight of all edges in EC to n + 1. The adversary watches the moves of alg and always increases the weight of an edge from EC used by alg. Obviously, alg can only use edges from EC with weight n while such edges exist – if it was using one with weight n + 1, its spanning tree would not be minimal. Thus the adversary is able to increase the weight of an edge used by alg until all edges have weight n + 1. Every such change except the last one incurs at least a cost of 1 to the algorithm. Since there are |EC | requests, the algorithm has a cost of at least |EC | − 1. In the second part of a phase the weight of all edges is decreased to n in the same order as they were increased. We neglect alg’s cost during these operations. For such an input sequence it is easy to construct a strategy for opt which has a cost of 1 or 2 in each phase. Then we can construct an input sequence σ such that Copt (σ) ≥ k for every k and Calg (σ) ≥ k(|EC2 |−1) . It follows that the competitive ratio is at least |EC2|−1 for every phase. Concluding, we have shown that for every online deterministic algorithm alg for the ODMST problem we have Ralg ∈ Ω(n2 ).
266
3
M. Dynia, M. Korzeniowski, and J. Kutylowski
Deterministic Algorithm MSTMark
In this section we present the deterministic algorithm MSTMark which achieves an optimal, up to a constant factor, competitive ratio for the ODMST problem. Notation. The MSTMark algorithm (Algorithm 1) works on a graph G = (V, E) computing a minimum spanning tree Mralg in each round r. Where clear from the context we will write Malg instead of Mralg omitting the current round number. The minimum spanning tree maintained by the optimal offline algorithm is described as Mropt and, wherever possible by Mopt . We say that an algorithm substitutes edge e with e′ in round r if we have Mr+1 alg = (Mralg \ {e}) ∪ {e′ }. MSTMark algorithm. The algorithm has to respond to two different kinds of events – increases and decreases of weights of edges in G. If the weight of an edge r−1 e ∈ Malg is increased in round r, MSTMark tries to find a suitable alternative ′ e ∈ Aw (e, r − 1, wr−1 (e)). If a not marked edge e′ can be found, MSTMark replaces e with e′ in Mralg . By the construction of the set Aw (·) any such edge causes Malg to remain a minimum spanning tree. If an appropriate edge r−1 cannot be found, MSTMark sets Mralg = Malg . r−1 If the weight of an edge e ∈ / Malg is decreased in round r, MSTMark checks whether there is a not marked edge e′ ∈ C(e, r − 1) with a higher weight than wr (e) . If yes, it substitutes e′ with e within Malg . If no, MSTMark sets Mralg = Mr−1 alg . In all other cases MSTMark does not perform any changes in its minimum spanning tree. The greedy approach changing only one edge of the MST on updates of edge weight has been already successfully applied in algorithms for updating minimum spanning trees, e.g. in [9], thus we won’t argue its correctness. Mentioned results also allow to perform the described changes in the minimum spanning tree efficiently by employing appropriate data structures. Marking with flags. In addition to the described behavior, MSTMark marks edges of G with two kinds of flags: presence and absence. The idea is that a flag is put on an edge e, where MSTMark is sure that, respectively, e ∈ Mopt or e ∈ / Mopt . This information is only guaranteed for the very round when the mark has been set – for future rounds it may not hold anymore. For the analysis of the competitive ratio of MSTMark one has to introduce the notion of epochs. The presence and absence flags are the key to this analysis. An epoch starts when all flags are removed from the graph (at lines 6, 11, 20 or 25 of MSTMark) and lasts until the next removal of all flags. We can show that in each epoch opt performs at least one change in its minimum spanning tree and that MSTMark performs at most n2 /2 changes. Then, the competetive ratio RMSTMark ≤ n2 /2. The analysis together with technical lemmas can be found in the full version of the paper.
Competitive Maintenance of Minimum Spanning Trees in Dynamic Graphs
267
Algorithm 1. MSTMark(round r) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28:
4
if weight of edge e ∈ Mr−1 and Aw (e, r − 1, wr−1 (e)) = ∅ then alg increased ′ r−1 ANM ← {e ∈ Aw (e, r − 1, w (e))|e′ isn’t marked with absence} if ANM = ∅ then remove e from Mralg and substitute it with any e′ ∈ ANM if e is marked with presence then remove all marks end if mark e with absence else remove e from Mralg and substitute it with e′ ∈ Aw (e, r − 1, wr−1 (e)) remove all marks mark e with absence end if end if r−1 (e)) = ∅ then if weight of edge e ∈ / Mr−1 alg decreased and Cw (e, r − 1, w CNM ← {e′ ∈ Cw (e, r − 1, wr−1 (e))|e′ isn’t marked with presence} if CNM = ∅ then remove e from Mralg and substitute it with any e′ ∈ CNM if e is marked with absence then remove all marks end if mark e′ with presence else remove e from Mralg and substitute it with e′ ∈ Aw (e, r − 1, wr−1 (e)) remove all marks mark e′ with presence end if end if
Randomized Algorithm RandMST
The randomized algorithm RandMST presented in this section achieves an expected competitive ratio of O(n log n). This algorithm works only for a limited scenario, where weights of edges are only increased. It is a cut down version of the MSTMark algorithm, with the handling of flags removed. In the considered scenario edge weights can only grow and we will see that flags are not necessary any more. Consider a round r in which the weight of an edge e ∈ Malg is increased. If there exist alternative edges for e with weight wr−1 (e), then RandMST selects one of these edges uniformly at random and uses it instead of e in Mralg . In other cases RandMST ignores the edge weight increase. We can show that the RandMST algorithm has an expected competitive ratio of O(n log n) for general graphs. For planar graphs the ratio drops to O(log n). This improvement is mainly (but not fully) due to the smaller number of edges in a planar graph, comparing to a general graph. The analysis of the planar case can be found in the full version of the paper.
268
M. Dynia, M. Korzeniowski, and J. Kutylowski
Algorithm 2. RandMST r−1 1: if weight of edge e ∈ Mr−1 (e)) = ∅ alg increased in round r and Aw (e, r − 1, w then 2: e′ ← choose uniformly at random an edge out of Aw (e, r − 1, wr−1 (e)) 3: remove e from Mralg and substitute it with e′ 4: end if
The Analysis. The idea of the analysis is to consider the behavior of RandMST in layers separately. Intuitively, a layer w consists only of edges which have weight w. In every layer we will divide the graph G into parts, called fixed components. The idea of fixed components is that the cost of opt on the whole input sequence is at least as high as the number of fixed components created in all layers. We will also be able to bound the expected cost of RandMST to O(n log n) times the number of fixed components created. From this we will be able to conclude that the expected competitive ratio of RandMST is at most O(n log n). Certain lemmas from this section have technically involved proofs – these can be found in the full version of this paper. We start the analysis with introducing the notions of fixed components, edge sets and realms. The fixed components. As already mentioned, we consider a separate layer of fixed components for each weight w. Let V (G′ ) denote the set of vertices of graph G′ , and E(G′ ) the set of edges of G′ . A fixed component is a subgraph of G and in every layer there is exactly one fixed component prior to round 1. A fixed component F in layer w splits if the weight of an edge e = (u, v) ∈ E(F ) with w(e) = w is increased, e ∈ Malg , and the size of minimum spanning trees does not increase (i.e. RandMST must perform a change in Malg ). Then the fixed component F splits into two fixed components F1 and F2 , such that V (F1 ) ⊂ V (F ) and V (F2 ) ⊂ V (F ) and V (F1 ) ∪ V (F2 ) = V (F ). Furthermore, a vertex x ∈ V (F ) is in V (F 1) if it can be reached from u when using only edges from Malg \{e}. Analogously the fixed component F2 consists of vertices which can be reached from v. We say that a fixed component splits on edge e if the split occurs due to an increase of weight of e. Note, that fixed components form a partition of the vertex set of G, and thus there are at most n fixed components in any layer. It is necessary for the splitting technique to work properly that the part of Malg contained in a fixed component is connected. This property will be established by Lemma 2. Besides fixed components, we also have edge sets in each layer w. Before round 1 there are no edge sets in any layer. If a fixed component F splits into F1 and F2 on an edge e with weight w, an edge set between F1 and F2 is created, denoted by ES (F1 , F2 ). It consists of all edges between vertices of F1 and F2 having weight w. If a fixed component F splits into F1 and F2 , edge sets connected to F also split. Consider such an edge set ES (F, F ′ ). Then every edge e ∈ ES (F, F ′ ) is put either into ES (F1 , F ′ ) or ES (F2 , F ′ ) depending on whether this edge connects F ′ to F1 or F2 . Note, that since there are at most n − 1 fixed components in a layer, the number of created edge sets is upper bounded by 2n.
Competitive Maintenance of Minimum Spanning Trees in Dynamic Graphs
269
The Realms. Up to now we have organized the graph G in fixed components and edge sets in each layer. We still need to introduce some further structure into the layers. We arrange the vertices of G in each layer into realms. The crucial property of realms is that in layer w there may be only edges with weight w + 1 or larger between vertices which are in separate realms. To implement the division into realms, we introduce the separating operation, which is applied in every round to every layer and realm separately. In layer w and realm R it looks for a maximum set of vertices V ′ , which has only edges with weight w + 1 or larger to the rest of vertices in R. Then a new realm is created and the vertices in V ′ are moved to it. So, the separating operation finds all vertex sets which can be moved to a separate realm. Fixed components are preserved when vertices are moved to a new realm, i.e. if two vertices v1 and v2 have been in a fixed component in realm R then after they are placed in another realm they remain in one fixed component. This requires creating new fixed components in new realms. Analogously, if two vertices have been in separate fixed components in R then they are placed in separate fixed components in the new realm. The following lemma states a crucial property of the separating operation. Lemma 1. Assume that the separating operation is applied in layer w. If a vertex set is put in a new realm and this causes an edge set ES (F1 , F2 ) to be split into two parts ES1 and ES2 contained in two realms, then only one of ES1 and ES2 has edges with weight w. Interaction between layers. We will now examine the interactions between distinct layers, the fixed components, edge sets and realms on them. We want to show that these interactions follow certain rules and that a certain property (as expressed by Corollary 1) is always fulfilled in a layer. Lemma 2. Between fixed components in layer w contained in the same realm there cannot be any edges with weight smaller than w. Each fixed component contains exactly one connected part of Malg . Corollary 1. The RandMST algorithm uses at most one edge of one edge set in Malg . The corollary states the most important fact for the analysis of the competitive ratio of RandMST. Splits and opt’s cost. We want to establish a bound between the number of operations opt performs on some layer w and the number of fixed component splits which have occurred on this layer. Lemma 3. Let sw be the number of fixed component splits in layer w during the whole execution of an input sequence. Let #Ew (G) be the number of edges having weight w in the graph G before round 1. Then opt has a cost of at least sw − #Ew (G) in layer w.
270
M. Dynia, M. Korzeniowski, and J. Kutylowski
By the last lemma, nearly every fixed component split (except for n splits for the whole execution) can be mapped to a round which causes a cost of 1 to opt. This mapping is obviously injective. If we can map RandMST’s costs to fixed component splits so that each split receives at most O(n log n) cost in expectation, then we can easily conclude that the expected competitive ratio of RandMST is O(n log n). We will call this mapping an assignment of costs, and introduce a cost assignment scheme, which assigns RandMST’s costs to fixed component splits. The cost assignment scheme. Every time a split of a fixed component F into F1 and F2 occurs, we assign all created edge sets to this split. This also includes edge sets which are divided in two by the split. This means that an edge set, which has previously been assigned to some split s be assigned to the split of F now. This operation can only decrease the number of edge sets assigned to any fixed component s. Since a fixed component split can create at most 2n edge sets, at most 2n edge sets are assigned to a split. We still have to bound the cost of RandMST on one edge set, i.e. bound the number of edge increases in an edge set which causes RandMST to change an edge in Malg . Consider the way RandMST chooses a new edge as an alternative for an edge e used before in layer w. This new edge is chosen from the whole alternative set uniformly at random. This alternative set is at least as large as the number of edges with weight w in the current edge set. What is important, is that each of the edges in the current edge set is chosen with the same probability. Thus, even if the adversary knows the code of RandMST and the probability distribution used by it, it can have no knowledge which particular edge is used within an edge set in Malg . On the other hand, by Corollary 1 we know that at most one edge out of an edge set is used in Malg . Let pES describe the probability that an edge out of the edge set ES is used in Malg . Let #ESw describe the number of edges with weight w in ES . Assume that we are now increasing the weight of an edge in edge set ES . Then, the probability of increasing the weight of an edge which is in Malg is exactly pES · 1/#ESw . We can upper bound pES ≤ 1. Furthermore, we know that the probability of increasing the weight of an edge in Malg is equal to the expected cost of RandMST, since RandMST’s cost is either 0 or 1. To bound the expected cost of RandMST on an edge set ES situated in layer w, we only look at requests in the input sequence which increase the weight of an edge set. Each of these requests decreases the number of edges with weight w in ES by one. What is important and follows from previous considerations, is that the number of edges with weight w in an edge set in layer w never increases after it has been created. So, the expected cost of RandMST on ES is then at 1 + . . . + 1, where x denotes the number of edges with weight w at most x1 + x−1 the moment of the creation of ES . This value is equal to Θ(log n), since we can upper bound x ≤ n2 . This cost assignment scheme assures that every change of edges in Malg producing cost is assigned to one edge set and, on the other hand, this edge set is assigned to a fixed component split. From the fact, that each fixed component
Competitive Maintenance of Minimum Spanning Trees in Dynamic Graphs
271
split is assigned at most O(n) edge sets and that each of these edge sets receives an expected cost of O(log n) we can easily conclude that the expected competitive ratio of RandMST is O(n log n).
References 1. Agarwal, P.K., Eppstein, D., Guibas, L.J., and Henzinger, M.R.: Parametric and Kinetic Minimum Spanning Trees. In FOCS’98: Proceedings of the 39th Annual Symposium on Foundations of Computer Science, Washington, DC, USA, IEEE Computer Society (1998) 596 2. Borodin, A. and El-Yaniv, R.: Online Computation and Competitive Analysis. Cambridge University Press (1998) 3. Chazelle, B.: A Minimum Spanning Tree Algorithm with Inverse-Ackermann Type Complexity. J. ACM 47 6 (2000) 1028–1047 4. Cheriton, D. and Tarjan, R.E.: Finding Minimum Spanning Trees. In SIAM Journal of Computing 5 (1976) 5. Chin, F. and Houck, D.: Algorithms for Updating Minimal Spanning Trees. In Journal of Computer and System Sciences 16 (1978) 333–344 6. Chrobak, M., Larmore, L.L., Lund, C., and Reingold, N.: A Better Lower Bound on the Competitive Ratio of the Randomized 2-Server Problem. Information Processing Letters 63 2 (1997) 79–83 7. Eppstein, D.: Spanning Trees and Spanners. Technical Report ICS-TR-96-16 (1996) 8. Eppstein, D., Galil, Z., Italiano, G.F., and Nissenzweig, A.: Sparsification a Technique for Speeding up Dynamic Graph Algorithms. J. ACM 44 5 (1997) 669–696 9. Frederickson, G.N.: Data Structures for On-Line Updating of Minimum Spanning Trees. In STOC’83: Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, New York, NY, USA, ACM Press (1983) 252–257 10. Henzinger, M.R. and King, V.: Maintaining Minimum Spanning Trees in Dynamic Graphs. In ICALP’97: Proceedings of the 24th International Colloquium on Automata, Languages and Programming, Springer-Verlag, London, UK (1997) 594–604 11. Kruskal, J.B.: On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. In Proceedings of the American Mathematics Society 7 (1956) 48–50 12. Pendarakis, D.E., Shi, S., Verma, D.C., and Waldvogel, M.: Almi: An Application Level Multicast Infrastructure. In 3rd USENIX Symposium on Internet Technologies and Systems (2001) 49–60 13. Pettie, S. and Ramachandran, V.: An Optimal Minimum Spanning Tree Algorithm. J. ACM 49 1 (2002) 16–34 14. Prim, R.C.: Shortest Connection Networks and Some Generalizations. In Bell System Technical Journal 36 (1957) 1389–1401 15. Wan, P.-J., Calinescu, G., Li, X.-Y., and Frieder, O.: Minimum-Energy Broadcasting in Static Ad Hoc Wireless Networks. In Wireless Networks 8 6 (2002) 607–617 16. Young, A., Chen, J., Ma, Z., Krishnamurthy, A., Peterson, L., and Wang, R.Y.: Overlay Mesh Construction Using Interleaved Spanning Trees. In IEEE INFOCOM (2004) 396–407
Exact Max 2-Sat: Easier and Faster⋆ Martin Fürer and Shiva Prasad Kasiviswanathan Computer Science and Engineering, Pennsylvania State University {furer,kasivisw}@cse.psu.edu
Abstract. Prior algorithms known for exactly solving Max 2-Sat improve upon the trivial upper bound only for very sparse instances. We present new algorithms for exactly solving (in fact, counting) weighted Max 2-Sat instances. One of them has a good performance if the underlying constraint graph has a small separator decomposition, another has a slightly improved worst case performance. For a 2-Sat instance F with ˜ (1−1/ (d ˜(F )−1))n ), where n variables, the worst case running time is O(2 ˜ d(F ) is the average degree in the constraint graph defined by F . We use strict α-gadgets introduced by Trevisan, Sorkin, Sudan, and Williamson to get the same upper bounds for problems like Max 3-Sat and Max Cut. We also introduce a notion of strict (α, β)-gadget to provide a framework that allows composition of gadgets. This framework allows us to obtain the same upper bounds for Max k-Sat and Max k-Lin-2.
1
Introduction
The Max 2-Sat problem is: Given a Boolean formula F in 2-Cnf (conjunctive normal form with 2 literals per clause), find a truth assignment that satisfies the maximum possible number of clauses. In this paper, we consider the more general weighted Max 2-Sat problem. Numerous results regarding worst-case bounds for exact solutions of Max 2-Sat have been published. The currently ˜ m/5.5 ) [15] best worst case bounds in terms of the number of clauses m is O(2 n (so for m/n > 5.5 it is no better than the trivial 2 ). Improvements in the exponential bounds are critical, for even a slight improvement from O(cn ) to O((c−ǫ)n ) can significantly change the range of the problem being tractable. For Max 2-Sat, improvements in terms of the number of variables has been surprisingly hard to achieve. Consequently, several researchers [1, 22] have explicitly proposed a 2cn , c < 1 algorithm for Max 2-Sat (or Max Cut) as an open problem. In a recent paper Williams [21] gave an exponential space algorithm for the ˜ ωn/3 )1 , where Max 2-Sat and Max Cut problems with a running time of O(2 ω is the matrix multiplication exponent over a ring. The space requirement of ⋆
1
This material is based upon work supported by the National Science Foundation under Grant CCR-0209099. ˜ Throughout the paper, O(g(n)) ≡ nO(1) g(n).
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 272–283, 2007. c Springer-Verlag Berlin Heidelberg 2007
Exact Max 2-Sat: Easier and Faster
273
the algorithm is of order 22n/3 . Unfortunately, it is well known that exponential space algorithms are useless for real applications [23]. Both Williams [21] and Woeginger [23] state the problem of improving the bounds using only polynomial space as open. For Max 2-Sat a bound of 2n△(F )/(△(F )+1) is simple to achieve [11], where △(F ) is the maximum degree in the constraint graph of F . We present two algorithms for Max 2-Sat, both of which always improve upon this simple bound. The algorithms always operate on the underlying constraint graph. The first algorithm Local-2-Sat, continuously branches on the neighborhood of a lowest ˜ )−1))n ˜ (1−1/(d(F ). The second algodegree vertex and has a running time of O(2 rithm Global-2-Sat, searches for a small vertex cut, removal of which would divide the graph into disconnected components. It has an excellent performance if the constraint graph has a small separator decomposition and we expect it to perform well in practice when combined with a graph partitioning heuristic. Furthermore, we show that the worst case performance of Global-2-Sat is almost comparable to the performance of Local-2-Sat. Another advantage of both our algorithms is that the analysis is much simpler and avoids tedious enumerations present in previous results. Loosely speaking, the idea behind our algorithms is recursive decomposition based on a popular approach that has originated in papers by Davis, Putnam, Logemann and Loveland [7, 8]. The recurrent idea behind these algorithms is to choose a variable v and to recursively count the number of satisfying assignments where v is true as well as those where v is false, i.e., we branch on v. Instead of choosing a single variable to branch, in every step of our algorithm we branch on some chosen set of variables. The algorithms start by performing a parsimonious reduction from clause weighted Max 2-Sat to variable weighted 3-Sat. During this process, we introduce dummy variables which are used purely for bookkeeping purposes and don’t contribute to the running times. We then use vertex separators to branch on fewer variables. Separators have been used in the past to get improved worst case bounds for NP-hard problems especially in the context of planar graphs [16, 17]. Recently Dahllöf et al. [6] used separators to improve worst case bounds of weighted 2-Sat for the case of separable constraint graphs. Max Cut is closely related to Max 2-Sat and we obtain a worst case bound ˜ ˜ ˜ (1−1/(d(G)−1))n ), where d(G) is the average degree of the graph G. To of O(2 achieve this, we use various strict α-gadgets (introduced in [19]) with Max 2Sat as our target problem. We also extend the definition of such gadgets to provide a framework that allows composition of gadgets. We use such compositions to obtain better worst case bounds for Max k-Sat, Max k-Lin-2 (see Section 4). Even though we describe the algorithm in relation to Max 2-Sat, it is applicable with the same time bounds for any weighted binary constraint satisfaction problem (Max 2-Csp), whose clauses are over pairs of binary variables. We omit the discussions of Max 2-Csp and concentrate only on Max 2-Sat.
274
2
M. Fürer and S.P. Kasiviswanathan
Preliminaries
We employ notation similar to that proposed in [6]. In the Max 2-Sat problem, with each clause C, a weight ω(C) ∈ N is associated. We seek an assignment which provides a maximum sum of weights of satisfied clauses. #Max 2-Sat is the problem of counting the number of such assignments. 3-Sat is the problem of computing the maximum weight satisfying assignments (called models) for a 3-Cnf formula. With each literal l, a weight w(l) ∈ N and a count c(l) ≥ 1 is associated. #3-Sat is the corresponding counting version. For a 3-Sat instance F , we define the weight and cardinality of a model M respectively as w(l) and C(M ) = c(l) W(M ) = {l∈L(F ) | l is true in M} {l∈L(F ) | l is true in M} where L(F ) is the set of literals in F . For M being the set of all maximum weight models for F , and M ′ being any arbitrary maximum weight model in M define ′ C(M ), W(M ) . #3-Sat(F, C, W ) = M∈M
V ar(F ) denotes the set of variables in F . For a set of variables A ∈ V ar(F ) and an assignment κ to them, let F [A = κ] be the problem resulting from assigning κ to the variables in A. For any set of variables A, F (A) denotes the sub-formula of F formed by collecting the clauses involving at least one variable from A. We transform our Max 2-Sat problem into a 3-Sat instance F ′ by adding dummy variables. Let V arx (F ′ ) denote the variables of the 2-Sat instance F present in F ′ and V ard (F ′ ) denote the dummy variables added during transformation. The sets V arx (F ′ ) and V ard (F ′ ) form a partition of V ar(F ′ ). In Section 4, we will also introduce auxiliary variables in the context of gadget reductions. Given a Boolean formula F , we define the constraint graph G(F ) = (V ar(F ), E), as the undirected graph where the vertex set is the set of variables and the edge set E is {(u, v) | u, v appear in the same clause of F }. For a graph G = (V, E), the neighborhood of a vertex v ∈ V denoted by NG (v), is the set {u | (u, v) ∈ E}. A subset of vertices S of a graph G with n vertices is an f (n)-separator that ρ-splits if |S| ≤ f (n) and the vertices of G−S can be partitioned into two sets V1 and V2 such that there are no edges from V1 to V2 , max{|V1 |, |V2 |} ≤ ρn, where parameter ρ is less than 1 and f is a function. An f (K)-separator decomposition of G is a recursive decomposition of G using separators, where subgraphs of size K have separators of size O(f (K)). We call a graph to be separable if it has a small separator decomposition.
Exact Max 2-Sat: Easier and Faster
2.1
275
Helper Functions
We use similar functions and structures as in [6, 9], some of which have been reproduced for completeness. The first function called Propagate simplifies the formula by removing dead variables. The four steps of the algorithm are performed until not applicable. It returns the updated formula, the weight of the variables removed, and count for the eliminated variables. Function Propagate(F, C, W ) (Initialize w ← 0, c ← 1) 1) If there is a clause (1 ∨ . . .) then it is removed, any variable a which gets removed is handled according to cases a) If w(a) = w(¬a) then c = c · (c(a) + c(¬a)); w = w + w(a). b) If w(a) < w(¬a) then c = c · (c(¬a)); w = w + w(¬a). c) If w(a) > w(¬a) then c = c · c(a); w = w + w(a). 2) If there is a clause of the form (0 ∨ . . .) remove 0 from it. 3) If there is a clause of the form (a) then remove it and c = c · c(a); w = w + w(a), and, if a still appears in F then F = F [a = 1]. 4) Return(F, c, w).
Another function called Reduce reduces the input formula. It takes advantage of the fact that if a formula F can be partitioned into sub-formulas F0 and F1 such that each clause belongs to either of them, and |V ar(F0 ) ∩ V ar(F1 )| = 1, then we can remove F0 or F1 while appropriately updating count and weight associated with the common variable. Among F0 , F1 we always remove the one with the smaller number of variables. In all our invocations of Reduce at least one of sub-formulas will be of constant size, thus each invocation takes O(1) time. Function Reduce(F, v) (Assume F = F0 ∧ F1 with V ar(F0 ) ∩ V ar(F1 ) = {v}) 1) Let |V ar(Fi )| ≤ |V ar(F1−i )|, i ∈ {0, 1}. 2) Set (ct , wt ) to #3-Sat(Fi [v = 1], C, W ). 3) Set (cf , wf ) to #3-Sat(Fi [v = 0], C, W ). 4) c(v) ← ct · c(v), c(¬v) ← cf · c(¬v), w(v) ← wt + w(v), w(¬v) ← wf + w(¬v). 5) Return #3-Sat(F1−i , C, W ).
The following lemma (stated without proof) shows that the value of #3-Sat(F, C, W ) is preserved under both these routines. The proof idea is similar to that used by Dahllöf et al. [6] for similar claim in the context of #2-Sat. Lemma 1. Applying Reduce and Propagate does not change the return value of #3-Sat(F, C, W ). The algorithms operate on all connected components of the constraint graph. In our algorithms (because of the bookkeeping involved) the process of branching on a variable has a lengthy description. Since this is not relevant for our result, we will be informal and hide the technicalities behind the phrase branch on.
3
Algorithms for Max 2-Sat
In this section we give algorithms for the problem of Max 2-Sat that improves the simple bound for all instances. The function Transform converts the Max
276
M. Fürer and S.P. Kasiviswanathan
2-Sat instance F into a 3-Sat instance F ′ by adding a distinct dummy variable to each clause. Dummy variables are used for bookkeeping of weights. Since the number of clauses can be as big as O(n2 ), so could be the number of dummy variables. We only branch on variables of V arx (F ′ )(= V ar(F )). As soon as we supply a variable xi ∈ V arx (F ′ ) with some value in a branch all the clauses containing xi in F ′ disappear due to the Reduce and Propagate routines. Function Transform(F ) 1) For each clause C = (xi ∨ xj ), C ∈ F ; add a clause C ′ = (xi ∨ xj ∨ dC ) to F ′ . 2) We create a weighted instance F ′ by the following rules: a) Assign weight 0 to any literal of type xi ∈ V arx (F ′ ) or ¬xi ∈ V arx (F ′ ). b) Assign weight ω (C) to the literal ¬d C for all d C ∈ V ard (F ′ ). c) Assign weight 0 to the literal dC for all dC ∈ V ard (F ′ ). 3) Return F ′ .
Let F be a Max 2-Sat instance with Ξ(F ) being the set of all assignments to V ar(F ). Also define R3SAT (F ′ ) as R3SAT (F ′ ) = {Assignments to V ar(F ′ ) | for any C ′ , dC is set true iff it is required to satisfy C ′ }. Define a function T : Ξ(F ) → R3SAT (F ′ ) where F ′ = Transform(F ). The function T takes some assignment for V ar(F ) and produces an assignment for V ar(F ′ ) by carrying over all assignments to V ar(F ) to V arx (F ′ ) and assigning dC true iff it is required to satisfy clause C ′ . The following theorem implies that the number of optimal solutions and the weight of these solutions are preserved under Transform. Theorem 1. T is a value preserving bijective function. Proof. To prove the bijection we start off by observing the T is one-to-one because any two distinct elements in Ξ(F ) have different images in R3SAT (F ′). Also every assignment in R3SAT (F ′) has a pre-image in Ξ(F ) which is just the restoration of the assignment to V arx (F ′ ). So the function is bijective. Also in F we collect the weight of satisfied clauses. In F ′ we set dC true iff required to satisfy clause C ′ . So if the corresponding clause in F is true we set dC false in F ′ and collect the weight of the clause and if the corresponding clause in F is false we set dC true and collect no weights. Hence, the function is also value preserving. ❑ Corollary 1. The number of optimal solutions and the weight of these solutions are preserved under Transform. 3.1
Algorithm Local-2-Sat
In this subsection we present an algorithm for Max 2-Sat that has an good worst case performance. At every step the algorithm Local-2-Sat chooses the lowest degree node in G(F ) and branches on all but one of its neighbors.
Exact Max 2-Sat: Easier and Faster
277
Algorithm Local-2-Sat(F ′ , C, W ) (Initialize w ← 0, c ← 1) Let F ′ = Transform(F ). 1) If V arx (F ′ ) = ∅ a) Pick a vertex y from V ar(F ) with minimum degree in G(F ). b) Pick any vertex z from NG(F ) (y). c) For each assignment κ to the variables in N G(F ) (y ) \ {z }: c1) Let (F 1 , c1 , w1 ) = Propagate(F ′ [NG(F ) (y) \ {z} = κ], C, W ). c2) Let (F2 , C1 , W1 ) = Reduce(F1 , z). c3) Let (c2 , w2 ) = Local-2-Sat(F 2 , C1 , W1 ). ⎧ if w1 + w2 < w, ⎨ (c, w) c4) Compute (c, w) = (c + (c1 · c2 ), w) if w1 + w2 = w, ⎩ (c1 · c2 , w1 + w2 ) if w1 + w2 > w. 3) Return (c, w).
The correctness of the algorithm Local-2-Sat is omitted in this extended abstract. We now show that the running time of the algorithm depends on the ˜ ) = 2m ) of the graph G(F ). This is especially powerful when average degree (d(F n the degrees are not uniform. Theorem 2. Let F be the input Max 2-Sat instance on n variables, and let ˜ )−2)/(d(F ˜ )−1))n ˜ ((d(F F ′ = Transform(F ). Then Local-2-Sat runs in O(2 ) time ˜ on F , where d(F ) is the average degree in G(F ) = (V ar(F ), E). Proof. Let m = |E| and n = |V ar(F )|. Let δ denote the degree of the vertex y. As soon as we assign some truth values to δ − 1 neighbors of y, all the clauses involving y but not involving z gets removed in polynomial time by the Propagate routine. Also all the clauses involving both y and z gets removed by Reduce. Therefore, the variable y get removed from G(F ). The decrease in n is at least δ(= |(NG(F ) (y) \ {z}) ∪ {y}|.
The decrease in m is at least δ+1 2 . This can be split up as: (a) δ edges incident on y, (b) vertices in NG(F ) (y) \ {z} can in worst case form a clique
edges, (c) since all vertices have degree among themselves, accounting for δ−1 2 at least δ, therefore each vertex in NG(F ) (y) \ {z} has at least an edge either to z or to an vertex not in NG(F ) (y) ∪ {y}, which accounts for δ − 1 more edges. Therefore, δ+1 δ−1 T (m, n) ≤ 2 T (m − , n − δ) + O(δ 2 ), 2 2
˜ 2mn−2n 2m−n which we can show solves to T (m, n) = O(2 ) by induction over m and n. The base case is straightforward for m = δ = 0, n = 1. To complete the induction we show that 2mn−2n2 δ+1 δ−1 2 T (m − , n − δ) ≤ 2 2m−n . 2 On applying inductive hypothesis we get: 2δ−1+
2m(n−δ)−δ(δ+1)(n−δ)−2(n−δ)2 2m−δ(δ+1)−(n−δ)
≤2
2mn−2n2 2m−n
⇔ 4mnδ + 4mn − 4m2 − 2n2 δ − n2 ≤ n2 δ 2 .
278
M. Fürer and S.P. Kasiviswanathan
This holds if the function f (δ) = n2 δ 2 − 4mnδ − 4mn + 4m2 + 2n2 δ + n2 ≥ 0 for δ ∈ [0, ⌊ 2m n ⌋]. Now in the interval of δ, f (δ) is monotonically decreasing till δ = 2m − 1 and monotonically increasing from there on, with f ( 2m n n − 1) = 0. Therefore, ˜ )−2 2 d(F ˜ )−1 n ˜ d(F ˜ 2mn−2n 2m−n ❑ ) = O(2 ). T (m, n) = O(2 3.2
Algorithm Global-2-Sat
In this subsection we present an algorithm Max 2-Sat with good performance on families of graphs where small separators exist and can be found efficiently. We also show that worst case performance of the algorithm is comparable with that of Local-2-Sat. The algorithm is closely related to the algorithm of Dahllöf et al. [6] for solving weighted 2-Sat instances on separable constraint graphs. The algorithm recursively breaks down the input until a constant size b is reached. Then it performs an exhaustive search. The routine Sep takes a graph G(F ) and returns a tuple (A, B, S) such that, (i) A ∪ B ∪ S = V ar(F ), (ii) there exists no edge (u, v) in G(F ) with u ∈ A, v ∈ B. The correctness of the algorithm Global-2-Sat is omitted in this extended abstract. Algorithm Global-2-Sat(F ′ , C, W ) (Initialize w ← 0, c ← 1) Let F ′ = Transform(F ), and (A, B, S)=Sep(G(F )). 1) If |V arx (F ′ )| ≤ b, do exhaustive search. 2) Otherwise, for each assignment κ to the variables in S: a) Let (F1 , c1 , w1 ) = Propagate(F ′ [S = κ], C, W ). b) Let (c2 , w2 ) = Global-2-Sat(F1 (A), C, W ). c) Let (c3 , w3 ) = Global-2-Sat(F 1 (B), C, W ). ⎧ if w1 + w2 + w3 < w, ⎨ (c, w) if w1 + w2 + w3 = w, d) Compute (c, w) = (c + (c1 · c2 · c3 ), w) ⎩ (c1 · c2 · c3 , w1 + w2 + w3 ) if w1 + w2 + w3 > w. 3) Return (c, w).
Polynomial time algorithms are known for finding Max Cut on planar graphs [12] and graphs not contractible to K5 [3]. However, counting the number of Max 2-Sat or Max Cut solutions are #P-complete even when restricted to planar graphs (results not explicitly stated but follow readily from results and reductions in [13, 20]). The following theorem proves the upper bound of Global-2-Sat on separable graphs. In addition to the most widely known planar graphs, other graph families like bounded genus graphs, graphs with excluded minor, bounded treewidth graphs are known to separable (for some of these results see [2, 5, 10, 16]). Theorem 3. Let F be the input Max 2-Sat instance on n variables, and let F ′ = Transform(F ). Assume that an ηK μ -separator decomposition of G(F ) with parameter ρ < 1 can be found in polynomial time. Then Global-2-Sat runs in ˜ ηnµ /(1−ρ µ ) ) time on F . O(2
Exact Max 2-Sat: Easier and Faster
279
Proof. If S = ∅, we need just two recursive calls. Otherwise we branch on the variables in S. Since, max(|A|, |B|) ≤ ρn and |S| ≤ ηnμ , we get the following recursive equation for the running time: µ
T (n) ≤ 2ηn (T (ρn) + T ((1 − ρ)n)) + p(n), where p(n) is some polynomial function in n. This results in overall running time ˜ ηnµ /(1−ρµ ) ) for separable graphs. ❑ of O(2 Worst Case Bounds for Global-2-Sat: For many classes of graphs we know that no small separators exist. For deriving the worst case bounds we use following routine for function Sep. Function Bfs-Sep: Perform BFS search on G(F ) starting from any vertex. We maintain a partition of V ar(F ) into three sets: A (fully discovered), B (non visited), S (currently working). We stop the BFS search a step before |A| > |B|. We start by proving a general lemma about the lower bound on the number of internal nodes in a degree bounded tree with fixed number of leaves. Lemma 2. Let △ ≥ 3 be an upper bound on the degree of a graph G which is l−2 ⌉ internal nodes. a tree. If G has l leaves, then it has at least ⌈ △−2 Proof. Proof by induction. Start with any tree of size n and degree △ (as a graph). Let v be a deepest internal node. Let d ≤ △ − 1 be the number of children of v. Let T ′ be the tree of size n′ obtained by deleting all the children of v. Let i and i′ denote the number of internal nodes in T and T ′ respectively, i.e., i′ = i − 1. Let l′ be the number of leaves of T ′ , i.e., l′ = l − d + 1. We invoke the inductive hypothesis over T ′ , resulting in i≥
l′ + △ − 4 l−d+△−3 l−2 l′ − 2 +1= = ≥ . △−2 △−2 △−2 △−2
The last inequality follows because d ≤ △ − 1.
❑
Lemma 3. Let G be a n vertex graph with maximum degree △ ≥ 3. Then Bfs4 Sep always finds a f (n)-separator in polynomial time with, f (n) ≤ n(△−2) +△ . △ Proof. The leaves of the BFS tree form the cut S, the internal nodes in the tree correspond to set the A, and, the undiscovered nodes correspond to the set S. Let l(= f (n)) denote the size of this cut. We know from Lemma 2 that, |A| ≥ (l − 2)/(△− 2). Since we stop one step before |A| becomes greater than |B| we also have |B| ≥ (l − 2)/(△ − 2). Since the sizes of A, B and S sum to n, we have the following inequality for upper bound of l n≥
l−2 l−2 +l+ . △−2 △−2
Solving for l we get the claimed result.
❑
Theorem 4. Let F be the input Max 2-Sat instance on n variables, and let F ′ = Transform(F ). Let △ = △(F ) be the maximum degree in G(F ). Then ˜ ((△−2)/(△−1))n+(△+1) log n ) time on F . Global-2-Sat using Bfs-Sep runs in O(2
280
M. Fürer and S.P. Kasiviswanathan
Proof. If the maximum degree △ ≤ 2, then Max 2-Sat can be done in polynomial time. So we assume △ ≥ 3. Let l denote the size of the cut. Using Lemma 3, and using the fact that △ ≥ 3 we get l ≤ n(△ − 2)/△ + 4/△ < n(△ − 2)/△ + 2. Also, in every step of the BFS tree construction |B| increases by at most △ − 2. So when we stop |B| − |A| < △ − 2 < △, implying that max(|A|, |B|) = |B| < (n − l)/2 + △. The recurrence for Global-2-Sat using Bfs-Sep can be written as: T (n) ≤ 2l+1 T (n′ ) + cl△2 , where n′ =min(n − l, (n − l)/2 + △) and c is some constant. The inductive step with T (n′ ) true becomes, △−2
′
′
T (n) ≤ 2l+1 T (n′ ) + cl△2 ≤ 2l+1 cn′ △2 · 2 △−1 n +(△+1)I(n ) + cl△2 Using the fact that n′ ≤ n − l, n′ ≤
n−l 2 △−2
+ △ and I(n′ ) = I(n) − 1 we get
T (n) ≤ 2l+1 cn△2 · 2 △−1 (
n−l 2 +△)+(△+1)(I(n)−1)
.
We complete the inductive proof by showing that: △−2
2l+1+ △−1 ( ⇔ This holds as l ≤ recurrence
l△ 2
n−l 2 +△)(△+1)(I(n)−1)
△−2
≤ 2 △−1 n+(△+1)I(n)
+ (△ − 1)△ − 1 ≤ (△ − 2) n2 + (△2 − 1).
n(△−2) △
+ 2. And finally we evaluate I(n) by solving the
I(n) = 1 + I(n′ ) ≤ 1 + I(
n n−l + △) ≤ 1 + I( + △), 2 2
which solves to log(n − 2△) < log n. Therefore, using all the above arguments △−2
˜ △−1 n+(△+1) log n ). T (n) = O(2
4
❑
Gadgets and Implications of Improved Max 2-Sat
Throughout this section we follow notation introduced in [19]. We characterize a gadget by two parameters (α, β) to provide a framework that allows composition of gadgets. The definition of a strict (α, β)-gadget reducing a constraint function f to a constraint family F is: For α, β ∈ R+ , a constraint function f : {0, 1}n → {0, 1}, and a constraint family F , a strict (α, β)-gadget reducing f to F is a finite collection of constraints {C1 , . . . , Cβ ′ } from F over primary variables x1 , . . . , xn and auxiliary variables
Exact Max 2-Sat: Easier and Faster
281
y1 , . . . , ym and associated real weights {w1 , . . . , wβ ′ }, wi > 0, with the following properties:
β ′ i=1 wi = β and for the Boolean assignments a to x1 , . . . , xn and b to y1 , . . . , ym the following conditions are satisfied:
(∀a : f (a) = 1) maxb ( 1≤i≤β ′ wi Ci (a, b)) = α,
(∀a : f (a) = 0) maxb ( 1≤i≤β ′ wi Ci (a, b)) = α − 1.
Gadgets can be used for our purposes in the following manner. Assume we have an instance of our optimization problem with constraints of total weight W and there is a strict (α, β)-gadget reducing each constraint to Max 2-Sat. Then we can build a Max 2-Sat instance F whose optimum is αW and such that any solution value S for F corresponds to a value of exactly S − (α − 1)W for the original instance. We use the parameter β to help us in composition of gadgets as shown in Lemma 4. Note that optimality of α is not necessarily preserved under composition. In the rest of discussion we assume gadget parameters to be some small constants. a (3.5,4)−gadget
Transform
MAX 3−SAT to MAX 2−SAT
b
c
b
d7
a
a
y c
d6
d1 d2
y b d3
d5 c d4
Fig. 1. Illustration of (non)-effect of auxiliary variables. We convert 3-Sat clause (a ∨ b ∨ c) using a (3.5, 4)-gadget to (b ∨ ¬y) ∧ (¬b ∨ y) ∧ (c ∨ ¬y) ∧ (¬c ∨ y) ∧ (b ∨ c) ∧ (¬b ∨ ¬c) ∧ (a ∨ y) [19]. Clause (a ∨ y) has weight 1, all other clauses has a weight of 1/2. The optimal assignment for auxiliary and dummy variables can be fixed in polynomial time as the algorithms proceed.
Lemma 4. Let the strict (α1 , β1 )-gadget define a reduction from a constraint function f1 ∈ F1 , to constraint family FI . Let the strict (α2 , β2 )-gadget define a reduction from a constraint f2 ∈ FI to a constraint family F2 . Then there exists a strict (α, β)-gadget defining a reduction from the constraint function f1 ∈ F1 to the constraint family F2 . Furthermore, α = β1 (α2 − 1) + α1 and β = β1 β2 . Proof. The proof follows from the definition of these gadgets and is omitted in this extended abstract. ❑ Table 1 summarizes (α, β) values for reducing some interesting problems to Max 2-Sat. Definitions of these problems can be found in [19]. Note that many more interesting reductions to Max 2-Sat are known (see [4, 19] and references therein). Consider an instance I of any of the above problems and let I ′ be the Max 2Sat instance obtained after performing gadget reduction from every constraint
282
M. Fürer and S.P. Kasiviswanathan
Table 1. Some strict (α, β)-gadget reductions to Max 2-Sat. In both Max k-Sat and Max k-Lin-2, the k is a fixed constant. Source Problem Max 3-Sat Max k-Sat Max Cut Max k-Lin-2
(α, β) (3.5, 4) (3.5(k − 2), 4(k − 2)) (2, 2) (3.5k(k − 2), 4k(k − 2))
Notes See Fig. 1 for the reduction from [19]. Strict (k − 2, k − 2)-gadget to Max 3-Sat. Add (x ∨ y) ∧ (¬x ∨ ¬y) for an edge (x, y). Strict (k, k)-gadget to Max 3-Sat.
function in I. There are no edges in G(I ′ ) between auxiliary variables added for two different constraint functions. Also for any constraint function f ∈ I, no auxiliary variable added for f is adjacent in G(I ′ ) to a variable in I but not in f . This implies that the auxiliary variables added for f during gadget reduction get separated from G(I ′ ) as soon as we provide assignments to the variables of f . See also Fig. 1. Additionally, for every constraint function in I the number of auxiliary variables added during gadget reduction to Max 2-Sat is O(1). Therefore, as the algorithms (Local-2-Sat and Global-2-Sat) proceed, the optimal assignment for the auxiliary variables can be easily computed in polynomial time. This ensures that the bounds derived in Section 3 apply to the above mentioned problems as well. In the following table we summarize the worst case bounds obtained by using Local-2-Sat. Source Problem Time Complexity ˜ Max k-Sat, Max k-Lin-2 (k fixed) 2(1−1/(d(F )−1))n ˜ Max Cut 2(1−1/(d(G)−1))n
5
Concluding Remarks
We present algorithms with improved exponential bounds for solving and counting solutions of MAX 2-SAT instances with applications. In practice one would expect Global-2-Sat to perform better when combined with some good graph partitioning heuristic like METIS (based on [14]). An interesting question would be to investigate the expected polynomial running time of the Max Cut algorithm by Scott et al. [18] for sparse instances with these better bounds.
References 1. Alber, J., Gramm, J., and Niedermeier, R.: Faster Exact Algorithms for Hard Problems: a Parameterized Point of View. Discrete Mathematics 229 1 (2001) 3–27 2. Alon, N., Seymour, P., and Thomas, R.: A Separator Theorem for Graphs with an Excluded Minor and Its Applications, STOC’90, ACM (1990) 293–299
Exact Max 2-Sat: Easier and Faster
283
3. Barahona, F.: The MAX-CUT Problem on Graphs not Contractible to K5 . Operations Research Letters 2 3 (1983) 107–111 4. Bellare, M., Goldreich, O., and Sudan, M.: Free Bits, PCPs, and Nonapproximability-Towards Tight Results. SIAM Journal of Computing 27 3 (1998) 804–915 5. Bodlaender, H.L., Gilbert, J.R., Hafsteinsson, H., and Kloks, T.: Approximating Treewidth, Pathwidth, Frontsize, and Shortest Elimination Tree. Journal of Algorithms 18 2 (1995) 238–255 6. Dahllöf, V., Jonsson, P., and Wahlström, M.: Counting Models for 2SAT and 3SAT Formulae. Theoretical Computer Science 332 1-3 (2005) 265–291 7. Davis, M., Logemann, G., and Loveland, D.: A Machine Program for TheoremProving. Communications of the ACM 5 7 (1962) 394–397 8. Davis, M. and Putnam, H.: A Computing Procedure for Quantification Theory. Journal of Association Computer Machinery 7 (1960) 201–215 9. Fürer, M. and Kasiviswanathan, S.P.: Algorithms for Counting 2-SAT Solutions and Colorings with Applications. Technical report TR05-033, Electronic Colloquium on Computational Complexity (2005) 10. Gilbert, J.R., Hutchinson, J.P., and Tarjan, R.E.: A Separator Theorem for Graphs of Bounded Genus. Journal of Algorithms 5 3 (1984) 391–407 11. Gramm, J., Hirsch, E.A., Niedermeier, R., and Rossmanith, P.: Worst-Case Upper Bounds for MAX-2-SAT with an Application to MAX-CUT. Discrete Applied Mathematics 130 2 (2003) 139–155 12. Hadlock, F.: Finding a Maximum Cut of a Planar Graph in Polynomial Time. SIAM Journal on Computing 4 3 (1975) 221–225 13. Hunt III, H.B., Marathe, M.V., Radhakrishnan, V., and Stearns, R.E.: The Complexity of Planar Counting Problems. SIAM Journal of Computing 27 4 (1998) 1142–1167 14. Karypis, G. and Kumar, V.: A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing 20 1 (1998) 359–392 15. Kojevnikov, A. and Kulikov, A.S.: A New Approach to Proving Upper Bounds for MAX-2-SAT. SODA’06, SIAM (2006) 11–17 16. Lipton, R. and Tarjan, R.E.: A Separator Theorem for Planar Graphs. SIAM Journal of Applied Mathematics 36 (1979) 177–189 17. Ravi, S.S. and Hunt, III, H.B.: An Application of the Planar Separator Theorem to Counting Problems. Information Processing Letters 25 5 (1987) 317–321 18. Scott, A.D. and Sorkin, G.B.: Faster Algorithms for MAX CUT and MAX CSP, with Polynomial Expected Time for Sparse Instances. RANDOM’03, Springer, 2764 (2003) 382–395 19. Trevisan, L., Sorkin, G.B., Sudan, M., and Williamson, D.P.: Gadgets, Approximation, and Linear Programming. SIAM Journal on Computing 29 6 (2000) 2074– 2097 20. Vadhan, S.P.: The Complexity of Counting in Sparse, Regular, and Planar Graphs. SIAM Journal of Computing 31 2 (2002) 398–427 21. Williams, R.: A New Algorithm for Optimal Constraint Satisfaction and Its Implications. ICALP’04, Springer 3142 (2004) 1227–1237 22. Woeginger, G.: Exact Algorithms for NP-Hard Problems: A Survey. Combinatorial Optimization – Eureka! You shrink!, Springer 2570 (2003) 185–207 23. Woeginger, G.: Space and Time Complexity of Exact Algorithms: Some Open Problems. IWPEC’04, Springer 3162 (2004) 281–290
Maximum Finding in the Symmetric Radio Networks with Collision Detection Frantiˇsek Galˇc´ık⋆ and Gabriel Semaniˇsin⋆⋆ ˇ arik University, Faculty of Science Institute of Computer Science, P.J. Saf´ Jesenn´ a 5, 041 54 Koˇsice, Slovak Republic
[email protected],
[email protected]
Abstract. We consider a problem of computing the maximal value associated to the nodes of a network in the model of unknown symmetric radio network with availability of collision detection. We assume that the nodes have no initial knowledge about the network topology, number of nodes and even they have no identifiers. The network contains one distinguished node, called initiator, that starts the process of computing. We design a series of algorithms that result into an asymptotically optimal deterministic algorithm completing the task in Θ(ecc + log M ax) rounds, where ecc is the eccentricity of the initiator and M ax is the maximal value among the integer values associated to the nodes. Some other utilisations of the developed algorithm are presented as well.
1
Introduction
A radio network is a collection of autonomous stations that are referred as nodes. The nodes communicate via sending messages. Each node is able to receive and transmit messages, but it can transmit messages only to nodes, which are located within its transmission range. The network can be modeled by a directed graph called reachability graph G = (V, E). The vertex set of G consists of the nodes of the network and two vertices u, v ∈ V are connected by an edge e = (u, v) if and only if the transmission of the node u can reach the node v. In such a case the node u is called a neighbour of the node v. If the transmission power of all nodes is the same, then the reachability graph is symmetric, i.e. a symmetric radio network can be modeled by an undirected graph. Nodes of the network work in synchronised steps (time slots) called rounds. In every round, a node can act either as a receiver or as a transmitter. A node u acting as transmitter sends a message, which can be potentially received by every node v such that u is a neighbour of v. In the given round, a node, acting as a receiver, receives a message only if it has exactly one transmitting neighbour. The received message is the same as the message transmitted by the transmitting neighbour. If in the given round, a node u has at least two transmitting ⋆
⋆⋆
Research of the author is supported in part by Slovak VEGA grant number ˇ VVGS grant number 38/2006. 1/3129/06 and UPJS Research supported in part by Slovak APVT grant number 20-004104 and Slovak VEGA grant number 1/3129/06.
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 284–294, 2007. c Springer-Verlag Berlin Heidelberg 2007
Maximum Finding in the Symmetric Radio Networks
285
neighbours we say that a collision occurs at node u. In the case, when the nodes can distinguish collision (interference noise) from silence (background noise), we say that they have an availability of collision detection. It is also assumed that a node can determine its behavior in the following round within the actual round. According to different features of the stations forming a radio network, several models of radio networks have been developed and studied. They differ in the used communications scenarios and initial knowledge assumed for nodes. The overview of the models of radio networks can be found e.g. in [7]. There has been studied many communication primitives as broadcasting, gossiping, leader election, synchronization, etc. Usually it is supposed that each node knows at least its identifier denoted by ID such that ID ∈ O(n), where n is the number of nodes in the network. The effectiveness of the designed algorithms is mostly measured by the time required to complete the prescribed goal, i.e. by the number of required rounds. Note that throughout this paper unique identifiers of nodes are not necessary. The goal of broadcasting is to distribute a message from one distinguished node, called a source, to all other nodes. Remote nodes of the network are informed via intermediate nodes. Similarly, the goal of acknowledged broadcasting is to realise broadcasting and to inform the source about finishing broadcasting. In this paper we deal with another problem related to a communication in a radio network, namely with the problem of computing the maximal value over the values associated to nodes of the network. In our setting we have given a distinguished node called initiator (throughout the paper we shall denote it by s) and we assume that each node of the network possess a positive integer value. In one round the initiator starts the algorithm that computes the maximum. The remaining nodes do not know the starting round. Our problem is motivated by the following real-world situation: Consider a multihop radio network with a distinguished central node. Every node is able to perform a measurement of a physical quantity. Sometimes, in order to perform a specific operation, the central node must find out the maximal (or minimal) value in the network. One can collect all values into the central node (e.g. by performing a gossiping algorithm), but up to now no efficient suitable algorithm is known. We provide an algorithm, that works in pipelined manner and, due to appropriate arrangement of transmissions, reduces the time necessary for completing the task. 1.1
Related Work
Similar problem of finding maximum over real values associated to nodes of a multiple access broadcast network was studied in [5]. The randomized algorithm, designed in that paper, was used for solving the selection problem. The problem of finding maximum over the integer values associated to nodes of a radio network was treated in [2], too. Our algorithm utilises the ideas and principles of the algorithm ENCODEDBROADCAST that was developed in [1] and its pipelined version, called RBEM,
286
F. Galˇc´ık and G. Semaniˇsin
that was presented in [6]. Algorithm RBEM is used several times as fast subroutine in order to broadcast computed information. 1.2
Model and Terminology
In this paper we consider radio networks with symmetric reachability graphs which are equipped with an availability of collision detection. No initial knowledge of nodes is assumed, i.e. the nodes have no information about the topology of the network, number of nodes in the network and even the nodes do not need identifiers. We also suppose that a reachability graph of a network is connected. A node v, acting as a receiver in a given round, hears a µ-signal, if at least one from its neighbours acts as a transmitter and sends a message in this round. Otherwise we say that node v hears the λ-signal (i.e. no its neighbour transmits). We shall encode information into the µ and λ-signals using transmission of contact messages. The contact message is a message that can be distinguished from all other messages transmitted during the work of algorithms. The distance of two nodes u, v (denoted by dist(u, v)) is the length of a shortest u − v-path in the underlying reachability graph. The eccentricity of a node v is defined as follows: ecc(v) = max{dist(v, u) : u ∈ V (G) \ {v}}. We briefly denote the eccentricity of the initiator ecc(s) by ecc. It is not difficult to see that all nodes of a reachability graph G can be partitioned into layers according to their distances from initiator s. Hence, we can define the sets Li = {v ∈ V (G) : dist(s, v) = i}, i = 0, 1, . . . , ecc. Let v be a neighbour of w, v ∈ Li+1 and w ∈ Li for some i. Then we say that v is a (+1)-neighbour of w and w is a (−1)-neighbour of v.
2
Preprocessing Algorithms
In this section we present algorithms that compute and broadcast some parameters of the network. These parameters will be used by the main algorithm during its work. In Section 2.1 we design an algorithm EFC, working in O(ecc) rounds, that is computing the eccentricity ecc of the initiator. After finishing this step, the value of ecc will be known only for initiator. Next, in Section 2.2, we show that in O(ecc) rounds we can also broadcast the computed eccentricity of the initiator to all nodes of the network. Finally, in Section 2.3, we briefly present an algorithm called DDC which for each node computes its distance from the initiator. This information will be known only for the given node. This algorithm will work in O(ecc) rounds too. 2.1
Computing Eccentricity of the Initiator
In this section we design an algorithm called EFC - Eccentricity Fast Counting. Let we start with a rough description of the algorithm. We remind that a communication in the network is arranged into synchronised rounds. According to our algorithm, in one round each node can be either active
Maximum Finding in the Symmetric Radio Networks
287
or inactive. Initially all nodes, except the initiator, are inactive. During the work of the algorithm, every active node knows its distance from the initiator modulo 3. The work of the algorithm is split into phases. Only initiator is informed about the number of current phase. But every active node has an information about the number of current round within the actual phase (it has its own counter of rounds that is initiated in an appropriate moment). Each phase consists of two parts. First part has 4 rounds and the second one takes 6 rounds. In the first part, the active nodes, that have been activated in the previous phase, attempt to activate their (+1)-neighbours. An inactive node becomes active whenever it receives the µ-signal in two consecutive rounds. In the second part active nodes detect whether they have some active (+1)-neighbour. If an active node does not detect any active (+1)-neighbour, it changes its state and becomes inactive. In order to avoid simultaneous transmissions in the consecutive layers, the transmissions in the second part are scheduled in a such a way, that only the nodes in the distance 3 transmit simultaneously. Now we are going to describe the phases of the algorithm EFC more precisely. The algorithm is initiated by the initiator s. At that moment only the initiator is active and we consider it to be a node activated in the previous phase. The initiator starts with the tasks prescribed for the first round of first phase of EFC. Part 1 (4 rounds). In this part, only the active nodes, which were activated in the previous phase, transmit. In the first two rounds they transmit contact messages. If an inactive node receives the µ-signal in two consecutive rounds, it becomes active and sets its counter of the current round within the phase to 2. As it is shown below, an inactive node can become active if and only if it has an active (-1)-neighbour which transmits the contact messages during the first two rounds of a phase. Let 0 ≤ i ≤ ecc and v ∈ Li be an active node transmitting in this part of current phase. Since v is active, it knows its distance d = i mod 3 from the initiator. If d = 2 then the node v transmits the contact message in the round 3 of actual phase. If d = 1 then v transmits the contact message in the round 4. During the rounds 3 and 4 a node, that has been activated in the current phase, acts as receiver in order to learn its distance from the initiator. If the µ-signal is received in the round 3 or 4, then it knows that its distance from the initiator is 0 or 2 respectively. Otherwise, it knows that its distance is 1. (The values are considered with respect to modulo 3.). Part 2 (6 rounds). The second part of the phase is divided into 3 couples of rounds. Since Part 1 consists of 4 rounds we number the round of this part by 5, 6, . . . , 10. If an active node v belongs to a layer Li for some i then it transmits the contact message in the round with number 5 + (i mod 3) ∗ 2. It means that within Part 2 transmissions occur only in rounds with numbers 5, 7 and 9 of current phase. If an active node v belonging to Li receives µ-signal in the round 5 + ((i + 1) mod 3) ∗ 2 then it remains active in the next phase. If a node was activated during Part 1 of current phase then it will be active in the beginning of the next phase too. All the others nodes will be inactivated.
288
F. Galˇc´ık and G. Semaniˇsin
Now we are going to show that if C stands for the number of the first phase in which the initiator is inactive then C = 2.ecc + 2 (the first phase of the work of the algorithm is numbered as 1 and we recall that the initiator knows the numbers of phases). In order to simplify the forthcoming consideration, we introduce two new concepts. Definition 1. A path (v0 , v1 , . . . , vk ) is called an active path whenever v0 is the initiator, vj ∈ Lj and vj is active for all j, 0 ≤ j ≤ k. Definition 2. An active path (v0 , v1 , . . . , vk ) is called an active path in the phase i whenever it is active in the beginning of the first round of phase i. The following lemma provides an information about the structure of active paths in a network during the work of EF C. Lemma 1. Let d be a length of a longest active path in the phase i. Then for every positive integer i, 1 ≤ i ≤ ecc + 1, and for every node v ∈ Li−1 there is an active path in phase i of the length d ending in the node v. Moreover, in the first round of phase i, each active node belongs to an active path. It is not very difficult to see that in the Part 1 of phase i, 1 ≤ i ≤ ecc, there are activated exactly the nodes of layer Li . The following lemma describes the active paths for the phases with number at least ecc + 1. Lemma 2. Every longest active path in the phase ecc + i, where 1 ≤ i ≤ ecc + 1, has the length ecc − i + 1. Moreover, no node from the layer Lj , j > ecc − i + 1, is active in the first round of the phase ecc + i. An application of Lemma 2 for i = ecc + 1 yields that the initiator has no active (+1)-neighbour in the phase 2.ecc + 1. Since the initiator is active also in all the previous phases, the initiator is inactive for the first time at the beginning of the phase 2.ecc + 2. Using these facts we can formulate the following result. Theorem 1. The algorithm EF C computes eccentricity ecc of the initiator in O(ecc) rounds. 2.2
Acknowledged Broadcasting of the Eccentricity of the Initiator
We remind that after finishing the algorithm EF C only the initiator knows its eccentricity. Now we need to distribute this information to the remaining nodes. In order to broadcast the computed eccentricity of the initiator, we can use the algorithm RBEM designed by Okuwa et al. in [6]. This algorithm broadcasts a message of binary length r in O(r + ecc) rounds. In our case, the eccentricity of the initiator can be binary encoded to a message with length log ecc. Then the algorithm RBEM completes broadcasting of this message in O(ecc) rounds. In general, the algorithm RBEM is not acknowledged, but the initiator knows the value of the parameter ecc, and therefore it has an implicit information when this task is completed. The algorithm EF C equipped with the previously described broadcasting ability we shall refer as ExEF C - extended EF C.
Maximum Finding in the Symmetric Radio Networks
2.3
289
Distributed Computing of the Distance from the Initiator
The main goal of this stage is to compute the distance from the initiator for each node of the network. After the completion of this stage, every node v knows its exact distance from the initiator that uniquely determines the layer Li containing v. The basic idea of the suggested algorithm is that the nodes belonging to Li transmit concurrently binary encoded number i + 1 (using µ and λ-signals) to their (+1)-neighbours, i.e. to nodes belonging to Li+1 . In order to decrease the time complexity of this task, we realise it in the pipelined fashion. We use the fact, that if we know the k lowest bits of the number i (i.e. the suffix of the binary code of i) then we also know the k lowest bits of the number i + 1. In order to realise the goal, we modify the algorithm RBEM. Particularly, we shall dynamically change the broadcasted message. In the following, we refer this modified algorithm as DDC-Distributed Distance Counting. Moreover DDC has one useful property. If the eccentricity ecc of the initiator is known for all nodes of the network, it allows us to use this algorithm for a “synchronisation of the nodes”. It means that the nodes can make an agreement about the round when they would start simultaneously some task. Theorem 2. The algorithm DDC computes the distance from the initiator to each node of the network in O(ecc) rounds. Moreover, by an application of the algorithms ExEFC and DDC we can “synchronize” the network in O(ecc) rounds.
3
Algorithm for Computing the Maximal Value
In this section we design an algorithm CMV that computes the maximum of the considered values over all nodes in the network. The algorithm consists of three logical parts. In the first step the initiator estimates the maximal value by determining the minimum number of bits necessary for its binary encoding. In the second step the initiator broadcasts the estimation to the other nodes and initiates the computation of the exact value, that forms the third logical part of the algorithm. At the end of the computation, the initiator knows the desired value. The first step is described in Section 3.1, the second and third step are discussed in Section 3.2. 3.1
Estimating the Maximal Value
As we have already mentioned, we suppose that every node of the network possess a positive integer value. In what follows we show how to compute the estimation of the maximal value among them. More precisely, for the unknown value M ax, the searched maximum, we want to compute the value Bmax such that 2Bmax −1 ≤ M ax < 2Bmax . Obviously, the value Bmax specifies how many bits we need to store an arbitrary value associated to a node of the network. We assume that algorithms ExEFC and DDC have been already performed and every node knows the eccentricity ecc of the initiator, it knows its distance from the initiator and the nodes are synchronized (they know the starting round of the algorithm computing the estimation).
290
F. Galˇc´ık and G. Semaniˇsin
Our algorithm, called EMV, works as follows. Every node performs in the loop 3 segments: receive, transmission and sleep. Every segment consists of only one round. (We use concept of segments only in order to use uniform terminology in description of algorithms.) For any node v, belonging to the layer Li , 0 ≤ i ≤ ecc, let us denote by Vv the value associated to v and let Bv be the positive integer satisfying 2Bv −1 ≤ Vv < 2Bv . In the first round of the algorithm, the nodes perform an activity that depends on their layer. The nodes belonging to the layers Li , where i = ecc − 3k for some integer k ≥ 0, realise transmission segment. The nodes from the layers Li for i = ecc − 3k − 1, k ≥ 0, realise activities prescribed for receive segment and the remaining nodes realise sleep segment. Note that during the work of algorithm the nodes realising the same segment are in the layers with mutual distance at least 3. The node v ∈ Li transmits the contact messages according to two rules: 1. The contact message is transmitted in all rounds r, where r = ecc − i + 1, ecc − i + 4, . . . , ecc − i + 3(Bv − 1) + 1. (Note that these rounds are the rounds of the transmission segment.) 2. The contact message is transmitted in every transmission segment following the receive segment during which a node received µ-signal. Let R be the round of the first receive segment with number at least ecc in which the µ-signal is not received by the initiator. It is possible to prove that R is well defined and moreover (R − ecc)/3 is the maximum over all values Bv except of the value Bs . It results into the following theorem. Theorem 3. Let M ax be the maximal value over the values associated to the nodes. Then the algorithm EMV computes the value Bmax such that 2Bmax −1 ≤ M ax < 2Bmax in O(ecc + log M ax) rounds. 3.2
Computing the Maximal Value
After finishing EMV the value Bmax is known only for the initiator s. Before performing the computation of M ax we have to distribute its estimation Bmax to the remaining nodes of the network. We can again utilise the algorithm RBEM which takes O(ecc + log M ax) rounds. After finishing RBEM, all nodes of the network are informed how many bits are needed to store M ax. Therefore the nodes can unify the representation of their values as binary sequences of the length ⌊log M ax⌋ + 1 = Bmax . In CMV each node v computes the value Ev which is its estimation of M ax. According to CMV the nodes, that recognize that they cannot improve the calculated value M ax, eliminate themselves from the process of computation. Similarly as EMV, the algorithm CMV is working in the loop and repeatedly perform 3 segments: receive, transmission and sleep. The difference is that every segment consists of two rounds. During the algorithm the nodes can be in one of two states: active or inactive. Now we are going to describe CMV in more details. In the beginning all nodes / Lecc is set to 0. We shall work with are active and the value Ev of every node v ∈
Maximum Finding in the Symmetric Radio Networks
291
binary representation of Ev and therefore Ev can be viewed as a finite sequence of fixed length Bmax consisting of 0’s and 1’s. During the work of the algorithm we improve the value of Ev by the modification of the particular bits. It means that in the beginning Ev = (0, 0, . . . , 0). Moreover, during initialization phase, if v ∈ Lecc then Ev is set to Vv . In the first round of the algorithm, the nodes perform an activity that depends on their layer. The nodes belonging to the layers Li , where i = ecc − 3k for some integer k ≥ 0, realise transmission segment. The nodes from the layers Li for i = ecc − 3k − 1, k ≥ 0, realise activities prescribed for the receive segment and the remaining nodes realise sleep segment. The inactive nodes cannot transmit. The transmission of an active node v ∈ Li , 1 ≤ i ≤ ecc, is prescribed by the following rules: 1. v transmits the contact message in all rounds r, where r = 2(ecc−i−2+3j)−1 for some j satisfying j ∈ {1, . . . , Bmax }, whenever j-th highest bit of binary encoding Ev is 1 (i.e. the bit corresponding to 2Bmax −j ) 2. if the µ-signal is received by the node v in the round r, where r = 2(ecc − i − 2 + 3j) for some j such that j ∈ {1, . . . , Bmax }, and the node v has not been transmitted in the round r − 1, then the node v becomes inactive 3. if in the round r, where r = 2(ecc − i − 3 + 3j) − 1 for some j satisfying j ∈ {1, . . . , Bmax }, the node v receives µ-signal or the j-th highest bit of binary encoding of Vv is 1 and Vv ≥ Ev , then the node sends the contact message in the following round r + 1 and sets the j-th highest bit of the binary encoded value Ev to 1. Note that the first and the second rule are related to the rounds that belong to a transmission segment of the node v. The third rule concerns rounds of the receive segment of the node v. The rules can be also interpreted as follows. In the first round of the transmission segment, an active node v transmits according to j-th highest bit of the value Ev , where j is determined by the rules for the given round. Furthermore, all active nodes belonging to the same layer work with the j-th highest bit. Simultaneously a (-1)-neighbour receives µ-signal during the first round of its receive segment if and only if it has at least one active (+1)-neighbour which has j-th highest bit equal to 1. In the following round these (-1)-neighbours announce to their (+1)-neighbours how they set their j-th highest bits. After this round, every node knows whether its activity in the previous round has influenced some its (-1)-neighbours, i.e. whether its active (-1)-neighbours set their j-th highest bits according to its j-th highest bit. If a node detects that no (-1)-neighbour set its Ev according to its information, it becomes inactive. That is why no value Ev , that can be potentially computed by this node, cannot be larger than the value, which would be computed in one of its active (-1)-neighbours. The following proposition provides relatively straightforward properties of binary sequences utilised for encoding integer numbers. It says that by a comparison of the highest k bits of two numbers with binary code of the same length we can obtain an important information about the size of these numbers.
292
F. Galˇc´ık and G. Semaniˇsin
Proposition 1. Let l, A, B be positive integers and A, B have binary representations A = (a1 a2 . . . al )2 and B = (b1 b2 . . . bl )2 respectively. If A ≤ B then for any k, 1 ≤ k ≤ l the following inequality (a1 a2 . . . ak )2 ≤ (b1 b2 . . . bk )2 holds. The next lemma shows that the computed value Es matches our expectations. Lemma 3. After the round r = 2(ecc − 3 + 3Bmax), the value Es of the initiator s is equal to the maximal value M ax over the values associated to the nodes. Proof. Note that we can encode the values of the nodes by binary sequences, each of them of the same length, because we have already applied the algorithm EMV that computes the value Bmax . Using previous proposition one can easily check that the following two invariants are true during the work of the algorithm CM V . 1. At the end of any round r = 2(ecc − i − 3 + 3j), where j, 1 ≤ j ≤ Bmax , for each node v ∈ Li holds the following: Ev ≥ Vv and the highest j bits of the binary encoded value Ev remain the same during the rest of the algorithm and they are equal to the highest j bits of the value Ew , where w is an arbitrary active (+1)-neighbour of the node v. 2. If an inactive node v ∈ Li has an active (-1)-neighbour, then there is an active (-1)-neighbour w ∈ Li−1 of v satisfying Ew > Ev . Since the initiator has no (-1)-neighbours, it is active in every round. After the round r = 2(ecc − 3 + 3Bmax ), any active path consists only of nodes with estimations of M ax equal to Es , because from the first invariant we have that all bits of Ev remain unchanged. Consider now a node w such that Vw = M ax. It is easy to see that in every round, every prefix of the binary encoded value Ew is equal to the prefix of the binary encoded value Vw . From the rules of the algorithm it follows, that in every round there is an active path ending in the node w. Indeed, this is true because the prefix of Ew is successively distributed and computed bit by bit in all nodes belonging to an active path ending in w. Therefore all nodes belonging to the considered active path remain active in every next round (there is no greater prefix during the work of the algorithm). Thus, in the considered round r = 2(ecc − 3 + 3Bmax ), the value Es of the initiator s is equal to the value Ew = Vw = M ax. ⊓ ⊔ As a consequence of the previous results we immediately have our main result. Theorem 4. Algorithm CMV computes the value M ax (in the initiator) in O(ecc + log M ax) rounds.
4
Lower Bound
In this section we show that the algorithm CMV is asymptotically optimal. In particular, we reduce the problem of broadcasting in symmetric geometric radio
Maximum Finding in the Symmetric Radio Networks
293
networks (shortly GRN) with collision detection to our problem of maximum finding. Model of GRN differs from our model of radio networks in two properties: nodes have an additional initial information about their positions and a reachability graph of a GRN should satisfy restrictions resulting from geometric locations of nodes. Note that broadcasting algorithms for GRN utilise unique identifiers for every node of the network (the existence of such identifiers follows for example from their geometric locations). Theorem 5. For any maximum finding algorithm with collision detection there exists a symmetric radio network of diameter 2 and such an assignment of values associated to nodes that the algorithm requires Ω(log M ax) rounds. Proof. Dessmark and Pelc showed in [3] that for every broadcasting algorithm with collision detection there exists a class of symmetric geometric radio network with diameter 2 for which this algorithm requires Ω(log n) rounds, where n is the number of nodes. More precisely, for given n this network has the following form: nodes of the network are labeled 1, . . . , n, node 1 is the source and node n is the sink. The set {2, . . . , n−1} is partitioned into sets X and Y , where |Y | = 2. Nodes 1, . . . , n − 1 form a complete graph. Nodes from Y are connected to the sink n. We shall refer such a network as a network of class H. In what follows we show how to utilise a maximum finding algorithm in a broadcasting algorithm for networks of class H. Now, let A be an algorithm for the maximum finding in symmetric radio networks with collision detection and G be a n-node network of the class H. An associated broadcasting algorithm (to the algorithm A) for the network G works as follows: In the first round the source (node 1) transmits a source message. Nodes in the distance 1 from the source become informed. Next, we perform algorithms ExECC and DDC with the node 1 as the initiator. After the O(1)-rounds, we can distinguish the sink n (a node in the distance 2 from the initiator 1). In order to distinguish the nodes of the set Y , the sink n transmits the contact message in the following round. All nodes, except the nodes of Y , set their values to 1. Two nodes of the set Y set their associated values to their identifiers. Performing algorithm A for maximum finding problem, we compute a label of one node from the set Y in the initiator 1 (a label of the node in Y with larger label). After that the initiator transmits a message containing the computed label of one node from Y . In the following round, only the node with this label transmits the source message and the sink n become informed. Obviously, the asymptotical time complexity of the associated broadcasting algorithm is the same as the complexity of algorithm A. Since M ax ∈ O(n), the previously mentioned result from [3] implies that time complexity of A is Ω(log M ax). ⊔ ⊓ Combining the previous result and the trivial lower bound Ω(ecc) we obtain that algorithm CMV is asymptotically optimal from the view of parameters ecc and M ax.
294
5
F. Galˇc´ık and G. Semaniˇsin
Conclusion
We have designed the algorithm CMV that computes the maximal value over the values associated to nodes of the network. The designed algorithm is asymptotically optimal and works in time Θ(ecc + log M ax) where ecc is the eccentricity of the initiator and M ax is the searched maximum. Besides of the studied problem of finding maximum over the values of a measured physical quantity the algorithm can be successfully utilised in some other situations. For example, we can use it to compute some parameters of the network, e.g. maximal identifier among the identifiers of nodes (potentially with a specified property - see Theorem 5), to compute the logical sum and logical product of the one-bit values associated to the nodes, etc. Moreover, the algorithm can serve for designing of broadcasting algorithm of time complexity O(D. log n/D. log3 n) for unknown symmetric radio networks with collision detection whose underlying reachability graph is planar (see [4]).
References 1. Chlebus, B.S., Gasieniec, L., Gibbons, A., Pelc, A., and Rytter, W.: Deterministic Broadcasting in Unknown Radio Networks. In Proceedings of 11th Annual ACMSIAM Symposium on Discrete Algorithms (SODA’00) (2000) 861–870 2. Chrobak, M., Gasieniec, L., and Rytter, W.: Fast Broadcasting and Gossiping in Radio Networks. In Proceedings of 41st Annual Symposium on Foundations of Computer Science (FOCS’2000) 575–581 3. Dessmark, A. and Pelc, A.: Broadcasting in Geometric Radio Networks. Journal of Discrete Algorithms (to appear) 4. Galˇc´ık, F.: Broadcasting in Radio Networks with Specific Topology. (2006) (manuscript) 5. Martel, C.U.: Maximum Finding on a Multiple Access Broadcast Network. Information Processing Letters 52 (1994) 7–13 6. Okuwa, T., Chen, W., and Wada, K.: An Optimal Algorithm of Acknowledged Broadcasting in Ad Hoc Radio Networks. Second International Symposium on Parallel and Distributed Computing (2003) 178–184 7. Pelc, A.: Broadcasting in Radio Networks. Handbook of Wireless Networks and Mobile Computing, I. Stojmenovic(ed.) John Wiley and Sons, Inc., New York (2002) 509–528
An Approach to Modelling and Verification of Component Based Systems Gregor Gössler1, Sussane Graf2 , Mila Majster-Cederbaum3, M. Martens3 , and Joseph Sifakis2 1
3
INRIA Rhône-Alpes, Montbonnot, France
[email protected] 2 VERIMAG, Grenoble, France {graf,sifakis}@imag.fr University of Mannheim, Mannheim, Germany
[email protected]
Abstract. We build on a framework for modelling and investigating component-based systems that strictly separates the description of behavior of components from the way they interact. We discuss various properties of system behavior as liveness, local progress, local and global deadlock, and robustness. We present a criterion that ensures liveness and can be tested in polynomial time.
1
Introduction
Component-based design techniques are an important paradigm for mastering design complexity and enhancing reusability. In the abstract data type view or object-oriented approach subsystems interact by invoking operations or methods of other subsystems in their code and hence rely on the availability and understanding of the functionality of the invoked operations. In contrast to this, components are designed independently from their context of use. Components may be glued together via some kind of gluing mechanism. This view has lead some authors, e.g. [3,8,20,9] to consider a component as a black box and to concentrate on the combination of components using a syntactic interface description of the components. Nevertheless, for these techniques to be useful, it is essential that they guarantee more than syntax-based interface compatibilities. No matter if a certain functionality has to be established or certain temporal properties must be ensured, knowledge about the components has to be provided. Methods based on the assume-guarantee paradigm [22] or similarly on the more recent interface automata [10] are useful e.g. for the verification of safety properties provided that they can be easily decomposed into a conjunction of component properties. Other approaches rely on some process algebra as CSP or π − calculus [21,16,1] and consider congruences and reductions to discuss properties of component systems. We build here on a framework for component-based modelling, called interaction systems, that was proposed in [13,14,12,23], which clearly separates Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 295–308, 2007. c Springer-Verlag Berlin Heidelberg 2007
296
G. Gössler et al.
interaction from (local) behavior of components. In [14] a notion of global deadlock-freedom, called interaction safety there, was introduced and investigated for interaction systems. Here, we explain how the framework can be used to discuss properties of systems including liveness, progress of subsystems, robustness and fairness. In most cases direct testing of the properties relies on an exploration of the global state space and hence cannot be performed efficiently. We have shown that deciding local and global deadlock-freedom as well as deciding liveness is NP-hard for component-based systems [19,18]. Alternatively, one may establish conditions that entail a desired property and that can be tested more efficiently. In [12] a first condition was given that entails global deadlock-freedom of interaction systems. In [17] we established a condition that entails local deadlockfreedom of interaction systems and that can be tested in polynomial time. Here we present a condition that can be tested in polynomial time and guarantees liveness of a component, of a set of components or of an interaction in an interaction system. We present here a simple version of the framework without variables. In section 2 we introduce the framework and model a version of the dining philosophers as an interaction system. In section 3 we consider properties of interaction systems and illustrate them by examples. In section 4 we present and analyze a condition for liveness that can be tested in polynomial time. Section 5 discusses related and future work.
2
Components, Connectors and Interaction Systems
We build on a framework of [12,14] where components i ∈ K together with their port sets Ai are the basic building blocks. Each component offers an interface which is here given as a set of ports. Each component i has a local behavior that is here given by a local transition system Ti . The local transition system regulates the way in which ports are available for cooperation with the environment. Components can be glued together. The gluing is achieved by a set of connectors. A connector is a set of ports where no two ports belong to the same component. An interaction is a subset of a connector. Certain interactions may be declared as complete interactions. By this we allow that they are performed independently of the environment. If an interaction is complete, so should be all its supersets. Please note, that we identify singleton sets with their element. Given the above ingredients we define the notion of an interaction system with a global behavior that is obtained from the local transition systems by taking the connectors into account, more formally: Definition 1 Let K be a set of components, Ai , i ∈ K, a port set that is disjoint from the port set of every other component. Ports ai , bi ... ∈ Ai are also referred to as Ai of all port sets is the port set of K. A finite actions. The union A = i∈K
An Approach to Modelling and Verification of Component Based Systems
297
nonempty subset c of A is called a connector, if it contains at most one port of each component i ∈ K. A connector set is a set C of connectors where: a) every port occurs in at least one connector of C, b) no connector contains any other connector. For connector set C we denote by I(C) all non empty subsets (called interactions) of the connectors in C, i.e. I(C) = {β = ∅ | ∃c ∈ C β ⊆ c}. We abbreviate I(C) to IC for ease of notation. The elements c ∈ C are maximal in IC with respect to set inclusion and are hence called maximal interactions. A set U ⊆ IC of interactions is said to be closed w.r.t. IC, if whenever u ∈ U and u ⊂ v ∈ IC then v ∈ U . Let Comp be a closed set of interactions. It represents the complete interactions. Let for each component i ∈ K a transition system Ti = (Qi , Ai , →i ) be given, a where →i ⊆ Qi × Ai × Qi . We write qi →ii qi′ for (qi , ai , qi′ ) ∈→i . We suppose that Qi ∩ Qj = ∅ for i = j. In the induced interaction system, the components cooperate via interactions in IC. For the notion of a run and the properties studied in this paper only interactions in C ∪ Comp will be relevant but for composition of systems as in [11] we need all interactions hence they are admitted (as labels) in the following definition. The induced interaction system is given by Sys = (K, C, Comp, T ), where the global behavior T = (Q, IC, →) is obtained from the behaviors of individual components, given by the transition systems Ti , in a straightforward manner: – Q = i∈K Qi , the Cartesian product of the Qi , which we consider to be order independent. We denote states by tuples (q1 , ..., qj , ...) and call them global states. – the relation →⊆ Q × IC × Q, defined by α ∀α ∈ IC ∀q, q ′ ∈ Q q = (q1 , ..., qj , ...) → q ′ = (q1′ , ..., qj′ , ...) iff i(α)
∀i ∈ K (qi →i qi′ if i participates in α and qi′ = qi otherwise). where for component i and interaction α, we put i(α) = Ai ∩ α and say that component i participates in α, if i(α) = ∅. The set of states in which ai ∈ Ai is a enabled is denoted by en(ai ) = {qi ∈ Qi | qi →i qi′ for some qi′ } The following example shows how the model of interaction systems can be used to model a solution for the dining philosopher problem. Example 1 There are n philosophers, n forks and one control component, i.e. Kphil = {philosopheri | i = 0, ..., n−1}∪{f orki| i = 0, ..., n−1}∪{control}. The alphabet (i+1)mod n (i+1)mod n , , eati , putii , puti of philosopheri is {activatei , enteri , getii , geti leavei }, the alphabet of f orki is {geti , puti }, and the alphabet of the component control is {enter, leave}. The transition systems are given by
298
G. Gössler et al.
f orki :
control :
fi,0 geti
puti
enter
c0
enter
enter cn−1
leave
leave
c1 leave
fi,1
philosopheri : leavei
pi,0
activatei
pi,1
enteri
pi,2
pi,7 (i+1)mod n
getii
puti
pi,6 putii
pi,3 (i+1)mod n
pi,5
eati
pi,4
geti
We introduce the following connector set Cphil containing: {eat0 , eat1 , ..., eatn−1}, act = {activate0 , activate1 , ..., activaten−1 }, {enter, enteri }, {leave, leavei}, (i+1)mod n (i+1)mod n {getii , geti }, {geti , , get(i+1)mod n }, {putii , puti }, {puti put(i+1)mod n }, i = 0, ..., n − 1. Compphil consists of all nonempty subsets of {eat0 , eat1 , ..., eatn−1 }. The induced interaction system is phil = (Kphil , Cphil , Compphil , Tphil ). act is the only interaction in Cphil ∪Compphil that may take place in the global state q0 = (p1,0 , ..., f1,0 , ..., c0 ). Then, {enter, enteri } can take place for a philosopher i. More discussion of this example is found in Remark 3 and Section 4.
3
Properties of Interaction Systems
We consider in the following several essential properties of component based systems and show how they can be clearly defined in our setting. In what follows, we consider an interaction system Sys = (K, C, Comp, T ) where T = (Q, IC, →) is constructed from given transition systems Ti , i ∈ K, as described in Definition 1. Let P be a predicate on the global state space. We assume here that P is an α inductive invariant, i.e. ∀q, q ′ ∈ Q ∀α ∈ C ∪Comp (P (q)∧q → q ′ ⇒ P (q ′ )). As an example we consider the predicate Preach(q0 ) describing all global states that are reachable (via interactions in C ∪ Comp) from some designated starting state q0 . The first property under consideration is P -deadlock-freedom which is a generalization of the concept of deadlock-freedom of [12,14]. An interaction system is considered to be P -deadlock-free if in every global state that satisfies P it may perform a maximal or complete interaction. This definition is justified by
An Approach to Modelling and Verification of Component Based Systems
299
the fact that both for complete and maximal interactions there is no need to wait for other components to participate. Deadlock-freedom is an important property of a system. But it does not provide any information about local deadlocks in K ′ ⊆ K, might be involved and hence which some set K ′ of components, ∅ = we consider this situation as well. Definition 2 Let Sys be an interaction system. Sys is called P -deadlock-free if for every α state q ∈ Q satisfying P there is a transition q → q ′ with α ∈ C ∪ Comp. Let K ′ ⊆ K, K ′ = ∅. K ′ is involved in a local P -deadlock in state q satisfying P , if for any i ∈ K ′ and for any ai ∈ Ai if qi ∈ en(ai ) then for any α ∈ C ∪ Comp with ai ∈ α there is some j ∈ K ′ and some aj ∈ Aj ∩ α such that qj ∈ / en(aj ). Sys is called locally P -deadlock-free if in any P -state q there is no ∅ = K ′ ⊆ K that is involved in a local P -deadlock in q. For P = true we speak of (local) deadlock-freedom. Remark 1 If Sys is locally P -deadlock-free then it is P -deadlock-free. The converse does not hold. In addition to deadlock-properties it is interesting to consider the property of P -progress of K ′ , i.e. the property that at any point of any P -run of the system, there is an option to proceed in such a way that some component of K ′ will eventually participate in some interaction. A subset K ′ of components is said to be P -live if K ′ participates infinitely often in every P -run. Please note that we admit only transitions labelled by elements in C ∪ Comp for the definition of a P -run. Definition 3 Let Sys be a P -deadlock-free interaction system. A P -run of Sys is an infinite α α sequence σ = q0 →0 q1 →1 q2 . . . where ql ∈ Q and P (ql ) = true and αl ∈ αn−1 α α C ∪ Comp for any l. For n ∈ N, σn denotes the prefix q0 →0 q1 →1 q2 . . . → qn . Let ∅ = K ′ ⊆ K. K ′ may P -progress in Sys, if for any P -run σ of Sys and for any n ∈ N there exists σ ′ such that σn σ ′ is a P -run of Sys and some i ∈ K ′ participates in some interaction α of σ ′ . K ′ ⊆ K is P -live in Sys, if every P -run of Sys encompasses an infinite number of transitions where some i ∈ K ′ participates, i.e. for every P -run σ and for all n ∈ N there is an m with m ≥ n and there is i ∈ K ′ with i(αm ) = ∅. An interaction α ∈ IC is P -live, if every P -run encompasses an infinite number β of transitions q → q ′ with α ⊆ β. Sys is called P -fair if every component i ∈ K is P -live in Sys. If P = true, we speak of liveness, similarly we speak of runs, fairness etc. Remark 2 If Sys is P -deadlock-free and at least one state satisfies P then P -runs exist as P is an inductive invariant.
300
G. Gössler et al.
Lemma 1 Let Sys be P -deadlock-free, and ∅ = K ′ ⊆ K. If K ′ may P -progress then K ′ is not involved in a local P -deadlock in any P -state. If Sys is locally P -deadlock-free this does not imply that every component may P -progress. If we consider a setting where a component may involve a technical device, as e.g. in an embedded system, that may break down, we might be interested to know how the properties behave on the failure of that component. As an example we treat here deadlock-freedom and progress. Definition 4 Let Sys be a deadlock-free interaction system. In Sys, deadlock-freedom is called robust with respect to failure of port ai ∈ Ai , if in every state q ∈ Q there is α a transition q → q ′ with α ∈ C ∪ Comp and ai ∈ α. In Sys deadlock-freedom is called robust with respect to failure of component i, if in every state q ∈ Q α there is a transition q → q ′ with α ∈ C ∪ Comp and i(α) = ∅. Let in Sys deadlock-freedom be robust with respect to failure of port ai ∈ Ai . Suppose that j ∈ K, i = j, may progress in Sys. The progress property of j is robust with respect to failure of port ai ∈ Ai , if for any run σ of Sys and for any n ∈ N there exists σ ′ such that σn σ ′ is a run of Sys and such that there is some interaction α of σ ′ with j(α) = ∅ and no interaction of σn σ ′ contains ai . Remark 3 The philosopher system in Example 1 is Preach(q0 ) -deadlock-free where q0 = (p1,0 , ..., f1,0 , ..., c0 ). This is due to the control component that admits at most n − 1 philosophers to the table. By a pigeon hole argument at least one philosopher can get both forks and continue. When he leaves, another philosopher may be admitted to the table. As we will see in section 4 each philosopher is Preach(q0 ) -live in the system and will eat at some time. Hence the system is Preach(q0 ) -fair. In the following we show how we can model a system of n identical tasks that have to be scheduled as they all need the same resource in mutual exclusion. Here no explicit representation of a scheduler or a controller is used. For this example we introduce a rule of maximal progress. The maximal progress rule restricts the transition relation for Sys to maximal transitions, i.e. to those transitions such β α that q → q ′ , implies that there is no β, q ′′ with α β and q → q ′′ . Example 2 We consider a set of tasks Ti (i ∈ K = {1, ..., n}) that compete for some resource in mutual exclusion. The basic behavior of each task is given in Figure 1 and needs not to be further explained. Let the set of ports of each component i be: Ai = {activatei , starti , resumei , preempti , f inishi}. We want to guarantee mutual exclusion with respect to the exec state, i.e. no two tasks should be in this state at the same time, in the sense that this is an inductive invariant.
An Approach to Modelling and Verification of Component Based Systems
301
inaci
activatei
f inishi
waiti
starti preempti suspi
execi resumei
Fig. 1. Basic behavior of each task Ti
Mutual exclusion, in this sense, can be achieved using the rule of maximal progress and by the following connectors: for i, j ∈ K, i = j, conni1 = {activatei }, ij connij 2 = {preempti , startj }, conn3 = {resumei , f inishj }. {startj } and {f inishj } are defined to be complete. Let Systasks be the system defined this way. Observation 1: On every run starting in a state where at most one component is in its exec state, Systasks guarantees mutual exclusion with respect to the exec state due to the rule of maximal progress. For detailed explanations see appendix. Observation 2: Let P = (∃i qi = suspi ). P is an inductive invariant and Systasks is P -deadlock-free and each component i may P -progress. The property of P -deadlock-freedom is robust with respect to failure of the operation resumei . Observation 3: Let P ′ = true then Systasks is not P ′ -deadlock-free as in the state q = (susp1 , ..., suspn ) no interaction in C ∪ Comp is available. We may modify the system by introducing a new action reseti for each i and by enriching the transition system Ti with an edge labelled reseti from the state suspi to inaci . In addition we introduce a connector conng = {reset1 , ..., resetn }. The resulting system Sys′tasks is P ′ -deadlock-free. P ′ -Deadlock-freedom is not robust with respect to failure of port reseti and hence not robust with respect to failure of component i, i ∈ K. Alternatively we might consider the state q0 = (inac1 , ..., inacn ). The state q = (susp1 , ..., suspn ) is not reachable from q0 and Systasks is Preach(q0 ) -deadlock-free. Observation 4: When a finite interaction system is P -deadlock-free for some in inductive invariant P , then it is also P -deadlock-free when we apply the rule of maximal progress. Observation 5: In Sys′tasks every component may P ′ -progress under the rule of maximal progress. For detailed explanation see appendix.
302
4
G. Gössler et al.
Testing Liveness of Components in Interaction Systems
In the previous examples we have verified some properties of the systems directly. As the properties are conditions on the global state space, they cannot be established directly in an efficient way in general. E.g. we have shown that deciding local and global deadlock-freedom as well as deciding liveness is NP-hard for component-based systems [19,18]. However, one may define (stronger) conditions that are easier to test and entail the desired properties. In [14], a condition for deadlock-freedom of an interaction system (called interaction safety there) is presented that uses a directed graph. The absence of certain cycles in some finite subgraph ensures deadlock-freedom of the system. This criterion can be extended to P -deadlock-freedom, it can be modified to ensure local progress as well as robustness of P -deadlock-freedom with respect to the failure of a port or a whole component, and it can be extended to apply to a broader class of systems including various solutions for the dining philosophers. In [17] we present a condition that entails local deadlock-freedom and can be tested in polynomial time. Here we focus on liveness. We present a condition that can be tested in polynomial time and entails liveness for a component i ∈ K. In what follows, we assume for simplicity that the local transition systems Ti have the property that they offer at least one action in every state. The general case can be reduced to this case by introducing idle actions or by adapting the definitions and results below to include this situation. To test for liveness we construct a graph, where the nodes are the components, and, intuitively, an edge i → j means “j needs i ”, in the sense that i eventually has to participate in an interaction involving j when j does progress. In transition system Tj we call a set A ⊆ Aj inevitable, if every infinite path in Tj encompasses an infinite number of transitions labelled with some action in A. Theorem 1 Let Sys be a P -deadlock-free interaction system for some finite set K of components and finite alphabets Ai , i ∈ K. The graph Glive is given by (K, →) where i → j if Aj \ excl(i)[j] is inevitable in Tj . Here excl(i) = {α ∈ C ∪ Comp with i(α) = ∅} and excl(i)[j] is the projection of excl(i) to j, i.e. the actions of j with which j participates in elements of excl(i). Let k ∈ K. We put R0 (k) = {j : ∃ a path f rom k to j in Glive } and Ri+1 (k) = {l ∈ K \ Ri (k) : ∀α ∈ C ∪ Comp l(α) = ∅ ⇒ ∃j ∈ Ri (k) j(α) = ∅} ∪Ri (k). If i≥0 Ri (k) = K then k is P -live in Sys. Proof: Appendix
An Approach to Modelling and Verification of Component Based Systems
303
Lemma 2 Testing the condition of Theorem 1 can be done in polynomial time in the sum of the sizes of the Ti and the size of C ∪ Comp. Proof: For the construction of the graph Glive = (K, →), we inspect each local transition system separately. To check if there is an arrow i → j, we remove in the transition system Tj all edges labelled by elements in Aj \ excl(i)[j] and determine if there are directed cycles in the resulting transition system. If this is not the case then we include the arrow, otherwise there are infinite paths in Tj that do not contain an element in Aj \ excl(i)[j], hence Aj \ excl(i)[j] is not inevitable in Tj . Clearly the graph can be constructed in O(|K|Σ|Ti | + |K|2 |C ∪ Comp|). Its size is O(|K|2 ). Once the graph Glive is constructed, it remains to perform a simple reachability analysis to determine R0 (k) which can be achieved in O(|K|2 ). The iteration is performed at most |K| times and each Ri (k) has at most |K| elements. In each iteration we consider all α ∈ C ∪ Comp. Hence we may calculate Ri (k) in O(|K|3 |C ∪ Comp|) where |C ∪ Comp| is the number of elements in C ∪ Comp. So testing the condition can be done in polynomial time in the sum of the size of the input. Remark 4 The condition given in the above theorem can easily be adapted to establish the P -liveness of a set K ′ ⊆ K of components as well as to establish the P -liveness of an action ak ∈ Ak in Sys. As an application we consider our model for the dining philosophers where we designate q0 as the starting state and choose the predicate Preach(q0 ) . Example 1 continued: Glive for the problem of the dining philosophers is as follows where the abbreviations are self-explanatory and we set n = 3 for better readability. The criterion now yields that philosopheri is Preach(q0 ) -live: take a component that is not in R0 (philosoheri ), e.g. control1 . Then for any α ∈ C ∪ Comp control1 (α) = ∅ ⇒ ∃j ∈ R0 (philosopheri ) j(α) = ∅.
f0
f1
f2
p0
p1
p2
c
Fig. 2. Glive for three philosophers
304
G. Gössler et al. c1 p0,0 p1,0 a0
p2,0
b0
p0,1
p0,3
a0
a1
b1
a2
b2
b0 p1,1
p2,1
p0,2 c1
Fig. 3. The transition systems T0 , T1 , T2 of Example 3
This is because control1 only participates in interactions in which some philosopher participates as well and all philosophers are connected in the graph. The same holds true for any fork as any interaction involving some fork also involves some philosopher. Example 3 Here we present an example where the component 1 is live but our criterion does not apply. We consider the following interaction system where K = {0, 1, 2}. The transition systems are given in Figure 3. The connector set is given by C := {{a0 , a1 } , {a0 , a2 } , {b0 , b1 } , {b0 , b2 } , {c1 }} and Comp is empty. The system is deadlock-free because c1 can always be performed. Component 1 is live in the composed system. This can be seen from the fact that whenever a connector not involving component 1 is performed after finitely many steps another connector involving 1 must be performed. The liveness graph contains only one edge from 0 to 2. Hence no node is reachable from 1 in the graph. Then we get i≥0 Ri = {1}. Therefore the criterion is not adequate to show that 1 is live in the system.
5
Discussion and Related Work
An important motivation for introducing a clean theoretical framework for component based systems is the hope that this will provide means for proving properties such as deadlock-freedom, progress and liveness by exploiting local information and compositionality as much as possible. We showed how the model of interaction systems can be used to deal with important properties of component based systems. Testing the definition of these properties directly is not feasible as it usually involves global state space analysis. An alternative is to find conditions that ensure a given property and are easier to test. First conditions for global, resp. local deadlock-freedom have been treated in [12], resp. [17]. More refined conditions and the treatment of progress can be found in [11]. Here we focussed on liveness. In particular we gave a sufficient condition for the liveness of a component in a component based system that can be tested in polynomial time.
An Approach to Modelling and Verification of Component Based Systems
305
If a condition entailing a desired property is not satisfied we may try to exploit compositionality in the following way: in [11] we define an (associative) composition operator that combines component systems by gluing two component systems with some new connectors. Then we derive conditions under which a property of one/both component systems can be lifted to the new combined and more complex system. Thus incremental construction and verification of systems can be achieved. Our model of interaction systems has some similarity with the model of input/output-automata in [15] with the difference that in [15] components, represented by automata, share actions. In each step of the global system exactly one action takes place and all components that offer this action participate in this step. Even though there are many approaches to model component based systems [3,21,1,16,8,9,20,7,10], to our knowledge the question of properties of component based systems has not yet been studied systematically to great extent. In [7] one can find a condition that ensures the deadlock-freedom of a component based system consisting of two components. In [4] a condition for deadlockfreedom in a strictly interleaving setting of processes that communicate over shared variables is presented. Interface automata [10] have been introduced as a means to specify component interfaces with behavioral types. Two interface automata are compatible if there exists an environment guaranteeing their composition to be deadlock-free. Verifying compatibility amounts to synthesizing the interface automaton of the most permissive environment avoiding deadlock. Liveness, progress or fairness properties are not addressed. In [24] general definitions of fairness are discussed. In [1] components, ports and (binary) connectors are specified in a CSP variant and the question under what conditions deadlock-freedom is preserved under refinement is investigated. There have been attempts to model component based systems with Petrinets [6,5,2]. Once a system is specified as a Petri-net one might use Petri-net tools to investigate properties of systems provided the specification formalism supports compositionality on the Petri-net level, which is not the case e.g. in [6,5]. Extended versions of our framework, including local variables of components and priority rules as an additional control layer, are presently being implemented. The implementation in the Prometheus tool focusses on compositional verification, in particular. A second implementation, called BIP, focusses on the efficient execution of systems and includes also timed specifications. The work presented here shows some typical results that can be established in this framework. Further results can be found in [11]. The investigation can – and needs to – be extended in different ways for fully incremental construction and verification of a large system. The notion of component can be extended with various additional information but also observability criteria and associated equivalence relations are important. Other possible interesting extensions concern introduction of time and probability, as well as dynamic reconfiguration.
306
G. Gössler et al.
References 1. Allen, R. and Garlan, D.: A Formal Basis for Architectural Connection. ACM Trans. Softw. Eng. Methodol. 6 3 (1997) 213–249 2. Aoumeur, N. and Saake, G.: A Component-Based Petri Net Model for Specifying and Validating Cooperative Information Systems. Data Knowl. Eng. 42 2 (2002) 143–187 3. Arbab, F.: Abstract Behavior Types: A Foundation Model for Components and Their Composition. In Proceedings of FMCO 2002, Springer Verlag, LNCS 2582 (2002) 4. Attie, P.C. and Chockler, H.: Efficiently Verifiable Conditions for DeadlockFreedom of Large Concurrent Programs. In Proceedings of VMCAI’05 LNCS 3385 (2005) 465–481 5. Bastide, R. and Barboni, E.: Component-Based Behavioural Modelling with HighLevel Petri Nets. In MOCA’04 Aahrus, Denmark, DAIMI, (2004) 37–46 6. Bastide, R. and Barboni, E.: Software Components: A Formal Semantics Based on Coloured Petri Nets. In Proceedings of FACS’05, ENTCS, Elsevier (2005) 7. Baumeister, H., Hacklinger, F., Hennicker, R., Knapp, A., and Wirsing, M.: A Component Model for Architectural Programming. In Proc. of FACS’05, ENTCS, Elsevier (2005) 8. Berger, K. et al.: A Formal Model for Componentware. In G.T. Leavens, M. Sitaraman (eds), Foundations of Component-Based Systems, Cambridge Univ. Press (2000) 189–210 9. Chouali, S., Heisel, M., and Souquières, J.: Proving Component Interoperability with B Refinement. In Proceedings of FACS’05, ENTCS, Elsevier (2005) 10. de Alfaro, L. and Henzinger, T.A.: Interface Automata. In Proceedings of ESEC 2001 (2001) 109–120 11. Gössler, G., Graf, S., Majster-Cederbaum, M., Martens, M., and Sifakis, J.: Establishing Properties of Interaction Systems (2006) Full paper in preparation 12. Gössler, G. and Sifakis, J.: Component-Based Construction of Deadlock-Free Systems. In Proceedings of FSTTCS 2003, Mumbai, India, LNCS 2914 (December 2003) 420–433 13. Gössler, G. and Sifakis, J.: Priority Systems. In Proceedings of FMCO’03, LNCS 3188 (April 2004) 14. Gössler, G. and Sifakis, J.: Composition for Component-Based Modeling. Sci. Comput. Program. 55 1-3 (2005) 161–183 15. Lynch, N.A. and Tuttle, M.R.: An Introduction to Input/Output Automata. CWIQuarterly 2 3 (September 1989) 219–246 16. Magee, J., Dulay, N., Eisenbach, S., and Kramer, J.: Specifying Distributed Software Architectures. In W. Schafer and P. Botella (eds), Proceedings of ESEC95, Springer, LNCS 989 (1995) 137–153 17. Majster-Cederbaum, M., Martens, M., and Minnameier, C.: A Polynomial-TimeCheckable Sufficient Condition for Deadlock-Freeness of Component Based Systems. Accepted to SOFSEM 07 18. Martens, M., Minnameier, C., and Majster-Cederbaum, M.: Deciding Liveness in Component-Based Systems is NP-Hard. Technical Report tr-2006-017, University of Mannheim, Fakultät Mathematik und Informatik (2006) 19. Minnameier, C.: Deadlock-Detection in Component-Based Systems is NP-Hard. Technical report tr-2006-015, University of Mannheim, Fakultät Mathematik und Informatik (2006) submited for publication
An Approach to Modelling and Verification of Component Based Systems
307
20. Moschoyiannis, S. and Shields, M.W.: Component-Based Design: Towards Guided Composition. In Proceedings of ACSD’03, IEEE Computer Society (2003) 122–131 21. Nierstrasz, O. and Achermann, F.: A Calculus for Modeling Software Components. In Proceedings of FMCO 2002, Springer, LNCS 2582 (2002) 339–360 22. Pnueli, A.: In Transition from Global to Modular Temporal Reasoning about Programs. In Logics and Models for Concurrent Systems. NATO, Springer, ASI Series F 13 (1985) 23. Sifakis, J.: A Framework for Component-Based Construction. In Proceedings of SEFM 05, IEEE Computer Society (2005) 24. Völzer, H., Varacca, D., and Kindler, E.: Defining Fairness. In Proceedings of CONCUR’05, Springer-Verlag, LNCS 3653 (2005) 458–472
Appendix A1) Detailed explanation of Observation 1: Started in a global state where at most one component is in its exec state, Systasks guarantees mutual exclusion with respect to the exec state. Mutual exclusion is guaranteed because whenever Tj enters execj , either by startj or resumej , then either there is no other task in its exec-state or the task Ti that is in the state execi must leave this state. The following items explain why this is the case for each of the two transitions: i) for resumej , the reason is that resumej can never happen alone. It can only be executed together with the f inishi action if process Ti is currently in the critical state execi , ii) for startj , which is complete, the reason is the rule of maximal progress: when Ti is in the critical state execi , it can execute the preempti action. Therefore, startj cannot be executed alone as also the pair {preempti , startj } is enabled. On the other hand, if there is no process in the critical section, process j can enter it by executing startj alone. A2) Detailed explanation of Observation 5: Let P ′ = true. In Sys′tasks every component may P ′ -progress under the rule of maximal progress. As all components have identical behavior it suffices to consider one of them, say component 1. The only situation in which component 1 cannot proceed by itself is when it is in state susp1 . We have to show that we can reach a global state where it can perform a transition: case 1) all other components are in the state susp. Then conng can happen and component 1 has proceeded, case 2) at least one component j is in the state execj . Then {resume1 , f inishj } may happen, case 3) not all other components are in state susp and none is in state exec. Then there must be one component j that is in state inacj or waitj . If it is in state inacj then it performs the complete action activatej and reaches state waitj . As there is no component in state exec there is no preempt action available and startj may be performed alone even under the rule of maximal progress. Now, {resume1 , f inishj } may happen and component 1 has made progress. A3) For the proof of Theorem 1 we use the following auxiliary lemma:
308
G. Gössler et al.
Lemma 3 α α Let σ = q0 →0 q1 →1 q2 ... be a P -run. If there is a path k0 → k1 → .... → kl in Glive and kl participates infinitely often in σ then k0 participates infinitely often in σ. Proof: by induction on the length l the path. Start of induction: l = 1. Then there is an edge k0 → k1 . As k1 participates infinitely often in transitions of σ and as the set of actions of k1 that need cooperation of k0 is inevitable in Tk1 we conclude that k0 participates infinitely often in transitions of σ. Induction step: l → l + 1. Let k0 → k1 → ... → kl → kl+1 be a path of length l + 1 and let kl+1 participate infinitely often in σ then by induction assumption k1 participates infinitely often in σ and as above we conclude that k0 participates infinitely often. A4) Proof of Theorem 1 α
α
Let σ = q0 →0 q1 →1 q2 ... be a P -run. We have to show that σ encompasses an infinite number of transitions where k participates. As K is finite and σ infinite there must be some component kˆ that participates in infinitely many transitions of σ. 1. 2.
kˆ = k, then we are done. kˆ = k then we now that kˆ ∈ Ri (k). case 1: if kˆ ∈ R0 (k) then by the above lemma k and the definition of R0 (k) we conclude that k participates infinitely often in σ. case 2: let kˆ ∈ Ri (k) for some i > 0. Then we show by induction on i that k participates infinitely often in σ. Start of induction i = 1: if kˆ ∈ R1 (k) then for all α ∈ C ∪ Comp with ˆ k(α) = ∅ ∃j ∈ R0 (k) with j(α) = ∅. As kˆ participates infinitely often in σ and as there are only finitely many elements in C ∪ Comp there must be ˆ some α with k(α) = ∅ which occurs infinitely often in σ. By definition of R1 (k) ∃j ∈ R0 (k) with j(α) = ∅. Hence j participates infinitely often in σ. As j ∈ R0 (k) we conclude by the above lemma that k participates infinitely often in σ. Induction step i → i + 1: let kˆ ∈ Ri+1 (k). As before there is an α ∈ ˆ C ∪ Comp with k(α) = ∅ and α occurs infinitely often in σ. Some j ∈ Ri (k) participates in this α, hence j participates infinitely often in σ and by induction assumption k participates infinitely often in σ.
Improved Undecidability Results on the Emptiness Problem of Probabilistic and Quantum Cut-Point Languages⋆ Mika Hirvensalo1,2 1
Department of Mathematics, University of Turku, FIN-20014 Turku, Finland 2 TUCS – Turku Centre for Computer Science
[email protected]
Abstract. We give constructions of small probabilistic and MO-type quantum automata that have undecidable emptiness problem for the cut-point languages.
1
Introduction
A finite (deterministic) automaton consists of a finite set of states and a transition function (see [13] for formal definitions and the language acceptance conditions). The Pumping Lemma [13] makes it clear that the emptiness problem of finite deterministic automata is algorithmically solvable. In this article, we study two variants of finite automata: Probabilistic automata [11] and quantum automata of measure-once (MO) type [10]. It is known that the emptiness problems of cut-point languages and strict cut-point languages defined by probabilistic automata are undecidable [11],[1], as is the emptiness problem of cut-point languages defined by quantum automata [2]. Quite surprisingly, the emptiness problem of strict cut-point languages determined by quantum automata turns out to be decidable [2]. In this article, we improve the undecidability results of [1] and [2] by constructing automata with undecidable emptiness problems of smaller size than found previously. In [1] and [2] it has been shown that the emptiness problem for probabilistic cut-point languages and quantum cut-point languages is undecidable for automata sizes 47 and 43, respectively. Here we prove the undecidability results for automata of sizes 25, and 21, respectively.
2
Preliminaries
A vector y ∈ Rn (seen as a column vector) is a probability distribution, if its coordinates are all nonnegative and sum up to 1. A matrix M ∈ Rn×n is called a Markov matrix or stochastic matrix, if all its columns are probability distributions. We also say that a matrix M is doubly stochastic, if M and M T both are stochastic matrices. Markov matrices M have the following property: if y is ⋆
Supported by the Academy of Finland under grant 208797.
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 309–319, 2007. c Springer-Verlag Berlin Heidelberg 2007
310
M. Hirvensalo
a probability distribution, so is M y. Clearly a product of two Markov matrices is again a Markov matrix. A unitary matrix U ∈ Cn×n is a matrix whose columns form an orthonormal set with respect to Hermitian inner product x, y = x∗1 y 1 + . . . + x∗n y n , where c∗ stands for the complex conjugate of c. The orthonormality of the columns is equivalent to U ∗ U = I, where U ∗ is the adjoint matrix of U defined as (U ∗ )ij = (Uji )∗ . Hence for a unitary matrix U we have U ∗ = U −1 , and therefore also U U ∗ = I, which is to say that also the rows of a unitary matrix form an orthonormal set. Another equivalent characterization of the unitarity can be given in terms of the L2 -norm ||x||2 = x, x = |x1 |2 + . . . + |xn |2 .
A matrix U is unitary if and only if ||U x||2 = ||x||2 for each x ∈ Cn [8]. In the sequel we denote ||x||2 = ||x||, unless otherwise stated. It is plain that a product of two unitary matrices is unitary. Any subspace V ⊆ Cn defines an (orthogonal) projection by x → xV , where x = xV + xV ⊥ is the (unique) decomposition of x with xV ∈ V and xV ⊥ ∈ V ⊥ (the orthogonal complement of V ). Each projection is a linear mapping, and it can be shown that P ∈ Cn×n is a matrix of a projection if and only if P 2 = P and P ∗ = P . A probabilistic automaton (PFA, see [11] for further details) over an alphabet Σ is a triplet (x, {Ma | a ∈ Σ}, y), where y ∈ Rn (n = |Σ|) is the initial probability distribution, each Ma ∈ Rn×n is a Markov matrix, and x ∈ Rn is the final state vector whose ith coordinate is 1, if the ith state is final, and 0 otherwise. An equivalent definition of a probabilistic automaton can be given by using a transition function δ : Q × Σ × Q → [0, 1], where Q = {q1 , . . . , qn } is the state set and δ(qi , a, qj ) = (Ma )ji . For any probabilistic automaton P we define a function fP : Σ ∗ → [0, 1] as follows: If w = a1 . . . ar , where ai ∈ Σ, then fP (w) = xT Mar · . . . · Ma1 y.
(1)
The interpretation of (1) is as follows: the ith coordinate of the initial distribution y stands for the probability of the automaton being initially in the ith state. Then, after reading the first letter a1 of the input word, the ith coordinate of vector Ma1 y represents the probability that the automaton has entered ith state. Similarly, Ma2 Ma1 y represents the distribution of states after reading input letters a1 and a2 . Finally, the ith coordinate of Mar · . . . · Ma1 y gives the probability that the automaton is in the ith state after reading the whole input word, and xT Mar · . . . · Ma1 y is the probability that starting from the initial distribution of states and reading word w, the automaton enters into one of the
Improved Undecidability Results on the Emptiness Problem
311
final states (corresponding to those coordinates where x has 1). If w = a1 . . . ar , we use the notation Mw = Ma1 · . . . · Mar , so we can rewrite (1) as fP (w) = xT MwR y, where wR = ar . . . a1 is the mirror image of word w = a1 . . . ar . A measure-once quantum automaton (MO-QFA) (see also [10]) over an alphabet Σ (n = |Σ|) is a triplet (P, {Ua | a ∈ Σ}, y), where y ∈ Cn is the initial amplitude vector of unit L2 -norm, each Ua ∈ Cn×n is a unitary matrix, and P ∈ Cn×n is the measurement projection. A quantum automaton Q defines a function fQ : Σ → [0, 1] by 2
fQ (w) = ||P UwR y|| .
(2)
We also define integer-weighted automata (ZFA) (see [5] for details) exactly as we defined PFA, but instead of initial distribution and Markov matrices, we have an initial vector in Zn and matrices with integer entries. As PFAs, ZFAs could also be defined by the means of transition function δ : Q × Σ × Q → Z. A ZFA Z = (x, {Ma | a ∈ Σ}, y) defines a function fZ : Σ ∗ → Z by fZ (w) = xT MwR y. For PFA and MO-QFA and a fixed λ ∈ [0, 1] we define cut-point languages and strict cut-point languages: For any λ ∈ [0, 1] and automaton A, L≥λ (A) = {w ∈ Σ ∗ | fA (w) ≥ λ} and L>λ (A) = {w ∈ Σ ∗ | fA (w) > λ}. It is known that there are cut-point languages that are not regular [11]. In this article we study both problems L≥λ (A) = ∅? and L>λ (A) = ∅?, and construct PFAs and MO-QFAs having an undecidable emptiness problem of smaller size than found previously. As in [1] and [2], we prove the undecidability results by showing that for a given instance I of Post Correspondence Problem (PCP) (see [7]), one can construct an automaton that accepts words if and only I has a solution. The following theorem [9] is the basis of our constructions: Theorem 1. For k ≥ 7, it is undecidable whether an instance I = {(u1 , v1 ), . . ., (uk , vk )} of PCP has a solution ui1 ui2 . . . uin = vi1 vi2 . . . vin . We will also use the following restriction of PCP [4], [6]: Theorem 2. There are instances I = {(u1 , v1 ), . . . , (uk , vk )} of PCP such that all minimal solutions1 ui1 ui2 . . . uin = vi1 vi2 . . . vin are of form i1 = 1, in = k, and i2 . . . in−1 ∈ {2, . . . , k − 1}+ . For k ≥ 7, PCP remains undecidable even when restricting to these instances. The instances of the above theorem are called Claus instances. In fact, all the undecidability proofs of PCP known to the author are for Claus instances, hence the question “is a given instance of PCP a Claus instance?” is of no importance. 1
A solution to PCP is minimal if it is not a concatenation of two solutions.
312
3
M. Hirvensalo
Probabilistic Automata
Let I = {(u1 , v1 ), . . ., (uk , vk )} be an instance of the PCP. We can assume that ui and vi are over a binary alphabet Σ = {1, 2}, and construct a PFA P such that for some λ ∈ [0, 1] L>λ (P ) = ∅ if and only if I has a solution. We also explain how to modify the construction to get a PFA P ′ such that L≥λ (P ′ ) = ∅ if and only if I has a solution.
∗ Step 1. (Embedding I in integer matrices) Let σ : Σ → N = {0, 1, 2, 3 . . .} be n the bijection defined as σ(ǫ) = 0 and σ(i1 i2 . . . in ) = j=1 ij 2n−j . The first target is to find, for some d, an embedding γ : Σ ∗ × Σ ∗ → Zd×d and (column) vectors x, y ∈ Zd such that xT γ(u, v)y includes expression (σ(u) − σ(v))2 . A construction with a dimension of 6 was given in [1]: ⎛ 2|u| ⎞ 2 0 0 0 0 0 ⎜ 0 0 0 0⎟ 0 2|uv| ⎜ ⎟ 2|v| ⎜ 0 0 2 0 0 0⎟ ⎜ ⎟. γ(u, v) = ⎜ (3) |u| σ(v)2|u| 0 2|u| 0 0⎟ ⎜ σ(u)2 ⎟ |v| |v| |v| ⎝ 0 σ(u)2 σ(v)2 0 2 0⎠
σ(u)2 2σ(u)σ(v) σ(v)2 2σ(u) 2σ(v) 1
It is straightforward to see that γ(u1 , v1 )γ(u2 , v2 ) = γ(u1 u2 , v1 v2 ), and by choosing x1 = (0, 0, 0, 0, 0, 1)T , and y 1 = (−1, 1, −1, 0, 0, 1)T we get xT1 γ(u, v)y 1 = 1 − (σ(u) − σ(v))2 . Hence xT1 γ(u, v)y1 ≤ 1 always, and xT1 γ(u, v)y 1 = 1 if and only if u = v. We define Ai = γ(ui , vi ) for each i ∈ {1, . . . , k}. Clearly I has a solution if and only if xT1 Aj1 Aj2 . . . Ajn y 1 = 1 for some j1 j2 . . . jn ∈ {1, . . . , k}+ , and xT1 Aj1 Aj2 . . . Ajn y 1 ≤ 1 anyway. As before, we denote Aj1 Aj2 . . . Ajn = Aw , where w = j1 j2 . . . jn , and Aǫ is defined to be the identity matrix. Thus I has a solution if and only if xT1 Aw y 1 > 0 for some w ∈ {1, . . . , k}+ (Notice that however xT1 y 1 = 1 > 0, and we will remove this property later). Remark 1. Notice that (x1 , {A1 , . . . , Ak }, y 1 ) is a ZFA with 6 states, over an alphabet of k symbols. Hence the problem “is fZ (w) > 0 for some nonempty word w”? is undecidable for integer-weighted automata. Step 2. (Reducing the number of matrices) We can assume that I is a Claus instance. Since all solutions ui1 . . . uin = vi1 . . . vin of Claus instances have i1 = 1, in = k, and i2 . . . in−k ∈ {2, . . . , k − 1}+ we can define x2 = (xT1 A1 )T and y 2 = Ak y 1 , B1 = A2 , . . ., Bk−2 = Ak−1 to get another ZFA Z = (x2 , {B1 , . . ., Bk−2 }, y 2 ). Notice that Z has 6 states and is over an alphabet of k − 2 symbols. Moreover, fZ (w) = xT2 Bw y 2 = xT1 A1 Bw Ak y 1 , so fZ (w) > 0 for some word w if and only is I has a solution. Now xT2 y 2 ≤ 0, since otherwise xT1 A1 Ak y 1 = 1, which would imply that u1 uk = v1 vk , contradicting Theorem 2. Step 3. (Reducing the number of matrices to 2) Let us denote the transition function of the ZFA Z introduced in Step 2 by δ(qi , c, qj ) = (Bc )ji for each
Improved Undecidability Results on the Emptiness Problem
313
i, j ∈ {1, . . . , 6} and c ∈ {1, . . . , k − 2}. To find two matrices C1 and C2 that will encode the essential properties of B1 , . . . , Bk−2 , we encode the k − 2 input symbols of automaton Z into binary strings and add some extra states adjacent to each state of Z that will decode the binary strings back to symbols in the set {1, . . . , k − 2}. However, there is no need to attach the “decoding states” to the state q6 , since according to (3) we have, for each c, δ(q6 , c, qi ) = 1, if i = 6 and 0 otherwise. We will use an injective morphism ψ : {1, . . . , k − 2}∗ → {1, 2}∗ defined as ψ(i) = 1i−1 2 for i < k − 2, and ψ(k − 2) = 1k−3 . Now if {q1 , . . . , q6 } is the state set of automaton Z, we define a new automaton Z ′ with states qi,j , where i ∈ {1, . . . , 5} and j ∈ {1, . . . , k −3}, plus state q6,1 , so we have 5(k −3)+1 = 5k −14 states altogether for Z ′ . We define the new transition function δ ′ so that when reading a sequence of 1’s, the automaton will move from a state of form qi,j into the state qi,j+1 (case j = k−3 is an exception), thus counting how many 1’s have been read so far. In all these transitions, weight of 1 is introduced. When the first 2 or the (k−2)th 1 occurs, the automaton moves to the state of form qr,1 introducing the weight corresponding to δ(qi , a, qr ) of the original automaton, where a ∈ {1, . . . , k − 2} is the letter whose encoding ψ(a) equals to the string 1 . . . 12 or (1k−2 ) that was recently read. More precisely: the transition function δ ′ of the automaton Z ′ will be defined as (for (i, r) = (6, 6)) ⎧ ⎨ δ(qi , k, qr ), if j = k − 3, and s = 1, ′ if i = r < 5 and j + 1 = s < k − 2, δ (qi,j , 1, qr,s ) = 1, ⎩ 0 otherwise. δ ′ (qi,j , 2, qr,s ) =
δ(qi , j, qr ) if s = 1, 0 otherwise,
δ ′ (q6,1 , c, q6,1 ) = 1 for c ∈ {1, 2}, and δ ′ (qi,j , c, qr,s ) = 0 for the cases not defined before. See Figure 1 for a graphical representation of the automaton Z ′ .
Fig. 1. Automaton Z ′ . The weights of the arrows are not shown in the picture.
314
M. Hirvensalo
Finally we enumerate all 5k − 14 states qi,j in some way, and define a vector x3 ∈ Z5k−14 whose all coordinates are zero, except each corresponding to state qi,1 (i ∈ {1, . . . , 6}), whose value is chosen to be (x2 )i . Vector y 3 ∈ Z5k−14 is defined analogously. We denote the transition matrices of this new automaton by C1 and C2 . The dimension of the matrices is 5k − 14. With these definitions, xT3 Cψ(w)R y 3 = xT2 BwR y 2 for each w ∈ {1, . . . , k − 2}∗ , and xT3 CwR y 3 ≤ 0, if w ∈ Σ + is not in the image of ψ. To see that the latter claim holds, we observe that any w ∈ {1, 2}∗ that is not in the image of ψ is of form w = ψ(w1 )r, where r ∈ {1, 11, . . . , 1k−3 }. For such a word w, we always have xT3 CwR y 3 = 1 − (σ(uw′ ) − σ(vw′ ))2 for some w′ ∈ 1{1, . . . , k − 1}+ , and because I is a Claus instance, xT3 CwR y 3 ≤ 0. Hence I has a solution if and only if there is a w ∈ Σ + such that xT3 Cw y 3 = 1. Notice again that xT3 Cw y 3 ≤ 1 for each w ∈ Σ ∗ , so I has a solution if and only if xT3 Bw y 3 > 0 for some w ∈ Σ + . Step 4. (Changing the initial and final vectors into probability distributions) For i ∈ {1, 2} let ⎞ ⎛ 0 0 0 Ci 0 ⎠ , Di = ⎝ Ci y 3 T x3 Ci y 3 xT3 Ci 0
and notice that Du Dv = Duv . Hence with x4 = (0, . . . , 0, 1)T , y 4 = (1, 0, . . . , 0)T , we have clearly xT4 Dw y 4 = xT3 Cw y 3 if w = ǫ, and xT4 y 4 = 0. Now each Di is a (5k − 12) × (5k − 12)-matrix and x4 and y 4 are probability distributions. Furthermore, I has a solution if and only if xT4 Dw y 4 > 0 for some w ∈ Σ ∗ . Step 5. (Embedding the matrices in stochastic ones, Part 1) This and the following part of the construction is due to P. Turakainen [16]. Define (5k − 10) × (5k − 10)-matrices E1 and E2 by ⎞ ⎛ 0 0 0 Ei = ⎝ ti Di 0 ⎠ , si r Ti 0
where ti , ri , and si are chosen so that the row and the column sums of Ei are zero. that the sums of coordinates of ti and r i are equal (both equal to Notice − r s (Di )rs ), hence si is definable. It is easy to verify that the central block matrix of Euv is Duv , and that the columns and rows sums stay zero when performing the multiplication. Let x5 = (0, xT4 , 0)T and y 5 = (0, y T4 , 0)T . Then xT5 Ew y 5 = xT4 Dw y 4 and hence xT5 Ew y 5 > 0 for some word w ∈ Σ ∗ if and only if I has a solution.
Step 6. (Embedding the matrices in stochastic ones, Part 2) Let 1 be an n × nmatrix with all entries 1. Clearly 12 = n1, which implies that 1i = ni−1 1 for i ≥ 1. In the continuation, n will be chosen as n = 5k − 10. Since the row and columns sums of each Ew (w = ǫ) are zero, we have Ew 1 = 1Ew = 0, whenever w = ǫ. Define F1 and F2 by Fi = Ei + c1, where c ∈ N is chosen so large that each entry of F1 and F2 is positive. Then the sum of the entries of Fi in each column (and row) is equal to c(5k − 10), and consequently matrices
Improved Undecidability Results on the Emptiness Problem
Gi =
315
1 Fi c(5k − 10)
are (doubly) stochastic. Since Ei 1 = 1Ei = 0, we have Fw = Ew + (c1)|w| = Ew + c|w| (5k − 10)|w|−1 1 whenever w = 1, which implies that Gw =
1 1 1. Ew + 5k − 10 (c(5k − 10))|w|
Now letting x6 = x5 , y 6 = y 5 we get (to compute xT6 1y 6 , recall that x6 and y 6 have exactly one coordinate equal to 1, and all other coordinates 0). xT6 Gw y 6 =
1 1 . xT Ew y 5 + 5k − 10 (c(5k − 10))|w| 5
Hence I has a solution if and only if there is w ∈ Σ ∗ such that xT6 Gw y 6 >
1 , 5k − 10
1 (P ) = ∅ and (x6 , {G1 , G2 }, y 6 ) is a (5k − 10 -state) PFA P such that the L> 5k−10 if and only if I has a solution.
Remark 2. According to Theorem 2, we conclude that the problem L>λ (P ) is 1 undecidable for a 5 · 7 − 10 = 25-state PFA over a binary alphabet for λ = 25 . Modification: Step 3.5. We can define matrices Ci 0 ′ Ci = 0 1 ′ ′ T and x′3 = (xT3 , 1)T , y ′3 = (y T3 , −1)T to notice that x′T 3 Cw y 3 = x3 Cw y 3 −1. Hence ′T ′ ′ + x3 Cw y 3 ≥ 0 if and only if I has a solution w ∈ Σ . Then the construction above gives an automaton P ′ with 5k − 9 states such that L≥λ (P ′ ) = ∅ if and only if I has a solution.
4
Quantum Automata
For quantum automata, we begin the constructions by finding two 2 × 2 unitary complex matrices that generate a free semigroup. These matrices will form the basis of our constructions. 1 3 4i 1 3 −4 , U2 = , and y = (1, 0)T . If Lemma 1. Let U1 = 5 4 3 5 4i 3 Uc1 · . . . · Ucr y = Ud1 · . . . · Uds y, where c1 . . . cr , d1 . . . ds ∈ {1, 2}∗, then r = s and ci = di for each i.
(4)
316
M. Hirvensalo
Proof. We say that a product T1 · . . . · Tr ,
(5)
where each Ti ∈ {U1 , U1−1 , U2 , U2−1 }, is reduced, if patterns Ui Ui−1 and Ui−1 Ui do not occur in (5). Following the proof in [14] we will show by induction on r that each reduced product (5), where r > 0, is of the form 1 ar ∗ , (6) 5r br ∗ where ar , br ∈ Z[i] and ar is not divisible by 5. To start with, the case r = 1 is trivial. Now we assume that the claim holds for reduced products at most r and divide the induction step for a reduced product T of length r + 1 into several cases: For ǫ, ǫ1 , ǫ2 ∈ {−1, 1}, we have either T = U2ǫ2 U1ǫ1 T ′ , T = U1ǫ1 U2ǫ2 T ′ , T = U1ǫ U1ǫ T ′ , or T = U2ǫ U2ǫ T ′ . Multiplication of (6) from left by U1ǫ1 and U2ǫ2 gives recurrences ar+1 3ar − ǫ1 4br ar+1 3ar + ǫ2 4ibr = , = and br+1 ǫ1 4ar + 3br br+1 ǫ2 4iar + 3br respectively, and hence we can find out that in the first case ar+1 = 3ar + ǫ2 4ibr = 3ar + ǫ2 4i(ǫ1 4ar−1 + 3br−1 ) = 3ar + ǫ1 ǫ2 16iar−1 + 12iǫ2 br−1 = 3ar + ǫ1 ǫ2 25iar−1 − ǫ1 ǫ2 9iar−1 + 12iǫ1 br−1
= 3ar + ǫ1 ǫ2 25iar−1 − ǫ1 ǫ2 3i(3ar−1 − 4ǫ1 br−1 ) = (3 − ǫ1 ǫ2 3i)ar + ǫ1 ǫ2 25iar−1. In the rest of the cases we have ar+1 = (3 + ǫ1 ǫ2 3i)ar − ǫ1 ǫ2 25iar−1, ar+1 = 6ar − 25ar−1, and ar+1 = 6ar − 25ar−1 , respectively. In all the cases we can use the induction assumption 5 ∤ ar to get 5 ∤ ar+1 . Denoting u = c1 . . . cr , v = d1 . . . ds ∈ Σ ∗ We can write equation (4) in a more compact way as (7) Uu y = Uv y, where |u| = r and |v| = s. If (7) holds for some u = v, we can assume without loss of generality that Uu = U1 Uu′ and Uv = U2 Uv′ . Thus we get −1 U1 Uu′ y = y, Uv−1 ′ U2
(8)
−1 where Uv−1 U1 Uu′ is a reduced product of length r + s ≥ 1, and we can ′ U2 write (8) as r+s 5 ar+s . = 0 br+s
This contradicts the previously observed fact that 5 ∤ ar+s . Notice that the same contradiction can be obtained also if one of the words u or v is empty. ⊓ ⊔
Improved Undecidability Results on the Emptiness Problem
317
Corollary 1. The semigroup generated by the unitary matrices U1 and U2 is free. Proof. If Uu = Uv , then also Uu y = Uv y, and the previous lemma implies that u = v. ⊓ ⊔
For u, v ∈ Σ ∗ we define
γ(u, v) =
1 2
Uu + Uv Uu − Uv Uu − Uv Uu + Uv
(9)
If is a straightforward task to verify that γ(u, v) is a unitary matrix, and that γ(u1 , v1 )γ(u2 , v2 ) = γ(u1 u2 , v1 v2 ). Moreover, 1 (Uu + Uv )y γ(u, v)(1, 0, 0, 0)T = . (10) 2 (Uu − Uv )y By Lemma 1, u = v if and only if the two last coordinates of (10) are zero. Hence if we denote y 1 = (1, 0, 0, 0)T and ⎛ ⎞ 0000 ⎜0 0 0 0⎟ ⎟ P1 = ⎜ ⎝0 0 1 0⎠, 0001 2
then P1 is the projection onto the last two coordinates and ||P1 γ(u, v)y 1 || = 0 if and only if u = v. Step 1. (Embedding an instance of PCP in unitary matrices) Let again I = {(u1 , v1 ), . . . , (uk , vk )} be an instance of PCP, ui and vi over a binary alphabet Σ = {1, 2}. We define Ai = γ(ui , vi ) for each i ∈ {1, . . . , k}. Hence I has 2 a solution if and only if there exists w ∈ {1, . . . , k}+ such that ||P1 Aw y 1 || = 0.
Step 2. (Getting rid of P1 y 1 = 0 and reducing the number of matrices) We assume that I = {(u1 , v1 ), . . . , (uk , vk )} is a Claus instance, i.e., an instance of PCP such that all solutions ui1 . . . uin = vi1 . . . vin are of form i1 = 1, in = k, and i2 . . . in−1 ∈ {2, . . . , k − 1}+ . We define B1 = A2 , . . ., Bk−2 = Ak−1 . A new initial vector is defined as y 2 = Ak y 1 , and a new final projection is defined as P2 = A−1 1 P1 A1 . Since A1 and Ak are unitary, it is easy to see that ||y 2 || = 1, and that P2 is a projec −1 2 2 = tion. Since also A−1 1 is unitary, we have ||P2 Bw y 2 || = A1 P1 A1 Bw Ak y 1 2 2 ||P1 A1 Bw Ak y 1 || , so ||P2 Bw y 2 || = 0 if and only if I has a solution. Moreover, 2 ||P2 y 2 || = ||P1 A1 Ak y|| = 0, since u1 uk = v1 vk , because we are dealing with a Claus instance. Step 3. (Reducing the number of matrices to 2) Define ⎞ ⎛ ⎛ ⎞ ⎛ 0 I 0 ··· 0 B1 0 · · · 0 P2 0 ⎜0 0 I ... 0⎟ ⎜ 0 B2 . . . 0 ⎟ ⎟ ⎜ ⎜ 0 P2 ⎜ ⎟ ⎜. . . ⎜ .⎟ C1 = ⎜ . . . . ⎟ , C2 = ⎜ .. .. .. . . . .. ⎟ , and P3 = ⎜ .. .. ⎜ ⎟ ⎝ .. .. . . .. ⎠ ⎝ . . ⎝0 0 0 ··· I ⎠ 0 0 · · · Bk−2 0 0 I 0 0 ··· 0
··· ··· .. .
0 0 .. .
· · · P2
⎞
⎟ ⎟ ⎟. ⎠
318
M. Hirvensalo
C1 and C2 are clearly unitary 4(k − 2) × 4(k − 2)-matrices. Let also y 3 = (y T2 , 0T , . . . , 0T )T . It is easy to verify that C2 C1 C2−1 = diag(B2 , . . . , Bk−2 , B1 ), which implies that C2i C1 C2−i = diag(Bi+1 , . . . , Bk−2 , B1 , . . . , Bi ). Now C2−1 = C2k−3 , so the inverse can be always replaced with a positive power, and hence for any word w ∈ {1, . . . , k − 2}∗ there is a word w′ ∈ Σ ∗ such that Cw′ = diag(Bw , . . .). On the other hand, both C1 and C2 are block matrices with exactly one nonzero block in each row and column. The said property is always inherited to the products formed of C1 and C2 , and hence Cw for any w ∈ Σ ∗ is a block matrix that has at most one nonzero block in each row and column, but any nonzero block in Cw is of the form Bw′ , where w′ ∈ {1, . . . , k − 2}∗ . 2 Hence I has a solution if and only if ||P3 Cw y 3 || = 0 for some w ∈ Σ ∗ . 2 Notice carefully that ||P2 y 2 || = 0 implies that also ||P3 y 3 ||2 = 0. This is a very important feature in this step, since if we would have ||P2 y 2 ||2 = 0 (as would be the case without Step 2), the new automaton would always allow words of the 2 r(k−2) 2 form 2r(k−2) , since P3 C2 y 3 = ||P3 y 3 || for each r ∈ Z.
Step 4. (Setting the threshold) Notice that since I−P3 is a projection orthogonal 2 2 2 to P3 , we have ||Cw y 3 || = ||P3 Cw y 3 || +||(I −P3 )Cw y 3 || , and since ||Cw y 3 || = 1 2 always (each Cw is unitary), we have ||(I − P3 )Cw y 3 || ≤ 1 with equality if and 2 only if I has a solution. Therefore, ||(I − P3 )Cw y 3 || ≥ 1 for some w ∈ Σ ∗ if and only if I has a solution. Let 0 < λ < 1 and define, for each i ∈ Σ, Ci 0 . Di = 0 1 √ √ Let also y 4 = ( λy T3 , 1 − λ)T ∈ R4k−7 , and I − P3 0 P4 = . 0 0 Now D1 and D2 are (4k − 7) × (4k − 7)-matrices, and ||P4 Dw y 4 ||2 = 2 √ 2 2 λ(I − P3 )Cw y 4 = λ(1−||P3 Cw y 3 || ). Thus ||P4 Dw y 4 || ≥ λ for some word w ∈ Σ ∗ if and only if I has a solution. If an automata with defining constants in Q[i] is required, one can choose 9 λ = 25 , for example. From the construction it follows that Q = (P4 , {D1 , D2 }, y4 ) is MO-QFA such that L≥λ (Q) = ∅ if and only if I has a solution. Remark 3. Letting k = 7 we see that the problem studied is undecidable for a 21-state MO-QFA over a binary alphabet. Skipping Step 3 we could as well obtain the undecidability result for a 4-state MO-QFA over a 5-symbol alphabet. Acknowledgement. Thanks to Dr. Vesa Halava for pointing out Theorem 2 and to Dr. Matti Soittola for reviewing earlier versions of this article and pointing
Improved Undecidability Results on the Emptiness Problem
319
out the usefulness of [14] and [15] when proving Lemma 1. Thanks also to the anonymous referee for pointing out that we can in fact save one more state when constructing the probabilistic automaton. In fact, since a vector x3 is 0 0 with nonnegative entries, we could in Step 4 define Di = , x4 = Ci y 3 Ci (0, xT3 )T , and y 4 = (1, 0, . . . , 0)T to get xT4 Dw y 4 = xT3 Cw y 3 . Then we can simply renormalize vector x4 into a probability distribution. Well, in the definition of probabilistic automata it was required that y is a probability distribution and x a vector with entries in {0, 1}, but since the construction eventually leads into doubly stochastic matrices, we can take the transpose of each matrix and swap vectors x and y to satisfy the definition.
References 1. Blondel, V.D. and Canterini, V.: Undecidable Problems for Probabilistic Automata of Fixed Dimension. Theory of Computing Systems 36 (2003) 231–245 2. Blondel, V.D., Jeandel, E., Koiran, P., and Portier, N.: Decidable and Undecidable Problems About Quantum Automata. SIAM Journal of Computing 34 6 (2005) 1464–1473 3. Derksen, H., Jeandel, E., and Koiran, P.: Quantum Automata and Algebraic Groups. Journal of Symbolic Computation 39 (2005) 357–371 4. Claus, V.: Some Remarks on P CP (k) and Related Problems. Bulletin of EATCS 12 (1980) 54–61 5. Eilenberg, S.: Automata, Languages, and Machines Vol. A. Academic Press (1974) 6. Halava, V., Harju, T., and Hirvensalo, M.: Lowering the Undecidability Bounds for Integer Matrices Using Claus Instances of the PCP. TUCS Technical Report 766 (2006) 7. Harju, T. and Karhum¨ aki, J.: Morphisms. In G. Rozenberg and A. Salomaa (eds), Handbook of Formal Languages, Springer (1997) 8. Hirvensalo, M.: Quantum Computing, 2nd edition. Springer (2003) 9. Matiyasevich, Y. and S´enizergues, G.: Decision Problems for Semi-Thue Systems with a Few Rules. Theoretical Computer Science 330 1 (2005) 145–169 10. Moore, C. and Crutchfield, J.P.: Quantum Automata and Quantum Grammars. Theoretical Computer Science 237 (2000) 275–306 11. Paz, A.: Introduction to Probabilistic Automata. Academic Press (1971) 12. Renegar, J.: On the Complexity and Geometry of the First-Order Theory of the Reals. Parts I, II, and III. Journal of Symbolic Computation 13 3 (1992) 255–352 13. Sheng Y.: Regular Languages. In G. Rozenberg and A. Salomaa (eds), Handbook of Formal Languages, Springer (1997) ´ 14. Swierczkowski, S.: On a Free Group of Rotations of the Euclidean Space. Indagationes Mathematicae 20 (1958) 376–378 ´ 15. Swierczkowski, S.: A Class of Free Rotation Groups. Indagationes Mathematicae (N.S.) 5 2 (1994) 221–226 16. Turakainen, P.: Generalized Automata and Stochastic Languages. Proceedings of American Mathematical Society 21 (1969) 303–209
On the (High) Undecidability of Distributed Synthesis Problems David Janin LaBRI, Université de Bordeaux I 351, cours de la libération, F-33 405, Talence cedex, France
[email protected]
Abstract. The distributed synthesis problem [11] is known to be undecidable. Our purpose here is to study further this undecidability. For this, we consider distributed games [8], an infinite variant of Peterson and Reif multiplayer games with partial information [10], in which Pnueli and Rosner’s distributed synthesis problem can be encoded and, when decidable [11,6,7], uniformly solved [8]. We first prove that even the simple problem of solving 2-process distributed game with reachability conditions is undecidable (Σ10 -complete). This decision problem, equivalent to two process distributed synthesis with fairly restricted FO-specification was left open [8]. We prove then that the safety case is Π10 -complete. More generally, we establish a correspondence between 2-process distributed game with Mostowski’s weak parity conditions [9] and levels of the arithmetical hierarchy. finally, distributed games with general ω-regular infinitary conditions are shown to be highly undecidable (Σ11 -complete).
1
Introduction
In this paper, we study the undecidability of the distributed synthesis problem as introduced by Pnueli and Rosner [11]. This problem can be stated as follows: Given a distributed architecture (finitely many sites interconnected through a given network with some specified global input channels and global output channels) and a global specification of expected correct architecture’s global behaviors (defining a set of mappings that map global input sequences to global output sequences say in First Order (FO) or even Monadic Second order (MSO) Logic), is there a distributed program (a set of mappings, one per site, that maps sequences of local inputs to sequences of local outputs) which resulting global behavior satisfies the global specification ? With a specification language as simple FO, on the architecture defined by two independent sites with independent (global) input channels and (global) output channels (see Figure 1), this problem is known to be undecidable [11]. Analyzing Pnueli and Rosner’s proof, one can observe that with reachability conditions (FO global specifications essentially stating that some properties eventually occur) the distributed synthesis problem is Σ10 -complete in the Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 320–329, 2007. c Springer-Verlag Berlin Heidelberg 2007
On the (High) Undecidability of Distributed Synthesis Problems
321
Fig. 1. An undecidable architecture
arithmetical hierarchy, i.e. inter-reducible to the halting problem of Turing Machine (TM). With safety conditions (FO global specification essentially stating that some bad properties never occur, allowing thus infinitary behaviors) one may conjecture (although the two kinds of problems are not dual) that the distributed synthesis problem is Π10 -complete in the arithmetical hierarchy. More generally, one may ask what is the relationship between the expressiveness of the global specification language (say within FO logic or even within MSO logic with more advanced infinitary conditions) and the complexity of the resulting distributed synthesis problems. In this paper, we give an answer to this question. Main Results We first prove the following refinement of Pnueli and Rosner’s result: Theorem 1. The 2-process reachability distributed synthesis problem is Σ10 complete even with fairly restricted global specification : universally quantified k-local FO-properties. Next, we prove that: Theorem 2. The 2-process safety distributed synthesis problem with FO specification is Π10 -complete. Since the set of finite state (or even computable) distributed programs is Σ10 definable, this result also implies that : Corollary 1. There exist safety distributed synthesis problems with distributed solutions but no computable ones. We study then relationships between more general infinitary conditions for distributed synthesis problems and levels of the arithmetical (or the analytical) hierarchy. More precisely, we show that: Theorem 3. For every integer n ≥ 1, solving distributed synthesis problem with two (or more) processes and weak parity conditions (in the sense of Mostowski [9]) of range [0, n + 1] (resp. [1, n]) is Πn0 -complete (resp. Σn0 -complete). With more complex infinitary conditions: Theorem 4. The distributed synthesis problem with Büchi infinitary condition (or more general parity infinitary condition) is highly undecidable (Σ11 -complete).
322
D. Janin
Related Works These results are achieved by proving analogous statements for 2-process distributed games [8]. These games are a variant (with infinite behaviors) of Peterson and Reif’s multiplayer games with partial information [10] into which distributed synthesis problems can be encoded and, in most decidable cases [11,6,7], uniformly solved [8]. A 2-process distributed game is equivalent to a 2-site distributed synthesis problem with fairly restricted specification. In fact, the global specification in a distributed game is implicitly encoded into (1) the description of players possible moves and (2) an infinitary condition specifying the allowed infinite behaviors. In logical terms, moves are specified by universally quantified k-local formulas. With reachability condition, i.e. when no infinite behavior is allowed, a 2-process distributed game is thus a fairly restricted 2-site distributed synthesis problem. General 2-site distributed synthesis problems with arbitrary LTL specification are not known to be reducible to 2-process distributed games [8]. The best known reduction is to add a third process that plays the role of (an automaton translation of) the external (LTL or even MSO) specification [2]. This makes our undecidability result in the reachability case stronger than Pnueli and Rosner’s result [11]. In Mohalik and Walukiewicz’s work [8], the decidability of 2-process distributed games was also left open. Our results show they are not decidable even with the simplest infinitary conditions.
2
Background
A word on an alphabet A is a function w : ω → A with prefix-closed domain. When d om(w) is finite, we say w is a finite word, otherwise, w is an infinite word. The set of finite words (resp. finite non empty words) on alphabet A is written A∗ (resp. A+ ), the set of infinite words is written Aω , the set of finite or infinite words is written A∞ . The empty word is written ǫ. The catenation of every word u ∈ A∗ and every word v ∈ A∞ is written u.v. Given two sets A and B, we write πA or π1 (resp. πB or π2 ) the projection from A × B onto A (resp. from A × B onto B). These notations are extended to any subset of A × B and words on A × B. Given P ⊆ A × B (resp. w ∈ (A × B)∞ ), we also write P [1] = πA (P ) and P [2] = πB (P ) (resp. w[1] = πA (w) and w[2] = πB (w)). Given w ∈ (A + B)∞ , we also write πA (w) (resp. πB (w)) the word obtained from w by deleting all letters not in A (resp. not in B). In the sequel, we will use languages of infinite words as infinitary acceptance conditions. We first review here the definition we will use. Definition 1 (Parity and weak parity condition [9]). Let L ⊆ Aω be a language of infinite words. Language L is called a parity condition when there are some integers m and n ∈ IN with m ≤ n and some priority mapping Ω : A → [m, n] such that L = {w ∈ Aω : lim inf Ω(w) ≡ 0(2)}, i.e. L is the set of infinite sequences where the least priority that occurs infinitely often is
On the (High) Undecidability of Distributed Synthesis Problems
323
even. Language L is a weak parity condition when there is a priority mapping Ω : A → [m, n] as above such that L is moreover restricted to sequence w ∈ Aω such that Ω(w) is an increasing sequence of priorities. In both cases, interval [m, n] is called the range of the parity condition. A safety condition is a parity (or weak parity) condition with range [0] (hence with L = Aω ) and a reachability condition is a parity (or weak parity) condition with range [1] (hence with L = ∅). Distributed games [8] are special kind of multiplayer games with partial information [10] extended to infinite plays with a cooperating Process team playing against a unique Environment player. Definition 2 (Distributed Game Arenas). A one-Process (two Players) game arena is a quadruple G = P, E, T, e with set of Process positions P , set of Environment positions E, set of possible transition moves T ⊆ P × E ∪ E × P and initial position e ∈ E. Given n one-Process game arenas Gi = Pi , Ei , Ti , ei for i ∈ [1, n], a synchronous distributed game arena G built fromthe local game arenas G1 , . . . , Gn , is a game arena G = P, E, T, e with P = i Pi , E = i Ei , e = (e1 , · · · , en ) and such that the set of moves T satisfies the following conditions: for every u ∈ P and v ∈ E – P-moves : (u, v) ∈ T if and only if for every i ∈ [1, n], (u[i], v[i]) ∈ Ti , – E-moves : if (v, u) ∈ T then for every i ∈ [1, n], (u[i], v[i]) ∈ Ti . Observe that there is a unique distributed game arena built from the local arenas G1 , . . . , Gn with maximal set of Environment moves. This arena, written G1 ⊗ · · ·⊗ Gn , is called the free synchronous product of the games G1 , . . . , Gn . Observe that any other distributed arena G built from the same local games can just be seen as a subgame of the free product obtained by possibly disallowing some Environment moves. We simply write G ⊆ G1 ⊗ · · · ⊗ Gn to denote that. In [8], a more general notion of distributed with asynchronous moves is defined. The additional expressiveness gained with asynchronism is not used in this paper. Since we are essentially establishing lower bounds result, this fact makes statements even stronger. Definition 3 (Plays and strategies). Given a two player game arena G = P, E, T, e , a strategy for the Process player (resp. a strategy for the Environment player) is a mapping σ : P + → E (resp. a mapping τ : E + → P ). The play induced by strategies σ and τ , written σ ∗ τ , is defined to be the maximal word w ∈ (P + E)∞ such that w(0) = e and, for every i ∈ d om(w) with i > 0, (w(i − 1), w(i)) ∈ T and, given w′ = w(0) · · · w(i − 1), if w(i − 1) ∈ P then w(i) = σ ◦ πP (w′ ) and if w(i − 1) ∈ E then w(i) = τ ◦ πE (w′ ). A process strategy σ is non blocking when, for every counter strategy τ , σ ∗τ ∈ (P + E)∗ .E ∪ (P + E)ω . Given an n-process game arena G ⊆ G1 ⊗· · ·⊗Gn , a Process strategy σ : P +→ E is a distributed strategy where there is a set of local process strategies {σi : Pi+ → E}i∈[1,n] such that, for every word w ∈ P + , σ(w) = (σ1 ◦πP1 (w), · · · , σn ◦πPn (w).
324
D. Janin
In other words, a Process strategy is distributed when every local Process player only plays following its own local view of a global play. Definition 4 (Games and winning strategies). A (weak or parity) game is an tuple G = P, E, T, e, Acc where P, E, T, e is a game arena and Acc ⊆ (P + E)ω is an additional (weak or parity) infinitary condition A game G (resp. a distributed game) is winning for player P (resp. for the Process team) when there is a Process strategy (resp. a distributed Process strategy) σ : P + → E such that,for every counter strategy τ : E + → P , σ ∗ τ ∈ (P + E)∗ .E ∪ Acc, i.e. every maximal play allowed by σ is either finite and ends in a position of player E, or is infinite and belongs to Acc.
3
Tilings and Quasi-tilings
In order to prove lower bounds results in next section, we review in this section the notions of finite and infinite tilings [1,5]. Definition 5 (Tilings). Let {n, s, w, e} be the four cardinal directions of the plan. Given a finite set of colors C with a distinguished color # called the border color, a tile is a mapping t : {n, s, w, e} → C that assigns to each cardinal direction a color of C with the additional requirement that t(s) = # and t(w) = #, i.e. color # will only be used to define East or N orth borders.
Fig. 2. A finite set of tiles
Given a finite set S of tiles (see Figure 2), a tiling is a partial function m : ω × ω → S such that d om(m) = [0, M − 1] × [0, N − 1] for some (M, N ) ∈ ω × ω when m is a finite tiling or d om(m) = ω × ω when m is an infinite tiling, such that: for all (i, j) ∈ d om(mσ ), N/S-compatibility: if (i, j + 1) ∈ d om(m) then m(i, j)(n) = m(i, j + 1)(s), E/W -compatibility: if (i + 1, j) ∈ d om(m) then m(i, j)(w) = m(i + 1, j)(e), E-border condition: (i + 1, j) ∈ / d om(m) if and only if m(i, j)(e) = #, and N -border condition: (i, j + 1) ∈ / d om(m) if and only if m(i, j)(n) = # (see Figure 3 with color black standing for the border color #).
Fig. 3. A tiling
Theorem 5 (Berger [1], Harel [5]). Given a set of colors C and a set S of tiles with a distinguished tile t0 ∈ S: (1) the problem of finding M and N and a finite M × N -tiling m such that m(0, 0) = t0 is Σ10 -complete, (2) the problem
On the (High) Undecidability of Distributed Synthesis Problems
325
of finding an infinite tiling m such that m(0, 0) = t0 is Π10 -complete, and (3) the problem of finding an infinite tiling m such that m(0, 0) = t0 and one given color , say red, occurs infinitely often is Σ11 -complete.
4
Towards the Proofs: Quasi-tilings
The notion of quasi-tiling defined below and encoded into one process game is essential for our encoding (in the remaining sections) of tilings into 2-process distributed games. Definition 6 (Quasi-tilings). A function m : ω × ω → S is a quasi-tiling (see Figure 4) when it satisfies N/S-compatibility and N -border condition on every column, E-border condition on every line, and E/W -compatibility on the first line.
Fig. 4. A quasi-tiling
It occurs that, for every finite set of tiles S and initial tile t0 ∈ S, there exists a one process (two player) game GS,t0 that encodes all quasi-tiling m : ω × ω → S as non blocking strategies for player E. Definition 7 (Quasi-tiling games). Given a finite set of color C, a finite set of C-colored tiles S and an initial tile t0 , let GS,t0 = P, E, T, i be the two player game arena defined by: – P = ({e, n} × S × {P roc})∪{⊥}, E = ({e, n} × S × {Env})∪{∗} and i = ∗, – T is the set of all pairs of the form ((d, t, P roc), (d, t′ , Env)) ∈ P × Env such that, if d = e then t′ (w) = t(e) and if d = n then t′ (s) = t(n) and t′ (e) = # if and only if t(e) = # (Process moves) plus the set of all pairs of the form (∗, (x, t0 , P roc)) or ((x, t, Env), ⊥) plus all pairs of the form ((d, t, Env), (d′ , t, P roc)) ∈ E × P such that, if d = e then d′ ∈ {e, n} and if d = n or t(e) = # and then d′ = n (Environment moves). The intuition behind this definition is that player E chooses along a word of the form ei .nω and, for every prefix ei .nj of this word, player P answers by choosing a tile for position (i, j). Since player E chooses where to turn the full set ω × ω is potentially covered. It turns out, as precisely stated in next Lemma, that player P non blocking strategies just define all quasi-tilings. Lemma 1 (Quasi-tilings and strategies). For every non blocking strategy σ : P + → E, in game GS,t0 , there is a unique quasi-tiling mσ : ω × ω → S such
326
D. Janin
that for all (i, j) ∈ ω ×ω, (i, j) ∈ d om(mσ ) if and only if there is counter strategy τ : E + → P such that π1 ◦ πP (σ ∗ τ ) = ei .nj and π2 (πP (σ ∗ τ )(i + j)) = mσ (i, j) (with in particular, mσ (0, 0) = t0 ). Conversely, for every quasi-tiling m such that m(0, 0) = t0 there is a non blocking strategy σm in game Gm,t0 such that mσm = m. Proof. By construction, in every play, player E’s task is to chose, at every step, a direction e or n and, when direction n has been chosen, or when the East border is reached, to choose repeatedly direction n. It follows that every (blocking) strategy for player E that avoids position ⊥ can be described by (1) choosing some (i, j) ∈ ω × ω and (2) playing the successive directions described by the word ei .nj - provided player P does not create the East border. Against player E, player P ’s strategy just amounts to choose, for every (i′ , j ′ ) ≤x (i, j) a tile ti′ ,j ′ . It should be clear that this choice is independent from (i, j) (chosen but by player E but unknown to player P ) so we can define mσ (i′ , j ′ ) = ti′ ,j ′ . The fact that mσ is a quasi-tiling immediately follows from game GS,t0 definition. The converse property is also immediate. Observe that, in game GS,t0 , player P chooses to define a tiling bounded in the East direction by choosing the first tile t such that t(e) = #.
5
Undecidability Results: Ground Cases
Theorem 6 (Safety case). The problem of finding a winning distributed strategy in a 2-process distributed game with safety condition is Π10 -complete. Proof. Clearly, solving a safety distributed game is Π10 . It remains to prove that it is also Π10 -hard. In order to do so, we encode the infinite tiling problem into a safety distributed game. Let S be a finite set of tiles and let t0 ∈ S be a given initial tile. The idea is to build a distributed game G from the free product GS,t0 ⊗ GS,t0 with safety condition in such a way that player E checks that (1) if a distributed strategy σ1 ⊗ σ2 is non blocking then σ1 = σ2 , and (2) a distributed strategy of the form σ ⊗ σ is winning if and only if the quasi-tiling mσ : ω × ω → S is also a tiling of ω × ω, i.e. it satisfies the E/W -compatibility condition on all lines. This is done as follows. We first assume, without lost of generality, that every position in game game GS,t0 is (1) extended with a new bit value that is positioned by Environment player’s first move as explained below and (2) also extended in such a way the last two tiles t (current) and t′ (previous) chosen by player Process are readable in Environment positions. Environment moves in the product GS,t0 ⊗ GS,t0 are then defined as follows. From the initial position (∗, ∗) player E moves to a position with an arbitrary pair of bit values (one on every side), and, according to these values (that remained unchanged later on):
On the (High) Undecidability of Distributed Synthesis Problems
327
1. with bit values (0, 0), (0, 1) or (1, 0) : player E plays the same directions in both local games and checks process strategy equality, 2. with bit values (1, 1) : player E delays by one step the North turn in the second local game and, after this turn, repeatedly check that t′1 (e) = t2 (w) where t′1 is the “previous” choice of tiles made by player P1 and t2 is the “current” choice of tiles made by player P2 . Player E moves to (⊥, ⊥) if any of these checks fails or if any of the Process players chooses a tile that contains the border color #. The winning condition for the Process team is to avoid position (⊥, ⊥). This is a safety condition. Let then σ1 ⊗ σ2 be a distributed winning strategy on such a game. By checking equality with bit values (0, 0), (0, 1) or (1, 0) Environment makes sure that Process player does play the same strategy σ = σ1 = σ2 regardless of the initial bit value that he has chosen. Given then the induced quasi-tiling mσ (see Lemma 1), one can check that when bit values are (1, 1) Environment does indeed check E/W-compatibility. It follows that mσ is a tiling, infinite by the safety condition. Conversely, for any infinite tiling m such that m(0, 0) = t0 one can check that σm ⊗ σm is a winning distributed strategy. We conclude applying Theorem 5. Theorem 7 (Reachability case). The problem of finding a winning distributed strategy in a 2-process distributed game with reachability condition is Σ10 -complete. Proof. Again, clearly, this problem is Σ10 . It remains to prove that it is Σ10 -hard. In order to do so, we encode into reachability distributed games the finite tiling problem. The encoding is similar to the encoding in the proof of Theorem 6 except that (1) player E now allows players P1 and P2 to play tiles that contains the border color # and (2) the winning condition for Process team is to reach, at the end of every local play, a tile t with t(n) = #. Observing that player P will force the East-border by playing a tile t with t(e) = # makes it clear that there is a winning distributed strategy in the new (reachability) distributed game G if and only if there is a finite tiling m such that m(0, 0) = t0
6
Within and Above the Arithmetical Hierarchy
The relationship with the arithmetical hierarchy is achieved through the observation that, by Post’s Theorem, every level of the arithmetical hierarchy has a computational interpretation by means of Alternating Turing Machines [3] extended with infinite runs and weak parity acceptance conditions. Theorem 8. For every integer n > 0, a language L ⊆ Σ ∗ is Πn0 -definable (resp. Σn0 -definable) if and only if it is definable by an Alternating Turing Machine with infinitary weak parity condition with range [0, n − 1] (resp. [1, n]).
328
D. Janin
Proof. By ATM we mean here ATM with universal and existential states only (no negation states). ATM are extended with infinite runs (with infinitary conditions) in the following sense: a computation tree of an ATM is accepting if every finite branch ends in an accepting control state, and, for every infinite branches, the corresponding infinite sequence of control states satisfies the infinitary condition. Applying [3], we know that standard ATM (with reachability conditions) capture the level Σ10 of the arithmetical hierarchy. By duality, ATM with safety conditions (hence infinite runs) capture the level Π10 . For higher levels, the proof is based on the observation that alternation allows a machine to (1) guess the answer of an oracle and, at the same time, to (2) start a computation of the oracle (or its complement) that checks the guessing is correct. By construction, since no acknowledgment is expected, the resulting infinitary conditions are weak in the sense of Mostowski [9]. Post’s Theorem ensures such a construction captures, level by level, the arithmetical hierarchy. Theorem 9 (The weak case). For every integer n > 0, the problem of solving 2-process distributed weak game with Mostowski range [0, n − 1] (resp. [1, n]) is Πn0 -complete (resp. Σn0 -complete). Proof. (sketch) Upper bounds should be clear. It remains to prove the lower bound. The main idea is to encode into (winning) distributed strategy the (accepting) runs of ATM. This can be achieved as follows. At first sight, the tiling encoding defined in the previous section fails to apply here since a tiling only encode the run of a non alternating TM (say with TM configurations encoded by means of east colors of tiles in a line). However, in this encoding, somehow as in a solitaire domino game, the process team defines (playing identically in copies of local game GS,t0 ) one tiling (equivalently one accepting TM run) while player E’s role is bound to check that all required space is covered and all tiling rules are satisfied (equivalently it checks that the process team defines indeed a TM run). The idea to encode the run of an ATM is thus to let player E chooses some of the tiles, say one over two ′ . In this shift from a solitaire to a two in a line, in a modified local game G˜S,t 0 player domino like game, all branches of an ATM run are encoded by the many tilings that are produced following player E’s moves. An analogous synchronization (restriction of player E’s global moves) in a distributed game G ⊆ G˜S,t0 ⊗ G˜S,t0 can force both Environment and Process players to play only real tilings (and not quasi-tilings). As the infinitary condition of the ATM immediately transfers to an infinitary condition of the distributed game, this concludes the proof. Theorem 10 (The Büchi case). The problem of solving a 2-process (or more) distributed game with Büchi condition (or higher parity condition) is Σ11 -complete. Proof. It should be clear that solving an n-process distributed game with an arbitrary ω-regular infinitary condition is Σ11 . Conversely, we encode the
On the (High) Undecidability of Distributed Synthesis Problems
329
construction of an infinite tiling with infinitely many occurrences of color red (see Theorem 5). From the encoding of the infinite tiling problem in the proof of Theorem 6, the idea is to add in local game GS,t0 a non deterministic tree automaton [12,4] that checks that, given a local strategy σ followed by player P , the induced quasi-tiling mσ (seen as a a sub tree of the binary tree t : (e+t)∗ → S) uses infinitely many tiles with color red. Such an automaton can be defined with Büchi acceptance criterion that, in turn, defines the winning condition for the Process team.
7
Conclusion
We have established a correspondence between infinitary conditions in distributed games and levels of the arithmetical (or analytical) hierarchy. These results already hold for the 2-process case (implying undecidability in this very restricted setting). Strictly speaking, they have no application. However, a clear understanding of the source of undecidability may help, in future work, to extend the known decidable classes of distributed synthesis problem (or distributed games). Acknowledgment. Thanks to Anne Dicky for her help revising a former version of this paper.
References 1. Berger, R.: The Undecidability of the Dominoe Problem. Memoirs of the American Mathematical Society 66 (1966) 1–72 2. Bernet, J. and Janin, D.: Tree Automata and Discrete Distributed Games. In Foundation of Computing Theory, Springer-Verlag, LNCS 3623 (2005) 540–551 3. Chandra, A.K., Kozen, D.C., and Stockmeyer, L.J.: Alternation. Journal of the ACM 28 1 January 1981 114–133 4. Grädel, E., Thomas, W., and Wilke, T. (eds): Automata, Logics and Infinite Games, Springer, LNCS Tutorial 2500 (2002) 5. Harel, D.: Effective Transformations on Infinite Trees, with Applications to High Undecidability. J. ACM 33 1 (1986) 224–248 6. Kupferman, O. and Vardi, M.Y.: Synthesizing Distributed Systems. In IEEE Symp. on Logic in Computer Science (LICS) (2001) 389–398 7. Madhusudan, P.: Control and Synthesis of Open Reactive Systems. PhD Thesis, University of Madras (2001) 8. Mohalik, S. and Walukiewicz, I.: Distributed Games. In Found. of Soft. tech and Theor. Comp. Science, Springer-Verlag, LNCS 2914 (2003) 338–351 9. Mostowski, A.W.: Hierarchies of Weak Automata on Weak Monadic Formulas. Theoretical Comp. Science 83 (1991) 323–335 10. Peterson, G.L. and Reif, J.H.: Multiple-Person Alternation. In 20th Annual IEEE Symposium on Foundations of Computer Sciences (October 1979) 348–363 11. Pnueli, A. and Rosner, R.: Distributed Reactive Systems are Hard to Synthesize. In IEEE Symposium on Foundations of Computer Science (1990) 746–757 12. Rabin, M.O.: Decidability of Second Order Theories and Automata on Infinite Trees. Trans. Amer. Math. Soc. 141 (1969) 1–35
Maximum Rigid Components as Means for Direction-Based Localization in Sensor Networks⋆ Bastian Katz, Marco Gaertler, and Dorothea Wagner Faculty of Informatics, Universit¨ at Karlsruhe (TH), Germany {katz,gaertler,wagner}@informatik.uni-karlsruhe.de
Abstract. Many applications in sensor networks require positional information of the sensors. Recovering node positions is closely related to graph realization problems for geometric graphs. Here, we address the case where nodes have angular information. Whereas Bruck et al. proved that the corresponding realization problem together with unitdisk-graph-constraints is N P-hard [2], we focus on rigid components which allow both efficient identification and fast, unique realizations. Our technique allows to identify maximum rigid components in graphs with partially known rigid components using a reduction to maximum flow problems. This approach is analyzed for the two-dimensional case, but can easily be extended to higher dimensions.
1
Introduction
A common field of application for sensor networks is monitoring, surveillance, and general data-gathering [10]. Positional information is a key requirement for these applications as well as for other network services such as geographic routing. Where positioning systems like GPS are not available, node positions have to be recovered from the network structure together with a communication model – like the unit disk graph (UDG) or quasi unit disk graph (qUDG) models [7] – and probably additional information like distances or directions between communicating nodes. This obviously corresponds to graph realization problems, which target the existence and uniqueness of graph embeddings. Traditionally, distance-based localization is fairly widespread, although there is no tight characterization of how much connectivity is needed for uniqueness of realization [3] and the realization problem is known to be N P-hard for general graphs and (q)UDG [9,1]. For direction-constrained graph realization, things become easier: Albeit the corresponding realization problem for qUDG is N P-hard [2], it can be reduced to an LP for general graphs. Rigidity theory provides a characterization of subgraphs whose realizations are uniquely determined: Uniqueness of a ⋆
This work was partially supported by the German Research Foundation (DFG) within the Research Training Group GRK 1194 ”Self-organizing Sensor-Actuator Networks”, and under grant WA 654/14-3 and by EU under grant DELIS (contract no. 001907).
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 330–341, 2007. c Springer-Verlag Berlin Heidelberg 2007
Maximum Rigid Components
331
graph’s realizability with edge directions coincides with the notion of rigid components [11]. For these rigid components of a network, localization with given communication directions loses most of its hardness: The localization problem reduces to to a system of linear equations for these subgraphs [6]. There are some easy techniques to find small rigid structures in a network that work well especially in geometric graphs. It is more challenging to find a partition of a network into maximum such components. To our knowledge, no algorithm exists that exploits the fact that small rigid substructures (or bodies) are easy to compute. Moukarzel [8] proposed an algorithm for identification of rigid structures in so-called body-bar frameworks where rigid bodies are connected by (multiple) edges. This approach, like ours, is based on an earlier work of Hendrickson [3], who developed efficient algorithms for rigidity testing in the plane, later known as the pebble game [4]. While the original work from Hendrickson cannot take any advantages from rigid subgraphs that are known or easy to get, Moukarzel’s approach focuses on a very special case requiring the graph to have some structural properties. We will provide an algorithm that works on general graphs and takes full advantage of known rigid substructures. This paper is organized as follows: In Section 2, we refer to some basic notations from rigidity theory and outline some intuitive steps to find small rigid components. In Section 3, we develop a characterization of rigid subgraphs that together form larger rigid components. This leads to an algorithm to efficiently identify maximum rigid subgraphs, which is given in Section 4. We give a short explanation of how to extend this for the layout or localization problem in Section 5 and an outlook in Section 6.
2
Preliminaries
Throughout this paper, we model the network topology as an undirected graph G = (V, E) with a given embedding p : V → R2 . We sometimes refer to an edge {v, u} as (v, u) assigning an arbitrary orientation. We do not assume this graph to have any specific properties except for a bounded degree1 and connectivity. To recover the true embedding p, we suppose we are given the directions of all edges as p(v) − p(u) . ∀(u, v) ∈ E : αp (u, v) := |p(v) − p(u)| Even with fixed edge directions, a graph can still have several embeddings that respect these constraints. Embeddings that yield the same edge directions are called parallel embeddings. No set of constraints can determine a graph’s embedding unambiguously in the sense that there are no parallel embeddings at all: For every c > 0 and x ∈ R2 the embedding p′ given by p′ (v) := cp(u) + x yields the same direction constraints as p regardless of the edges involved. We call these embeddings similar that differ only by translation and scaling and say 1
Which will only become important for the runtime analysis.
332
B. Katz, M. Gaertler, and D. Wagner
that a graph’s embedding is uniquely determined, if all embeddings that have the same edge directions are similar. Whereas finding realizations for directionconstrained graphs in general leads to an LP ([2]), the problem can be reduced to a system of linear equations for graphs that are guaranteed to have a unique realization [6]. Fortunately, there is a simple characterization of such graphs at least if the underlying embedding is in general position. Graphs that allow only similar parallel embeddings for embeddings in general position are called generically parallel rigid or, here for short, parallel rigid. Theorem 1 (Laman’s theorem for parallel embeddings[11]). A graph G = (V, E) is generically parallel rigid if and only if it contains a set of edges E ′ ⊆ E with |E ′ | = 2|V | − 3 such that for all subsets E ′′ ⊂ E ′ |E ′′ | ≤ 2|V (E ′′ )| − 3 .
(1)
We call a set of edges E ′ independent, if it does not contain a subset that violates (1)2 . Note that Theorem 1 also implies that a graph G = (V, E) with a sufficient number of edges (|E| ≥ 2|V | − 3) edges must either be rigid or have a subgraph S = (VS , ES ) ⊂ G with |ES | > 2|VS | − 3 (see Figure 1).
Fig. 1. Two graphs with |E| = 2|V | − 3. The left graph has a subgraph with |E ′ | > 2|V (E ′ )| − 3, a K4 , and thus parallel drawing (resize the right triangle by stretching the dashed edges). The right graph is parallel rigid.
Apparently, parallel rigidity coincides in the plane with the ’standard’ rigidity, i.e., the property of a graph not to allow continuous deformations without changing edge lengths [11], which is necessary, but by far not sufficient for unambiguous distance-based realization [3]. In three dimensions however, the advantages of edge directions over lengths for our purposes become even clearer: Here we lack a combinatorial characterization of standard rigidity, whereas the theory for parallel drawings can easily be extended to any dimension: Theorem 1 holds analogously for embeddings p : V → Rd by replacing (1) by (d − 1)|E ′′ | ≤ d|V (E ′′ )| − (d + 1) [11]. As a consequence, the following approach works similarly for the three-dimensional case. We start with the observation that there are rigid subgraphs that are easily found: First, an edge certainly is a rigid subgraph, second, a rigid subgraph can 2
This notion of independency refers to the matroidal character; similarly, a connected graph could be characterized as a graph containing a spanning tree, i.e., a set E ′ ⊂ E of |E ′ | = |V | − 1 edges, such that no subsets E ′′ ⊂ E ′ is a cycle, i.e., |E ′′ | ≤ |V (E ′′ )| − 1.
Maximum Rigid Components
333
4 1
0
2
3
(B)
0 (A)
(C)
(D)
(E)
Fig. 2. Rigid subgraphs can be found via triangulation (A). If two rigid subgraphs (grey) share an edge, the union is also rigid (B). The same holds, if two rigid subgraphs share two nodes (C). These approaches still leave more complex configurations where three (D) or more bodies form a larger rigid body (E).
be augmented by adding a node being adjacent to two nodes of that subgraph (triangulation), and third, two rigid subgraphs that overlap in at least two nodes together form a rigid subgraph. Obviously, these techniques are best suited for geometric graphs with a high locality, where connected nodes have a high probability to share neighbors. Figure 2 (A)-(C) show structures where these means suffice to find out that the graph is rigid; (D) and (E) show constellations, where three or more bodies together form a rigid component. In a word, there are some quite easy techniques for the identification of rigid substructures, which do not end with maximum such components; on the other hand, one expects to speed up the identification of maximum rigid subgraphs.
3
Maximum Rigid Components
Knowing rigid substructures indeed can help to identify the maximum rigid components much faster. In this section, we will present an algorithm to obtain maximum rigid components from a given graph G which is already partitioned into rigid components, for example using edge- and node-overlappings. By a partition, we here refer to a set of rigid subgraphs that cover all of G’s edges disjointly but may have common nodes. We call this a Laman partition (see Figure 3): Definition 1 (Laman partitions). Let G = (V, E) be a simple undirected graph and S be a set of pairwise edge-disjoint, generically rigid subgraphs. 1. The partition graph G(S) := (V (S), E(S)) is defined as the union of the rigid components graphs, i. e., E . V E(S) := V (S) := (V,E)∈S
(V,E)∈S
334
B. Katz, M. Gaertler, and D. Wagner v2 S 1
S 5 S 2 v3
v1 v4 S 3
S 4
Fig. 3. A Laman partition with five bodies. Nodes have redundancies rdS (v1 ) = rdS (v4 ) = 2, rdS (v2 ) = rdS (v3 ) = 1, but, for example, rdS ′ (v4 ) = 1 for S ′ = {S1 , S2 , S3 , S5 }. Theorem 2: The surplus of S ′ is sp (S ′ ) = 2 · rd (S ′ ) − 3(|S ′ | − 1) = 2 · 5 − 3(4 − 1) = 1. {S1 , S2 , S3 } is independent and no subset S ′′ ⊆ S ′ with S5 ∈ S ′′ has more surplus.
The set S is also called a Laman partition (of G(S)). It is rigid, if G(S) is rigid and it is independent, if there is no S ′ ⊂ S which is rigid. 2. The redundancy of a node v ∈ V is defined as rdS (v) := | {(V, E) ∈ S | v ∈ V } | − 1 . The notion is extended to rigid partitions by rd (S) := v∈V (S) rdS (v). We denote the redundantly used nodes as R(S) := {v ∈ V (S) | rdS (v) > 0}. 3. The surplus of edges in a graph H = (V ′ , E ′ ) with respect to Laman’s theorem is denoted by sp (H) := |E ′ | − 2|V ′ | + 3. We will also write sp (S) for sp (H(S)). Note that a graph H has at most E − sp (H) independent edges. The simplest Laman partition of a graph is a partition into |E| graphs which all consist of exactly one edge. Although this will work as well, in many scenarios, we have significantly less rigid bodies. Without loss of generality, we assume furthermore that in such a Laman partition every graph S = (V, E) has exactly |E| = 2 · |V | − 3 edges (i.e., sp (S) = 0), where E is independent. Since these graphs are rigid, they contain such a set of edges and probably some more which we simply ignore. The approaches from [3,8] have in common that they manage a growing independent set of edges. Due to the matroidal character of the problem, an edge can greedily be chosen to join this set if there is no dependency to present edges. Rigid areas of the network can be identified en passant. When talking about rigid bodies, we lose some of this ease, since a subgraph can have both, edges that are important for rigidity as well as excessive ones. But the greedy approach still works: If we go through the bodies of a Laman partition and merge bodies as soon as there are bodies that form a larger rigid structure, we end up with a partition into maximum rigid components. Unfortunately, it is not sufficient to look for bodies, that together have a sufficient number of edges. A counterexample is given in Figure 3, where sp ({S1 , . . . , S5 }) = 0, but whereas
Maximum Rigid Components
335
the bodies S1 , S2 , S3 , S5 have one edge more than needed, the edge S4 can be stretched without affecting the other subgraphs. We start with the observation that Laman partition with sufficiently overlapping bodies must have enough edges to fulfill Laman’s theorem: Lemma 1. Let S be a rigid partition. Then sp (S) = 2 · rd (S) − 3 · (|S| − 1). Proof. As the graphs in a rigid partition have disjoint edge sets, the edges of G(S) just sum up as |E(S)| = (V,E)∈S |E| = (V,E)∈S (2 · |V | − 3), whereas the nodes were counted rdS (v) + 1 times. Thus, |V (S)| = (V,E)∈S |V | − rd (S) holds which results in the following equation: sp (S) = |E(S)| − 2 · |V (S)| + 3 |V | − rd (S) + 3 (2 · |V | − 3) − 2 · = (V,E)∈S
(V,E)∈S
= 2 · rd (S) − 3 · (|S| − 1) . From the remark to Laman’s theorem follows that a Laman partition S with sp (S) ≥ 0 at least contains a rigid subset. Adapting the iterative scheme, we will use the following theorem to maintain an independent rigid partition merging bodies whenever a rigid subset appears: Theorem 2. Let S be a rigid partition and S ⋆ ∈ S such that S − S ⋆ is independent. Then S ′ ⊆ S is rigid if and only if for all non-empty S ′′ ⊆ S ′ that contain S ⋆ the inequality sp (S ′ ) ≥ sp (S ′′ ) holds. Proof. First assume that S ′ is rigid. If there was any subset S ′′ of S ′ with sp (S ′ ) < sp (S ′′ ), one could not choose |E(S ′ )| − sp (S ′ ) edges from E(S ′ ) without choosing more than |E(S ′′ )|−sp (S ′′ ) from E(S ′′ ). Therefore, any 2|V (S)|−3 edges from G(S) cannot be independent. If on the other hand for all S ′′ ⊂ S ′ with S ⋆ ∈ S ′′ the inequality sp (S ′ ) ≥ sp (S ′′ ) holds, then we know that sp (S ′ ) ≥ 0, as it holds for all graphs, i. e., sp (H) = 0 for all H ∈ S. Suppose that S was not rigid. According to Laman’s theorem, there must be a rigid subgraph G′ = (V ′ , E ′ ) G(S ′ ) with |E ′ | > 2|V (E ′ )| − 3. This graph G′ spans over at least 2 graphs in S which also form a rigid graph with at least one dependent edge. All those non-trivial rigid subsets S ′′ include S ⋆ ; thus their union S max forms the unique maximal rigid subgraph G(S max ). But we’re able to choose |E(S ′ )| − sp (S ′ ) edges from E(S ′ ) even if we restrict ourselves to take only a set of independent edges from E(S max ) where we only have to leave out sp (S max ) ≤ sp (S ′ ). These 2|V (S ′ )| − 3 edges are either independent, so that S ′ must be rigid, or there still is a subgraph with G′ = (V ′ , E ′ ) G(S ′ ) with |E ′ | > 2|V (E ′ )| − 3 which is not covered by S max . Both cases are inconsistent with either the assumptions or the definition of S max . The detection of subsets with this property is not trivial. We present an efficient algorithm to solve this task by formulating it as a maximum-flow problem.
336
B. Katz, M. Gaertler, and D. Wagner
Definition 2. For a rigid partition S and a particular graph S ⋆ ∈ S such that S − S ⋆ is an independent rigid partition, the bipartite intersection network B(S, S ⋆ ) = (R(S), S, A, κ, b) is given by A = {(v, G) ∈ R(S) × S | v ∈ G}
κ≡ 2
b(v) = 2 · rdS (v)
b(G) =
3 : G = G⋆ 0 : G = G⋆
A flow then is a function f : A → N with f (a) ≤ κ(a) and f (v, G) ≥ 0 . f (v, G) ≥ 0, bf (G) := b(G) − bf (v) := b(v) − (v,G)∈A
(v,G)∈A
Definition 3. Let S be a Laman partition, S ⋆ ∈ S such that S − S is an independent Laman partition and f a maximal flow in B(S, S ⋆ ). Then a subset S ′ ′ is called saturated iff ∀S ∈ S : (v,S) f (v, S) = b(S) and closed iff ∀(v, S), (v, S ′ ) ∈ A : S ∈ S ′ ∧ f (v, S) > 0 ∧ f (v, S ′ ) < 2 =⇒ S ′ ∈ S ′ ,
i.e., there is no path from a contained graph to one that isn’t by traversing edges in the residual network. For any set of graphs S ′ , the (minimal) closure is denoted by S ′ . Analogously, the closure of a set of nodes R ⊆ R(S) is defined as R := {G ∈ S | f (v, G) < 2}. 0 (3)
0 (3)
0 (3)
1 (3)
0 (0)
S1
S2
S3
S4
S5
2
1
1
2
1
2
2
0
0 0
v1
v2
v3
v4
0 (4)
1 (2)
0 (2)
0 (4)
Fig. 4. The intersection network of the graph from Figure 3 (with S ⋆ = S5 ) with a maximum flow f . Nodes and graphs are annotated with bf (·) (b(·)), edges with f (·).
In Figure 4, the intersection network of the example 3, {S1 , S2 , S3 , S5 } is a maximum closed and saturated subset. {S2 , S3 , S5 } is a smaller closed and saturated set whereas {S1 , S3 , S5 } is not, as S2 can be reached from S1 by traversing (v1 , S1 ) and (v1 , S2 ). The following two lemmas ensure that for a maximum flow firstly any saturated and closed set is rigid, and secondly as long as a rigid set is contained, there has to be a saturated and closed set:
Maximum Rigid Components
337
Lemma 2. If for any valid flow f in a rigidity network B(S, S ⋆ ) there is a saturated, closed set of graphs S ′ then the following properties hold: f (v, S ′ ), is 2rd (S ′ ) − bf (R(S)). 1. The flow to S ′ , (v,S)∈A,S∈S ′
2. The graph S is contained in S ′ , i. e., S ⋆ ∈ S ′ . 3. The set S ′ is rigid. ⋆
Proof. We prove these properties one at a time: 1. As f (v, S) = 2 holds for all v ∈ S ′ and S ∈ S ′ , we obtain the following equalities: ⎞ ⎛ ⎝ 2⎠ f (v, S) − f (v, S) = v∈R(S ′ )
(v,S)∈A,S∈S ′
=
v∈R(S ′ )
=
(v,S)∈A
⎛
(v,S)∈A,S ∈S ′
⎝2rdS (v) − bf (v) −
v,S∈A,S ∈S ′
(2rdS ′ (v) − bf (v))
⎞
2⎠
v∈R(S ′ )
= 2rd (S ′ ) − bf (R(S ′ )) . 2. As the flow saturates the graphs in S ′ , (v,S)∈A,S∈S ′ f (v, S ′ ) ≥ 3(|S ′ | − 1). Thus, 2rd (S ′ ) ≥ 2rd (S ′ ) − bf (R(S ′ ) = 3(|S| − 1). Therefore, S ′ at least contains a rigid subset, which then must contain S ⋆ . 3. With theorem 2 it is sufficient to show that for all subsets S ′′ ⊆ S ′ that include S ⋆ sp (S ′ ) ≥ sp (S ′′ ). For a closed, saturated subset S ′′ ⊆ S ′ is sp (S ′′ ) = bf (R(S ′′ )) since (see considerations above) 3(|S ′′ | − 1) =
(v,S)∈A,S∈S ′′
f (v, S) = 2rd (S ′′ ) − bf (R(S ′′ ))
′′
⇐⇒ 3(|S | − 1) − 2rd (S ′′ ) = −bf (R(S ′′ )) .
=−sp(S ′′ )
For a saturated, but not necessarily closed set S ′′ ∋ S ⋆ , this becomes sp (S ′′ ) ≤ bf (R(S ′′ )). Therefore, when S ′ is a saturated, closed subset with respect to f , and S ′′ ⊂ S ′ such that S ⋆ ∈ S ′′ the following inequality holds: sp (S ′′ ) ≤ bf (R(S ′′ )) ≤ bf (R(S ′ )) = sp (S ′ ) . Lemma 3. Let S be a rigid partition, S ⋆ ∈ S such that S −S ⋆ is independent. If S contains a non-trivial rigid subset and S ′ is an inclusion-maximal rigid subset, then for any maximum flow in B(S, S ⋆ ), S ′ is saturated and closed.
338
B. Katz, M. Gaertler, and D. Wagner
Proof. Let S ′ be a non-trivial, inclusion-maximal rigid subset of S. As all rigid subsets overlap in S ⋆ , S ′ is well-defined as the union of all rigid subsets of S. Suppose, S ′ was not closed or saturated with respect to a maximum flow f . Then bf (R(S ′ )) > sp (S ′ ) and therefore Rf = {v ∈ R(S ′ ) | bf (v) > 0} must be nonempty. But the closure S ′′ := Rf is saturated. As S ′′ is rigid, S ′′⊆S ′ . Furthermore, by this choice we assure that bf (S ′′ ) = bf (S ′ ). However, this contradicts with bf (S ′′ ) = sp (S ′′ ) ≤ sp (S ′ ) < bf (R(S ′ )).
4
Implementation
Together, the Lemma 2 and 3 are the foundation for our algorithm that finds maximum rigid components starting with an arbitrary Laman partition S. It is given in pseudo-code in Algorithm 1. Algorithm 1. MergeRigidComponents(S)
1
2
S I ← ∅; while S = ∅ do choose S ⋆ from S; S ← S − S⋆; while ∃S ∈ S I : |V (S ⋆ ) ∩ V (S)| > 1 do S I ← S I − S; S ⋆ ← G({S, S ⋆ }); f ← maximum flow in B(S I ∪ {S ⋆ }, S ⋆ ); S ′ ← maximum closed and saturated set with respect to f ; if |S ′ | > 1 then S ⋆ ← G(S ′ ); S I ← S I \ S ′; S I ← S I ∪ {S ⋆ };
First, this algorithm clearly ensures S I to be the unique partition into maximum rigid subgraphs, since as an invariant, S I is independent: Before we add a graph S ⋆ to S I , we find the maximum rigid subset of S I ∪ {S ⋆ }, remove the involved graphs from S I and add the graph formed by them to S I . For this to hold (and thus for the correctness of the algorithm), we do not need the steps marked with ’1’, which will play an important role for the runtime analysis. Second, this algorithm runs in O(n + l log l + k 2 ) for k := |S| and l := |R(S)|. We first iterate over all graphs in S and all contained nodes to find the nodes from R(S) and to annotate the graphs with their respective intersection nodes. With a bounded node degree of ∆, this can be done in O(n) as no node can be part of more than ∆ edge-disjoint graphs. This annotation can be kept up-todate during merging operations by processing only the annotations of the smaller graph (in terms of intersection nodes). This can be done with an overall effort of O(l log l) steps. Now we have k iterations of the outer ’while’-loop. For every S ⋆ ∈ S, we first test, whether there is a graph in S I which has two nodes with S ⋆
Maximum Rigid Components
339
in common. This check can at most be performed 2k − 1 times over all, k times failing (once for every S ⋆ ) and at most k − 1 times succeeding and combining two graphs, i. e., reducing the overall number of graphs. For such a check, at most k intersection nodes must be considered. The k-th intersection node at the latest is the second common node with one of the other graphs. The analysis of the second (’2’-) part of the algorithm is more cumbersome. We therefore first analyze the structure of B(S I ∪ {S ⋆ }, S ⋆ ). Every node in R(S I ∪ {S ⋆ }) has at least two incident edges in A (recall the structure of the intersection network from Definition 2). Less than 3/2|S I |−3/2 of them can have more than two edges to graphs in S I , as every such node has rdS I (v) > 0 and S I is independent. On the other hand, nodes with only one edge to graphs in S I must have an edge to S ⋆ . This can only apply to at most |S I | nodes. Thus, we have less than 5/2|S I | − 3/2 ∈ O(k) intersection nodes. Similarly, as only |S I | edges can be incident to S ⋆ , and for every node all but one incident edge corresponds to a redundant use, we have rd S I > |A|−|S I |−5/2|S I |−3/2. Since rd S I < 3/2|S I | − 3/2 holds, we also get |A| < 2 · |S I | ∈ O(k). Furthermore we know, that a maximum flow can have at most a value of 3|S I |. Na¨ıvely implemented, this still could lead to a complexity of Θ(k 2 ) per solved maximum flow problem. Fortunately, there is an easy way to re-use solutions from the preceding iteration. If fi is a valid flow in B(SiI , Si⋆ ), the intersection I ⋆ network of the ith iteration, then a valid flow for the network B(Si+1 , Si+1 ) can be constructed in O(k) by
fi+1 (a) =
fi (a) , if a ∈ Ai . 0 , otherwise
This flow fi+1 cannot violate any of the conditions as – no edge a has fi+1 (a) > 2 if this held for fi , I gets more flow than 3 if no graph S ∈ SiI did, and – no graph S ∈ Si+1 I ⋆ – every node v ∈ R(Si+1 ∪ {Si+1 }) is either included in the same set of graphs as in the i-th iteration and has therefore the same value for b(v) which is ⋆ . In this case, there is an edge not violated by fi+1 , or it must be in Si+1 ⋆ a = (v, Si+1 ) ∈ Ai+1 with fi+1 (a) = 0. It then must have bf (v) ≥ 0 by construction. Although the changes of S I look quite complex, we only have k additions of a new graph, and by the re-use of flows, the flow accepted by any graph is non-decreasing. Therefore, we have at most 3k successful augmenting steps and k failing tests in flow maximization over all, which can then be done in O(k 2 ). Figure 5 depicts how the identification of rigid subgraphs reduces the node density that is needed in order to achieve a certain coverage of the largest localized component. Similarly, the number of components for a fixed node density decreases compared to simple techniques such as triangulation and overlapping.
B. Katz, M. Gaertler, and D. Wagner
0.6 0.4
largest connected comp. edge−overlapping + triangulation node−overlap. + triangulation maximum rigid subgraphs
0.0
0.2
coverage (nodes)
0.8
1.0
340
2.0
2.5
3.0
3.5
4.0
4.5
5.0
node density (nodes per unit square)
Fig. 5. The coverage of the largest component to be localized by edge- and nodeoverlapping triangulation and the identification of maximum rigid components compared with the largest connected component. Evaluated on random 0.5-quasi-UnitDisk-Graphs; node density refers to the number of nodes per unit square.
5
Layout
Our technique iteratively finds maximum rigid components of a graph, but it does not maintain unique valid embeddings for these components. In order to calculate realizations of the identified components, it is sufficient to always have realizations of the graphs in S. This is trivial for triangulation and graphs that are constructed by merging two overlapping graphs. If three or more graphs are merged, consistent size ratios can be derived from solving a linear equation system. Here, it is sufficient to consider some reduced graph with artificial edges only between intersection nodes (see Figure 6). Here, the iterative approach turns out to have a big advantage. Solving these problems for many merging steps, each with a small number of components to be merged, drastically reduces the effort spent for the linear equation systems from worst-case Θ(k 3 ) for solving a global equation system to O(k) for solving Θ(k) small equation systems with a constantly bound number of components to be merged. In our extensive experiments on random qUDG, only the latter case occurred, usually for very small bounds, making the additional costs for the layout calculation negligible. A more elaborate explication as well as experimental results can be found in [5,6].
Fig. 6. Artificial edges used to determine edge length ratios
Maximum Rigid Components
6
341
Conclusion and Outlook
In this paper, we presented an algorithm that fully exploits rigidity theory for the direction-constrained network localization problem. Unlike for distance-based localization, this theory provides a full characterization of rigid network structures that are sufficient for this task which can be extended to the R3 or higher dimensions. Our algorithm not only considers node- and edge-overlapping components but also identifies maximum rigid components. This can be seen either as a stand-alone solution for partial localization (with the guarantee to localize substructures as far as uniquely possible), or as a speed-up technique for approaches that rely on Linear Programming. Depending on the point of view, the benefits compared to standard techniques like triangulation or overlapping are much larger localized components or much smaller LP instances. The iterative approach in almost all scenarios reduces the complexity by applying the costly operations only to necessary and in most cases very small subproblems. Although the depicted algorithm relies on exact directions, identification of rigid subgraphs can also be a foundation for iterative localization with noisy direction constraints together with local optimization ([5,6]).
References 1. Aspnes, J., Goldenberg, D., and Yang, Y.: On the Computational Complexity of Sensor Network Localization. In Proceedings of the First International Workshop on Algorithmic Aspects of Wireless Sensor Networks (2004) 2. Bruck, J., Gao, J., and Jiang, A.: Localization and Routing in Sensor Networks by Local Angle Information. New York, NY, USA, ACM Press, (May 2005) 181–192 3. Hendrickson, B.: Conditions for Unique Graph Realizations. SIAM J. Comput. 21 1 (1992) 65–84 4. Jacobs, D. and Hendrickson, B.: An Algorithm for Two Dimensional Rigidity Percolation: The Pebble Game (1997) 5. Katz, B.: Richtungsbasierte Lokalisierung von Sensornetzwerken (German). Master’s Thesis (2006) 6. Katz, B., Gaertler, M., and Wagner, D.: Maximum Rigid Components as Means for Direction-Based Localization in Sensor Networks. Technical Report 2006-17, Universit¨ at Karlsruhe (2006) 7. Kuhn, F., Wattenhofer, R., and Zollinger, A.: Ad-Hoc Networks Beyond Unit Disk Graphs. In DIALM-POMC’03: Proceedings of the 2003 Joint Workshop on Foundations of Mobile Computing, New York, NY, USA, ACM Press (2003) 69–78 8. Moukarzel, C.: An Efficient Algorithm for Testing the Generic Rigidity of Graphs in the Plane. In J. Phys. A: Math. Gen. 29 (1996) 8079 9. Saxe, J.B.: Embeddability of Weighted Graphs in k-Space is Strongly NP-Hard. In Proc. 17th Allerton Conf. Commun. Control Comput. (1979) 480–489 10. Tubaishat, M. and Madria, S.: Sensor Networks: An Overview. In IEEE Potentials 22 2 (2003) 20–23 11. Whiteley, W.: Matroids from Discrete Applied Geometry. In Matroid Theory, AMS Contemporary Mathematics (1996) 171–311
Online Service Management Algorithm for Cellular/WALN Multimedia Networks Sungwook Kim1 and Sungchun Kim2 1
Department of Computer Science, Sogang University Shinsu-dong 1, Mapo-ku, Seoul, 121-742, South Korea
[email protected] 2 Department of Computer Science, Sogang University Shinsu-dong 1, Mapo-ku, Seoul, 121-742, South Korea
[email protected]
Abstract. Efficient network management system is necessary in order to provide QoS sensitive multimedia services while enhancing network performance. In this paper, we propose a new online network management algorithm based on adaptive online control strategy. Simulation results indicate the superior performance of our proposed algorithm under widely varying diverse traffic loads.
1 Introduction Based on the anywhere and anytime service concept, it is becoming important that users can move among various networks seamlessly. Therefore, current trends show that cellular networks and WLANs will co-exist and be complementary to provide seamless multimedia service. The network architecture based on the inter-dependence between a WLAN and cellular networks can be defined as overlay network [1]-[4]. Multimedia networks should take into account the prioritization among different multimedia traffic services. Based on different tolerance characteristics, class I data type has higher priority than class II data type during network operations [5]-[6]. With an enormous growth of multimedia service, network congestion has become more apparent. Network congestion occurs when the aggregate traffic volume at an input link is higher than the capacity of the corresponding output link. To avoid global synchronization, congestion control mechanisms should detect network congestion earlier and sends feedback to the end-nodes [7]-[8]. In this paper, we focus on the adaptive QoS control in cellular/WLAN interworking taking into account the congestion control and reservation policies. An algorithm employing online computations is called an online algorithm. The term ‘online computation problem’ refers to decision problems where decisions must be made in real time based on past events without information about the future. In wired/wireless network, the traffic patterns and future arrival rate of requests is generally not known. Furthermore, the fact that traffic patterns can vary dramatically over short periods of time makes the problem more challenging. Therefore, online algorithms are natural candidates for the design of control scheme of network management. Optimal offline algorithms are unrealizable for network management because it needs full knowledge of the future for an online problem [5]. Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 342 – 346, 2007. © Springer-Verlag Berlin Heidelberg 2007
Online Service Management Algorithm for Cellular/WALN Multimedia Networks
343
Earlier work reported in [3] and [4] has also considered cellular/WLAN interworking management. Both schemes are designed to effectively improve the network performance, at the same time, provide QoS support for higher priority service. However, these existing schemes have several shortcomings, as described in Section III. Compared to these schemes, we can see that our proposed online algorithm is quite appropriate and attains better performance for multimedia overlay network environment.
2 Proposed Network Management Algorithms In this section, we develop a control scheme based on adaptive online approach. Our proposed online scheme provides a coordination paradigm by employing reservation and buffer management mechanisms. For class I traffics, bandwidth reservation is needed to accommodate strict delay limited services. For class II traffics, buffer management strategy is required for tolerable delay but strict packet loss services. 2.1 Reservation Mechanism Bandwidth reservation technique is to reserve some network capacity for higher priority class I traffic service. For a given traffic load, there is an optimal reservation amount. But this amount dynamically varies with the network traffic. To determine the optimal reservation amount adaptively, we partition the time-axis into equal intervals of length unit_time. Our proposed online algorithm adjusts the amount of reserved bandwidth (ResB) based on real time measurements during every unit_time. To maintain the reserved bandwidth close to the optimal value, we define a traffic window, that is used to keep the history of class I traffic requests (Wclass_I). The traffic window is of size [tc - tclass_I, tc], where tc is the current time and tclass_I is the window length, and this size can be adjusted in time steps equal to unit_time. If the class I call dropping probability (CDP) for handoff service is higher (lower) than its predefined target probability (Pclass_I), the traffic window size is increased (decreased). The values of ResB can be estimated as the sum of requested bandwidths by handoff class I calls during the traffic window:
Res B =
∑ (B
i
× Ni )
(1)
i ∈ Wclass _ I
where N i and Bi are the number of handoff requests and the corresponding bandwidths of data type i, respectively. Therefore, by using this traffic window, we can dynamically adjust the amount of the ResB at every unit_time, which is more responsive to changes in the network condition after the bandwidth has been reserved. 2.2 Buffer Management Mechanism In contrast to class I traffic services, class II services do not need QoS guarantees. Therefore, instead of call-admission rules, congestion control mechanisms are required for class II traffic management in multimedia networks [7]-[8]. Our proposed QoS control scheme differentiates between class I and class II traffic. When network congestion occurs, we attempt to provide a ‘better effort’ service for class II traffic
344
S. Kim and S. Kim
while maintaining the QoS of the call-admission controlled class I services and can achieve a high throughput and a low average delay. In order to cope with the congestion problem, in this paper, we propose an online buffer management mechanism. This mechanism tries to keep the queue size around the target buffer occupancy to prevent the global synchronization. Therefore, under traffic congestion situation, router drops arriving packets probabilistically [7]-[8]. This strategy can detect network congestion earlier so as to utilize bandwidth more efficiently. Moreover, during inter-network vertical handoff process, packet losses can also occur. To overcome this packet loss problem, the sending packets are temporarily stored at the buffer. And then, after the address of the new network is informed, the buffered packets are forwarded to the new network. This packet buffering technique can make handoffs be successful by recovering lost packets during the handoff. In order to satisfy the above requirements, we propose Online Buffer Management (OBM) mechanism. With the aim of the adaptive buffer management, our OBM mechanism defines two control parameters and adjusts these values in an online manner. The parameters are the queue range (Qr) and the packet marking probability (Mp): Qr is a threshold for packet buffering, and Mp is probability to randomly drop arriving class II data packets, which can prevent the average queue size be increased abruptly. For a seamless handoff, our mechanism can ensure that the amount of class I handoff packet buffering is reached to Qr. In this paper, Qr is decided to be equal to the current ResB. By inspecting the current reserved bandwidth, Qr value can also be adaptively adjusted at every unit_time. In parallel with Qr adaptively control, the OBM mechanism also dynamically adjust Mp value. The uncertainty of the future traffic makes it impossible to optimally decide Mp. Therefore, we also treat the Mp adjustment as an on-line decision problem. In our OBM mechanism, three system parameters are used to determine Mp – the maximum queue length (ML), current queue length (L) and Qr. Based on these parameters, Mp is obtained as
MP =
L − Qr ML − Qr
(2)
where L is used as the main indicator for determining Mp. At every unit_time, Mp is adaptively adjusted by considering the current queue conditions. If L is greater than the total buffer size (T) indicates that the buffer does not have any space for the incoming packets. Therefore, all arriving class II data packets should be dropped (Mp = 1). When L is less than Qr (0 < L < Qr), the network situation is considered congestion free and no arriving packets are dropped (Mp = 0). If L is greater than Qr, but less than the total buffer size (Qr < L < T), we set Mp by using equation (2) to drop congested packets in a randomized manner. The main steps of our proposed QoS control scheme are follows: • At every unit_time, our QoS control scheme monitors the current class I CDP and then adjusts the traffic window size accordingly • Traffic window sizes are defined as integer multiples of unit_time in this paper • If the current CDP is higher (lower) than the Pclass_I, the traffic window size is increased (decreased) in steps equal to unit_time • Based on the size of the traffic window, we adjust ResB and Qr at every unit_time
Online Service Management Algorithm for Cellular/WALN Multimedia Networks
345
• Every unit_time, Mp are also adaptively adjustable - If L is greater than T, we set Mp = 1 : all arriving class II data packets should be dropped - If L is less than Qr (L < Qr), we set Mp = 0 : no arriving packets are dropped - If L is between Qr and T (Qr < L < T), we set Mp given by (2)
3 Simulation Experiments In this section, we evaluate the performance of our proposed algorithm using a simulation model. Based on this simulation model, we compare the performance of our algorithm with other existing schemes [3]-[4]. Fig. 1 and 2 shows the performance comparison for all traffic services in terms of CBP and CDP of real-time (class I) data traffic services. When the call-arrival rate is low (below 0.5), the performance of the three schemes is identical. This is because all three schemes have enough bandwidth to accept the requested calls. As the call-arrival rate increases, the average amount of unused bandwidth decreases. Thus, new-call requests are likely to be rejected and CBP increases, but the CDP of handoff calls quickly settles down due to bandwidth reservation. From the simulation results we obtained, it can be seen that our algorithm, in general, performs better than the other existing schemes from low to heavy traffic load distributions. This feature is highly desirable to provide better network efficiency.
4 Summary and Conclusions In this paper, we propose online adaptive network management algorithms for multimedia overlay networks. Our algorithm is able to resolve conflicting QoS criteria while ensuring efficient network performance. In addition, our online approach has low complexity, making it practical for real overlay network operations. Performance evaluation results indicate that our algorithm maintains a well-balanced network performance in widely different traffic-load situations. 1
0.5
0.9
0.45
Call Blocking Probability (Class I)
0.8
Call Dropping P robability (Clas s I)
Our Framework RMI Scheme ALBCA Scheme
0.7 0.6 0.5 0.4 0.3
0.4 0.35 0.3 0.25 0.2 0.15
0.2
0.1
0.1
0.05
0
0 0
0.5
1 1.5 2 Offered Load (Call Arrival Rate)
2.5
3
Fig. 1. Call Blocking Probability (class I)
Our Framework RMI Scheme ALBCA Scheme
0
0.5 1 Offered Load (Call Arrival Rate)
1.5
Fig. 2. Call Dropping Probability (class I)
346
S. Kim and S. Kim
References 1. Stemm, M. and Katz, R.H.: Vertical Handoffs in Wireless Overlay Networks. ACM Mobile Networking. (MONET) 3 4 (1998) 335-350 2. Badis, H. and A1 Agha, K.: An Efficient Mobility Management in Wireless Overlay Networks. PIMRC 2003 3 (Sep. 2003) 2500-2504 3. Song, W., Jiang, H., Zhuang, W., and Shen, X.: Resource Management for QoS Support in Cellular/WLAN Interworking. IEEE Network 19 5 (2005) 12-18 4. Dahlberg, T. and Jung, J.: Survivable Load Sharing Protocols: A Simulation Study. Wireless Networks 7 3 (2001) 283-296 5. Kim, S. and Varshney, P.K.: An Integrated Adaptive Bandwidth Management Framework for QoS Sensitive Multimedia Cellular Networks. IEEE Transaction on Vehicular Technology (May 2004) 835- 846 6. Kim, S. and Varshney, P.K.: An Adaptive Bandwidth Allocation Algorithm for QoS Guaranteed Multimedia Networks. Computer Communications 28 (Oct. 2005) 1959-1969 7. Feng, W., Kandlur, D., Saha, D., and Shin, K.: Blue: An Alternative Approach To Active Queue Management. Proc. of NOSSDAV 2001 (June 2001) 41-50 8. Aweya, J., Ouellette, M., Montuno, D.Y., and Chapman, A.: Enhancing TCP Performance with a Load-Adaptive RED Mechanism. International Journal of Network Management 11 1 (2001) 31–50
A Simple Algorithm for Stable Minimum Storage Merging Pok-Son Kim1,⋆ and Arne Kutzner2 1
Kookmin University, Department of Mathematics, Seoul 136-702, Rep. of Korea
[email protected] 2 Seokyeong University, Department of E-Business, Seoul 136-704, Rep. of Korea
[email protected]
Abstract. We contribute to the research on stable minimum storage merging by introducing an algorithm that is particularly simply structured compared to its competitors. The presented algorithm performs n + 1)) comparisons and O((m + n) log m) assignments, where O(m log( m m and n are the sizes of the input sequences with m ≤ n. Hence, according to the lower bounds of merging the algorithm is asymptotically optimal regarding the number of comparisons. As central new idea we present a principle of symmetric splitting, where the start and end point of a rotation are computed by a repeated halving of two search spaces. This principle is structurally simpler than the principle of symmetric comparisons introduced earlier by Kim and Kutzner. It can be transparently implemented by few lines of Pseudocode. We report concrete benchmarks that prove the practical value of our algorithm.
1
Introduction
Merging denotes the operation of rearranging the elements of two adjacent sorted sequences of size m and n, so that the result forms one sorted sequence of m + n elements. An algorithm merges two adjacent sequences with minimum storage [1] when it requires O(log2 (m + n)) bits additional space at most. It is regarded as stable, if it preserves the initial ordering of elements with equal value. There are two significant lower bounds for merging. The lower bound for the number of assignments is m + n because every element of the input sequences can change its position in the sorted output. As shown by Knuth in [1] the lower n + 1)), where m ≤ n. bound for the number of comparisons is Ω(m log( m The Recmerge algorithm of Dudzinski and Dydek [2] and the Symmerge algorithm of Kim and Kutzner [3] are two minimum storage merging algorithms that have been proposed in the literature so far. Both algorithms are asymptotically optimal regarding the number of comparisons and resemble structurally. They perform the merging by a binary partitioning of both input sequences ⋆
This work was supported by the Kookmin University research grant in 2006.
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 347–356, 2007. c Springer-Verlag Berlin Heidelberg 2007
348
P.-S. Kim and A. Kutzner
which operates as the foundation of a rotation that is followed by two recursive calls. The algorithm proposed here operates similar, however the partitioning is performed by a novel technique called symmetric splitting. This partitioning technique is structurally simpler than the older ones, because it neither requires the detection of the shorter input sequence nor a binary search as sub operation. Further there is no static selection of any pivot element or centered subsequence. The simplicity leads to a highly transparent and well understandable algorithm that can be implemented in a few lines of Pseudocode. Despite its simplicity our algorithm is asymptotically optimal regarding the number of comparisons and requires O((m + n) log m) assignments. Another class of merging algorithms is the class of in place merging algorithms, where the external space is restricted to a constant amount merely. Recent work in this area are the publications [4,5,6,7], that describe algorithms which are all asymptotically optimal regarding the number of comparisons as well as assignments. However, these algorithms are structurally quite complex and rely heavily on other concepts, as e.g. Kronrod’s idea of an internal buffer [8], Mannila and Ukkonen’s technique for block rearrangements [9] and Hwang and Lin’s merging algorithm [10]. We included the stable in place merging algorithm proposed in [7] into our benchmarking in order to give an impression of the performance behavior of the different approaches. We will start with a formal definition of our algorithm together with the presentation of a corresponding Pseudocode implementation. Afterwards we will prove that our algorithm is stable, minimum storage and asymptotically optimal regarding the number of comparisons. In a benchmark section we show that our algorithm performs well compared to its competitors. We will finish with a conclusion, where we give some ideas for further research.
2
Formal Definition / Pseudocode Implementation
Let u and v be two adjacent ascending sorted sequences. We define u ≤ v (u < v) iff x ≤ y (x < y) for all elements x ∈ u and for all elements y ∈ v. The Principle of Symmetric Splitting The algorithm presented her relies on a quite simple idea, the principle of symmetric splitting. Informally this principle can be described as follows: Let u and v be our two input sequences. By a repeated halving of two search spaces we compute separations u ≡ u′ u′′ and v ≡ v ′ v ′′ so that we get u′′ > v ′ and u′ v21 y, so we preserve u3 > v1 (u1 and v3 stay untouched). ⊓ ⊔ Additionally the following holds: If y is marked v1 -belonging, we have u3 > v1 y. If x is marked u3 -belonging, we have xu3 > v1 or even xu3 > v1 y if y is additionally marked v1 -belonging. Corollary 1. Splitmerge is stable. m, n ≤ m/2, ≤ n
⌊log m⌋
≤ m, ≤ n/2
⌊log n⌋
≤ 1, ≤ n ≤ m, ≤ 1
≤ 1, ≤ n/2 ⌊log n⌋
≤ m/2, ≤ 1 ≤ 1, ≤ 1
Fig. 3. Recursion Depth
⌊log m⌋
352
P.-S. Kim and A. Kutzner
Lemma 2. The recursion depth is limited by min{⌊log m⌋ + ⌊log n⌋ , m − 1} Proof. We prove both upper bounds separately. (1) After the splitting (step (2) in the formal definition) we get either |u′3 | ≤ m/2 and |v1′ | ≤ n/2 or |u′1 | ≤ m/2 and |v3′ | ≤ n/2. This in turn implies (≤ m/2, ≤ n) and (≤ m, ≤ n/2) as sizes of the two recursive calls. Hence according to figure 3 the recursion depth is limited by ⌊log m⌋ + ⌊log n⌋. (2) u′1 ≡ u implies that we did not touch the (b)-alternative during the splitting. This in turn implies that v1′ is empty. In the opposite case (u′3 ≡ u) we have to distinguish two alternatives: Either we did not touch the (a)-alternative and so v3′ is empty or we touched the (a)-alternative with empty u22 as well as empty u21 and marked x as u1 -belonging. In the latter case we get a recursion where u′3 is shorten by one element. So the shorter side loses at least one element with every successful recursive invocation and the overall recursion depth is limited by m − 1. ⊓ ⊔ Since m ≤ n, the following corollary holds: Corollary 2. Splitmerge is a minimum storage algorithm.
m(= m01)
recursion level 0
1(= m11) m − 1(= m12)
recursion level 1
m−2
1
m−3
1 1
recursion level 2
n(= n01) n11
n12 n21
recursion level 3
n32
n31 n41
m−4
1 m − (m − 1)
n22
recursion level m − 1
n42
nm−1 nm−1 1 2
Fig. 4. Maximum spanning case
Complexity Unless stated otherwise, let us denote m = |u|, n = |v| with m ≤ n and let k = ⌊log m⌋ + 1 if 2k−1 < m < 2k or k = log m if m = 2k . Further mij and nij denote sizes of sequences merged on the ith recursion level (initially m01 = m and n01 = n). Lemma 2 shows that the recursion depth is limited by m − 1. We will now consider the relationship between m and n for the maximum spanning case, the case where the recursion depth is m − 1. Here (m, n) can be partitioned to either (1 (= m11 ), n11 )) and (m − 1 (= m12 ), n12 ) or (m − 1 (= m11 ), n11 ) and (1 (= m12 ), n12 )) merely. If there are other partitions with 1 < m11 , m12 < m − 1, then the algorithm may reach at most the recursion depth m − 2 (= m − 2 − 1 + 1). Without loss of generality we suppose that (m, n) is partitioned
A Simple Algorithm for Stable Minimum Storage Merging
353
to (1 (= m11 ), n11 )) and (m − 1 (= m12 ), n12 ) on recursion level 1. Since the Splitmerge algorithm applies the symmetric splitting principle, it must be satisfied n that n11 ≥ n − 2⌊logn m⌋ and n12 < 2⌊logn m⌋ (if m = 2k , then n12 < 2⌊logn m⌋ = m ). 1 Further if m − 1>n2, the recursion depth would be smaller than m − 1. Thus m − 1 ≤ n12 . Here m − 1 ≤ n12 and n12 < 2⌊logn m⌋ implies 2⌊log m⌋ · (m − 1) < n. Suppose that, just as on the first recursion level, (m − 1 (= m12 ), n12 ) is again partitioned to (1, n21 )) and (m − 2, n22 ) on the second recursion level. Then n
2⌊log m⌋ n21 ≥ 2⌊logn m⌋ − 2⌊log m⌋ ·2n⌊log(m−1)⌋ , n22 < 2⌊log(m−1)⌋ = 2⌊log m⌋ ·2n⌊log(m−1)⌋ and 2 2 2 m − 2 ≤ n2 . Thus from m − 2 ≤ n2 and n2 < 2⌊log m⌋ ·2n⌊log(m−1)⌋ it holds 2⌊log m⌋ · 2⌊log(m−1)⌋ · (m − 2) < n. On the ith recursion level, suppose
i−1 i i (m − (i − 1) (= mi−1 2 ), n2 ) is partitioned to (1, n1 ) and (m − i, n2 ). Then n n i n1 ≥ 2⌊log m⌋ ·2⌊log(m−1)⌋ ···2⌊log(m−i+2)⌋ − 2⌊log m⌋ ·2⌊log(m−1)⌋ ···2⌊log(m−i+2)⌋ ·2⌊log(m−i+1)⌋ , n
n ···2⌊log(m−i+2)⌋ = 2⌊log m⌋ ·2⌊log(m−1)⌋ ···2⌊log(m−i+2)⌋ ni2 < 2⌊log m⌋ ·2⌊log(m−1)⌋ 2⌊log(m−i+1)⌋ ·2⌊log(m−i+1)⌋ n and m−i ≤ 2⌊log m⌋ ·2⌊log(m−1)⌋ ···2⌊log(m−i+2)⌋ ·2⌊log(m−i+1)⌋ i. e. 2⌊log m⌋ ·2⌊log(m−1)⌋ · · · 2⌊log(m−i+1)⌋ · (m − i) < n, and so on. Hence, to reach the recursion depth m − 1, we need the assumption 2⌊log m⌋ · 2⌊log(m−1)⌋ · 2⌊log(m−2)⌋ · · · 2⌊log 1⌋ < n and can state the following theorem:
Theorem 1. If the Splitmerge algorithm reaches the recursion level m − 1 for two input sequences of sizes m, n (m ≤ n), then n > 2⌊log m⌋ · 2⌊log(m−1)⌋ · 2⌊log(m−2)⌋ · · · 2⌊log 1⌋ . We will now investigate the worst case complexity of the Splitmerge algorithm regarding the number of comparisons and assignments. Fig. 4 shows the partitions in the maximum spanning case. Note that on the recursion level i, a sequence of length mi1 = 1 (mi2 = m − i) is merged with a sequence of length ni1 (ni2 ). Lemma 3. ([2] Lemma 3.1) If k = 2i then j=1 log kj ≤ 2i log(k/2i ).
2i
j=1
kj for any kj > 0 and integer i ≥ 0,
Theorem 2. The Splitmerge algorithm needs O(m log(n/m+1)) comparisons.
Proof. Lemma 2 shows that the recursion depth is limited by m − 1 (Note that if m = 2k then m − 1 = 20 + 21 + 22 + · · · + 2k−1 = 20 + 21 + 22 + · · · + 2log m−1 ). We group the recursion levels into k + 1 recursion groups, say recursion group 0, recursion group 1, · · ·, recursion group k, so that each recursion group i (i = 1, 2, · · · , k) holds at most 2i−1 recursion levels (see Fig. 5). Till now mij and nij denoted the lengths of sequences merged on the ith recursion level. From now on we change the meaning of indexes so that mij and nij denote the lengths of sequences merged on the ith recursion group. Then there are at most 2i partitions in each recursion group i (i = 1, 2, · · · , k), say (mi1 , ni1 ), (mi2 , ni2 ), · · · (mi2i , ni2i ). Thus the number of comparisons for symmetric splitting with the recursion group 0 is equal to ⌊log n⌋ + 1≤ ⌊log(m + n)⌋ + 1. For the recursion
354
P.-S. Kim and A. Kutzner
m(= m01) 1(= m11) m − 1(= m12)
recursion group 1
recursion group 2
m−3
1
n11
1
n22 n23
n24 n31
m−4
1
n12 n21
m−2
1
n(= n01)
recursion group 0
m−5
recursion group 3
1
m−7 .. ..
n34 n36
n37
n38 .. ..
.. recursion group k
1 m − (m − 1)
n33 n35
m−6
1
n32
nk2k −1 nk2k
Fig. 5. Construction of recursion groups
group 1 we need max(log m11 , log n11 ) +1+ max(log m12 , log n12 ) +1≤ log(m11 + n11 ) + 1 + log(m12 + n12 ) + 1 comparisons, and so on. For the recursion group i we 2i need at most j=1 log(mij + nij )+ 2i comparisons. Since for each recursion group 2i 2i i (i = 1, 2, · · · , k) j=1 (mij + nij ) ≤ m + n, it holds j=1 log(mij + nij ) + 2i ≤ 2i log((m+n)/2i )+2i by Lemma 3. Note the following special case: if each merging of subsequences triggers two nonempty recursive calls, the recursion level becomes exactly k and recursion groups and recursion levels are identical. In this case each ith recursion level comprises 2i (i = 0, 1, . . . , k) subsequence mergings 2i and for each recursion group (level) i = 0, 1, . . . , k, it holds j=1 (mij + nij ) = 2i m+n. Therefore we need at most j=1 log(mij +nij )+2i ≤ 2i log((m+n)/2i )+2i comparisons as well. So the overall number of comparisons for all k + 1 recursion groups is not greater than ki=0 (2i + 2i log((m + n)/2i )) = 2k+1 − 1 + (2k+1 − k k 1) log(m + n) − i=0 i2i . Since i=0 i2i = (k − 1)2k+1 + 2, the Splitmerge algorithm needs at most 2k+1 − 1 + (2k+1 − 1) log(m + n) − (k − 1)2k+1 − 2 = 2k+1 log(m+n)−k2k+1 +2k+2 −log(m+n)−3 = 2m(log m+n m +2)−log(m+n)−3 = n + 1)) comparisons. ⊓ ⊔ O(m log( m Corollary 3. The Splitmerge algorithm is asymptotically optimal regarding the number of comparisons. 2i Regarding the sizes of merged sequences theorem 2 states j=1 (mij +nij ) ≤ m+n for all recursion groups i (i = 0, 1, · · · , k). Hence, if we take the optimal rotation algorithm proposed in [2], we perform O(m + n) assignments on every recursion group. Because we have at most k + 1 recursion groups the following theorem holds: Theorem 3. The Splitmerge algorithm needs O((m + n) log m) assignments.
A Simple Algorithm for Stable Minimum Storage Merging
355
Table 1. Runtimes of different merging algorithms n, m i St.-In-Pl.-Merge #comp te 223 224 25193982 6774 221 222 6307320 1652 219 220 1582913 395 219 216 1854321 406 219 212 2045316 307 219 28 1225279 97 219 24 1146326 107 219 21 786492 34
Recmerge #comp te 18642127 12864 4660230 2457 1165009 402 962181 311 263410 289 38401 283 4409 276 687 556
Symmerge #comp te 21285269 11841 5320495 2093 1329813 359 863284 241 196390 196 27917 164 1477 83 55 16
Splitmerge #comp te 21986651 11587 5496000 2128 1373814 349 837843 216 119072 187 11478 159 927 60 91 14
n, m : Lengths of input sequences (m = n) i : Number of different input elements te : Execution time in ms, #comp : Number of comparisons
4
Experimental Work / Benchmarking
We did some benchmarking for the Splitmerge algorithm, in order to get an impression of its practical value. We compared our algorithm with Dudzinsky and Dydek’s Recmerge [2] algorithm, Kim and Kutzner’s Symmerge [3] algorithm and the asymptotically optimal in place merging algorithm proposed in [7]. For rotations we generally used the rotation algorithm proposed in [2] that is optimal with respect to the number of assignments. Table 1 contains a summary of our results. Each entry shows a mean value of 30 runs with different random data. We took a state of the art hardware platform with 2 Ghz processor speed and 512MB main memory, all coding was done in the C programming language, all compiler optimizations had been switched of. The benchmarks show that Splitmerge can fully compete with Recmerge and Symmerge. Please note, that despite a slightly higher number of comparisons our algorithm performs a bit better than its two sisters. This seems to be due to Splitmerge’s simpler structure. The second column of Table 1 shows the number of different elements in both input sequences. Regarding their runtime all algorithms can take more or less profit of a decreasing number of different elements in the input sequences. However, the effect is particular well visible with Splitmerge.
5
Conclusion
We presented a simply structured minimum storage merging algorithm called Splitmerge. Our algorithm relies on a novel binary partition technique called symmetric splitting and has a short implementation in Pseudcode. It requires n + 1)) comparisons and O((m + n) log m) assignments, so it is asympO(m log( m totically optimal regarding the number of comparisons. Our benchmarking proved that it is of practical interest. During our benchmarking we observed that none of the investigated algorithms could claim any general superiority. We could always find input sequences
356
P.-S. Kim and A. Kutzner
so that a specific algorithm performed particularly well or bad. Nevertheless, we could recognize criteria that indicated the superiority of a specific algorithm for specific inputs. For example Splitmerge performs well if we have only few different elements in our input sequences. We plan more research on this topic in order to develop guidelines for a clever algorithm selection in the case of merging.
References 1. Knuth, D.E.: The Art of Computer Programming. Volume Vol. 3: Sorting and Searching. Addison-Wesley (1973) 2. Dudzinski, K. and Dydek, A.: On a Stable Storage Merging Algorithm. Information Processing Letters 12 (1981) 5–8 3. Kim, P.S. and Kutzner, A.: Stable Minimum Storage Merging by Symmetric Comparisons. In Albers, S., Radzik, T. (eds.), Algorithms – ESA 2004, Springer, Lecture Notes in Computer Science 3221 (2004) 714–723 4. Symvonis, A.: Optimal Stable Merging. Computer Journal 38 (1995) 681–690 5. Geffert, V., Katajainen, J., and Pasanen, T.: Asymptotically Efficient In-Place Merging. Theoretical Computer Science 237 (2000) 159–181 6. Chen, J.: Optimizing Stable In-Place Merging. Theoretical Computer Science 302 (2003) 191–210 7. Kim, P.S. and Kutzner, A.: On Optimal and Efficient in Place Merging. In Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Stuller, J. (eds), SOFSEM 2006, Springer, Lecture Notes in Computer Science 3831 (2006) 350–359 8. Kronrod, M.A.: An Optimal Ordering Algorithm without a Field Operation. Dokladi Akad. Nauk SSSR 186 (1969) 1256–1258 9. Mannila, H. and Ukkonen, E.: A Simple Linear-Time Algorithm for in Situ Merging. Information Processing Letters 18 (1984) 203–208 10. Hwang, F. and Lin, S.: A Simple Algorithm for Merging Two Disjoint Linearly Ordered Sets. SIAM J. Comput. 1 (1972) 31–39 11. Cormen, T., Leiserson, C., Rivest, R., and Stein, C.: Introduction to Algorithms. 2nd edn. MIT Press (2001)
Generating High Dimensional Data and Query Sets⋆ Sang-Wook Kim1 , Seok-Ho Yoon1 , Sang-Cheol Lee1 , Junghoon Lee2 , and Miyoung Shin3 1
2
School of Information and Communications, Hanyang University {wook, bogely, korly}@hanyang.ac.kr Dept. of Computer Science and Statistics, Cheju National University
[email protected] 3 School of Electrical Engineering and Computer Science Kyoungpook National University
[email protected]
Abstract. Previous researches on multidimensional indexes typically have used synthetic data sets distributed uniformly or normally over multidimensional space for performance evaluation. These kinds of data sets hardly reflect the characteristics of multimedia database applications. In this paper, we discuss issues on generating high dimensional data and query sets for resolving the problem. We first identify the requirements of the data and query sets for fair performance evaluation of multidimensional indexes, and then propose HDDQ Gen (High-Dimensional Data and Query Generator) that satisfies such requirements. HDDQ Gen has the following features: (1) clustered distribution, (2) various object distribution in each cluster, (3) various cluster distribution, (4) various correlations among different dimensions, and (5) query distribution depending on data distribution. Using these features, users are able to control the distribution characteristics of data and query sets appropriate for their target applications.
1
Introduction
The nearest neighbor query is one of the most frequently-used operations in multimedia databases [1,2,3], aiming at finding the most similar objects in a database. In previous work, each object is considered as a point in multidimensional vector space by extracting the features of objects, called feature vectors, such as colors, textures, and brightness [4]. The nearest neighbor query is defined as follows: For a given target query point and object points in multidimensional space, it finds the object point that has the closest Euclidean distance from t in the database [5]. For efficient processing of the nearest neighbor query, most of existing methods employ a multidimensional index for fast retrieval of points in multidimensional space [5]. Even if it shows good performance on low dimensional applications ⋆
This research was supported by the MIC, Korea, under the ITRC support program supervised by the IITA (IITA-2005-C1090-0502-0009).
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 357–366, 2007. c Springer-Verlag Berlin Heidelberg 2007
358
S.-W. Kim et al.
such as geographic information systems (GIS), it is known that the performance of a multidimensional index gets worse seriously in higher dimensional applications such as multimedia applications [6,7], which is called ‘dimensionality curse’. Thus, lots of researches trying to device a better high dimensional indexes have been carried out to resolve the dimensionality curse problem. For fair performance evaluation of new indexes, it is necessary to make experiments with actual data and query sets used in target applications. When such data and query sets are not available at the time of building indexes, however, we may need to conduct experiments with standardized synthetic data and query sets. Obviously, the synthetic data and query sets should have similar features to the actual ones in such cases[8]. Previous work on the nearest neighbor query typically has used synthetic data sets distributed uniformly or normally over multidimensional space [5,6,7]. However, recent research result has shown that these kinds of data sets hardly reflect the characteristics of multimedia database applications, especially when the nearest neighbor query is performed in high dimensional space [9]. In this paper, we discuss issues on generating high dimensional data and query sets for resolving the problem. We first identify the requirements of the data and query sets that are appropriate for fair performance evaluation of the nearest neighbor query processing with multidimensional indexes and then propose HDDQ Gen (High-Dimensional Data and Query Generator) that satisfies such requirements. HDDQ Gen has the following features: (1) clustered distribution, (2) various object distribution in each cluster, (3) various cluster distribution, (4) various correlations among different dimensions, and (5) query distribution depending on data distribution. With these features, users are able to control the distribution characteristics of data and query sets. Our contribution is fairly important in that HDDQ Gen provides the benchmark environment capable of evaluating multidimensional indexes properly. This paper is organized as follows: After presenting design requirements for the proposed HDDQ Gen in Section 2, we will briefly explain singular value decomposition (SVD) employed in HDD Gen in Section 3. Section 4 provides the HDDQ Gen algorithm and its implementation details. In addition, we show that users can control the distribution characteristics with HDDQ Gen by giving an example of the generated data and query sets. Finally, Section 5 summarizes and concludes this paper.
2
Design Requirements
In this section, we address the design requirements for HDDQ Gen which is a generator of data and query sets for fair performance evaluation of high dimensional indexes. • Object clustering: According to the analysis result in reference [9], the distance between the nearest and the farthest objects from a given query point decreases gradually with the growth of dimensionality. Particularly, for the uniformly distributed data points whose dimensionality is more than 20, the
Generating High Dimensional Data and Query Sets
• • •
•
3
359
distance becomes very small and makes the nearest neighbor meaningless. This result is consistent with the result of reference [6]. Only in applications where objects are distributed with a set of clusters over entire space, the nearest neighbor query is meaningful [9]. Taking this situation into account, HDDQ Gen generates the objects distributed in a set of clusters. Object distribution in clusters: In real applications, object distribution in each cluster is of a great variety. Using HDDQ Gen, users are able to control the shape, size, and distribution for each cluster of objects to be generated. Cluster distribution: Cluster distribution is various in reality, just like object distribution in clusters. Using HDDQ Gen, users can also control the distribution characteristics of cluster centroids in multidimensional space. Correlations: Besides the uniform distribution assumption, the independence assumption is the basic one to analytically evaluate the performance of multidimensional indexes. With this, the objects are assumed not to be mutually correlated in different dimensions. However, in practice, most of data sets do have the correlations among different dimensions, especially in high dimensional space [10]. To reflect such situations, HDDQ Gen controls all possible correlations between two different dimensions in each cluster. Query distribution: One of mistakes that have been usually made in evaluating the performance of high dimensional indexes is that query distribution is independent of object distribution. However, in many real-world applications, queries are usually issued at the points near the target objects. In particular, to make the nearest neighbor query meaningful in high dimensional space, query points should occur within or near the target object clusters [9]. To reflect this situation, HDDQ Gen considers object distribution in distributing query points.
Singular Value Decomposition
As a background to understand HDDQ Gen, in this section, we briefly give the definition of SVD (Singular Value Decomposition), and discuss its implications in this work. 3.1
Definition of SVD
Suppose that, for a matrix X of size M × N (= xi,j ), uj , which is the mean of a column, is defined as follows. uj = (1/M )
M
xi,j ,
1≤j≤N
i=1
Also, let 1M be a matrix of size M × M all of which elements are 1’s. By SVD, a matrix of X − 1M·uT is represented as the multiplications of U , S, and V , as in Eq. (1), where U is a column-orthonormal1 matrix of M × N , S is a diagonal matrix of M × M , and V is a matrix of M × M [11]. 1
For an identify matrix I, it is U T · U = I.
360
S.-W. Kim et al.
X − 1M×uT = U · S · V T
(1)
The covariance matrix of X, C(= ci,j )2 having the size of M × M , can be represented as follows: C = (1/M ) · X T · X − uT · u = V · Λ · V T ,
(2)
where Λ is a diagonal matrix of M × M , while Λ and V denote the eigen values and eigen vectors of C, respectively. 3.2
Implications of SVD
X, shown above, denotes N objects in M -dimensional space, and the elements of C correspond to the covariances between two different dimensions of those objects. As in Eq. (2), SVD informs us of the underlying trends of the given object sets through the matrices V and Λ. The matrix V denotes a new axis system minimizing the correlation between the axes in object distribution while the matrix Λ carries the information about how widely distributed the objects are in each dimension with the new axis system. Fig. 1 shows the objects distributed over two dimensional space. Each point denotes the position of an object in two dimensional space; x and y form the original axis system. Using SVD, we can obtain a new axis system consisting of x’ and y’ which is more suitable for this object distribution. Also, it is possible to have the information about how widely distributed the objects are in the new axis system consisting of x’ and y’.
Fig. 1. SVD implications
4
Proposed Method
In this section, HDDQ Gen is explained in detail including the control parameters for HDDQ Gen, the algorithm for HDDQ Gen, and some examples of data and query sets generated by HDDQ Gen. 2
It is defined as ci,j =
N
k=1
xk,i ×xk,j N
−(
N
k=1
N
xk,i
×
N
k=1
N
xk,j
).
Generating High Dimensional Data and Query Sets
4.1
361
Control Parameters
• numDims: It denotes the number of dimensions of data and query sets. • numObjects: It denotes the total number of objects in data sets and determines the size of a data set. • struct numObjInCluster : It determines the number of objects in a cluster. For a given input (maximum, minimum), an arbitrary number of objects between the maximum and the minimum is assigned to each cluster. Thus, different number of objects are assigned to different clusters in a data set. • struct objDistInCluster : It determines the distribution of objects in a cluster, which can be one of the three: uniform, normal, and exponential distribution. In case of uniform distribution, a pair of (minimum, maximum) is taken as an input for each dimension and then the uniform distribution is generated in a randomly chosen range between the minimum and the maximum. In case of normal distribution, a pair of (minimum, maximum) of the standard deviation is taken as an input for each dimension and then the normal distribution having the mean 0 and the standard deviation s is generated for a randomly chosen value s between the minimum and the maximum. In case of exponential distribution, a pair of (minimum, maximum) of the mean is taken as an input for each dimension and then the exponential distribution having the mean a is generated for a randomly chosen value a between the minimum and the maximum. In all the three cases, the dimensions are mutually independent. • struct clusterDist: It determines the distribution of cluster centroids, which can be picked among the three: uniform, normal, and exponential distribution. That is, cluster centroids can be generated to have uniform, normal, or exponential distribution in a similar way to struct ObjDistInCluster • queryRatio: It denotes the percentage of the number of query points to the number of all the objects. For example, if it is 10, as many query points as 10% of objects are generated. • queryDist: It determines the distribution of query points, and either ‘independent’ or ‘dependent’ can be taken as an input. The ‘independent’ means that query points are generated in such a way to be uniformly distributed in multidimensional space and to be independent of object distribution. The ‘dependent’ means that query points are generated to have the same distribution as the objects. 4.2
HDDQ Gen Algorithm
This subsection explains HDDQ Gen algorithm shown below in details. HDDQ Gen takes the control parameters mentioned in Section 4.1 as inputs, and generates object and data sets into dataFile and queryFile, respectively. The variables of numObjInCluster and numQueriesInCluster are used only within the algorithm. HDDQ Gen produces a bunch of object and query points in a cluster until the required amount of objects are fully generated (Line 1). In Lines 2-5, the features of clusters are determined. Line 2 determines the number of objects
362
S.-W. Kim et al.
Algorithm HDDQ Gen Input: numDims, numObjects, struct numObjInCluster, struct objDistInCluster, struct clusterDist, queryRatio, struct queryDist; Output: dataFile, queryFile; Local variable: numObjInCluster, numQueriesInCluster; 1. while (numObjects > 0) { 2. determine numObjInCluster, the number of objects in the cluster using struct numObjects; 3. numQueriesInCluster = numObjInCluster * queryRatio; 4. get centerPoint, the center point of the cluster using struct clusterDist; 5. determine the axis system for this cluster; 6. while (numObjects > 0 && numObjInCluster > 0) { 7. generate an object belonging to the cluster using struct objDistInCluster; 8. adjust the object to the axis system of the cluster; 9. shift the object so that all the objects in the cluster are centered around the centerPoint; 10. output the object into dataFile; 11. numObjects−−, numObjInCluster−−; } 12. while (numQueriesInCluster > 0) { 13. generate a query point belonging to the cluster using struct objDistInCluster; 14. adjust the query point to the axis system of the cluster; 15. shift the query point so that all the objects in the cluster are centered around the centerPoint; 16. output the query point into queryFile; 17. numQueriesInCluster−−; } }
in each cluster, and Line 3 does the number of query points in proportional to the number of objects. Line 4 decides the locations of cluster centroids. These decisions are controlled by the user-specified control parameters. Line 5 determines the axis system so as to have the correlation between different dimensions in each cluster. If the number of dimensions in the target space is M , this axis system consists of M orthonormal vectors. Since such vectors cannot be easily generated in a random fashion, we employ SVD described in Section 3 for this purpose. That is, to generate M orthonormal vectors, we first randomly generate each element in a matrix of M × M . This matrix serves as a virtual covariance matrix. By performing SVD on this matrix, we can obtain the result of (2), and it leads to a new axis system consisting of M orthonormal vectors. Lines 6-11 produces objects reflecting the features of clusters. First, Line 7 generates objects satisfying the user-specified characteristics of the distribution
Generating High Dimensional Data and Query Sets
363
given via struct objDistInCluster. Line 8 adjusts the objects into the new axis system determined in Line 5. Line 9 moves the adjusted objects into the new positions taking into account the cluster centroids determined by Line 4. Finally, Line 10 outputs these objects into dataFile. In fact, Lines 8 and 9 can be described in a simple matrix form as follows: y = A+x·B
(3)
Given an M -dimensional space, x is a matrix of 1×M which includes the objects generated by Line 7. B is a matrix of M × M which is an axis system of the clusters generated by Line 5. A is a matrix of 1 × M which includes the cluster centroids given by Line 4. Finally, y is a matrix of 1 × M , which is the finally generated objects. Lines 12-17 generate query points that have the features of clusters. The operating principle is the same as that of generating the objects. The query points are finally stored into queryFile. 4.3
Examples
In this subsection, by providing examples of data and query sets by HDDQ gen, we show that users can control the characteristics of data and query sets properly. Table 1 shows the control parameters employed. Table 1. Control parameter settings for generating data and query sets control parameters numDims numObjects struct numObjInCluster struct objDistInCluster struct clusterDist queryRatio queryDist
assigned values 10 1,000 MC, FC normal(SS, LS) uniform 10 dependent
It is an example of generating 1,000 objects in 10-dimensional space. For struct numObjInCluster, the value of MC (Many Clusters) is an option to adjust the number of objects in a cluster into the range of [30,70] while the value of FC (Few Clusters) is to adjust it into [90, 210]. Also, for struct objDistInCluster, the normal distribution is taken. SS (Small Standard) deviation is an option to adjust a standard deviation into [0.5%, 3.5%] of the entire range, and LS (Large Standard) deviation is to adjust a standard deviation into [3.5%, 7%] of the entire range. The cluster centroids are uniformly distributed over multidimensional space. The number of query points is set to be 10% of that of objects, and the distribution of query points takes the same as that of objects.
364
S.-W. Kim et al.
Fig. 2. Some examples of generating data sets
Fig. 2 and Fig. 3 show four different types of data and query sets that can be generated by taking different combinations of (MC, FC) and (SS, LS)3 . First, let us see the data sets. MC leads to many clusters consisting of small number of objects while FC leads to small clusters of many objects. On the other hand, LS obtains data sets distributed more widely than SS, and shows various correlations between two dimensions in each cluster. The parameter values can be set to make synthetic data sets reflect the distribution characteristics of real-world applications. Also, it can be seen that the query sets have the similar characteristics to the data sets and reflect the distribution characteristics of objects successfully. 3
Originally, 10-dimensional data and query sets were generated. For visualization, however, only the two dimensions were chosen for projection over 2-dimensional space.
Generating High Dimensional Data and Query Sets
365
Fig. 3. Some examples of generating query sets
5
Conclusions
In this paper, we pointed out that the synthetic data and query sets used in previous work did not reflect actual situations correctly in evaluating the performance of multidimensional indexes. We also discussed how to resolve such a problem. This paper first identified the requirements of the data and query that are appropriate for fair performance evaluation of multidimensional indexes and nearest neighbor queries, and then proposed HDDQ Gen (High-Dimensional Data and Query Generator) capable of generating the data and query sets that satisfy such requirements. HDDQ Gen can successfully control (1) clustered distribution, (2) various object distribution in each cluster, (3) various cluster distribution, (4) various correlations among different dimensions, and (5) query distribution depending on data distribution. So, users are provided with various choices for the
366
S.-W. Kim et al.
distribution of target data and query sets. This paper has the significance in that it provides the basis for evaluating the performance of high dimensional indexes and nearest neighbor queries correctly as the benchmark environment reflecting the characteristics of the applications. Along with HDDQ Gen, we plan to provide good data sets which can be used for the benchmarking in WWW environments by putting together various actual data and query sets used in multimedia applications. Acknowledgment. Sang-Wook Kim would like to thank Jung-Hee Seo, and Grace (Ju-Young) Kim, and Joo-Sung Kim for their encouragement and support.
References 1. Bohm, C., Berchtold, S., and Keim, D.: Searching in High-Dimensional Spacesindex Structures for Improving the Performance of Multimedia Databases. ACM Computing Surveys 33 (2001) 322-373 2. Ogras, U. and Ferhatosmanoglu, H.: Dimensionality Reduction Using Magnitude and Shape Approximations. Proc. of the 12th Int’l. Conf. on Information and Knowledge Management (2003) 99-107 3. Jeong, S., Kim, S., Kim, K., and Choi, B.: An Effective Method for Approximating the Euclidean Distance in High-Dimensional Space. Int’l. Conf. on Database and Expert Systems Applications (2006) 863-872 4. Arya, M., et al.: QBISM: Extending a DBMS to Support 3D Medical Images. In Proc. Int’l. Conf. on Data Engineering. IEEE (1994) 314-325 5. Berchtold, S., et al.: Fast Nearest Neighbor Search in High-Dimensional Space. In Proc. Int’l. Conf. on Data Engineering. IEEE (1998) 209-218 6. Weber, R., Schek, H., and Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. Proc. Int’l. Conf. on Very Large Data Bases (1998) 194-205 7. Berchtold, S., Keim, D., and Kriegel, H.: The X-tree: An Index Structure for HighDimensional Data. In Proc Int’l. Conf. on Very Large Data Bases (1996) 28-39 8. Zobel, J., Moffat, A., and Ramamohanarao, K.: Guidelines for Presentation and Comparison of Indexing Techniques. ACM SIGMOD Record 25 (1996) 10-15 9. Beyer, K., et al.: When Is Nearest Neighbor Meaningful?. In Proc. Int’l. Conf. on Database Theory (1998) 217-235 10. Kim, S., Aggarwal, C., and Yu, P.: Effective Nearest Neighbor Indexing with the Euclidean Metric. Proc. ACM Int’l. Conf. on Information and Knowledge Management (2001) 9-16 11. Jolliffe, I.: Principal Component Analysis. Springer-Verlag (1986)
Partial vs. Complete Domination: t-Dominating Set⋆ Joachim Kneis, Daniel M¨ olle, and Peter Rossmanith Department of Computer Science, RWTH Aachen University, Germany {kneis,moelle,rossmani}@cs.rwth-aachen.de
Abstract. We examine the parameterized complexity of t-Dominating Set, the problem of finding a set of at most k nodes that dominate at least t nodes of a graph G = (V, E). The classic NP-complete problem Dominating Set, which can be seen to be t-Dominating Set with the restriction that t = n, has long been known to be W[2]-complete when parameterized in k. Whereas this implies W[2]-hardness for t-Dominating Set and the parameter k, we are able to prove fixed-parameter tractability for t-Dominating Set and the parameter t. More precisely, we obtain a quintic problem kernel and a randomized O((4 + ε)t poly(n)) algorithm. The algorithm is based on the divide-and-color method introduced to the community earlier this year, rather intuitive and can be derandomized using a standard framework.
1
Introduction
The widely accepted P=NP hypothesis implies that there are no polynomialtime algorithms for any NP-hard problem. Nevertheless, many of these problems arise and need to be dealt with in everyday applications. This dilemma has led to several noteworthy concepts such as randomization or approximation, which soften the classical notion of intractability as inspired by NP-completeness theory. Parameterized complexity [6] constitutes another remarkable means of relaxing the worst-case analysis for NP-hard problems. The underlying idea of parameterized complexity lies in investigating the hardness of a problem with respect to a parameter, for example the size of the solution or some measurable quantity of the instance. Many problems turn out to be fixed-parameter tractable (FPT), meaning that they can be solved by an O(f (k)poly(n)) algorithm, where k is the parameter and f a function. On the negative side, there are good reasons to believe that a problem is not in FPT when it turns out to be hard for certain other parameterized complexity classes such as W[1] or W[2]. In so far, an intuitive interpretation of W[1]hardness is that the problem in question cannot be tackled even for small values of the parameter. We refer the reader to the monograph by Downey and Fellows for a detailed explanation of these concepts [6]. ⋆
Supported by the DFG under grant RO 927/7-1.
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 367–376, 2007. c Springer-Verlag Berlin Heidelberg 2007
368
J. Kneis, D. M¨ olle, and P. Rossmanith
Let G = (V, E) be a simple undirected graph. We say that a node d ∈ V dominates a node v ∈ V if either d = v or {d, v} ∈ E, and that a node c ∈ V covers an edge e ∈ E if c ∈ e. A dominating set for G is a subset D ⊆ V of nodes that dominate all of V , and a vertex cover is a subset C ⊆ V of nodes that cover all of E. The corresponding decision problems Dominating Set and Vertex Cover ask for a size-k dominating set or vertex cover, respectively. Both are classical, well-known NP-complete problems. In terms of parameterized complexity and for the parameter k, Dominating Set is W[2]-complete, whereas Vertex Cover allows for a very efficient FPT algorithm whose running time is bounded by O(1.2738k + kn) [4]. In the same way that Dominating Set and Vertex Cover ask for node subsets dominating all the nodes or covering all the edges of a graph, many problems regard the complete satisfaction of certain constraints. This leads to the question of how asking for only a partial satisfaction affects the complexity, which has received a lot of attention as of late [3,7,8,9]. We will refer to the partial-satisfaction variants of Dominating Set and Vertex Cover as t-Dominating Set and t-Vertex Cover. Given a graph G = (V, E) and numbers k, t ∈ N , t-Dominating Set asks for a size-k subset of nodes that dominate at least t nodes, and t-Vertex Cover asks for a size-k subset of nodes that cover at least t edges. Note that t > k in all interesting cases. When investigating the parameterized complexity of t-Vertex Cover and t-Dominating Set, both k and t constitute interesting parameters. A rather intuitive reduction from Independent Set suffices to prove that t-Vertex Cover is W[1]-hard when parameterized in k [8]. The case of t-Dominating Set is obvious: If k is the only parameter, then Dominating Set can be reduced to t-Dominating Set by setting t = n. This implies that t-Dominating Set is even W[2]-hard for the parameter k. In the case that t is chosen as the parameter, positive results abound. Cai, Chan and Chan have applied the new random separation method [3] to several partial-satisfaction problems, obtaining a randomized O(4t tn + m) algorithm for t-Vertex Cover, where n = |V | and m = |E| as usual. The complexity can be improved to O(2.0911tn(n + m)k) via another, entirely different approach [12]. The random separation method is notably elegant: Consider a graph problem that consists in looking for a certain subgraph. The basic idea is to color the nodes in red and green randomly, hoping that the nodes of the subgraph in question will be green and surrounded by red neighbors. If the size of the subgraph and its neighborhood is bounded by f (k), the chance of obtaining a helpful random coloring is 2−f (k) . That is, if checking for the desired subgraph is possible in polynomial time, a randomized O(2f (k) poly(n)) algorithm with exponentially small error probability can be constructed. Surprisingly, it seems that the random separation method cannot be used to design an algorithm for t-Dominating Set: Even if all the k nodes of a solution are colored green and all the surrounding nodes are colored red, there may be many green components that are interdependent in so far that they have dominated surrounding nodes in common. Only if the degree is bounded by d,
Partial vs. Complete Domination: t-Dominating Set
369
2
an O(2td+td tm) algorithm can be obtained [3]. Note that d can be as large as t − 2 for nontrivial instances on general graphs. These obstacles and the fact that Dominating Set is W[2]-hard may create the first impression that t-Dominating Set constitutes an intractable problem when parameterized in t. Fortunately, however, it turns out that the divideand-color method [5,11] can be applied. This method can be seen to be an improvement on the well-known color-coding technique [2], allowing for even faster FPT algorithms for many interesting graph-packing problems. The crucial idea of the divide-and-color paradigm consists in combining the power of random colorings with a recursive halving of the problem size. When looking for a k-node path in a graph, for example, we may recursively color the graph in black and white, hoping that the first and second half of an actual k-node path will be colored all black and all white, respectively. Trying 3 · 2k random colorings per recursive call results in an O(4k poly(n)) algorithm with constant error probability, which can easily be amplified to become exponentially small. In this paper, we present a problem kernel and a non-trivial application of the divide-and-color method for t-Dominating Set. The problem kernel has size O(t5 /k), and the resulting randomized FPT algorithm has a running time of O((4 + ε)t poly(n)). Moreover, the algorithm is rather intuitive: The underlying idea of dividing the task of finding a t-dominating set into two tasks of finding a t/2-dominating set in appropriate subgraphs is natural. We also avoid complex case distinctions or similar constructs. In most cases, such building blocks have the sole purpose to ease the worst-case analysis for certain special cases. Even worse, they often make the algorithm harder to implement or verify—and possibly slower. It is a particular strength of the divide-and-color method to aid us in designing algorithms that are free from counterintuitive artifacts. This design pattern has received increasing attention in recent years [4,10,13].
2
A Problem Kernel for t-Dominating Set
Kernelization constitutes an important concept in the field of parameterized complexity. Given a parameterized problem, the idea is to map any instance (I, k) to a new instance (I ′ , k ′ ), where both k ′ and the size of I ′ are bounded by two functions in k and (I ′ , k ′ ) is a yes-instance iff (I, k) is a yes-instance. Vertex Cover, for instance, allows for a kernelization that maps any instance (G, k) to an equivalent instance (G′ , k ′ ) such that k ′ ≤ k and |E ′ | ≤ k 2 for G′ = (V ′ , E ′ ): Firstly, nodes of degree greater than k need to be included in any k-node vertex cover. Secondly, if the remaining graph of degree at most k contains more than k 2 edges, it is impossible to cover all of them by any k nodes. The problem kernel developed in this section is based on two intuitive observations. Firstly, a graph with an appropriately large number of high-degree nodes should allow for some set of k nodes dominating at least t nodes. Secondly, if the graph has a large number of low-degree nodes, some of them should be interchangeable, allowing for the removal of redundant parts of the graph.
370
J. Kneis, D. M¨ olle, and P. Rossmanith
Definition 1. Let G = (V, E) be a graph and V ′ ⊆ V . We define: – – – – – – –
N (v) := { w ∈ V | {v, w} ∈ E }, N [v] := N(v) ∪ {v}, N [V ′ ] := v∈V ′ N [v ′ ], N (V ′ ) := N [V ′ ] \ V ′ , N 2 [V ′ ] := N [N [V ′ ]] and N 3 [V ′ ] := N [N 2 [V ′ ]], G[V ′ ] is the subgraph of G induced by V ′ , and V ′ [a, b] := { v ∈ V ′ | a ≤ degG (v) ≤ b }.
Theorem 1. Any instance (G, k, t) of t-Dominating Set can be reduced to a kernel of size t5 /k + t3 /k 2 = O(t5 /k) in polynomial time. Proof. It is safe to assume that the maximum degree ∆(G) of G = (V, E) satisfies 2 ≤ ∆(G) ≤ t− 2, because the problem can be solved (and, of course, kernelized) very easily otherwise. We may also assume that |V | ≥ t5 /k + t3 /k 2 , because otherwise the instance already has the desired kernel size. Finally, we have k ≤ t in all interesting cases. Let Vhi := V [t/k, t − 2]. Since G has maximum degree at most t − 2, the maximum number of nodes in N 2 [v] for any v ∈ V is bounded by 1 + (t − 2) + (t−2)(t−3) = t2 −4t+5 ≤ t2 . Consequently, if |Vhi | > (k −1)t2 , then G contains at least k nodes of degree at least t/k whose pairwise distance is three or more. In this case, (G, k) constitutes a yes-instance: Two nodes with distance three or more cannot have dominated vertices in common, and k nodes each dominating at least t/k other vertices clearly constitute a t-dominating set. The input may thus be replaced by a trivial yes-instance in this case. Otherwise, let V1 := N 2 [Vhi ] and construct a node set V2 by choosing t2 /k many nodes of the highest degrees possible from V \ V1 . Note that after picking any set S of k − 1 nodes from V2 , it is always possible to pick a k-th node whose distance to any of the nodes from S is at least three. We are now going to prove that G′ := G[N [V1 ] ∪ N [V2 ]] has a t-dominating set of size k if and only if G does. It is obvious that any t-dominating set for G′ also constitutes a t-dominating set for G. For the other direction, let D be a t-dominating set in G. We transform D into a t-dominating set D′ of the same size for G′ according to the following case distinction for each v ∈ D: If v ∈ V1 ∪ V2 , then it remains in D′ . Otherwise, if v ∈ / V1 ∪ V2 , then deg(v) ≤ deg(w) for all w ∈ V2 . Since D ∩ V2 contains at most k − 1 nodes, there is a node w ∈ V2 whose distance to any of the nodes from D ∩ V2 is at least three. Using such a node w instead of v in D′ cannot decrease the number of dominated nodes. It remains to estimate the size of G′ . Because the maximum degree is bounded by t − 2 and |Vhi | is bounded by (k − 1)t2 , we get |N [Vhi ]| < kt3 . Similarly, since the maximum degree of nodes not contained in Vhi is bounded by t/k, we also get |N [V1 ]| = |N 3 [Vhi ]| < t5 /k. On the other hand, |V2 | = t2 /k by construction, and this implies |N [V2 ]| ≤ t3 /k 2 . Hence, (G′ , t, k) constitutes a problem kernel of the desired size.
Partial vs. Complete Domination: t-Dominating Set
3
371
A Randomized Algorithm for t-Dominating Set
From an intuitive perspective, it may seem that the divide-and-color method [11] can be applied to t-Dominating Set in a straightforward fashion: After randomly coloring the nodes in black and white, it may seemingly suffice to find t/2-dominating sets in the black and the white part whose combined size does not exceed k. The resulting algorithm could be proven to have a running time of O(4t poly(n)). Unfortunately, it would also be incorrect. This is because it might be impos· sible to split a t-dominating set D into two subsets D1 ∪ D2 = D that dominate about t/2 nodes each—see Figure 1 for a small example.
Fig. 1. Consider this graph and k = 2, t = 10. The solution is unique and unbalanced: No matter which coloring of the nodes in black and white we choose, there are no t/2-dominating sets in the black and the white subgraph.
A first approach to fixing this problem, of course, could consist in solving subproblems of unbalanced size. However, this would lead to prohibitively large running times, because the problem sizes of the larger subproblems may not decrease sufficiently fast. Nevertheless, the divide-and-color approach works very well if we handle such unbalanced solutions in an appropriate way. In order to measure the balancedness of a solution in a formal fashion, we introduce the notion of α-balance: Definition 2. Let G = (V, E) be a graph and D a t-dominating set. We call D · · α-balanced iff there are partitions D1 ∪ D2 = D and X1 ∪ X2 = N [D] with ⌊t/2⌋ − ⌊αt⌋ ≤ |X1 | ≤ ⌈t/2⌉ + ⌊αt⌋ and X1 ⊆ N [D1 ], X2 ⊆ N [D2 ]. We call X1 and X2 balanced halves of N [D]. For instance, the graph in Figure 1 is 1/10-balanced. The key observation for dealing with unbalanced solutions is that the lack of balance is caused by very few nodes of high degree (as illustrated in Figure 2). If there is no α-balanced t-dominating set in a yes-instance, then it turns out that some ⌈1/(2α)⌉ nodes constitute a t/2-dominating set (this fact will be detailed in the upcoming proof). Algorithm TDS (Table 1) handles both the balanced and the unbalanced case by checking whether a few nodes constitute a t/2-dominating set. 1 ⌉. Given a graph G = (V, E) and a number Lemma 1. Let α ∈ R and β = ⌈ 2α t ∈ N, Algorithm TDS returns the size of a minimum t-dominating set for G with probability at least 12 .
372
J. Kneis, D. M¨ olle, and P. Rossmanith
Fig. 2. Balancing numbers: Unbalancedness requires large numbers Table 1. The randomized algorithm (TDS). The global constant α can be used to tweak the exponential vs. the polynomial factors in the running time. TDS(G, t) : 1 ⌉ then if there is a t-dominating set of size at most β := ⌈ 2α Deterministically compute the size s of a minimum t-dominating set; return s; fi; if |V | = ∅ then return ∞; fi; kopt := ∞; for 4 · 2t times do Choose some V ′ ∈ 2V with uniform probability; // Unbalanced part: ′ Find an A ⊆ Vβ dominating a maximum number t∗ of nodes in G[V ′ ]; if t∗ > ⌈t/2⌉ + ⌊αt⌋ then s2 := TDS(G[V \ V ′ ], t − t∗ ); if β + s2 < kopt then kopt := β + s2 ; fi; fi; // Balanced part: for t′ from 0 to ⌊αt⌋ do s1 := TDS(G[V ′ ], ⌈t/2⌉ + t′ ); s2 := TDS(G[V \ V ′ ], ⌊t/2⌋ − t′ ); if s1 + s2 < kopt then kopt := s1 + s2 ; fi; endfor; endfor; return kopt ;
Proof. Observe that the algorithm returns ∞ if t > n, and that it cannot return a number k if there is no solution of size k. It remains to show that the algorithm does not return suboptimal values with sufficient probability. More precisely, we will employ induction to show that it returns the size of a minimum t-dominating set with probability at least 21 . For t ≤ β, the algorithm finds the correct number by brute force. If there is a solution of size k ≤ β, this brute force approach will consider it. The other case is t > β. We are going to investigate two subcases: There may be an α-balanced solution or not. Assume there is an α-balanced minimum t-dominating set Opt and X = N [Opt] with balanced halves X1 and X2 . Let furthermore C := (V ′ , V \ V ′ ) be a random two-coloring of V . The probability that X1 ⊆ V ′ and X2 ⊆ V \ V ′
Partial vs. Complete Domination: t-Dominating Set
373
is 2−|X| . Since s1 and s2 are simultaneously correct with probability 1/4 by induction, the success probability is 2−t · 1/4. Amplifying this probability by 4 · 2k repetitions results in an overall success probability of at least 1/2, because 1 . 2 In the second subcase there is no balanced optimal solution, but a minimum t-dominating set Opt with certain properties: · Let X = N [Opt] and choose any partition O1 ∪ O2 = Opt. Since Opt is not · balanced, there is no partition X1 ∪ X2 = X with t
1 − (1 − 2−t · 2−2 )4·2 ≥ 1 − e−1 >
⌊t/2⌋ − ⌊αt⌋ ≤ |X1 | ≤ ⌈t/2⌉ + ⌊αt⌋ and X1 ⊆ N [O1 ], X2 ⊆ N [O2 ]. ′ Let d : Opt → P(N [Opt]) be a mapping such that d(v) ∪ d(v ) = ∅ for any two ′ different nodes v, v ∈ Opt, v ∈ d(v), v∈Opt |d(v)| = t, and d(v) ⊆ N [v]. Then · every partition O1 ∪ O2 = Opt defines an induced coloring XO1 , XO2 of N [Opt] by XO1 = v∈O1 d(v). Let v1 , . . . , vβ be the β nodes in Opt with highest |d(vi )| and set O′ = {v1 , . . . , vβ }. Since Opt is not α-balanced, partitioning Opt into O′ , Opt \ O′ and using the induced coloring XO′ , XOpt\O′ yields |XO′ | < ⌊t/2⌋ − ⌊αt⌋ or |XO′ | > ⌈t/2⌉ + ⌊αt⌋. If |XO′ | < ⌊t/2⌋ − ⌊αt⌋, then we have |d(v)| ≤
t/2 ⌊t/2⌋ − ⌊αt⌋ ≤ β β
for any v ∈ (Opt \ O′ ). In this case, however, it is always possible to move nodes from Opt \ O′ to O′ in order to obtain an α-balanced t-dominating set using the induced coloring because t/2 β ≤ α · t. Therefore we have |XO′ | > ⌈t/2⌉ + ⌊αt⌋. If Algorithm TDS finds exactly the induced coloring, it will correctly compute the domination of O′ and the correct result for Opt \ O′ with probability 1/2. Consequently, the success probability is at least t 1 − (1 − 2−t · 1/2)4·2 ≥ 1 − e−2 . Lemma 2. Let 0 < α ≤ 1/25. The number Tt of recursive calls issued by Algorithm TDS is bounded by 4(1+α)t · t6 . Proof. Consider the pseudo code from Table 1. We obtain the recurrence ⌊αt⌋
Tt ≤ 4 · 2t T⌊t/2⌋ + T⌈t/2⌉+t′ + T⌊t/2⌋−t′ t′ =0
⌊αt⌋
≤ 8 · 2t
t′ =−⌊αt⌋
T⌈t/2⌉+t′ .
374
J. Kneis, D. M¨ olle, and P. Rossmanith
Now employ induction to prove the bound from the statement of the lemma. Applying the induction hypothesis yields an upper bound of ⌊αt⌋
8·2 The fact that
l
i=0
4i ≤
t
t′ =−⌊αt⌋ 4 3
′ 4⌈t/2⌉+t · (t/2 + αt)6 .
· 4l implies
Tt ≤ 32/3 · 2t 4⌈t/2⌉+⌊αt⌋ · (t/2 + αt)6 .
It is now easy to prove the claim using standard calculus: Tt ≤ 32/3 · 4t/2 4⌈t/2⌉+⌊αt⌋ · (t/2 + αt)6 ≤ 32/3 · 4t+1+αt · (t/2 + αt)6 = 128/3 · (1/2 + α)6 · t6 4(1+α)t t ≤ t6 41+α
Theorem 2. Let 0 < α ≤ 1/25. t-Dominating Set can be solved with exponentially small error probability in time 1
O((4 + 6α)t · t6 · n⌈ 2α ⌉+1 ). Proof. By Lemma 1, Algorithm TDS returns the size of a minimum t-dominating set with probability at least 21 in time O(4(1+α)t t6 nβ ). Hence a linear number of repetitions suffices to obtain exponentially small error probability. To see that 41+α ≤ (4 + 6α) for 0 < α ≤ 1/25, note that the Taylor series of the former term is 4 + 4 ln(4)α + O(α2 ). Therefore, the number of recursive calls is bounded by (4 + 6α)t t6 . Each call takes time O(nβ ), resulting in the above runtime bound. In order to get rid of the polynomial in n, we can simply apply the kernelization from the previous section. This can be helpful, because the choice of very small values for α results in a high-degree polynomial factor.
4
Derandomization
In order to see how the above algorithm can be derandomized, let us review its usage of random bits. Randomly coloring an n-node graph in two colors, of course, requires n random bits. The coloring is helpful as soon as t nodes in the closed neighborhood X = N [D] of some minimum t-dominating set D are assigned appropriate colors. This happens with probability at least 2−|X| . In order to make failure impossible, we have to cycle through a family of colorings deterministically. Doing so will succeed when we make sure that every possible coloring of X is hit at least once. Since we do not know X, we need to hit every coloring for any set of size |X| = t at least once. Fortunately, this
Partial vs. Complete Domination: t-Dominating Set
375
problem has already been addressed by Alon et al., who investigated the concept of (almost) k-wise independence [1]. A set F ⊆ {0, 1}n is called k-wise independent, if for x1 . . . xn chosen uniformly from F and any positions i1 < · · · < ik , the probability that xi1 . . . xik equals any k-bit string y, is exactly 2−k . It is called (ε, k)-wise independent, if this probability is at least 2−k − ε and at most 2−k + ε. Therefore, a (2−t−1 , t)wise independent set F guarantees that any coloring of any size t-subset appears with probability at least 2−t−1 . Thus, derandomization can be achieved by cycling through all elements of F . Moreover, we can employ a theorem by Alon et al. that enables us to construct such a (2−t−1 , t)-independent set F of size O(4t t2 log2 n) in O(4t poly(n)) time. Rather than cycling through O(2t ) random colorings as seen in Algorithm TDS, it then suffices to go through the O(4t t2 logn ) members of F . Converting Algorithm TDS into a deterministic one thus increases the runtime bound to O((16 + ε)t poly(n)), where ε can be made arbitrarily small at the expense of larger and larger polynomial factors. Proposition 1. t-Dominating Set with parameter t is in FPT.
5
Conclusion
We obtained an O(t5 /k) problem kernel and a randomized O((4 + ε)t poly(n)) algorithm for t-Dominating Set. The algorithm can be derandomized to get a deterministic O((16 + ε)t )poly(n)) algorithm. Comparing the complexity of t-Dominating Set and t-Vertex Cover reveals some curious characteristics. While Dominating Set is much harder than Vertex Cover in terms of parameterized complexity, the respective partialsatisfaction variants are both fixed-parameter tractable. In other words, switching to the parameter t yields a positive result that is surprising considering the W[2]-hardness of Dominating Set.
References 1. Alon, N., Goldreich, O., H˚ astad, J., and Peralta, R.: Simple Constructions of Almost k-Wise Independent Random Variables. Journal of Random structures and Algorithms 3 3 (1992) 289–304 2. Alon, N., Yuster, R., and Zwick, U.: Color-Coding. Journal of the ACM 42 4 (1995) 844–856 3. Cai, L., Chan, S.M., and Chan, S.O.: Random Separation: A New Method for Solving Fixed-Cardinality Optimization Problems. In Proc. of 2nd IWPEC, Springer, LNCS4169 (2006) 4. Chen, J., Kanj, I.A., and Xia, G.: Simplicity is Beauty: Improved Upper Bounds for Vertex Cover. Technical Report TR05-008, School of CTI, DePaul University (2005) 5. Chen, J., Lu, S., Sze, S., and Zhang, F.: Improved Algorithms for Path, Matching, and Packing Problems. In Proc. of 07 SODA, to appear
376
J. Kneis, D. M¨ olle, and P. Rossmanith
6. Downey, R.G. and Fellows, M.R.: Parameterized Complexity. Springer-Verlag (1999) 7. Gandhi, R., Khuller, S., and Srinivasan, A.: Approximation Algorithms for Partial Covering Problems. In Proc. of 28th ICALP, Springer, LNCS2076 (2001) 225–236 8. Guo, J., Niedermeier, R., and Wernicke, S.: Parameterized Complexity of Generalized Vertex Cover Problems. In Proc. of 9th WADS, Waterloo, Canada, Springer, LNCS3608 (2005) 36–48 9. Halperin, E. and Srinivasan, R. Improved Approximation Algorithms for the Partial Vertex Cover Problem. In Proc. of 5th APPROX, Springer, LNCS2462 (2002) 185–199 10. Kneis, J., M¨ olle, D., Richter, S., and Rossmanith, P.: Algorithms Based on the Treewidth of Sparse Graphs. In Proc. of 31st WG, Springer, LNCS3787 (2005) 385–396 11. Kneis, J., M¨ olle, D., Richter, S., and Rossmanith, P.: Divide-and-Color. In Proc. of 32nd WG, Springer, LNCS4271 (2006) 12. Kneis, J., M¨ olle, D., Richter, S., and Rossmanith, P.: Intuitive Algorithms and t-Vertex Cover. In Proc. of 17th ISAAC, LNCS, Springer(2006) to appear 13. Sch¨ oning, U.: A Probabilistic Algorithm for k-SAT and Constraint Satisfaction Problems. In Proc. of 40th FOCS (1999) 410–414
Estimates of Data Complexity in Neural-Network Learning Vˇera K˚ urkov´ a Institute of Computer Science, Academy of Sciences of the Czech Republic Pod Vod´ arenskou vˇeˇz´ı 2, Prague 8, Czech Republic
[email protected]
Abstract. Complexity of data with respect to a particular class of neural networks is studied. Data complexity is measured by the magnitude of a certain norm of either the regression function induced by a probability measure describing the data or a function interpolating a sample of input/output pairs of training data chosen with respect to this probability. The norm is tailored to a type of computational units in the network class. It is shown that for data for which this norm is “small”, convergence of infima of error functionals over networks with increasing number of hidden units to the global minima is relatively fast. Thus for such data, networks with a reasonable model complexity can achieve good performance during learning. For perceptron networks, the relationship between data complexity, data dimensionality and smoothness is investigated.
1
Introduction
The goal of a supervised learning is to adjust parameters of a neural network so that it approximates with a sufficient accuracy a functional relationship between inputs and outputs known only by a sample of empirical data (inputoutput pairs). Many learning algorithms (such as the back-propagation [21], [6]) iteratively decrease the average square of errors on a training set. Theoretically, such learning is modeled as minimization of error functionals defined by data: the expected error is determined by data in the form of a probability measure and the empirical error by a discrete sample of data chosen with respect to this measure (see, e.g., [20], [5]). In most learning algorithms, either the number of network computational units is chosen in advance or it is dynamically allocated, but in both cases, it is constrained. The speed of decrease of infima of error functionals over networks with increasing number of computational units can play a role of a measure of complexity of data with respect to a given type of computational units (such as perceptrons with a given activation function or radial or kernel units with a given kernel). In this paper, we investigate data complexity with respect to a class of networks for data defining the error functionals: a probability measure ρ and a sample of input-output pairs z = {(ui , vi ) | i = 1, . . . , m}. We derive an upper bound on Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 377–387, 2007. c Springer-Verlag Berlin Heidelberg 2007
378
V. K˚ urkov´ a
the speed of decrease of infima of error functionals over networks with n hidden units depending on a certain norm tailored to the type of hidden units of either the regression function defined by the probability measure describing the data or its discrete approximation in the form of a function interpolating a sample of input-output pairs of training data. We show that the speed of decrease od these infima is bounded from above by n1 times the square of this norm. Thus over a network with the number of hidden units n greater than 1ε times the square of this norm, infima of error functionals are within ε from their global minima. We propose to characterize data complexity by the magnitudes of this norm of the regression or an interpolating function. For perceptron networks, we investigate the relationship between data complexity, smoothness and dimensionality. We estimate the norm tailored to perceptrons by the product of a function k(d) of the dimension of the data d (which is decreasing exponentially fast to zero) and a Sobolev seminorm of the regression or an interpolating function defined as the maximum of the L1 -norms of the partial derivatives of the order d. This estimate shows that for perceptron networks with increasing dimensionality of inputs, the tolerance on smoothness of the training data (measured by the Sobolev seminorm of the regression or an interpolatig function), which allow learning by networks of a reasonable size, is increasing exponentially fast. The paper is organized as follows. In section 2, learning is described as minimization of error functionals expressed in terms of distance functionals. In section 3, tools from approximation theory are applied to obtain upper bounds on rates of decrease of infima of the error functionals over networks with increasing model complexity and by inspection of these bounds, a measure of data complexity is proposed. In section 4, the proposed concept of data complexity is illustrated by the example of the class of perceptron networks, for which the relationship between data complexity, data dimensionality and smoothness of the regression or an interpolating function is analyzed.
2
Learning as Minimization of Error Functionals
Let ρ be a non degenerate (no nonempty open set has measure zero) probability measure defined on Z = X × Y , where X is a compact subset of Rd and Y a bounded subset of R (R denotes the set of real numbers). The measure ρ induces the marginal probability measure on X defined for every S ⊆ X as ρX (S) = −1 (S)), where πX : X ×Y → X denotes the projection. Let (L2ρX (X), .L2ρ ) ρ(πX X denote the Lebesque space of functions satisfying X f 2 dρX < ∞. The expected error functional Eρ determined by ρ is defined for every f in L2ρX (X) as Eρ (f ) =
(f (x) − y)2 dρ Z
and the empirical error functional Ez determined by a sample of data z = {(ui , vi ) ∈ X × Y | i = 1, . . . , m} is defined as
Estimates of Data Complexity in Neural-Network Learning
379
m
1 (f (ui ) − vi )2 . m i=1
Ez (f ) =
It is easy to see and well-known [5] that the expected error Eρ achieves its minimum over the whole space L2ρX (X) at the regression function fρ defined for all x ∈ X as y dρ(y|x),
fρ (x) =
Y
where ρ(y|x) is the conditional (w.r.t. x) probability measure on Y . Thsus min
f ∈L2ρ (X)
Eρ (f ) = Eρ (fρ ).
X
Moreover, Eρ (f ) =
X
(f (x) − fρ (x))2 dρX + Eρ (fρ ) = f − fρ 2L2ρ + Eρ (fρ ) X
[5, p.5]. So Eρ can be expressed as the square of the L2ρX -distance from fρ plus a constant Eρ (f ) = f − fρ 2L2ρ + Eρ (fρ ). (1) X
The empirical error Ez achieves its minimum over the whole space L2ρX (X) at any function that interpolates the sample z, i.e., at any function h ∈ L2ρX (X) such that h|Xu = hz , where Xu = {u1 , . . . , um } and hz : Xu → Y is defined as hz (ui ) = vi .
(2)
For all such functions h, min
f ∈L2ρ (X)
Ez (f ) = Ez (h).
X
Also the empirical error can be expressed in terms of a distance functional. For any X ⊂ Rd containing Xu and f : X → R, let fu = f|Xu : Xu → R m denote f restricted mto X2 u and .2,m denote the weighted ℓ2 -norm on R defined 1 by x22,m = m x . Then i=1 i
1 1 (f (ui ) − vi )2 = (fu (ui ) − hz (ui ))2 = fu − hz 22,m = Ez (fu ). m i=1 m i=1 m
Ez (f ) =
m
2 So the empirical error Ez can be expressed as the square of the lm -distance from hz
Ez (f ) = fu − hz 22,m .
(3)
380
3
V. K˚ urkov´ a
Characterization of Data Complexity with Respect to a Class of Networks
To model neural-network learning, one has to consider minimization of error functionals over subsets of L2ρX (X) formed by functions computable by various classes of networks. Often, neither the regression function fρ nor any function interpolating the sample z is computable by a network of a given type. Even if some of these functions can be represented as an input-output function of a network from the class, the network might have too many hidden units to be implementable. In most learning algorithms, either the number of hidden units is chosen in advance or it is dynamically allocated, but in both cases, it is constrained. We investigate complexity of the data ρ and z defining the error functionals with respect to a given class of networks in terms of model complexity of networks sufficient for learning from these data. The most common class of networks with n hidden units and one linear output unit can compute functions of the form n wi gi | wi ∈ R, gi ∈ G , spann G = i=1
where G is the set of functions that can be computed by computational units of a given type (such as perceptrons or radial-basis functions). The number n of hidden units plays the role of a measure of model complexity of the network. Its size is critical for a feasibility of an implementation. For all common types of computational units, the union ∪∞ n=1 spann G of the nested family of sets of functions computable by nets with n hidden units is dense in L2ρX (X) (see, e.g., [18], [13] and the references therein). Both the expected and the empirical error functionals are continuous on L2ρX (X) (their representations (1) and (3) show that they can be expressed as squares of the L2ρX -norm or weighted ℓ2 -norm resp., plus a constant). It is easy to see that a minimum of a continuous functional over the whole space is equal to its infimum over any dense subset. Thus inf
f ∈∪∞ n=1 spann G
Eρ (f ) = Eρ (fρ )
and
inf
f ∈∪∞ n=1 spann G
Ez (f ) = 0.
Note that for G linearly independent, sets spann G are not convex and thus results from theory of convex optimization cannot be applied. Thus we have to consider merely inf f ∈spann G Eρ (f ) because for a general set G, minima over sets spann G might not be achieved. The speed of convergence with the number of hidden units n increasing of the infima of error functionals over sets spann G to the global minima over the whole space L2ρX (X) is critical for learning capability of the class of networks with hidden units computing functions from G (for example, perceptrons with a certain activation function). Inspection of estimates of this speed can suggest some characterization of complexity of data guaranteeing a possibility of learning from such data by networks with a reasonable number of hidden units computing
Estimates of Data Complexity in Neural-Network Learning
381
functions from the class G. We shall show that one such characterization of complexity of data with respect to a class of networks is the magnitude of a norm tailored to the type of hidden units of either the regression function fρ or any function h interpolating the sample z, i.e., a function satisfying h(ui ) = vi for all i, . . . , m. If the magnitude of such norm is “small”, infima of error functionals over spann G converge quickly. The norm, called G-variation, can be defined for any bounded nonempty subset G of a normed linear space (X, .) (here, we consider the Hilbert space L2ρX (X) and some parameterized sets G corresponding to sets of functions computable by neural networks). G-variation is defined as the Minkowski functional of the closed convex symmetric hull of G, i.e., f G = inf c > 0 : c−1 f ∈ cl conv (G ∪ −G) , (4)
where the closure cl is taken with respect to the topology generated by the norm . and conv denotes the convex hull. Note that G-variation can be infinite (when the set on the right-hand side is empty). It was defined in [12] as an extension of the variation with respect to half-spaces introduced for Heaviside perceptron networks in [2] (for the properties of variation see [14]). The following theorem estimates speed of convergence of the infima of the expected and the empirical error functionals over sets spann G formed by functions computable by networks with n hidden units computing functions from G. Theorem 1. Let d, m, n be positive integers, both X ⊂ Rd and Y ⊂ R be compact, z = {(ui , vi ) ∈ X × Y | i = 1, . . . , m} with all ui distinct, ρ be a non degenerate probability measure on X × Y , and G be a bounded subset of L2ρX (X) with sG = supg∈G gL2ρ . Then X
inf
f ∈spann G
Eρ (f ) − Eρ (fρ ) ≤
s2G fρ 2G n
and for every h ∈ L2ρX (X) interpolating the sample z, inf
f ∈spann G
Ez (f ) ≤
s2G h2G . n
Proof. By the representation (1), for every f ∈ L2ρX (X), Eρ (f ) − Eρ (fρ ) = fρ − f 2L2 and so inf f ∈spann G Ez (f ) − Eρ (fρ ) = fρ − spann G2L2 . Thus it ρX ρX remains to estimate the distance of fρ from spann G. By an estimate of rates of approximation by spann G in a Hilbert space derived by Maurey [19], Jones [8] and Barron [2,3], and reformulated in terms of G-variation in [14], this distance s2 f 2 s f is bounded from above by G √nρ G . Hence inf f ∈spann G Eρ (f ) − Eρ (fρ ) ≤ G nρ G . Let G|Xu denote the set of functions from G restricted to Xu = {u1 , . . . , um }. By the representation (3), for every f ∈ L2ρX (X), Ez (f ) = fu − hz 2L2 and so ρX
inf f ∈spann G Ez (f ) = hz − spann G|Xu 22,m . By Maurey-Jones-Barron’s estimate,
382
V. K˚ urkov´ a sG
s2G
hz G
hz 2G
hz −spann G|Xu 2,m ≤ |Xu √n |Xu . Hence inf f ∈spann G Ez (f ) ≤ |Xu n |Xu . It follows directly from the definitions that if f|Xu = fu , then fu G|Xu ≤ f G . Thus for every h interpolating the sample z, inf f ∈spann G Ez (f ) ≤
s2G h2G . n
So the infima of error functionals achievable over networks with n hidden units computing functions from a set G decrease at least as fast as n1 times the square of the G-variational norm of the regression function or some interpolating function. When these norms are small, good approximations of the two global minima, minf ∈L2ρ (X) Eρ (f ) = Eρ (fρ ) and minf ∈L2ρ (X) Ez (f ) = 0, can be obtained using X X networks with a moderate number of units. Thus the magnitudes of the Gvariational norms of the regression function or some function interpolating the sample z of input-output pairs can be used as measures of complexity of data given by the probability measure ρ or a finite sample z chosen from X × Y with respect to ρ. When these magnitudes are “small”, data have a reasonable complexity for learning by networks with hidden units computing functions from the set G.
4
Smoothness and Data Complexity with Respect to Perceptron Networks
To get some insight into complexity of data with respect to various types of networks, one has to estimate corresponding variational norms. One method of such estimation takes an advantage of integral representations of functions in the form of “networks with continua of hidden units”. Typically, sets G describing computational units are of the form G = {φ(., a) | a ∈ A}, where φ : X × A → R. For example, perceptrons compute functions from the set Pd (ψ, X) = {f : X → R | f (x) = ψ(vi · x + bi ), vi ∈ Rd , bi ∈ R}, where ψ : R → R is an activation function (typically, a sigmoidal, i.e., a monotonic nondecreasing function σ : R → R satisfying limt→−∞ σ(t) = 0 and limt→∞ σ(t) = 1). An important type of a sigmoidal is the Heaviside function ϑ(t) = 0 for t < 0 and ϑ(t) = 1 for t ≥ 0. So, Pd (ψ, X) = {φ(x, (v1 , . . . , vd , b)) | v ∈ Rd , b ∈ R}, where φ(x, (v1 , . . . , vd , b)) = ψ(v · x + b). If for X and A compact, a continuous function f : X → R can be represented as a “neural network” with a continuum of hidden units computing functions φ(., a) and with output weights w(a), i.e., w(a)φ(x, a)da f (x) = A
Estimates of Data Complexity in Neural-Network Learning
383
and the weighing function w is in L1λ (X), where λ denotes the Lebesgue measure, then f G ≤ wL1λ (5) [15, Theorem 3.1] (see also [7] and [11] for extensions of this result). So G-variational norm can be estimated using the L1λ -norm of the weighting function. For standard computational units, many functions can be represented as such “infinite” networks and moreover the L1λ -norms of weighting functions can be estimated in terms of some norms expressing certain kinds of smoothness, the upper bound (5) gives a method for estimating the data complexity proposed in the previous section. For all sigmoidals σ, Pd (σ, X)-variation in L2ρX (X) is equal to Pd (ϑ, X)variation [15]. Thus to investigate complexity with respect to sigmoidal perceptron networks, it is sufficient to estimate variation with respect to Heaviside perceptrons called variation with respect to half-spaces (perceptons with the Heaviside activation compute characteristic functions of half-spaces of Rd intersected with X). To simplify notation, we write Hd (X) instead of Pd (ϑ, X). So .Hd = .Pd (σ) for all sigmoidals. An integral representation as a network with Heaviside perceptrons holds for functions from a wide class (including functions on Rd , which are compactly supported or merely “rapidly decreasing at infinity” and have continuous partial derivatives of all orders) [15], [10]. For d odd, the representation is of the form wf (e, b)ϑ(e · x + b)dedb, (6) f (x) = S d−1 ×R
where S d−1 denotes the unit sphere in Rd and the weighing function wf (e, b) is a product of a function a(d) of the number of variables d converging with d increasing exponentially fast to zero and a “flow of the order d through the hyperplane” He,b = {x ∈ Rd | x · e + b = 0}. More precisely, (De(d) (f ))(y)dy, wf (e, b) = a(d) He,b
where a(d) = (−1)(d−1)/2 (1/2)(2π)1−d (d)
and De denotes the directional derivative of the order d in the direction e. The integral representation (6) was derived in [15] for compactly supported functions from C d (Rd ) and extended in [11] to functions of a weakly controlled decay, which satisfy for all α with 0 ≤ |α| < d, limx→∞ (Dα f )(x) = 0 and there exists ε > 0 such that for each multi-index α with |α| = d, limx→∞ (Dα f )(x) xd+1+ε = 0. The class of functions with weakly controlled decay contains all d-times continuously differentiable functions with compact support as well as all
384
V. K˚ urkov´ a
functions from the Schwartz class S(Rd ) [1, p.251]). In particular, it contains the Gaussian function γd (x) = exp(−x2 ). In [10], the L1λ -norm of the weighting function wf was estimated by a product of a function k(d), which is decreasing exponentially fast with the number of variables d, with the Sobolev seminorm of the represented function f : wf L1λ ≤ k(d)f d,1,∞ . The seminorm .d,1,∞ is defined as f d,1,∞ = max Dα f L1λ (Rd ) , |α|=d
where α = (α1 , . . . , αd ) is a multi-index with nonnegative integer components, Dα = (∂/∂x1 )α1 . . . (∂/∂xd )αd and |α| = α1 + · · · + αd . Thus by (5) f Hd ≤ k(d)f d,1,∞ = k(d) max Dα f L1λ (Rd ) |α|=d
where k(d) ∼
4π d
1/2
e d/2 < 2π
4π d
(7)
1/2 d/2 1 . 2
Note that for large d, the seminorm f 1,d,∞ is much smaller than the stan dard Sobolev norm f d,1 = |α|≤d Dα f L1λ (Rd ) [1] as instead of the summation of 2d iterated partial derivatives of f over all α with |α| ≤ d, merely their maximum over α with |α| = d is taken. The following theorem estimates speed of decrease of minima of error functionals over networks with increasing number n of Heaviside perceptrons. Theorem 2. Let d, m, n be positive integers, d odd, both X ⊂ Rd and Y ⊂ R be compact, z = {(ui , vi ) ∈ X × Y | i = 1, . . . , m} with all ui distinct, ρ be a non degenerate probability measure on X ×Y , such that the regression function fρ : X → R is a restriction of a function hρ : Rd → R of a weakly controlled decay and let h : Rd → R be a function of a weakly controlled decay interpolating the sample z. Then min
f ∈spann Hd (X)
and where c(d) ∼
4π d
min
f ∈spann Hd (X)
e d 2π
<
Ez (f ) ≤
c(d)h2d,1,∞ n
Eρ (f ) − Eρ (fρ ) ≤
c(d)hρ 2d,1,∞ , n
4π d2d .
Proof. It was shown in [9] that sets spann Hd (X) are approximatively compact in L2ρX (X) and so each function in L2ρX (X) has its best approximation in sets spann Hd . Thus by (1) and (3), both the functionals Eρ and Ez achieve over
Estimates of Data Complexity in Neural-Network Learning
385
spann Hd their minima. It follows from [10] (Theorems 3.3, 4.2 and Corollary 3.4) that for all d odd and all h of a weakly controlled decay hHd (X) ≤ k(d)hd,1,∞ , where k(d) ∼
4π 1/2 d
e d/2 . 2π
The statement follows by Theorem 1.
Thus for any sample of data z, which can be interpolated by a function h ∈ C d (Rd ) vanishing sufficiently quickly at infinity such that the squares of the maxima of the L1λ -norms of partial derivatives of the order |α| = d do not d d exceed an exponentially increasing upper bound 4π 2 , more precisely
d d 2π d d 1 2 α 2 ∼ 2 , hd,1,∞ = max D f L1 (Rd ) ≤ < λ c(d) 4π e 4π |α|=d the minima of the empirical error Ez over networks with n sigmoidal perceptrons decrease to zero rather quickly – at least as fast as n1 . For example when for d > 4π, all the L1λ -norms of the partial derivatives of the order d are smaller than 2d , convergence faster than n1 is guaranteed. Our estimates of data complexity can be illustrated by the example of the Gaussian function γd (x) = exp(−x2 ). It was shown in [10] that for d odd, γd Hd ≤ 2d (see also [4] for a weaker estimate depending on the size of X, which is valid also for d even). Thus by Theorem 1, when the regression function fρ = γd and the sample z of the size m is such that the function hz defined as hz (ui ) = vi is the restriction of the Gaussian function γd to Xu = {u1 , . . . , um }, then 4d2 4d2 and min . (8) min Ez (f ) ≤ Eρ (f ) ≤ n f ∈spann Hd (X) n f ∈spann Hd (X) This estimate gives some insight into the relationship between two geometrically opposite types of computational units - Gaussian radial-basis functions (RBFs) and Heaviside perceptrons. Perceptrons compute plane waves (functions of the form ψ(v · x + b), which are constant on the hyperplanes parallel with the hyperplane {x ∈ Rd | v · x + b = 0}), while Gaussian RBFs compute radial waves (functions of the from exp(−(bx − v)2 ), which are constant on spheres centered at v). By (8) minima of the error functionals defined by the d-dimensional Gaussian probability measure over networks with n Heaviside per2 2 ceptrons converge to zero faster than 4dn . Note that the upper bound 4dn grows with the dimension d only quadratically and it does not depend on the size m of a sample. On the other hand, there exist samples z = {(ui , vi ) | i = 1, . . . , m}, the sizes of which influence the magnitudes of the variations of the functions hz defined as hz (ui ) = vi . For example, for any positive integer k, consider X = [0, 2k], Y = [−1, 1] and the sample z = {(2i, 1), (2i + 1, −1) | i = 0, . . . , k − 1} of the size m = 2k. Then one can easily verify that hz Hd (X) = 2k (for functions of one variable, variation with respect to half-spaces is up to a constant equal to their total variation, see [2], [15]). This example indicates that the more the data “oscillate”, the larger the variation of functions, which interpolate them.
386
5
V. K˚ urkov´ a
Discussion
We proposed a measure of data complexity with respect to a class of neural networks based on inspection of an estimate of speed of convergence of the error functionals defined by the data. For data with a “small” complexity expressed in terms of a magnitude of a certain norm (which is tailored to the network type) of the regression or an interpolating function defined by the data, networks with a reasonable model complexity can achieve good performance during learning. Our analysis of data complexity in neural-network learning merely considers minimization of error functionals. The next step should be to extend the study to the case of regularized expected errors as in the case of kernel models in [16], [17]. Various stabilizers could be considered, among which variation with respect to half-spaces seems to be the most promising. In one dimensional case, variation with respect to half-spaces is up to a constant equal to total variation [2], [15], which is used as a stabilizer in image processing. Moreover, our estimates show its importance in characterization of data complexity in learning by perceptron networks. Acknowledgement. This work was partially supported by the project 1ET100300419 “Intelligent Models, Algorithms, Methods, and Tools for Semantic Web Realization” of the National Research Program of the Czech Republic and the Institutional Research Plan AV0Z10300504.
References 1. Adams, R.A., Fournier, J.J.F.: Sobolev Spaces. Academic Press, Amsterdam, (2003) 2. Barron, A.R.: Neural Net Approximation. Proc. 7th Yale Workshop on Adaptive and Learning Systems, K. Narendra (ed.), Yale University Press (1992) 69–72 3. Barron, A.R.: Universal Approximation Bounds for Superpositions of a Sigmoidal Function. IEEE Transactions on Information Theory 39 (1993) 930–945 4. Cheang, G.H.L. and Barron, A.R.: A Better Approximation for Balls. Journal of Approximation Theory 104 (2000) 183–200 5. Cucker, F. and Smale, S.: On the Mathematical Foundations of Learning. Bulletin of AMS 39 (2002) 1–49 6. Fine, T. L.: Feedforward Neural Networks Methodology. Springer, New York (1999) 7. Girosi, F. and Anzellotti, G.: Rates of Convergence for Radial Basis Functions and Neural Networks. Artificial Neural Networks for Speech and Vision, R. J. Mammone (ed.), Chapman & Hall, London (1993) 97–113 8. Jones, L.K.: A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training. Annals of Statistics 20 (1992) 608–613 9. Kainen, P.C., K˚ urkov´ a, V., and Vogt, A.: Best Approximation by Linear Combinations of Characteristic Functions of Half-Spaces. Journal of Approximation Theory 122 (2003) 151–159 10. Kainen, P.C., K˚ urkov´ a, V., and Vogt, A.: A Sobolev-Type Upper Bound for Rates of Approximation by Linear Combinations of Plane Waves. Submitted, Research Report ICS–900, www.cs.cas.cz/research/publications.shtml
Estimates of Data Complexity in Neural-Network Learning
387
11. Kainen, P.C., K˚ urkov´ a, V., and Vogt, A.: Integral Combinations of Heavisides. Submitted, Research Report ICS–966, www.cs.cas.cz/research/publications.shtml 12. K˚ urkov´ a, V.: Dimension–Independent Rates of Approximation by Neural Networks. Computer–Intensive Methods in Control and Signal Processing: Curse of Dimensionality, K. Warwick and M. K´ arn´ y (eds), Birkh¨ auser, Boston (1997) 261– 270 13. K˚ urkov´ a, V.: Neural Networks as Universal Approximators. The Handbook of Brain Theory and Neural Networks II, M. Arbib (ed.), MIT Press, Cambridge (2002) 1180–1183 14. K˚ urkov´ a, V.: High-Dimensional Approximation and Optimization by Neural Networks. Advances in Learning Theory: Methods, Models and Applications(Chapter 4), J. Suykens et al. (eds), IOS Press, Amsterdam (2003) 69–88 15. K˚ urkov´ a, V., Kainen, P.C., and Kreinovich, V.: Estimates of the Number of Hidden Units and Variation with Respect to Half-Spaces. Neural Networks 10 (1997) 1061– 1068 16. K˚ urkov´ a, V. and Sanguineti, M.: Error Estimates for Approximate Optimization by the Extended Ritz Method. SIAM Journal on Optimization 15 (2005) 461–487 17. K˚ urkov´ a, V. and Sanguineti, M.: Learning with Generalization Capability by Kernel Methods of Bounded Complexity. Journal of Complexity 21 (2005) 350–367 18. Pinkus, A.: Approximation Theory of the MPL Model in Neural Networks. Acta Numerica 8 (1998) 277–283 19. Pisier, G.: Remarques sur un r´esultat non publi´e de B. Maurey. S´eminaire ´ d’Analyse Fonctionnelle 1980-81, Expos´e no. V, Ecole Polytechnique, Centre de Math´ematiques, Palaiseau, France (1980) V.1-V.12 20. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer-Verlag, New York (1995) 21. Werbos, P.J.: Backpropagation: Basics and New Developments. The Handbook of Brain Theory and Neural Networks. Arbib M. (ed.), MIT Press, Cambridge (1985) pp. 134–139
Concurrent and Located Synchronizations in π-Calculus⋆ Ivan Lanese Computer Science Department, University of Bologna, Italy
[email protected]
Abstract. We present two novel semantics for π-calculus. The first allows one to observe on which channel a synchronization is performed, while the second allows concurrent actions, provided that they do not compete for resources. We present both a reduction and a labeled semantics, and show that they induce the same behavioral equivalence. As our main result we show that bisimilarity is a congruence for the concurrent semantics. This important property fails for the standard semantics.
1
Introduction
Recent years have seen a strong effort in the field of process calculi, trying to find the best suited primitives and tools for describing different properties of concurrent interacting systems. One of the most successful among these calculi is the π-calculus [8], which allows one to model mobility, an interesting feature of modern systems, in a natural way. Different extensions have been considered to describe, for instance, concurrency aspects and locations [11,9,14,3]. Concurrency is usually obtained via mappings to models which are equipped with concepts of causality and independence, such as graph transformation systems [9], Petri nets [3] or event structures [14]. This allows one to reason about concurrency issues, but this makes harder or even prevents the use of standard process calculi tools based on labeled transition systems (LTSs). We examine which concurrency aspects can be modeled in process calculi using a standard LTS. Clearly, labels of this LTS will be richer than standard labels. In particular, we allow the execution of many actions inside the same transition, and the label will contain all of them. While some actions do not interfere with each other, others may compete for resources. In real concurrent systems, in fact, actions usually require exclusive access to the communication medium. As a very simple example, you cannot telephone if the line is busy: you need to use another line. This is modeled in π-calculus by requiring concurrent actions to be performed on different channels. This can be done easily for inputs and outputs, but not for synchronizations. In fact, in the standard π-calculus semantics, the label of any complete synchronization is τ , and this does not contain any information on the used channel. This information is necessary for our semantics. Thus, to have a gradual presentation, first we analyze the effects of adding the location of ⋆
Research supported by the Project FET-GC II IST 16004 Sensoria.
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 388–399, 2007. c Springer-Verlag Berlin Heidelberg 2007
Concurrent and Located Synchronizations in π-Calculus
389
the synchronization to the label in the standard interleaving scenario, and then we move to the concurrent one. The interleaving case is a necessary step, but it may be useful also by itself. In fact, different channels may not be equivalent, for instance since they may be under different accounting policies. We analyze the properties of the interleaving and the concurrent semantics both at the level of LTS and of the induced behavioral equivalence. In particular, in both the cases we consider a reduction and a labeled semantics and we show that they induce the same bisimilarity relation. We concentrate on the strong semantics, and give some insights on how the results can be extended to the weak case. An important property of the concurrent semantics is compositionality: the induced bisimilarity is a congruence w.r.t. the operators of process composition, while this is not the case for the standard semantics. This property allows one to compute the behavior of large complex systems from the behavior of their components, making analysis techniques scalable. Structure of the paper. In Section 2 we recall the standard (early) semantics of π-calculus. Section 3 introduces locations in the interleaving setting, while Section 4 moves to the concurrent one. Section 5 describes some comparisons with similar approaches, while Section 6 outlines the weak semantics. Finally, Section 7 presents some conclusions and traces for future work. A version of the paper with full proofs is available at [4].
2
Background
In this section we present the syntax and the standard (early) semantics of π-calculus (we consider only the monadic π-calculus, but the extension to the polyadic case is straightforward). See, e.g., [13] for a more detailed presentation. Processes, in π-calculus, communicate by exchanging channel names, using channels themselves as communication medium. Therefore we assume a countable set of channel names ranged over by a, b, x, . . . . Definition 1 (Syntax) P : : = ab.P1 | a(x).P1 | P1 |P2 | P1 + P2 | νa P1 | !P1 | 0 In the above definition ab.P1 is a process that outputs the name b on channel a, while a(x).P1 accepts an input on channel a, and, after receiving b, it behaves as P1 {b/x}. Both ab and a(x) are called prefixes. Also, P1 |P2 is the parallel composition of P1 and P2 , P1 + P2 the process that can behave as P1 or as P2 , νa P1 is like process P1 , but the scope of channel a has been restricted to P1 , !P1 stands for an unbounded number of copies of P1 executing in parallel and 0 is the idle process. We restrict our attention to prefix-guarded processes, i.e., in P1 + P2 both P1 and P2 must be either prefixed processes or sums (or 0). Name x is bound in a(x).P1 and name a is bound in νa P1 . The functions fn(P ), bn(P ) and n(P ) computing the sets of free names, bound names and all the names in process P respectively are defined as usual. We consider processes
390
I. Lanese Table 1. Standard reduction semantics react-S par-S
(a(x).P + M )|(ab.Q + N ) →S P {b/x}|Q P →S P ′ P |Q →S P ′ |Q congr-S
res-S
P →S P ′ νa P →S νa P ′
P1 ≡ P2 →S P2′ ≡ P1′ P1 →S P1′
up to α-conversion of bound names, i.e., we always suppose that all the bound names are different and different from the free names. We will write a instead of a(x) and a instead of ab if x and b are not important, and if π is a prefix we write π for π.0. We first describe the allowed transitions, then the behavioral equivalence. We consider both the reduction semantics, analyzing the behavior of the system in isolation, and the labeled semantics, analyzing its interactions with the environment. In the following sections we will show how these semantics must be changed to handle located and concurrent synchronizations. To simplify the presentation of the reduction semantics we exploit a structural congruence equating processes that we never want to distinguish. Definition 2. The structural congruence ≡ is the least congruence satisfying the monoid laws for parallel composition and summation (with 0 as unit), the replication law P |!P ≡ P and the laws for restriction νa νb P ≡ νb νa P , νa 0 ≡ 0 / fn(P1 ). and νa(P1 |P2 ) ≡ P1 |νa P2 if a ∈ Definition 3 (Reduction semantics). The reduction semantics of π-calculus is the set of unlabeled transitions generated by the rules in Table 1. The subscript S (for standard) is used to distinguish the standard semantics from the ones we will present later. Also, we will use uppercase letters for reduction semantics and lowercase ones for labeled semantics, thus standard labeled semantics is identified by subscript s. Definition 4 (Labeled semantics). The labeled semantics is the LTS defined in Table 2. We have as labels input ab, output ab, bound output a(b) (where b is bound) and internal action τ . We use α as metavariable to denote labels. The subject subj(α) and the object obj(α) of an action α are a and b respectively if α is ab, ab or a(b), while subj(τ ) and obj(τ ) are both undefined. We now define the behavioral equivalence for our processes. The same definition will be applied also to the LTSs that we will define in the following sections. Subscripts will always clarify which underlying LTS is used. Definition 5 (Bisimilarity). Let t be a LTS. A bisimulation is a relation R such that P R Q implies:
Concurrent and Located Synchronizations in π-Calculus Table 2. Standard labeled semantics (rules with
α
P − →s P
sum-s∗
α
par-s∗
α
P −→s P
ab
′
Q −→s Q τ
P |Q − →s P ′ |Q′ α
P − →s P ′
ab
a(x).P −→s P {b/x}
inp-s
′
P +Q− →s P ′ ab
res-s
have also a symmetric counterpart)
ab
ab.P −→s P
out-s
com-s∗
∗
P − →s P ′
α
α
Q −−→s Q′
b∈ / fn(P )
τ
P |Q − →s νb(P ′ |Q′ ) ab
open-s
P −→s P ′
a = b
a(b)
νb P −−→s P ′
α
rep-s
a(b)
ab
P −→s P ′
a∈ / n(α) α
νa P − →s νa P ′
bn(α) ∩ fn(Q) = ∅ P |Q − →s P ′ |Q
′
close-s∗
391
P |!P − →s P ′ α
!P − →s P ′ α
– P − →t P ′ with bn(α) ∩ fn(Q) = ∅ implies Q − →t Q′ ∧ P ′ R Q′ , – vice versa. A full bisimulation is a substitution closed bisimulation. We denote with ≈t (resp. ∼t ) the maximal bisimulation (resp. full bisimulation), called bisimilarity (resp. full bisimilarity).
3
Observing Locations in the Interleaving Setting
In this section we present both a reduction semantics and a labeled semantics for π-calculus, based on the idea that synchronizations performed on different channels must be distinguished. We also show that they produce the same behavioral equivalence. Those semantics will be identified by subscripts L and l respectively. The information about localities must be added to labels, thus there are (simple) labels (denoted by S) also in the reduction semantics. However, when speaking about labeled semantics, we refer to the other style of semantics. Reductions are labeled by sets of channel names containing the free names on which a synchronization is performed. Thus we may have a singleton if the reduction is performed by synchronizing on a free name, and the empty set otherwise. Synchronization on local channels cannot be observed, since the restriction operator completely hides the channel. This also follows the intuition that any effect of the channel usage, included for instance its accounting, must be performed before restricting it. See [2] for an example on introducing accounting on a channel. Definition 6 (Located interleaving reduction semantics). The located interleaving reduction semantics of π-calculus is the LTS generated by the rules in Table 3. This semantics strictly follows the structure of standard reduction semantics (Table 1). Actually, the only difference is the introduction of labels.
392
I. Lanese Table 3. Located interleaving reduction semantics react-L
{a}
(a(x).P + M )|(ab.Q + N ) −−→L P {b/x}|Q S
par-L
S
→L P ′ P −
P − →L P ′
res-L
S
P |Q − →L P ′ |Q
S\{a}
νa P −−−−→L νa P ′ S
congr-L
P1 ≡ P2 − →L P2′ ≡ P1′ S
P1 − →L P1′
We now present the labeled semantics, by extending the one in Table 2. Technically, the main difference is that we have different labels denoting a complete synchronization instead of just τ . More precisely, we denote a synchronization at a free name a with aτ , while τ is used to denote a synchronization on a restricted channel. The located semantics is obtained by substituting rules com-s and close-s (and their symmetric) with: ab
ab
com-l
P −→l P ′
ab
close-l
P −→l P ′
Q −→l Q′ aτ
P |Q −→l P ′ |Q′ a(b)
Q −−→l Q′ aτ
P |Q −→l νb
b∈ / fn(P )
(P ′ |Q′ )
and adding the new rule tau-l: aτ
tau-l
P −→l P ′ τ
νa P − →l νa P ′
We extend the definition of subj(α) and obj(α) by defining subj(aτ ) = a, while obj(aτ ) is undefined. The next lemma characterizes the correspondence between standard labeled semantics and located labeled semantics. α
→l P ′ iff: Lemma 1 (Operational correspondence). P − α
→s P ′ , – either α ∈ {ab, ab, a(b), τ } and P − τ – or α = aτ for some a ∈ fn(P ) and P − →s P ′ . Note that the states of the located and of the standard LTS coincide, but located labels carry more information. Thus located (full) bisimilarity implies the standard one. Corollary 1. P ≈l P ′ ⇒ P ≈s P ′ and P ∼l P ′ ⇒ P ∼s P ′ . The converse of the previous corollary does not hold. Counterexample 1 (Located vs standard (full) bisimilarity) νb (a + b)|(a + b) ≈s a|a but not νb (a + b)|(a + b) ≈l a|a.
Concurrent and Located Synchronizations in π-Calculus
393
The only difference between the two processes is that the left one can also perform a τ action on the hidden channel b. In the standard semantics this is indistinguishable w.r.t. the synchronization on a, while they are different under the located semantics. The same counterexample holds also for full bisimilarity. We now analyze the relationships between the reduction and the labeled semantics. First of all we show that the reduction semantics fully captures all the transitions of the labeled semantics that do not require interactions with the environment. We denote with Sτ the label aτ if S = {a} and τ if S = ∅. Sτ
S
→L P ′ iff P −−→l P ′′ with P ′′ ≡ P ′ . Theorem 1. P − More interestingly, two processes are bisimilar in any context under the reduction semantics iff they are full bisimilar according to the labeled one. Definition 7 (Context). A context C[•] is obtained when a • replaces an occurrence of 0 in a process. We denote as C[P ] the process obtained by replacing • with P in C[•], if it is well-formed. Theorem 2. P ∼l Q iff C[P ] ≈L C[Q] for each context C[•]. This result proves the complete correspondence between the two semantics.
4
Concurrent Synchronizations
We extend the located semantics presented in the previous section to allow the contemporary execution of many actions, provided that they are performed on different channels. This is justified by the observation that in a system with real parallelism, such as distributed systems, different components can interact at the same time, provided that they do not compete for resources. However, the system is not fully synchronous, thus actions can occur also in isolation, or, in other terms, some components may stay idle during a transition. More parallel scenarios, where the communication medium can be shared by different actions, will be subject of future work. We use subscripts C and c to identify the reduction and the labeled semantics respectively. We start by presenting the reduction semantics. Definition 8 (Concurrent located reduction semantics). The concurrent located reduction semantics of π-calculus is the LTS generated by the rules in Table 3 and by the rule: S
comp-C
1 ′ P −→ C P
S
2 ′ Q −→ C Q
S1 ∩ S2 = ∅
S1 ∪S2
P |Q −−−−→C P ′ |Q′
Here labels are used to check that concurrent reductions use different resources. The added rule allows in fact parallel processes to concurrently reduce, by syn{a,c}
chronizing on different channels. For instance, ab|a(x).xc|c|c −−−→C bc. The following theorem shows the relation between the concurrent and the interleaving semantics.
394
I. Lanese S
S
S
S
n 2 1 ′ −→ Theorem 3. P − →C P ′ implies P = P1 −→ L Pn+1 = P with L P2 −→L . . . − i∈{1,...,n} Si = S.
We now consider the labeled semantics. Technically, labels are essentially multisets of located labels. Indeed, they are exactly that when there are no restricted names. Restricted names appear in the label when they are extruded, such as a(b)
b in νb ab −−→l 0. However many outputs may extrude the same name concurrently. Thus the set of extruded names must be attached to the whole label and not to single outputs. Thus we use labels of the form (Y )act where Y is the set of extruded names and act is a multiset of basic actions α of the form ab, ab, aτ or τ . We use μ as metavariable for those labels, and we write α ∈ μ if either α has the form ab, ab (with b ∈ / Y ), aτ or τ and it belongs to act, or if α = a(b), ab ∈ act and b ∈ Y . We use [α1 , α2 , . . . , αn ] to denote a multiset containing the elements α1 , α2 , . . . , αn , and we use the operators ∪, ⊆, \, . . . on multisets with the obvious meaning. We extend the notation to deal with labels, where the operators are applied to both the multiset part and the set of extruded names (but, if a name does not occur in the multiset, then it is removed also from the set of extruded names). We call sequential label any label whose multiset part is a singleton, and sequential transition any transitionwith a sequential label. We define subj(μ) = α∈μ subj(α) and similarly obj(μ) = α∈μ obj(α). Also tau(μ) is the largest submultiset of μ containing only actions τ (non located). A label μ = (Y )act is well-formed if [α1 , α2 ] ⊆ μ implies subj(α1 ) = subj(α2 ) (if both the actions have a subject) and y ∈ Y implies y ∈ obj(μ) and y ∈ / subj(μ). We denote as acta (μ) the unique action α ∈ μ such that subj(α) = a, if it exists. In order to define the semantics we introduce two auxiliary operators to deal with labels: @ and \, corresponding intuitively to label composition and label restriction. The label μ1 @ μ2 is defined only if, whenever x ∈ subj(μ1 ) and x ∈ subj(μ2 ), actx (μ1 ) and actx (μ2 ) are an input and an output (possibly bound) with equal subjects and objects. In that case μ1 @ μ2 = (Y )act with: act = tau(μ1 ) ∪ tau(μ2 )∪ [aτ ] if a ∈ subj(μ1 ) ∩ subj(μ2 ) acta (μi ) if a ∈ subj(μi ) \ subj(μ3−i ), i ∈ {1, 2} a∈subj(μ1 )∪subj(μ2 )
Also, Y = (bn(μ1 ) ∪ bn(μ2 )) ∩ obj(act). Similarly, μ \ a is defined only if all the occurrences of a in μ (if any) are as object of a free output or as subject of aτ . In the last case aτ is replaced by τ . Other actions are preserved. If a ∈ obj(μ), then a is added to Y , otherwise Y is unchanged. We use νA as shortcut for νa1 νa2 . . . νan where A = {a1 , a2 , . . . , an }. Definition 9 (Concurrent located labeled semantics). The concurrent located labeled semantics of π-calculus is the LTS defined in Table 4.
Concurrent and Located Synchronizations in π-Calculus Table 4. Concurrent located labeled semantics (rules with counterpart) out-c
[ab]
P − →c P ′
com-c
μ
μ
P +Q− →c P ′
μ1
par-c∗
μ2
P −→c P ′
[ab]
a(x).P −−→c P {b/x}
inp-c
μ
∗
Q −→c Q′
P − →c P ′
bn(µ) ∩ fn(Q) = ∅ μ
P |Q − →c P ′ |Q
Φ
μ
μ @μ
2 ′ ′ P |Q −−1−−−→ c νZ P |Q
res-c
P − →c P ′
Φ′
μ\a
νa P −−−→c νA P ′
μ
rep-c
have also a symmetric
ab.P −−→c P
∗
sum-c
∗
395
P |!P − →c P ′ μ
!P − →c P ′
where Φ requires bn(µ1 ) ∩ fn(Q) = bn(µ2 ) ∩ fn(P ) = bn(µ1 ) ∩ bn(µ2 ) = ∅ and defines Z = (bn(µ1 ) ∪ bn(µ2 )) \ bn(µ1 @ µ2 ) and Φ′ defines A = {a} if a ∈ / obj(µ) and A = ∅ otherwise.
The following theorem shows that the concurrent LTS includes the interleaving one. Moreover, when moving to the concurrent framework, no sequential transitions are added. α
Theorem 4. P − →l P ′ iff [α]
– α = a(b) and P −−→c P ′ ; (b)[ab]
– α = a(b) and P −−−−→c P ′ . As an obvious consequence the concurrent bisimilarity implies the located (and the standard) one. Corollary 2. P ≈c P ′ ⇒ P ≈l P ′ ⇒ P ≈s P ′ . The following counterexample shows that the concurrent bisimilarity is strictly finer. Counterexample 2 (Concurrent vs located bisimilarity) a|b ≈l a.b + b.a but not a|b ≈c a.b + b.a. The two processes are bisimilar under the located semantics, but not under [a,b]
the concurrent one where a|b − −− →c 0, a transition that cannot be matched by a.b + b.a. This shows that the concurrent semantics highlights the degree of parallelism of a process, distinguishing between concurrency and nondeterminism. This is the same counterexample used to prove that ≈s is not a congruence, since the two terms have different transitions when placed in a context that merges a and b: the first one can perform a τ action while the second one cannot. This counterexample essentially exploits the fact that the expansion law is no longer valid. However some instances of the expansion law hold, for instance when actions are on the same channel: ax|ay ≈c ax.ay + ay.ax. Also, the ability
396
I. Lanese
to perform actions in parallel includes the ability to perform the same actions sequentially, thus c.(a|b) ≈c c.(a|b) + c.a.b. The above counterexample suggests that bisimilarity may be a congruence. This is indeed the case, as proved by the following theorem. Theorem 5. ≈c is a congruence w.r.t. all the operators in the calculus and w.r.t. substitutions. While referring to [4] for the whole proof, we want to highlight here some important points. First of all this theorem suggests that observing concurrency aspects is important to have good compositionality properties. This happens also in some similar cases (see [6]). Interestingly, adding a smaller amount of concurrency is enough to get this property, in fact it is enough to allow the concurrent execution of one input and one output on different channels. This alone would yield, however, a semantics that lacks, in our opinion, a clear intuitive meaning. Building the concurrent semantics on top of the located semantics is fundamental for the congruence result. In fact, consider a concurrent semantics with only normal τ actions. Then the two terms νb (a+b)|(a+b) and a|a are bisimilar, but when they are inserted into context a|• the first one can perform [a, τ ] going to 0 while the second one cannot. Notice that closure under substitutions implies ≈c = ∼c . We now show that a concurrent transition can always be decomposed in a computation including only sequential transitions, thus generalizing Theorem 3. Given a label μ and a sequential label α we define the operation α ; μ only if bn(μ) ∩ n(α) = ∅ and the union of the two action parts is well-formed. In that case α ; μ is computed by making the union both on the multisets of actions and on the sets of extruded names. α;μ
α
μ
→c P ′′ − →c P ′ . Theorem 6. If P −−−→c P ′ then P − The recursive application of the theorem allows one to decompose a concurrent transition in a sequential computation. In fact, any non sequential label can be written as α; μ for suitable α and μ. Results similar to those in Theorem 1 and in Theorem 2 can be proved also for the concurrent scenario. However there is a little mismatch between the labeled and the reduction semantics we have presented, which are the most direct generalizations of the interleaving ones. The labeled semantics distinguishes between [τ ] and [τ, τ ], while in the reduction one they both correspond to ∅. One can either add the missing information to the reduction labels, or remove it from the labeled setting. We analyze here the second case, but the first one is analogous. We use nc as subscript for this semantics (modifying the rules is trivial: actually it is enough to modify the operator of label restriction). We extend the notation Sτ , denoting by it the multiset containing one action aτ for each a ∈ S. S
Sτ
→C P ′ iff P −−→nc P ′′ with P ′′ ≡ P ′ . Theorem 7. P − Theorem 8. P ≈nc Q iff C[P ] ≈C C[Q] for each context C[•] .
Concurrent and Located Synchronizations in π-Calculus
5
397
Related Work
Many semantics for π-calculus have been proposed in the literature, focusing on different aspects. We present a comparison with the ones more related to ours. First of all, we take the inspiration for this work from a concurrent semantics for Fusion Calculus [10] derived in [7] using a mapping from Fusion Calculus into a graph transformation framework called SHR [5]. The intrinsic concurrent nature of SHR and the fact that actions there are naturally located on nodes make the main semantic aspects discussed in this paper emerge spontaneously. The semantics presented in [7] however preserved many of the particularities of SHR synchronization, such as the fact that extrusions are not observed, and processes are always allowed to perform idle transitions to themselves. Because of the first difference, processes νx ux and νx ux + uz were bisimilar, while they are not even bisimilar with the standard semantics of π-calculus. On the other side, idle transitions allowed the observation of the free names of a process, thus νx xu and 0 were not bisimilar. Furthermore the semantics in [7] is derived via a mapping from an LTS which is quite different w.r.t standard process calculi LTSs, while our work presents a similar semantics in a direct and standard way. Some related semantics for π-calculus are described below. Net semantics [3]: this semantics is obtained via a mapping into Petri nets, and is quite related to ours. The main differences are that actions can use the same channel concurrently, thus a.a and a|a are distinguished, but two outputs cannot extrude the same name at the same time, thus νy xy.zy + zy.xy and νy xy|zy are equivalent. Open bisimilarity [12]: open bisimilarity instantiates processes throughout the bisimulation game, but it uses distinctions to keep track of which names can never be merged. Open bisimilarity is less distinguishing than concurrent bisimilarity. The inclusion follows easily from the closure under arbitrary substitutions of concurrent bisimilarity. The inclusion is strict since a|b and a.b + b.a are open bisimilar but not concurrent bisimilar. Notice that open bisimilarity is a congruence, but it has no direct coinductive characterization. Causal bisimilarity [1]: this semantics traces the causal dependencies among actions. It is not comparable with the concurrent semantics, since νb (a + b)| (a + b) and a|a are causally bisimilar (there are no dependencies) but not concurrent bisimilar. Conversely a|a and a.a are concurrent bisimilar but not causally bisimilar. If we add located τ actions to causal semantics we get a bisimilarity finer than the concurrent one. The inclusion follows since if two actions are independent, then they can be executed concurrently. Thus from the interleaving transitions and the causal dependencies one can compute the concurrent transitions. The inclusion is strict since !a and !a.a are causally different (in the second one there are causal dependencies between different occurrences of a, while in the first one there are not), but concurrent bisimilar. Similar statements can be made for the (mixed-order) concurrent semantics in [9], which has a causal flavor.
398
I. Lanese
Located bisimilarity [11]: in this bisimilarity a location is associated to each sequential process, thus actions performed by different sequential processes are distinguished. This concept of localities is completely different from ours, and even if it tracks sequential processes this bisimilarity is not comparable with the concurrent one. In fact, νb a.b.c|e.b and νb a.b|e.b.c are concurrent bisimilar, but they are not located bisimilar since in the first one c is executed in the same component of a while in the second one it is not. On the other hand: νbνc b|b|c|c and νbνc b|b.(c|c) are located bisimilar since τ actions do not exhibit locations, while they are not concurrent bisimilar since in the first one the two τ actions can be executed in parallel, while in the second one they cannot.
6
Weak Semantics
In this section we outline the main features of the weak bisimilarities based on the labeled semantics we have introduced in this paper. Usually weak bisimilarity (see [13] for the precise definition) abstracts from internal activities, i.e. from τ actions. However, in our setting, we have two kinds of τ actions: aτ performed on free name a and τ performed on a hidden name. While one must surely abstract from the latter ones, abstracting also located synchronizations may lose too much information. We call semiweak bisimilarity the one that abstracts only from τ (or ∅ in the reduction semantics), and weak the one that abstracts also from aτ (or all the labels in the reduction semantics). Semiweak bisimilarity is midway between the strong and the weak semantics. If we consider semiweak bisimilarity most of the results shown in the previous sections are still valid. The only notable difference is that the semiweak semantics is not image finite (i.e., a process may have an infinite number of one step derivatives), even up to bisimilarity, but theorems 2 and 8 can be proved only for processes that are image finite up to bisimilarity. However, the same hypothesis is required to prove the corresponding property of the standard weak semantics. If we consider weak bisimilarity instead (based on the labeled semantics), the located and the standard bisimilarities collapse. The concurrent bisimilarity is still strictly finer (Counterexample 2 is still valid) than the standard one, but there is no simple relation with a reduction semantics. In fact, in the reduction semantics labels should be completely abstracted away, thus one must have some other way to observe process behavior. The usual approach of using barbs [13], that is observing the capabilities to produce inputs or outputs, is not sufficient. For instance, a.b + b.a and a|b are barbed bisimilar in the concurrent scenario (both of them can react when put in a context containing either a or b or both).
7
Conclusions and Future Work
We have presented two semantics for π-calculus, highlighting important information about which channels are used by a synchronization and which actions can be executed concurrently. We have analyzed the semantics both at the level
Concurrent and Located Synchronizations in π-Calculus
399
of LTS and of induced behavioral equivalence. As our main result we have shown that bisimilarity is a congruence for the concurrent located semantics, and this guarantees compositionality. Note that all the shown results hold also for CCS, since mobility is not exploited in the used constructions. As future work we plan to apply the same ideas to other calculi. In particular, preliminary analysis show that similar results can be obtained for Fusion Calculus [10], but more care is required to deal with fusions. Also, we want to study the semantic effect of allowing concurrent actions on the same channel. Preliminary results show that this has a strong impact, for instance the direct generalization of Theorem 8 fails. Acknowledgments. The author would like to strongly acknowledge Davide Sangiorgi for many useful discussions and comments and Ugo Montanari for some early discussions.
References 1. Boreale, M. and Sangiorgi, D.: Some Congruence Properties of pi-Calculus Bisimilarities. Theoret. Comput. Sci. 198 1-2 (1998) 159–176 2. Bruni, R. and Lanese, I.: PRISMA: A Mobile Calculus with Parametric Synchronization. In Proc. of TGC’06, Springer, LNCS (2006) to appear 3. Busi, N. and Gorrieri, R.: A Petri Net Semantics for pi-Calculus. In Proc. of CONCUR’95, Springer, LNCS 962 (1995) 145–159 4. Concurrent and Located Synchronizations in π-Calculus, extended version. http://www.cs.unibo.it/˜lanese/publications/fulltext/sofsem2007ext.pdf.gz. 5. Ferrari, G.L., Montanari, U., and Tuosto, E.: A LTS Semantics of Ambients via Graph Synchronization with Mobility. In Proc. of ICTCS’01, Springer, LNCS 2202 (2001) 1–16 6. Lanese, I.: Synchronization Strategies for Global Computing Models. PhD Thesis, Computer Science Department, University of Pisa, Pisa, Italy (2006) 7. Lanese, I. and Montanari, U.: A Graphical Fusion Calculus. In Proceedings of the Workshop of the COMETA Project on Computational Metamodels, Elsevier Science, ENTCS 104 (2004) 199–215 8. Milner, R., Parrow, J., and Walker, J.: A Calculus of Mobile Processes, I and II. Inform. and Comput. 100 1 (1992) 1–40, 41–77 9. Montanari, U. and Pistore, M.: Concurrent Semantics for the pi-Calculus. In Proc. of MFPS’95, Elsevier Science, ENTCS 1 (1995) 10. Parrow, J. and Victor, B.: The Fusion Calculus: Expressiveness and Symmetry in Mobile Processes. In Proc. of LICS’98, IEEE Computer Society Press (1998) 176–185 11. Sangiorgi, D.: Locality and Interleaving Semantics in Calculi for Mobile Processes. Theoret. Comput. Sci. 155 1 (1996) 39–83 12. Sangiorgi, D.: A Theory of Bisimulation for the pi-Calculus. Acta Inf. 33 1 (1996) 69–97 13. Sangiorgi, D. and Walker, D.: Pi-Calculus: A Theory of Mobile Processes. Cambridge University Press (2001) 14. Varacca, D. and Yoshida, N.: Typed Event Structures and the pi-Calculus. In Proc. of MFPS’06, Elsevier Science, ENTCS 158 (2006) 373–397
Efficient Group Key Agreement for Dynamic TETRA Networks Su Mi Lee1,⋆ , Su Youn Lee2 , and Dong Hoon Lee1 1
Center for Information Security Technologies(CIST), Korea University, 1, 5-Ka, Anam-dong, Sungbuk-ku, Seoul, Korea {smlee,donghlee}@korea.ac.kr 2 Baekseok College of Cultural Studies, 393 Anseo-dong, Cheonan, Chungchongnam-do, Korea
[email protected]
Abstract. Terrestrial Trunked Radio (TETRA) is the most frequencyefficient standard for mobile communication and its architecture is fully scalable, from a large high-capacity system to a low-capacity system. In the TETRA standard, various attacks such as a reply attack can occur and the key-establishment scheme used in the TETRA standard requires high communication costs. In this paper, we propose an efficient group key agreement in TETRA networks that guarantees secure communication among light-weight mobile stations. That is, computation cost per mobile station is very low, only requires XOR operation on-line, and our scheme allows mobile stations and a base station to agree a group key with 1-round complexity.
1
Introduction
Terrestrial Trunked Radio (TETRA) is a new digital transmission standard developed by the European Telecommunication Standards Institute [1,2] and it is becoming the system for public safety organizations (police, fire, ambulance, etc). TETRA is typically designed for the Professional Mobile Radio market and includes Private Mobile Radio systems for the military. Its greatest attribute is its efficient frequency in mobile communication which is equivalent to 6.25kHz per channel. For this reason its architecture is fully scalable from a large high-capacity system to a low-capacity system. TETRA also offers fast call set-up time, various communication supports, and direct mode operation between radios with outstanding security features [4]. The intended TETRA market areas include security services, public access, transport services, closed group members, factory site services, and Mining, etc. TETRA is now enhancing mobile commerce including video on demand, video-conferencing, file transfer, e-mail, messaging, and paging. In TETRA networks, a Mobile Station (MS) requests call set-up to other MS in its group via the Switching and Management ⋆
This work was supported by grant No. R01 − 2004 − 000 − 10704 − 0 from the Korea Science & Engineering Foundation.
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 400–409, 2007. c Springer-Verlag Berlin Heidelberg 2007
Efficient Group Key Agreement for Dynamic TETRA Networks
401
Infrastructure (SwMI), which consists of Base Stations (BS). The TETRA standards support point-to-point, and point-to-multipoint communications by the use of SwMI. The standardized TETRA security features [5] are based upon the Digital Enhanced Cordless Telecommunication security, such as authentication, encryption, key management and added features which are relevant to Professional Mobile Radio users, such as end-to-end encryption for closed group MS. These closed groups, one particular group of MSs, can be isolated from all other MSs in the same system and receive a broadcast message exclusively. One of the main security issues in TETRA networks is an access control for making sure that only legitimate MSs of a group have access to communication. The security mechanism could be provided by encrypting a broadcast message using a common session key (group key) which is not known to any MS not in the group. Therefore the entire security of the system depends on how honest MSs can share securely a group key. In the TETRA standard to share a group key, each MS shares an individual secret key with BS which is used in point-to-point communication and then BS unicasts a group key encrypted by each individual secret key to each MS. A closed groups can execute securely point-to-multipoint communication using the group key. However, the group key mechanism of the TETRA standard can be vulnerable to various attacks such as a replay play because of the lack of proper key establishment and management. The main goal of our work is to construct an efficient group key agreement for dynamic groups that can complement the TETRA standard. Our scheme well suited for the TETRA standard allows a group of MSs and BS to dynamically agree on a group key in a group. For dynamic group communications, we propose setup, join, and leave algorithms. In the setup, join, and leave algorithms, each MS performs at most n XOR, 1 pseudorandom function, and 1 hash function. Since the operations dependent on the number of group MSs is XOR operation, the total cost of computations is extremely low.
2
TETRA Security Mechanisms
Several years ago the TETRA Security and Fraud Prevention Group was established to deal specifically with security related to TETRA networks. Security mechanisms of the TETRA standard are briefly described in this section. A full description about the security mechanisms can be found in the ETSI standards [1] and [2]. In particular, the group has produced recommendation 02 [2] which defines end-to-end encryption and key management. The encryption uses three different types of keys. 1. Key Encryption Key (KEK): KEK derived from authentication process is used to protect uplink and wherever possible as it is the most secure and is unique to MS. It should always be used for MS and BS link. 2. Group Cipher Key (GCK): GCK is the common session key (group key) shared among MSs in a group. The group key allows a group MSs to download Traffic Encryption Keys.
402
S.M. Lee, S.Y. Lee, and D.H. Lee
Fig. 1. TETRA services
3. Traffic Encryption Key (TEK): TEK is the key used to encrypt traffic. BS encrypts and sends encrypting TEK to each MS using either KEK or GCK. As illustrated in Fig.1, TETRA networks provide two communication types: point-to-point communication and point-to-multipoint communication. – Point-to-point communication: When two MSs want to communicate each other, two MSs could execute point-to-point communication by interfering BS. For instance, when MS4 wants to communicate with MS6 , BS encrypts TEK with KEK4 and KEK6 and then unicasts the encryption to MS4 and MS6 respectively. – Point-to-multipoint communication: When closed group MSs in local or different areas want to form a group and communicate securely, BS encrypts GCK using KEK of each MS and then unicasts the encryption to each MS.
3
An Efficient Group Key Agreement for Dynamic TETRA Networks
We propose an efficient key agreement in TETRA networks that requires secure group communication among light-weight mobile stations. For individual communication, KEKi is already loaded into MSi and BS. Actually, for sharing KEKi , MSi and BS can execute any 2-party key exchange protocol. Using KEKi for (1 ≤ i ≤ n), all MSs simply share a group key. Let H : {0, 1}∗ → {0, 1}k be a public hash function and G be a collisionresistant pseudorandom function family (PRF). Indices are subjects to modulo n, i.e. if i=j mod n then MSi =MSj . We consider a signature scheme Sig = (gen, sig, ver). BS holds a pair of signing private/public key (skb , pkb ) which are the output of the key generation signature algorithm gen. To share a group key, the legitimate group MSs execute the following steps.
Efficient Group Key Agreement for Dynamic TETRA Networks
403
Setup. Let G0 = {MS1 ,...,MSn } be an initial group with n parties who wish to share a group key. IDi denotes MSi ’s identity. Let I0 = ID1 || · · · ||IDn . – For 1 ≤ i ≤ n, KEKi is shared between MSi and BS. BS computes Ti = GKEKi (I0 ) and then broadcasts (Zi−1,i , αi−1,i )← Zi−1,i = Ti−1 ⊕ Ti and αi−1,i = sigskb (Zi−1,i ||I0 ) (for 1 ≤ i ≤ n). – Each MSi computes Ti = GKEKi (I0 ). Upon receiving (Zj−1,j , αj−1,j ) (i = j), if it is valid after MSi checks n signature, each MSi sets T˜i−1 = Ti−1 and Tˆi = Ti and computes Tj (1 ≤ j ≤ n) sequentially as follows: T˜j = Zj−1,j ⊕ T˜j−1 = Tj−1 ⊕ Tj ⊕ T˜j−1 , < j ← i + 1, · · · , n, · · · , i >, Tˆj = Zj,j+1 ⊕ Tˆj+1 = Tj ⊕ Tj+1 ⊕ Tˆj+1 , < j ← i − 1, · · · , 1, · · · , i >
MSi checks if T˜j is equal to Tˆj for each j (1 ≤ j ≤ n). If it holds, MSi sets Tj = T˜j (= Tˆj ) and is sure of the correctness of Tj . Even though wrong messages (or no message) may be broadcasted by illegal MSs or system faults, honest MSs can notice the errors through the above check process and then halt. Using Ti , all MSs compute a group key sk0 as follows: sk0 = H(T1 ||T2 || · · · ||Tn−1 ||Tn ).
BS G0 = {MS1 , MS2 , MS3 , MS4 }, I0 = ID1 ||ID2 ||ID3 ||ID4 compute Ti = GKEKi (I0 ) Zi−1,i = Ti−1 ⊕ Ti , αi−1,i = sigskb (Zi−1,i ||I0 ) (for 1 ≤ i ≤ 4) Broadcast (Zi−1,i , αi−1,i ) (for 1 ≤ i ≤ 4)
?
?
MS1
MS2
T1 = GKEK1 (I0 )
T2 = GKEK2 (I0 )
? MS3 T3 = GKEK3 (I0 )
? MS4 T4 = GKEK4 (I0 )
compute a group key sk0 = H(T1 ||T2 ||T3 ||T4 )
Fig. 2. Setup algorithm with G0 = {MS1 , MS2 , MS3 , MS4 } and BS
Join Event. Assume that a new MS joins the group of which size is n and the new MS’s identity is MSn+1 . Let v be the current session and Iv = MS1 || . . . ||MSn+1 . – Before a new MSn+1 joins a group, KEKn+1 is loaded into MSn+1 and has been already shared between MSn+1 and BS.
404
S.M. Lee, S.Y. Lee, and D.H. Lee
• BS computes Tn+1 = GKEKn+1 (Iv ) with a new KEKn+1 and re-calculates Ti = GKEKi (Iv ) with old KEKi and new Iv (for 1 ≤ i ≤ n) • BS broadcasts (Zi−1,i , αi−1,i ) ← Zi−1,i = Ti−1 ⊕ Ti and αi−1,i = sigskb (Zi−1,i ||Iv ) (for 1 ≤ i ≤ n + 1). – MSn+1 generates Tn+1 = GKEKn+1 (Iv ) with a new KEKn+1 and each MSi (for 1 ≤ i ≤ n) re-calculates Ti = GKEKi (Iv ) with his own KEKi and new Iv . Upon receiving (Zj−1,j , αj−1,j ) (i = j), if it is valid after MSi checks n + 1 signature, each MSi sets T˜i−1 = Ti−1 and Tˆi = Ti and computes Tj (1 ≤ j ≤ n + 1) sequentially as follows: T˜j = Zj−1,j ⊕ T˜j−1 = Tj−1 ⊕ Tj ⊕ T˜j−1 , < j ← i + 1, · · · , n + 1, · · · , i >, Tˆj = Zj,j+1 ⊕ Tˆj+1 = Tj ⊕ Tj+1 ⊕ Tˆj+1 , < j ← i − 1, · · · , 1, · · · , i > MSi checks if T˜j is equal to Tˆj for each j (1 ≤ j ≤ n + 1). If it holds, MSi sets Tj = T˜j (= Tˆj ) and is sure of the correctness of Tj . Even though wrong messages (or no message) may be broadcasted by illegal MSs or system faults, honest MSs can notice the errors through the above check process and then halt. Using Ti , all MSs compute a group key skv as follows: skv = H(T1 ||T2 || · · · ||Tn ||Tn+1 ) Leave Event. Assume that MS leaves the group of which size is n and the leaving MS’s identity is MSl . Let k be the current session and Ik = MS1 || . . . ||MSl−1 || MSl+1 || . . . ||MSn . – BS re-calculates Ti = GKEKi (Ik ) with old KEKi and new Ik and broadcasts (Zi−1,i , αi−1,i ) ← Zi−1,i = Ti−1 ⊕ Ti and αi−1,i = sigskb (Zi−1,i ||Ik ) (for 1 ≤ i = l ≤ n). – Each MSi updates Ti = GKEKi (Ik ) with his own KEKi and new Ik (1 ≤ i = l ≤ n). Upon receiving (Zj−1,j , αj−1,j ) (i = j = l), if it is valid after MSi checks n − 1 signature, each MSi sets T˜i−1 = Ti−1 and Tˆi = Ti and computes Tj (1 ≤ j = l ≤ n) sequentially as follows: T˜j = Zj−1,j ⊕ T˜j−1 = Tj−1 ⊕ Tj ⊕ T˜j−1 , < j ← i + 1, · · · , n, · · · , i >, Tˆj = Zj,j+1 ⊕ Tj+1 = Tj ⊕ Tj+1 ⊕ Tˆj+1 , < j ← i − 1, · · · , 1, · · · , i > MSi checks if T˜j is equal to Tˆj for each j (1 ≤ j = l ≤ n). If it holds, MSi sets Tj = T˜j (= Tˆj ) and is sure of the correctness of Tj . Even though wrong messages (or no message) may be broadcasted by illegal MSs or system faults, honest MSs can notice the errors through the above check process and then halt. Using Ti , all MSs compute a group key skk as follows: skk = H(T1 || · · · ||Tl−1 ||Tl+1 || · · · ||Tn ).
Efficient Group Key Agreement for Dynamic TETRA Networks
405
BS Gv = {MS1 , MS2 , MS3 , MS4 , MS5 }, Iv = ID1 ||ID2 ||ID3 ||ID4 ||ID5 compute T5 = GKEK5 (Iv ) and re-calculate Ti = GKEKi (Iv ) (for 1 ≤ i ≤ 4) Zi−1,i = Ti−1 ⊕ Ti , αi−1,i = sigskb (Zi−1,i ||Iv ) (for 1 ≤ i ≤ 5) Broadcast (Zi−1,i , αi−1,i ) (for 1 ≤ i ≤ 5)
? MS1
? MS2
? MS3
? MS4
T1 = GKEK1 (Iv ) T2 = GKEK2 (Iv ) T3 = GKEK3 (Iv ) T4 = GKEK4 (Iv )
? MS5
T5 = GKEK5 (Iv )
compute a group key skv = H(T1 ||T2 ||T3 ||T4 ||T5 )
Fig. 3. Join algorithm with Gv = {MS1 , MS2 , MS3 , MS4 , MS5 } and BS
4 4.1
Security Analysis Security Notions
Pseudorandom Functions. Let G : Keys(G)×D → R be a family of functions, and g : D → R a random function. Ag is an algorithm that takes an oracle access to a function and returns a bit. We consider two experiments: The advantage of an adversary Ag is defined as follows: prf −0 prf −1 Advprf G,Ag = Pr[ExpG,Ag = 1] − Pr[ExpG,Ag = 1].
The advantage function is defined as follows: prf max Advprf G (k, t) =Ag {AdvG,Ag (k)}.
where Ag is any adversary with time complexity t. The scheme G is a secure pseudo random function family if the advantage of any adversary Ag with time complexity polynomial in k is negligible. −1 Expprf G,Ag R
−0 Expprf G,Ag R
K ← Keys(G) g ← RandD→R g(·) d ← Ag d ← AGg K (·) Return d Return d
406
S.M. Lee, S.Y. Lee, and D.H. Lee
BS Gk = {MS1 , MS2 , MS4 , MS5 }, Ik = ID1 ||ID2 ||ID4 ||ID5 re-calculate Ti = GKEKi (Ik ) (for 1 ≤ i = 3 ≤ 5) Z5,1 = T5 ⊕ T1
Z1,2 = T1 ⊕ T2
Z2,4 = T2 ⊕ T4
α5,1 = sigskb (Z5,1 ||Ik ) α1,2 = sigskb (Z1,2 ||Ik ) α2,4 = sigskb (Z2,4 ||Ik )
Z4,5 = T4 ⊕ T5 α4,5 = sigskb (Z4,5 ||Ik )
Broadcast all (Zi−1,i , αi−1,i )
?
?
MS1
MS2
?
?
MS4
MS5
compute a group key skk = H(T1 ||T2 ||T4 ||T5 )
Fig. 4. Leave algorithm with Gk = {MS1 , MS2 , MS4 , MS5 } and BS
Secure Signature Scheme (SS). A signature scheme consists of three algorithms, S = {K.gen, S.sig, V.ver}. K.gen generates a private-public key pair for the user. S.sig makes a signature for the message with the private key. V.ver verifies the message-signature pair with the public key and returns 1 if valid or 0 otherwise. Let Gg be a group generator which generates g and a group G whose order is |G|. Let k ∈ N be a security parameter. Let S be a signature scheme. Consider following experiment. ExpSS S,A (k)
OS.sigsk (·)
(g, |G|) ← Gg (k), (sk, pk) ← K.gen(k), (σ, M ) ← Af if (σ, M ) =⊥ then return 0, else if S.verpk (σ, M ) = 1 and OS.sigsk (·) never returned σ on input M then return 1 else return 0
(pk)
The advantage of an adversary Af is defined as follow: SS AdvSS S,Af (k) = Pr[ExpS,Af (k)] = 1
The advantage function of the scheme is defined as follows: SS max AdvSS S (k, t) =Af {AdvS,Af (k)},
where the maximum is taken over all adversaries running in time t. P is said to be secure if AdvSS S (k, t) is negligible in t. 4.2
Security Models
To define a notion of security, we define the capability of an adversary as in [3]. Πik represents the k-th instance of MSi . An instance Πik has unique
Efficient Group Key Agreement for Dynamic TETRA Networks
407
identifier sidki and partner identifier pidki . Here Πbt represents the t-th instance of BS. An instance Πbt has unique identifier sidtb and partner identifier pidtb . We say that Πik , Πjl and Πbt are partnered, if pidki , pidlj and pidtb are equal, and sidki =sidlj =sidtb . Let P be our protocol AGKA generated using individual KEKs. Let A be an adversary attacking P . We consider an experiment, where A asks queries to oracles, and the oracles answer back to A. Oracle queries model attacks which A may use in the real system. • A query Send (b, k, m) is used to send a message m to instance Πbk . An adversary may use this query to perform attacks by modifying and inserting the message. • A query Execute (Πbk ) represents passive eavesdropping of an adversary. • A query Reveal (Πik (or Πbk )) models the known key attacks in the real system. The adversary is given a session key of instance Πik (or Πbk ). • A query Corrupt (Πbk ) models exposure of the long-term key (skb ) held by BS. • A query Test (Πik (or Πbk )) is used to define the advantage of an adversary. If Πik (or Πbk ) is fresh (defined below), the oracle flips a coin b. If b is 1, then a real group key is returned. Otherwise a random string is returned. The adversary is allowed to make a single Test query, at any time during the experiment. At the end of the experiment, the adversary A outputs a guess b′ for the bit b. The advantage of A, denoted Advagka P,A (k) for a security parameter k, is defined as |2 · Pr[b = b′ ] − 1|. For protocol P , we define its security as follows: Advagka (k, t) =max {Advagka A P P,A (k)}, where the maximum is taken over all adversaries running in time t. AGKA P is (k) is negligible in t. said to be secure if Advagka P Definition 1. Πik and Πbk are fresh if 1) neither Πik and Πbk nor one of their partners has been asked for a Reveal query after Πik , Πbk and their partners have computed a group key, and 2) the adversary has not queried Corrupt (Πbk ). 4.3
Security Proof
Theorem. Let P be our protocol AGKA generated using individual KEKs. For adversary A attacking P within a polynomial time t, with less than qex Execute queries and qs Send queries, the advantage of A is bounded as follows: prf SS Advagka P,A (k, t) ≤ AdvS,Af (k, t) + n · AdvG,Ag (k, t),
here n is an upper bound on the number of honest parties.
408
S.M. Lee, S.Y. Lee, and D.H. Lee
Proof (Sketch). The advantage is from the following two cases: – Case 1: There is at least one forged signature. – Case 2: There is no forged signature. Case1 Case2 Thus, Advakga P,A = AdvP,A + AdvP,A . We bound the advantage of A from each case in the following claims. SS Claim 1. AdvCase1 P,A ≤ AdvS,Af (k, t).
Proof of Claim 1. To prove Claim 1, we construct Af which breaks the underline signature scheme using A, if the advantage from Case 1 is non-negligible. Af is given a public key pk and a signing oracle S.sig(·) in the experiment of unforgeability. Af sets pk as a public key of BS. Af uses S.sig(·) to make a signature of BS. That is, a signature of BS for a message m is S.sig(m). The more concrete description of Af is as follows:
S.sig(·)
Af
(pk)
1 For each oracle queries of A, • For Send (Πbk ) : To make a signature of BS, use the signing oracle S.sig(·). • For Corrupt (Πbk ), halt since we can not know the secret key of the signing oracle. • For all oracle queries of A, answer them following the protocol under the above restriction. 2 If a forged signature σ is found during simulation such that σ is a valid signature of BS, output σ and quit.
The probability of success of Af depends on whether or not A makes a forged signature. prf Claim 2. AdvCase2 P,A ≤ n · AdvG,Ag (k, t).
Proof of Claim 2. The main idea of a brief proof is that if there is an adversary who breaks P with non-negligible advantage then there is an adversary breaking pseudorandomness of a pseudo random function family G. To prove, we construct a distinguisher Ag to break pseudorandomness of a pseudo random function family G. Ag is given an oracle function O(·) in the experiment of pseudorandomness of the function family G. Ag uses O(·) to make Ti = O(I0 ) . Then Ag simulates prf −1 prf −0 ExpG,A or ExpG,A depending on whether O(·) is a function from G or not. g g Let an advantage of a distinguisher be Advprf G,Ag . The concrete description of Ag is as follows:
Efficient Group Key Agreement for Dynamic TETRA Networks
409
1. For 1 ≤ i ≤ n, use O(·) instead of GKEKi to make O(I0 ). 2. For all 1 ≤ j ≤ i − 1, select a random value rj and use it instead of KEKj to make Grk (I0 ). 3. For all i + 1 ≤ j ≤ n, use KEKj to make GKEKj (I0 ). 4. For oracle queries of an adversary, answer them following the protocol. 5. Let the output of A be b′ . Then output b′ and quit.
BS (or Authentication Centre) and MS may perform any 2-party key exchange protocol to share KEK in registration process. In the process, for authentication, MS should use its own long-term key such as a password or a signing key. Actually, we omitted this process in our paper since our goal is to construct a group key agreement scheme suitable to the TETRA standard. In our scheme, MS does not use a long-term key while executing our protocol. Therefore, we do not consider corruption of MS in our security models. For space limitation, we omit the detail proof.
5
Conclusion
In the TETRA standard, security is very important party of TETRA. The security of TETRA is well defined in the standard. However, there are several weak points [6] in the TETRA standard. We improve a party of the TETRA standard to share a group key in view points of communication complexity and security.
References 1. SFPG Recommendation01-Key Distribution http://www.tetramou.net/sfpg 2. SFPG Recommendation02-End to End Encryption http://www.tetramou.net/sf-pg 3. Katz, J. and Yung, M.: Scalable Protocols for Authenticated Group Key Exchange. In Advances in Cryptology Crypto’03, Springer-Verlag, LNCS 2729 (2003) 110-125 4. Lammerts, E., Slump, C.H., and Verweij, K.A.: Realization of a Mobile Data Application in TETRA. STW/SAFE99, 249–253 5. Roelofsen, G.: ‘TETRA Security. Information Security Technical Report 5 (2000) 6. Roelofsen, G.: Security Issues for TETRA Networks. TETRA Conference (1998)
Algorithmic Aspects of Minimum Energy Edge-Disjoint Paths in Wireless Networks Markus Maier, Steffen Mecke, and Dorothea Wagner Universit¨ at Karlsruhe (TH), Germany
Abstract. The problem of finding k minimum energy, edge-disjoint paths in wireless networks (MEEP) arises in the context of routing and belongs to the class of range assignment problems. A polynomial algorithm which guarantees a factor-k-approximation for this problem has been presented before, but its complexity status was open. In this paper we prove that MEEP is NP-hard and give new lower and upper bounds on the approximation factor of the k-approximation algorithm. For MEEP on acyclic graphs we introduce an exact, polynomial algorithm which is then extended to a heuristic for arbitrary graphs.
1
Introduction
Links between nodes of a wireless network are less reliable than connections in wired networks, because of effects like fading, interference, or obstructions. For reliable routing in wireless networks it can therefore be desirable to communicate not only over one path but over various, disjoint paths. This can help to achieve connections that are more reliable, have less latency, or higher bandwidth. Energy is a sparse resource in ad hoc and especially in sensor networks, therefore it is usually vital to achieve the connectivity goal with a minimum energy usage. The advantage of a wireless networks node, in this respect, is their ability to do multicasts to all of their neighbors, using only energy for one transmission. If several paths have a node in common, energy can be saved by doing just one multicast at this node instead of several unicast transmissions. In [1] Srinivas and Modiano showed several algorithms for finding sets of k edge-disjoint or node-disjoint paths in wireless ad hoc networks. They gave a polynomial time algorithm for finding an energy-minimal set of k node-disjoint paths and a polynomial time algorithm for finding an energy-minimal pair of edge-disjoint paths. The node-disjoint case is less complex in the sense that, as the paths share no nodes except the start node s, only s can save energy by doing a multicast to its neighbors. For the edge-disjoint case a k approximation algorithm was presented in [1] (the LDMW algorithm). However, the complexity of the problem has remained unknown. Therefore, in this paper we concentrate on the edge-disjoint case. Here, energy can be saved also at intermediate nodes. The disadvantage of paths that are merely edge-disjoint is that they may not protect against node failures. One main difference between node and edge failures is that the reasons for the latter are often only temporary (e.g., interference or obstruction) whereas reasons for node failures are often permanent (e. g., power Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 410–421, 2007. c Springer-Verlag Berlin Heidelberg 2007
Algorithmic Aspects of Minimum Energy Edge-Disjoint Paths
411
loss or mobility). The permanent interruption of a path, therefore, has to be dealt with differently, namely by establishing a new path, whereas, in case of a transient failure, the system might just use the alternative paths until the failing link becomes available again. 1.1
Related Work and Overview
Energy-efficient routing has been looked at many times before. One of the first works was probably [2], followed by many more. Disjoint path routing is already present in [3] and has been rediscovered in the context of wireless sensor and ad hoc networks, e.g. in [4] or [5]. But [1] seems to be the first work on the combined problem of finding energy-minimal disjoint paths (namely MEEP). It is closely related to work on finding energy-efficient strongly connected ([6]) or strongly k-connected ([7]) subgraphs in the context of topology control. These two problems and the k-MEEP problem belong to the class of range assignment problems. One of the first works on this type of problem was [8], and [9] was another predecessor of [1]. Range assignment has been studied widely in the meantime, see also [10] for a survey. In a successive paper ([11]), MEEP is extended and analyzed under lifetime aspects. As Srinivas and Modiano already pointed out, the problem of finding minimum energy disjoint paths is more focused than finding k-connected subgraphs, in the sense that it is only concerned with finding disjoint paths between one pair of nodes and therefore does not need to maintain routes between nodes that may never have to communicate with each other at all. Whereas they gave polynomial time algorithms for finding minimum energy node-disjoint paths and pairs of edge-disjoint paths, our first result (Sect. 3) is that the MEEP problem is NPcomplete, at least, if k is part of the input. It was shown in [1] that an algorithm for finding k minimum length edge disjoint paths (like, for example Suurballe’s algorithm, [3]) provides at least k-approximation. In Sect. 4 we show that the factor of k is exact. However if we assume √ all edge weights to be equal we can prove a asymptotically tight bound of Θ( n) for the approximation factor of this algorithm. Finally, in Sect. 5, we present an exact, polynomial time algorithm for acyclic graphs. Based on this algorithm, we describe a simple heuristic for arbitrary graphs in Sect. 6.
2
Network Model
We use a slightly more general network model than that of [1]. A network consists of n nodes. Each node v has a maximum transmission power Emax (v) and can transmit at any power in the interval [0, Emax (v)]. For each (ordered) pair (u, v) of distinct nodes we are given a weight w(u, v). The node u can establish a link to a node v if it transmits with power greater than or equal to w(u, v). These weights need not be symmetric, nor does the triangle inequality need to hold. This includes energy metrics (with d(u, v) = δ(u, v)α for α ∈ {0, 1, 2, ...}) as well as other, more general metrics.
412
M. Maier, S. Mecke, and D. Wagner
Clearly, such a network can be modeled by a weighted directed graph where the nodes are the nodes of the network and there is an edge from a node u to a node v if Emax (u) ≥ w(u, v), i. e. if there can be a link from u to v. In this case the corresponding edge weight is w(u, v). As mentioned above, we can use the wireless character of a network to save energy in a multicast setting (the so-called wireless multicast advantage (WMA)). Consequently, the cost of a set of paths should not be measured by the sum of the edge weights. Instead we define the following cost function which we call energy: Definition 1 (Energy). The energy E(P ) of a set P of paths in a directed graph is defined as max w(u, v) E(P ) = u∈V (P )
(u,v)∈A(P )
where V (P ) denotes the nodes and A(P ) the edges in the path and we set for convenience max∅ = 0. The weight of P is w(P ) = w(u, v) . (u,v)∈A(P )
Formally, the decision version of the problem of finding k minimum energy edgedisjoint paths can be stated as follows: Definition 2 (MEEP). Given a directed acyclic graph D = (V, A) with weights w : A → R+ , two nodes s, t ∈ V , B ∈ N and the number k ∈ N of paths. Are there k edge-disjoint paths P from s to t with E(P ) ≤ B?
3
Complexity
There are polynomial time algorithms for finding k edge-disjoint paths of minimum weight (i. e. sum of edge weights) in a graph [3]. However, since in MEEP a different cost function is used, the problem becomes NP-complete for general k (i. e. for k being part of the input). We will show this in the following by reduction of SET COVER to MEEP. Theorem 1. Given a directed graph D = (V, A) with weights w : A → R+ , start node s ∈ V , end node t ∈ V , k ∈ N and a threshold B ∈ N. Then it is NP-complete to decide if there is a set P of k edge-disjoint paths from s to t with E(P ) ≤ B. Proof. We will show the theorem by a reduction of the classic SET COVER problem to MEEP. Let us first remind the reader of the definition of SET COVER: Definition 3 (SET COVER). Given a set U = {u1 , . . . , un }, a family F = {S1 , . . . , Ss } of subsets of U and an integer B. Can we select B (or less) subsets from F such that every element of U is in at least one of the selected subsets?
Algorithmic Aspects of Minimum Energy Edge-Disjoint Paths
413
It is a well-known fact that SET COVER is NP-complete [12]. Given an instance of SET COVER, we can construct a directed acyclic graph D = (V, A) in polynomial time such that there is a correspondence between n edge-disjoint paths in D and the set covers. The construction is as follows: The nodes V of D consist of the elements of U and F , two nodes s and t, and for every set Si ∈ F with ni elements, we have nodes vi,1 , . . . , vi,|Si | . From s there is an edge to every vi,j (i ∈ {1, . . . , s}, j ∈ {1, . . . , |Si |). From the nodes vi,j there are edges to the corresponding sets Si . From each set Si there is an edge to every node u ∈ U with u ∈ Si . Finally, there are edges from all the nodes in U to t. All edges are assigned a weight of 1. Fig. 1 shows an example of this reduction.
Fig. 1. Reduction of an instance of SET COVER with U = {u1 , . . . , u7 }, F = {S1 , . . . , S5 }, S1 = {u1 , u3 }, S2 = {u2 , u4 , u5 }, S3 = {u1 , u2 , u3 }, S4 = {u5 , u6 }, S5 = {u4 , u7 } to MEEP
Clearly, the size of this graph is polynomial in the size of the SET COVER problem and it can be constructed in polynomial time. We will show that there is a set cover of size less than or equal to B if and only if there are n edge-disjoint paths P from s to t with E(P ) ≤ B + 2n + 1. Given a set cover of size B – w. l. o. g. the sets S1 . . . , SB – we can construct n edge-disjoint paths from s to t as follows: For every element of U we can find a set in S1 , . . . , SB that covers the element. Let ni ≤ |Si | (i ∈ {1, . . . , B}) denote the number of elements that are thus associated with set Si . Clearly n1 + . . . + nB = n. Now we construct ni edge-disjoint paths from s to Si (via the nodes vi,1 , . . . , vi,ni ) and ni edge-disjoint paths from Si to t (via the elements that are covered by Si . Together, we have constructed n edge-disjoint paths from s to t. Since all edge weights are 1, the energy consumed by the paths is equal to
414
M. Maier, S. Mecke, and D. Wagner
the number of nodes (except t), namely E(P ) = 1 + n + B + n = B + 2n + 1, as desired. Given n edge-disjoint paths from s to t they must visit exactly n of the nodes vi,j and n of the nodes u1 , . . . , un , by construction of the graph. Thus, the energy of the paths is B + 2n + 1, where B is now the number of used nodes in S1 , . . . , Ss . The paths easily induce a set cover by taking for every element u ∈ U the predecessor on the path visiting u (one of S1 , . . . , Ss ). As a result, we have found a set cover with B or less sets. We have shown that there is a set cover of size B if and only if there is a set of n edge-disjoint paths from s to t in D with energy 1 + 2n + B, which implies NP-completeness. ⊓ ⊔ Remark 1. From the NP-completeness proof we can derive a result about the best possible approximation factor for MEEP using a result of Feige: Feige could show in [13] that there cannot exist an approximation algorithm for SET COVER with an approximation factor better than O(1−o(1)) log n unless NP has slightly superpolynomial time algorithms. Using this result one can easily show that under the same conditions the same bound holds for the approximation of MEEP.
4
Approximation
As we have seen in the previous section we cannot expect to find an approximation algorithm with an approximation factor of less than O(log k) for MEEP. In [1] the so-called Link-Disjoint Minimum-Weight (LDMW) algorithm was proposed and shown to possess an approximation factor of less than or equal to k. In this algorithm, the k paths of minimum weight are computed instead of the paths of minimum energy (e. g. using Suurballe’s algorithm [3]). The example in Fig. 2 shows, that the approximation factor of the LDMW algorithm is exactly k.
Fig. 2. An example that shows an approximation factor of k for the LDMW approximation algorithm
Algorithmic Aspects of Minimum Energy Edge-Disjoint Paths
415
In this example, all of the upper paths (via w) have a weight of 1 + 3ε, whereas the lower paths have a weight of 1 + ε. Thus, the k paths found by the LDMW algorithm are the lower paths which need an energy of k + ε. The energy of the upper paths, however, is 1 + (2k + 1)ε. For ε → 0 the quotient of the LDMW solution and the optimal solution approaches k. 4.1
The Binary Case
The example in Fig. 2 works because of great differences in edge weights. It is an interesting question if we can attain a better approximation factor if no such great differences can occur. We studied the case where all edges have the same weight (w. l. o. g. 1) and called this restricted problem BMEEP (for Binary MEEP), because the nodes can either send or not, but not send at different energy levels. Note that our proof of NP-completeness works for BMEEP as well. As explained above, however, the example that shows an approximation factor of k for the LDMW algorithm applied to MEEP does not work for BMEEP. Can we expect LDMW to work better on BMEEP? We can give an example that shows for k ∈ N of the form k = 1 + 2 + . . .+ l for some l ∈ N that the approximation factor cannot be better than 1 + 2l which is √ Ω( k). Fig. 3 shows the example for k = 1 + 2 = 3. We will discuss this example and briefly show how it can be extended to the general case.
Fig. 3. A lower bound on the approximation factor of LDMW applied to BMEEP
Let
P1 = (s, u1 , v1 , v2 , . . . , v2m , u′1 , t), P2 = (s, u2 , v2 , v4 , . . . , v2m , u′2 , t), P3 = (s, u3 , v1 , v3 , . . . , v2m−1 , u′3 , t), ′ P1′ = (s, u′ , v1′ , . . . , v2m , t)
and
denote four s-t-paths. Clearly, the weights of the paths are W (P1 ) = 2m + 3, W (P2 ) = W (P3 ) = m + 3 and W (P1′ ) = 2m + 2. Thus, the three paths of minimum weight are P1′ , P2 and P3 with an energy of E(P1′ , P2 , P3 ) = 4m + 6. On the other hand E(P1 , P2 , P3 ) = 2m + 7. Thus the factor between the LDMW solution and the optimal solution is 4m+6 2m+7 which approaches 2 for m → ∞. The idea of the general case is to use i paths of “step size” i where i ∈ {1, . . . , l} as optimal paths (P1 with step size 1; P2 and P3 with step size 2
416
M. Maier, S. Mecke, and D. Wagner
above) and m = l(l−1) parallel paths that are slightly shorter than the paths of 2 “step size” l − 1. Then one can show the factor of 1 + 2l between the LDMW solution and the optimal solution. The next lemma shows that this approximation factor is asymptotically tight. Lemma √ 1. For k ≥ 6 the approximation factor of the LDMW algorithm is at least 2 k. Proof. Let P ∗ be the set of k edge-disjoint paths with minimum energy and P ′ the set of k edge-disjoint paths with minimum weight. It is sufficient to √ show that E(P ′ ) ≤ W (P ′ ) ≤ W (P ∗ ) ≤ 2 k · E(P ∗ ) where all but the last inequality is obvious. Let’s have a look at the graph that is induced by P ∗ . It is the union of k shortest (directed) s-t-paths and therefore (w.l.o.g.) a directed acyclic graph of n nodes (if there are cycles, they can be removed). There exists an order s = v1 , v2 , v3 , . . . , vn−1 , vn = t of the nodes of this graph (for instance the topological order) such that every edge goes “upward”. The weight of P ∗ is the number of edges of this graph. The energy is the number of nodes (minus 1). The more edges the graph has, the shorter they have to be (where the length of an edge (vi , vj ) is |j − i|). There are at most n − 1 edges of length 1, n − 2 edges of length 2 and so on. Every path leads from node s to node t, so the “distance” it crosses is n − 1. Even if P ∗ uses only the shortest possible √ edges, we claim that the total distance is “used up” after edges of length 2 k are used because it is at least √ 2 k i=1
i(n − i) =
in −
√ √ √ i2 = 2nk + n k − 8/3k k − 6 k − 1
≥ k(n − 1) for n ≥ k ≥ 6. Therefore the number of used edges is at most ∗
W (P ) ≤
√ 2 k i=1
√ √ √ (n − i) = 2n k − 2k − k ≤ 2 kE(P ∗ ) .
⊓ ⊔
For the case k = 3 we could show that the approximation factor is exactly k. The proof is rather long and technical and cannot be given here. It can be found in other works by the authors. For the case of general k, the lower bound of 1 + 2l √ (for k = l(l+1) 2 ) asymptotically matches the upper bound of 2 k.
5
An Algorithm for Acyclic Graphs
In Sect. 3 we have shown that MEEP is NP-complete for weighted directed graphs and the number k of paths part of the input. In this section we will show that there is an exact, polynomial time algorithm if we restrict the graphs to be considered to acyclic graphs and fix a certain k ∈ N. The algorithm relies on a notion from graph drawing that has to be presented first, so-called layerings.
Algorithmic Aspects of Minimum Energy Edge-Disjoint Paths
417
We will first give an algorithm for properly layered graphs and then show briefly how we can transform an acyclic graph to a properly layered graph. Combining these steps we get a polynomial time algorithm for acyclic (directed) graphs. 5.1
Algorithm for Properly Layered Graphs
Layerings are a well-studied problem in graph drawing. The following definitions are from [14]. Definition 4. A layering of an acyclic digraph D = (V, A) is a partition of V into subsets (layers) L1 , . . . , Lh , such that if (u, v) ∈ A, u ∈ Li and v ∈ Lj , then i > j. The span d(e) of an edge e = (u, v) where u ∈ Li and v ∈ Lj is defined as d(e) = j − i − 1. A layering is called proper if d(e) = 0 for all edges e ∈ A. For every acyclic digraph a layering can be computed in linear time, e.g. by longest path layering [14]. From now on, we confine ourselves to finding k edgedisjoint paths in properly layered graphs for fixed k. Theorem 2. Given a weighted acyclic digraph D = (V, A) with weights w : A → R+ , a proper layering into layers L0 , . . . Lh , start node s and end node t, we can compute k minimum-energy edge-disjoint paths from s to t in time O(nk mk ). Proof. W. l. o. g. we can assume that L0 = {s} and Lh = {t}. Since we have a proper layering, all edges go from one layer to the next. Thus the set of edges can be divided into layers as well. Let Ai = {(u, v) ∈ A|u ∈ Li−1 , v ∈ Li } for i = 1, . . . , h. Clearly A is the disjoint union of A1 , . . . , Ah . Obviously, a set of k edge-disjoint paths from s to t must use exactly k (different) edges from each edge layer. In the following we will consider k edge-disjoint paths from the source s to k-combinations of nodes from the same layer. For a k-combination (without repetitions) (e1 , . . . , ek ) of edges from a layer Ai let φ(e1 , . . . , ek ) be the combination of the start nodes and ψ(e1 , . . . , ek ) the combination of the end nodes. Let (u1 , . . . , uk ) be a k-combination (with repetitions) of nodes from layer Li (i = 0, . . . , h − 1). E(u1 , . . . , uk ) denotes the minimum energy of k edge-disjoint paths from s to (u1 , . . . , uk ) and E(e1 ,...,ek ) (u1 , . . . , uk ) the minimum energy of k edge-disjoint paths from s to the node combination (u1 , . . . , uk ) using the edge combination (e1 , . . . , ek ). Given a k-combination with repetitions (u1 , . . . , uk ) of nodes from a layer Li (i = 1, . . . , h), let A(u1 ,...,uk ) denote the set of k-combinations of edges leading to (u1 , . . . , uk ). Then E(u1 , . . . , uk ) = =
min
E(e1 ,...,ek ) (u1 , . . . , uk )
min
E(φ(e1 , . . . , ek )) + ∆(e1 ,...,ek ) .
(e1 ,...,ek )∈A(u1 ,...,uk ) (e1 ,...,ek )∈A(u1 ,...,uk )
418
M. Maier, S. Mecke, and D. Wagner
Here, ∆(e1 ,...,ek ) is the increase in energy when a set of k paths to a combination of nodes of a layer Li is extended by edges e1 , . . . , ek to a combination of nodes of layer Li+1 . Due to the multicast advantage, we get the following formula: max w(u, v) . ∆(e1 ,...,ek ) = u∈φ(e1 ,...,ek )
(u,v)∈(e1 ,...,ek )
This leads to a dynamic programming approach: In order to compute the k minimum-energy paths to a combination (v1 , . . . , vk ) of nodes in layer Li+1 , we use the minimum-energy paths to all combinations of nodes in layer Li : We enumerate all possible k-combinations (without repetitions) of edges from edge layer Ai+1 leading to (v1 , . . . , vk ) and pick the combination with minimum total energy. The energy E(t, . . . , t) is the energy of k minimum-energy edge disjoint s-t-paths and the paths themselves can be found by backtracking: For every combination of nodes we have to store the k edges on the minimum-energy paths leading there. The predecessor edges on the k edge-disjoint minimumenergy paths to a combination (u1 , . . . , uk ) are denoted by pred(u1 , . . . , uk ). In summary we can give the following dynamic programming algorithm: – Initialization for all combinations (with repetitions) (v1 , . . . , vk ) of nodes from layers L0 , . . . , Lh : 0 if v1 = v2 = . . . = vk = s • Emin (v1 , . . . , vk ) = ∞ otherwise – For all edge layers Ai = A1 , . . . , Ah do • For all combinations (without rep) (e1 , . . . , ek ) of edges from Ai ∗ If Emin (ψ(e1 , . . . , ek )) > Emin (φ(e1 , . . . , ek )) + ∆(e1 , . . . , ek ) · Emin (ψ(e1 , . . . , ek )) = Emin (φ(e1 , . . . , ek )) + ∆(e1 , . . . , ek ) · pred(ψ(e1 , . . . , ek )) = (e1 , . . . , ek ) – E(t, . . . , t) is the energy of k minimum energy edge-disjoint paths. – k minimum-energy edge-disjoint paths are found by backtracking. The running time of the algorithm above is determined by the total number of edge combinations to be considered. If we set mi = |Ai | (i = 1, . . . , h) this number is h mi m ≤ ∈ O(mk ) . k k i=1
The algorithm needs to hold in memory a table of predecessor edges and the energy of a minimum-energy path for every combination of nodes from the same layer. Setting ni = |Li | the number of combinations is h ni + k − 1 i=0
k
≤
h
nki ∈ O(nk ) ,
i=0
since the number of combinations with repetitions can clearly be bounded by the number of permutations with repetitions. ⊓ ⊔
Algorithmic Aspects of Minimum Energy Edge-Disjoint Paths
5.2
419
Algorithm for Acyclic Graphs
For every acyclic digraph, a layering can be computed in linear time, e.g. by longest path layering [14]. From there, we can easily construct a proper layering by introducing new nodes for all edges e = (u, v) ∈ A that span more than one layer. If, for example, u ∈ Li , v ∈ Lj and j > i + 1, we introduce j − i − 1 new nodes ve,1 , . . . , ve,j−i−1 and replace e by the path (u, ve,1 , ve,2 , . . . , (ve,j−i−1 , v)). The weights of the new edges are set to w′ (u, ve,1 ) = w(e) and w′ (e′ ) = 0 for all other introduced edges e′ . An example of the transformation of a layered graph to a properly layered graph and for a mapping of paths in one graph to the other graph can be seen in Fig. 4.
(a) Two edge-disjoint paths in a layered acyclic graph
(b) The corresponding paths in the properly layered graph constructed by our algorithm Fig. 4. An example for the transformation of layered graphs to properly layered graphs and corresponding paths
Combining the algorithms we can derive an algorithm for general acyclic graphs. Given an acyclic graph D = (V, A) with n nodes and m edges, we first compute a layering. Then we compute in time O(nm) a properly layered graph D′ = (V ′ , A′ ) with O(mn) nodes and edges. Applying the algorithm for properly layered graphs to D′ we can compute k edge-disjoint minimum-energy paths in k k D′ in time O(|A′ | ) = O(mk nk ) and with space in O(|V ′ | ) = O(mk nk ). Finally, we have to find the corresponding paths in D, which can be done in linear time (given appropriate data structures, e. g. pointers from the edges in A′ to the corresponding edges in A). For a fixed k ∈ N we can thus find k minimum-energy edge-disjoint paths in polynomial time.
420
6
M. Maier, S. Mecke, and D. Wagner
A Heuristic for General Graphs
Most graphs that arise of real-world networks are not acyclic, e. g. if we assume symmetry of our weights w(u, v) and the maximum energy of the nodes is equal we get a symmetric graph. However, we can apply our algorithm for acyclic graphs to derive a heuristic for the general case: In the first step we compute an appropriate acyclic subgraph and use our exact algorithm in the acyclic subgraph. One natural way of doing this assumes that the coordinates of the nodes are known (i. e. we have information about the geometry of the network). Then we can just remove any edge whose end point is further from the target than the starting point (in terms of euclidean distance). Edges adjacent to s are treated differently: All edges leaving s remain in the graph, whereas edges leading to s are removed. We did some experiments with graphs of different sizes and randomly created layouts. We placed nodes uniformly at random in a square of a given size and computed the LDMW paths in the original graph and the exact solution in the acyclic subgraph. We assumed that the enrgy (i.e., the edge lengths) depend only on the euclidian distances of nodes (i. e. we used the network model of [1]). Due to the high running time and memory requirements of the algorithm we could only make comparisons for k = 3. They showed that our heuristic usually outperformed the LDMW algorithm. Energy savings were up to 40 % and the average was between 10 % and 15 %, depending on the “density” of the graph. We also found that removing edges in order to get an acyclic graph did not decrease the number of edge-disjoint s-t-paths dramatically. In summary we could show that the paths found by the LDMW approximation algorithm usually are far from optimal. Thus it would be worth searching for better approximation algorithms.
(a) three paths with E (P ) = 415381 (b) three paths with E (P ) = 359295 Fig. 5. Comparison between the LDMW heuristc (left) and acyclic graph heuristic (right)
Algorithmic Aspects of Minimum Energy Edge-Disjoint Paths
7
421
Conclusion
We have seen that MEEP is NP-complete in the general case where k is not bounded (but part of the input). The complexity of MEEP for a fixed k ∈ N is still unknown. If we restrict our problem to graphs of equal edge √ weights, there √ remains a small gap between the Ω( k) lower bound and the 2 k upper bound for k > 6 for the approximation factor of the LDMW algorithm. It is also worth searching for better approximation algorithms, as we are still far away from the theoretical lower bound of around log(k). And there is still no satisfying (heuristic or approximative) distributed algorithm for finding energy-optimal disjoint paths.
References 1. Srinivas, A. and Modiano, E.: Minimum Energy Disjoint Path Routing in Wireless Ad-Hoc Networks. In: Proc. Int. Conf. on Mobile Computing and Networking, Mobicom’03, ACM Press (2003) 122–133 2. Singh, S., Woo, M., and Raghavendra, C.S.: Power-Aware Routing in Mobile Ad Hoc Networks. In: Proc. Int. Conf. on Mobile computing and networking, MobiCom’98, ACM Press (1998) 181–190 3. Suurballe J.W.: Disjoint Paths in a Network. Networks 4 (1974) 125–145 4. Ganesan, D., Govindan, R., Shenker, S., and Estrin, D.: Highly-Resilient, EnergyEfficient Multipath Routing in Wireless Sensor Networks. SIGMOBILE Mob. Comput. Commun. Rev. 5 4 (2001) 11–25 5. Nasipuri, A. and Das, S.: On-Demand Multipath Routing for Mobile Ad Hoc Networks. In: Proc. Int. Conf. on Computer Communications and Networks, ICCCN’99 (1999) 64–70 6. Chen, W.T., Huang, N.F.: The Strongly Connecting Problem on Multihop Packet Radio Networks. IEEE Transactions on Communications 37(3) (1989) 293–295 7. Lloyd, E.L., Liu, R., Marathe, M.V., Ramanathan, R., and Ravi, S.S.: Algorithmic Aspects of Topology Control Problems for Ad Hoc Networks. Mob. Netw. Appl. 10(1-2) (2005) 19–34 8. Kirousis, L.M., Kranakis, E., Krizanc, D., and Pelc, A.: Power Consumption in Packet Radio Networks (Extended abstract). In: Proc. Symp. on Theoretical Aspects of Computer Science, STACS’97, Springer-Verlag (1997) 363–374 9. Wieselthier, J.E., Nguyen, G.D., and Ephremides, A.: Energy-Efficient Broadcast and Multicast Trees in Wireless Networks. Mob. Netw. Appl. 7(6) (2002) 481–492 10. Clementi, A., Huiban, G., Penna, P., Rossi, G., and Verhoeven, Y.: Some Recent Theoretical Advances and Open Questions on Energy Consumption in Ad-Hoc Wireless Networks. In: Proc. Workshop on Approximation and Randomization Algorithms in Communication Networks, ARACNE (2002) 23–38 11. Tang, J. and Xue, G.: Node-Disjoint Path Routing in Wireless Networks: Tradeoff between Path Lifetime and Total Energy. In: Proc. IEEE International Conference on Communications 7 (2004) 3812–3816 12. Papadimitriou, C.H.: Computational Complexity. Addison-Wesley (1995) 13. Feige, U.: A Threshold of ln n for Approximating Set Cover. J. ACM 45 4 (1998) 634–652 14. di Battista, G., Eades, P., Tamassia, R., and Tollis, I.G.: Graph Drawing: Algorithms for the Visualization of Graphs. Prentice Hall (1999)
The Pk Partition Problem and Related Problems in Bipartite Graphs J´erˆome Monnot1 and Sophie Toulouse2 1
CNRS LAMSADE - UMR 7024, Universit´e Paris-Dauphine Place du Mar´echal De Lattre de Tassigny, 75775 Paris Cedex 16, France
[email protected] 2 LIPN - UMR CNRS 7030, Institut Galil´ee, Universit´e Paris 13 99 av. Jean-Baptiste Cl´ement, 93430 Villetaneuse, France
[email protected]
Abstract. In this paper, we continue the investigation proposed in [15] about the approximability of Pk partition problems, but focusing here on their complexity. More precisely, we prove that the problem consisting of deciding if a graph of nk vertices has n vertex disjoint simple paths {P1 , · · · , Pn } such that each path Pi has k vertices is NP-complete, even in bipartite graphs of maximum degree 3. Note that this result also holds when each path Pi is chordless in G[V (Pi )]. Then, we prove that the optimization version of these problems, denoted by MaxP3 Packing and MaxInducedP3 Packing, are not in PTAS in bipartite graphs of maximum degree 3. Finally, we propose a 3/2-approximation for Min3PathPartition in general graphs within O(nm + n2 log n) time and a 1/3 (resp., 1/2)-approximation for MaxWP3 Packing in general (resp., bipartite) graphs of maximum degree 3 within O(α(n, 3n/2)n) (resp., O(n2 log n)) time, where α is the inverse Ackerman’s function and n = |V |, m = |E|.
1
Introduction
The Pk partitioning problem (Pk Partition in short) consists, given a simple graph G = (V, E) on k×n vertices, of deciding if there exists a partition of V into (V1 , · · · , Vn ) such that for 1 ≤ i ≤ n, |Vi | = k and the subgraph G[Vi ] induced by Vi contains a Hamiltonian path. In other words, we want to know if there exists n vertex disjoint simple paths of length k in G. The analogous problem where the subgraph G[Vi ] induced by Vi is isomorphic to Pk (the chordless path on k vertices) will be denoted by induced Pk Partition. These two problems are NP-complete for any k ≥ 3, and polynomial otherwise, [8,13]. In fact, they both are a particular case of a more general problem called partition into isomorphic subgraphs, [8]. In [13], Kirkpatrick and Hell give a necessary and sufficient condition for the NP-completeness of the partition into isomorphic subgraphs problem in general graphs. Pk Partition has been widely studied in the literature, mainly because its NP-completeness also implies the NP-hardness of two famous optimization Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 422–433, 2007. c Springer-Verlag Berlin Heidelberg 2007
The Pk Partition Problem and Related Problems in Bipartite Graphs
423
problems, namely: the minimum k-path partition problem (denoted by MinkPathPartition) and the maximum Pk packing problem (MaxPk Packing in short). Mink-PathPartition consists of partitioning the vertex set of a graph G = (V, E) into the smallest number of paths such that each path has at most k vertices (for instance, Min2-PathPartition is equivalent to the edge cover problem); the optimal value is usually denoted by ρk−1 (G), and by ρ(G) when no constraint occurs on the length of the paths (in particular, we have ρ(G) = 1 iff G has an Hamiltonian path). Mink-PathPartition has been extensively studied in the literature, [19,18,22], and has applications in broadcasting problems, see for example [22]. MaxPk Packing (resp., MaxInducedPk Packing), consists, given a simple graph G = (V, E), of finding a maximum number of vertex-disjoint (resp., induced) Pk . In their weighted versions (denoted MaxWPk Packing and MaxWInducedPk Packing, respectively), the input graph G = (V, E) is given together with a weight function w : E → N on its edges; the goal is to find a collection = {P1 , . . . , Pq } of vertex-disjoint (resp., induced Pk ) maximizing q P w(P) = i=1 e∈Pi w(e). Some approximation results for MaxWPk Packing when the graph is complete on k × n vertices are given in [9,10,15]. In this case, each solution contains exactly n vertex disjoints paths of length k − 1 (note that, in this particular case, the minimization version may also be considered). This problem is related to the vehicle routing problem, [22,3]. Here, we study the complexity of Pk Partition and induced Pk Partition in the case of bipartite graphs. We first show that Pk Partition and induced Pk Partition are NP-complete for any k ≥ 3 in bipartite graphs of maximum degree 3. Moreover, for k = 3, this remains true even if the graph is planar. On the opposite, Pk Partition, induced Pk Partition, Mink-PathPartition and MaxWPk Packing trivially become polynomial-time computable in graphs of maximum degree 2 and in forests. Then, we prove that, in bipartite graphs of maximum degree 3, MaxPk Packing and MaxInducedPk Packing are not in PTAS. More precisely, we prove that there is a constant εk > 0 such that it is NP-hard to decide whether a maximum (induced) Pk -packing of a bipartite graph of maximum degree 3 on kn vertices is of size n or of size upper bounded by (1−εk )n. Finally, we propose a 3/2-approximation for Min3-PathPartition in general graphs and a 1/3 (resp., 1/2)-approximation for MaxWP3 Packing in general (resp., bipartite) graphs of maximum degree 3. This paper is organized as follows: in the next section, we briefly present previous related works about the hardness of solving bounded-size-path packing problems. Then, the third part is dedicated to complexity results concerning the problems Pk Partition, induced Pk Partition, MaxInducedPk Packing and MaxPk Packing in bipartite graphs. Finally, some approximation results concerning MaxWP3 Packing and Min3-PathPartition are proposed in a fourth section. A full version of this paper has been published as Technical Report, [16]. The notations are the usual ones according to graph theory. Moreover, we exclusively work on undirected simple graphs. In this paper, we often identify a path P of length k − 1 with Pk , even if P contains a chord. However, when
424
J. Monnot and S. Toulouse
we deal with induced Pk Partition, the paths considered will be chordless. We denote by opt(I) and apx(I) the value of an optimal and of an approximate solution, respectively. We say that an algorithm A is an ε-approximation with ε ≥ 1 for a minimization problem (resp., with ε ≤ 1 for a maximization problem) if apx(I) ≤ ε × opt(I) (resp., apx(I) ≥ ε × opt(I)) for any instance I (for more details, see for instance [2]).
2
Previous Related Work
The minimum k-path partition problem is obviously NP-complete in general graphs [8], and remains intractable in comparability graphs, [19], in cographs, [18], and in bipartite chordal graphs, [19] (when k is part of the input). Note that most of the proofs of NP-completeness actually establish the NP-completeness of Pk Partition. Nevertheless, the problem turns out to be polynomial-time solvable in trees, [22], in cographs when k is fixed, [18] or in bipartite permutation graphs, [19]. Note that one can also find in the literature several results about partitioning the graph into disjoints paths of length at least 2, [20,11]. Concerning the approximability of related problems, Hassin and Rubinstein, [9] proposed a generic algorithm to approximate MaxWP4 Packing in complete graphs on 4n vertices that guarantees an approximation ratio of 3/4 for general distance function. More recently in [15], it has been proven that this algorithm is also a 9/10-approximation for the 1, 2-instances. For the minimization version, it provides respectively a 3/2- and a 7/6-approximation for the metric and the 1, 2- instances in complete graphs on 4n vertices (in this case, we seek a maximal P4 -Packing of minimum weight). In [10], the authors proposed a (35/67 − ε)-approximation for MaxP3 Partition in complete graphs on 3n vertices using a randomized algorithm. To our knowledge, there is no specific approximation results for MaxWP3 Packing in general graphs. However, using approximation results for the maximum weighted 3-packing problem (mainly based on local search techniques), [1], we can obtain a ( 12 − ε)-approximation for MaxWP3 Packing. Finally, there is, to our knowledge, no approximation result for Mink-PathPartition. Nevertheless, when the problem consists of maximizing the number of edges used by the paths, then we can find some approximation results, in [21] for the general case, in [5] for dense graphs.
3
Complexity Results
Theorem 1. Pk Partition and induced Pk Partition are NP-complete in bipartite graphs of maximum degree 3, for any k ≥ 3. As a consequence, the problems MaxPk Packing and Mink-PathPartition are NP-hard in bipartite graphs with maximum degree 3, for any k ≥ 3. Proof (sketch). The proof is based on a reduction from the k-dimensional matching problem, denoted by kDM, which is known to be NP-complete, [8]. Since the paths of length k − 1 that are used in this reduction are chordless, the
The Pk Partition Problem and Related Problems in Bipartite Graphs
425
Fig. 1. The gadget H(ci ) when ci is a 3-tuple
Fig. 2. The gadget H(ej ) for k = 3 and dj = 2
result holds for both Pk Partition and induced Pk Partition. An instance of kDM consists of a subset C = {c1 , . . . , cm } ⊆ X1 × . . . × Xk of k-tuples, where X1 , . . . , Xk are k pairwise disjoint sets of size n. A matching is a subset M ⊆ C such that no element in M agrees in any coordinate, and the purpose of kDM is to answer the question: does there exist a perfect matching M on C, that is, a matching of size n? We first do the proof for odd values of k. Given an instance I = (C, X1 × . . . × Xk ) of kDM, we build an instance G = (V, E) of Pk Partition, where G is a bipartite graph of maximum degree 3, as follows: • To each k-tuple ci ∈ C, we associate a gadget H(ci ) that consists of a col i,q for lection P i,1 , . . . , P i,k of k vertex-disjoint Pk with P i,q = ai,q 1 , . . . , ak
i,q+1 i,q q = 1, . . . , k. We add to H(c ] for q = 1 to k − 1, in order i ) the edges [a1 , a1 i,1 i,k (see Figure 1 for an illustration when to form a (k + 1)-th Pk a1 , . . . , a1 k = 3). • For each element ej ∈ X1 ∪ . . . ∪ Xk , let dj denote the number of k-tuples j j ci ∈ C that contain ej ; the gadget H(ej ) is defined as a cycle v1j , . . . , vN j +1 , v1
on N j + 1 vertices, where N j = k(2dj − 1). Moreover, for p = 1 to dj , we denote by lpj the vertex of index 2k(p − 1) + 1 (see Figure 2 for an illustration of H(ej ) when k = 3 and dj = 2). • Finally, for any couple (ej , ci ) such that ej is the value of ci on the q-th coorj dinate, the two gadgets H(ci ) and H(ej ) are connected using an edge [ai,q 2 , lpi ]. j The vertices lpi that will be linked to a given gadget H(ci ) must be chosen in such a way that each vertex lpj from any gadget H(ej ) will be connected to exactly one gadget H(ci ) (this is possible since each H(ej ) contains exactly dj vertices lpj ).
This construction leads to a graph on 3k 2 m + (1 − k)kn vertices: consider, on the one hand, that each gadget H(ci ) is a graph on k 2 vertices and, on the
426
J. Monnot and S. Toulouse
Fig. 3. Two possible vertex partitions of a H(ci ) gadget into 2-length paths
kn other hand, that j=1 dj = km (wlog., we assume that each element ej appears at least once in C). Finally, G is obviously bipartite of maximum degree 3. We claim that there exists a perfect matching M ⊆ C iff there exists a partition P ∗ of G into Pk . The following property can be easily proved: Property 1. In any partition of G into Pk , and for any i = 1, . . . , m, one uses either P i or Qi , where P i and Qi are the collections of paths defined as: ⎧ i,q ⎨ P i,q = ai,q , . . . , a , l i,q 2 k ∀i = 1, . . . , m, ∀q = 1, . . . , k, ⎩ Qi,q = ai,q , . . . , ai,q , ai,q 1 2 k (where li,q denotes the vertex from some H(ej ) linked to ai,q 2 ).
i,k i,2 P i = ∪kq=1 P i,q ∪ ai,1 1 , a1 , . . . , a1 ∀i = 1, . . . , m, Qi = ∪kq=1 Qi,q
Let M be a perfect matching on C; we build a packing P applying the following rule: if a given element ci belongs to M , then we use P i to cover H(ci ), and we use Qi otherwise; Figure 3 illustrates this construction for 3DM. Since M is a perfect matching, exactly one vertex lp per gadget H(ej ) is already covered by some P i,q . Thus, on a given cycle H(ej ), the N j = k(2dj −1) vertices that remain uncovered can easily be covered using a sequence of (2dj − 1) vertex disjoints Pk . Conversely, let P ∗ = {P1 , . . . , Pr } be a partition of G into Pk ; since each gadget H(ej ) has N j = k(2dj − 1) + 1 vertices, at least one edge e of some Pℓ in P ∗ links H(ej ) to a given H(ci ), using a lp vertex; we deduce from Property 1 that Pℓ is some P i,q path and thus, that lp is the only vertex from H(ej ) that intersects Pℓ . Consider now any two vertices lp and lp′ , p < p′ , from H(ej ); since lp = v2k(p−1)+1 and lp′ = v2k(p′ −1)+1 , there are 2k(p′ − p) − 1 vertices between lp and lp′ , which might not be covered by any collection of Pk . Hence, exactly one vertex from each H(ej ) is covered by some P i,q . Concerning H(ci ), we already i i know that its vertices may be covered by either P , or Q ; hence, by setting i ∗ M = ci | P ⊆ P , we define a perfect matching, and the proof is complete. The proof is quite identical for even values of k. The only difference lies on j j j the H(ej ) gadgets, that consists of a cycle v1 , . . . , vN j , v1 on N j vertices, plus j j an additional edge [vN j , vN j +1 ].
The Pk Partition Problem and Related Problems in Bipartite Graphs
427
If we decrease the maximum degree of the graph down to 2, we can easily prove that Pk Partition, induced Pk Partition, MaxPk Packing and Mink-PathPartition are polynomial-time computable. The same fact holds for MaxWPk Packing, although it is a little bit complicated. Moreover, this result holds in forests. Proposition 1. MaxWPk Packing is polynomial in graphs with maximum degree 2 and in forests, for any k ≥ 3. Proof (sketch). We reduce the problem of computing an optimal solution of MaxWPk Packing in graphs with maximum degree 2 (or in a forest) to the problem of computing a maximum weight independent set (MaxWIS in short) in an interval (or chordal) graph, which is known to be polynomial, [7]. The reduction that is made is the usual one when dealing with set packing problems: from an instance of MaxWPk Packing, we construct a graph G′ = (V ′ , E ′ ) where V ′ is isomorphic to the set of Pk of the initial graph and where E ′ describes the intersection relation between the Pk ; the weight associated to a vertex from V ′ is naturally set to the weight of the Pk this vertex represents. On the other hand, if we restrict our attention to planar bipartite graphs of maximum degree 3, P3 Partition and induced P3 Partition remain intractable. Theorem 2. P3 Partition and induced P3 Partition are NP-complete in planar bipartite graphs with maximum degree 3. As a consequence, MaxP3 Packing and Min3-PathPartition are NP-hard in planar bipartite graphs with maximum degree 3. Proof (sketch). The construction made in Theorem 1 transforms an instance of the planar 3-dimensional matching problem (Planar 3DM-3 in short), which is still NP-complete, [6], into a planar graph (just note that the choice of the vertex lpj from H(ej ) that will be linked to H(ci ) is no longer free, but depends on the characteristic graph of the input instance of Planar 3DM-3). Lemma 1. For any k ≥ 3, there is a constant εk > 0, such that ∀G = (V, E) instance of MaxPk Packing (resp., MaxInducedPk Packing) where G is a bipartite graph of maximum degree 3, it is NP-complete to decide whether opt(G) = |V | |V | k or opt(G) ≤ (1 − εk ) k , where opt(G) is the value of a maximum (resp., maximum induced) Pk -Packing on G. Proof (sketch). The argument is based on an APX-hardness result concerning the optimization version of kDM (denoted by MaxkDM): for any k ≥ 3, there exists a constant ε′k > 0, such that ∀I = (C, X1 ×. . .×Xk ) instance of MaxkDM with n = |Xq | ∀q, it is NP-complete to decide whether opt(I) = n or opt(I) ≤ (1 − ε′k )n, where opt(I) is the value of a maximum matching on C. Furthermore, this result also holds if we restrict our attention to instances of MaxkDM with bounded degree, namely, to instances verifying dj ≤ f (k) ∀j, where f (k) is a constant (we refer to [17] for k = 3, to [12] for other values of k). Let I be an instance of MaxkDM such that ∀ej ∈ X1 ∪ . . . ∪ Xk , dj ≤ f (k). Consider
428
J. Monnot and S. Toulouse
the graph G = (V, E) produced in Theorem 1. We recall that G is bipartite, of maximum degree 3, on |V | = 3k 2 m + (1 − k)n vertices (where m = |C|). Furthermore, all paths of length k − 1 in G are chordless. Let P ∗ be an optimal solution of MaxPk Packing with value opt(G). The argument lies on that we can assume wlog. the following two facts: (i) For any k-uple ci , P ∗ contains either the packing P i , or the packing Qi of the gadget H(ci ). (ii) For any element ej , P ∗ contains exactly 2dj −1 paths from the gadget H(ej ). Under these assumptions, if m0 denotes the number of elements ci such that P ∗ contains P i , we observe that opt(I) = m0 and thus, we have: opt(G) = (3km − kn) + opt(I). Hence, deciding whether opt(I) = n or opt(I) ≤ (1 − ε′k )n and deciding whether opt(G) = (3km−kn)+n or opt(G) ≤ (3km−kn)+(1−ε′k )n n are equivalent. By setting εk = 3km−kn+n ε′k , we have (3km − kn) + (1 − ε′k )n = j (1 − εk )(3km − kn + n). Finally, since d ≤ f (k) where f (k) is a constant, we 1 ε′k , which completes the deduce that km ≤ kf (k)n and then, εk ≥ 3kf (k)+1−k proof. The APX-hardness immediately follows. Some interesting questions concern the complexity of Pk Partition (or induced Pk Partition) for k ≥ 4 in planar bipartite graphs with maximum degree 3 and the APX-hardness of MaxPk Packing and MaxInducedPk Packing (or MaxInducedPk Packing) for k ≥ 3 in planar bipartite graphs with maximum degree 3.
4
Approximation Results
We present some approximation results for the problems MaxWP3 Packing and Min3-PathPartition, that are mainly based on matching and spanning tree heuristics. 4.1
MaxWP3 Packing in Graphs of Maximum Degree 3
For this problem, the best approximate algorithm known so far provides a ratio of ( 12 − ε), within high (but polynomial) time complexity. This algorithm is deduced from the one proposed in [1] to approximate the weighted k-set packing problem for sets of size 3. Furthermore, a simple greedy 1/k-approximation of MaxWPk Packing consists of iteratively picking a path of length k − 1 that is of maximum weight. For k = 3 and in graphs of maximum degree 3, the time complexity of this algorithm is between O(n log n) and O(n2 ) (depending on the encoding structure). Actually, in such graphs, one may reach a 1/3-approximate solution, even in time O(α(n, m)n), where α is the inverse Ackerman’s function and m ≤ 3n/2. Theorem 3. MaxWP3 Packingis1/3approximablewithin O(α(n, 3n/2)n)time complexity in graphs of maximum degree 3; this ratio is tight for the algorithm we analyze.
The Pk Partition Problem and Related Problems in Bipartite Graphs
429
Fig. 4. The tightness
Proof. We assume that the graph is connected (otherwise, we apply the same proof on each connected component containing at least 3 vertices). The argument lies on the following observation: for any spanning tree of maximum degree 3 containing at least 3 vertices, one can build a cover of its edge set into 3 packings of P3 within linear time (a formal proof is given in appendix). Hence, given a weighted connected graph G = (V, E) of maximum degree 3, we compute a maximum-weight spanning tree T = (V, ET ) on G. Because G is of maximum degree 3, this can be done in O(α(n, 3n/2)n) time, [4]. We then compute (P 1 , P 2 , P 3 ) a P3 -packing cover of T and finally, pick the best P3 -packing among P 1 , P 2 and P 3 . The value of this packing is at least 1/3 times the weight of T , which is at least the weight of an optimal P3 -Packing on G, since any P3 -Packing can be extended into a spanning tree. The tightness of this algorithm is illustrated in Figure 4: the edges of ET are drawn in rigid lines, whereas the edges of E\ET are drawn in dotted lines; finally, all the edges with no mention of their weight are of weight 1. Observe that an optimal P3 -packing on T is of weight n + 3, whereas opt(I) = 3n + 3. For the unweighted case, we easily see that an optimal P3 -packing uses at most 2|V |/3 edges. Moreover, computing a spanning tree can be done in linear time, and we can prove that the 3 packing outputted by the solution cover at least |V | vertices. Thus, using Theorem 3, we deduce: Corollary 1. MaxP3 Packing is 1/2 approximable within linear time complexity in graphs of maximum degree 3. 4.2
MaxWP3 Packing in Bipartite Graphs of Maximum Degree 3
If we restrict our attention to bipartite graphs, we slightly improve the ratio of 1 1 2 − ε ([1]) up to 2 . We then show that, in the unweighted case, this result holds without any constraint on the graph maximum degree. From I = (G, w) where G is a bipartite graph G = (L ∪ R, E) of maximum degree 3, we build two weighted graphs (GL , dL ) and (GR , dR ), where GL = (L, EL ) and GR = (R, ER ). Two vertices x = y from L are linked in GL iff there exists in G a path of length 2 Px,y from x to y, rigorously: [x, y] ∈ EL iff ∃z ∈ R s.t. [x, z], [z, y] ∈ E. The distance dL (x, y) is defined as dL (x, y) = max{w(x, z) + w(z, y)|[x, z], [z, y] ∈ E}. (GR , dR ) is defined by considering R instead of L. If G is of maximum degree 3, then the following fact holds:
430
J. Monnot and S. Toulouse
Lemma 2. From any matching M on GL (resp., on GR ), one can deduce a P3 packing PM of weight w(PM ) = dL (M ) (resp., w(PM ) = dR (M )), when G is of degree at most 3.) Proof. We only prove the result for GL . Let M be a matching on GL . For any edge e = [x, y] ∈ M , there exists in G a chain Pe = {x, ze , y} with w(Pe ) = dL (e). Let us show that PM = {Pe |e ∈ M } is a packing. Assume the reverse: then, there exists two edges e1 = [x1 , y1 ] and e2 = [x2 , y2 ] in M such that Pe1 ∩ Pe2 = ∅. Since {e1 , e2 } is a matching, the four vertices x1 , x2 , y1 and y2 are pairwise distinct and then necessarily ze1 = ze2 . Hence, ze1 is linked to 4 vertices in G, which contradicts the fact that the maximum degree in G does not exceed 3. Weighted P3 -Packing 1 Build the weighted graphs (GL , dL ) and (GR , dR ); 2 Compute a maximum weight matching ML∗ (resp., MR∗ ) on (GL , dL ) (resp., on (GR , dR )); 3 Deduce from ML∗ (resp., MR∗ ) a P3 packing PL (resp., PR ) according to Lemma 2; 4 Output the best packing P among PL and PR . The time complexity of this algorithm is mainly the time complexity of computing a maximum weight matching in graphs of maximum degree 9, that is O(|V |2 log |V |), [14]. Theorem 4. Weighted P3 -Packing provides a 1/2-approximation for the problem MaxWP3 Packing in bipartite graphs with maximum degree 3 and this ratio is tight. Proof. Let P ∗ be an optimum P3 -packing on I = (G, w), we denote by PL∗ ∗ (resp., PR ) the paths of P ∗ of which the two endpoints belong to L (resp., R); thus, opt(I) = w(PL∗ ) + w(PL∗ ). For any path P = Px,y ∈ PL∗ , [x, y] is an edge from EL , of weight dL (x, y) ≥ w(Px,y ). Hence, ML = {[x, y]|Px,y ∈ PL∗ } is a matching on GL that satisfies: d(ML ) ≥ w(PL∗ )
(1)
Moreover, since ML∗ is a maximum weight matching on GL , we have dL (ML ) ≤ dL (ML∗ ). Thus, using inequality (1) and Lemma 2 (and by applying the same arguments on GR ), we deduce: ∗ w(PL ) ≥ w(PL∗ ), w(PR ) ≥ w(PR )
(2)
Finally, the solution outputted by the algorithm satisfies w(P) ≥ 1/2(w(PL ) + w(PR )); thus, we directly deduce from inequalities (2) the expected result. The instance I = (G, w) that provides the tightness is depicted in Figure 5. It consists of a graph on 12n vertices on which one can easily observe that w(PL ) = w(PR ) = 2n(n + 2) and w(P ∗ ) = 2n(2n + 2).
The Pk Partition Problem and Related Problems in Bipartite Graphs
431
Fig. 5. The tightness
Concerning the unweighted case, we may obtain the same performance ratio without the restriction on the maximum degree of the graph. The main differences with the previous algorithm lies on the construction of the two graphs GL , GR : starting from G, we duplicate each vertex ri ∈ R by adding a new vertex ri′ with the same neighborhood as ri (this operation, often called multiplication of vertices in the literature, is used in the characterization of perfect graphs). Finally, we add the edge [ri , ri′ ]. If RL denotes the vertex set {ri , ri′ |ri ∈ R}, then the following property holds: Property 2. From any matching M on GL , one can deduce a matching M ′ on GL that saturates RL , and such that |M ′ | ≥ |M |. Let M be a matching on GL . If none of the two vertices ri and ri′ for some i are saturated by M , then set M ′ = M ∪ {[ri , ri′ ]}. If exactly one of them is saturated by a given edge e from M , then set M ′ = (M \ {e}) ∪ {[ri , ri′ ]}. In any case, M ′ is still a matching of size at least |M |. Thus, the expected result is obtained by applying this process to each vertex of RL . Theorem 5. There is a 1/2-approximation for MaxP3 Packing in bipartite √ graphs and this ratio is tight. The complexity time of this algorithm is O(m n). 4.3
Min3-PathPartition in General Graphs
To our knowledge, the approximability of Mink-PathPartition (or MinPathPartition) has not been studied so far. Here, we propose a 3/2-approximation for Min3-PathPartition. Although this problem can be viewed as an instance of 3-set cover (interpret the set of all paths of length 0,1, or 2 in G as sets on V ), Min3-PathPartition and the minimum 3-set cover problem are different. For instance, consider a star K1,2n ; the optimum value of the corresponding 3-set cover instance is n, whereas the optimum value of the 3-path partition is 2n − 1. Note that, concerning MinPathPartition (that is, the approximation of ρ(G)), we can trivially see that it is not (2−ε)-approximable, from the fact that deciding whether ρ(G) = 1 or ρ(G) ≥ 2 is NP-complete. Actually, we can more generally establish that ρ(G) is not in APX: otherwise, we could obtain a PTAS for the traveling salesman problem with weight 1 and 2 when opt(I) = n, which is not possible, unless P=NP.
432
J. Monnot and S. Toulouse
Computing ρ2 (G) 1 Compute a maximum matching M1∗ on G; 2 Build a bipartite graph G2 = (L, R; E2 ) where L = {le |e ∈ M1∗ }, R = {rv |v ∈ V \ V (M1∗ )}, and [le , rv ] ∈ E2 iff the corresponding isolated vertex v∈ / V (M1∗ ) is adjacent in G to the edge e ∈ M1∗ ; 3 Compute a maximum matching M2∗ on G2 ; 4 Output P ′ the 3-paths partition deduced from M1∗ , M2∗ , and V \ V (M1∗ ∪ M2∗ ). Precisely, if M1′ ⊆ M1∗ is the set of edges adjacent to M2∗ , then the paths of length 2 are given by M1′ ∪ M2∗ , the paths of length 1 are given by M1∗ \ M1′ , and the paths of length 0 (that is, the isolated vertices) are given by V \ V (M1∗ ∪ M2∗ ); The time complexity of this algorithm is O(nm + n2 log n), [14]. Theorem 6. Min3-PathPartition is 3/2-approximable in general graphs; this ratio is tight for the algorithm we analyze. Proof (sketch). Let G = (V, E) be an instance of Min3-PathPartition. Let P ∗ = (P2∗ , P1∗ , P0∗ ) be an optimal solution on G, where Pi∗ denotes for i = 0, 1, 2 the set of paths of length i. By construction of the approximate solution, we have: (3) apx(I) = |V | − |M1∗ | − |M2∗ |
We consider a subgraph G′2 = (L, R′ ; E2′ ) of G2 where R′ and E2′ are defined / P0∗ } and E2′ contains the edges [le , rv ] ∈ E2 such that v is as: R′ = {rv ∈ R|v ∈ adjacent to e via an edge that belongs to the optimal solution. By construction of G′2 , from the optimality of M1∗ , and because P ∗ is a 3-path packing, we deduce that dG′2 (r) ≥ 1 for any r ∈ R′ , and that dG′2 (l) ≤ 2 for any l ∈ L, where dG′2 (v) is the degree of vertex v in graph G′2 . Hence, G′2 contains a matching that is of size at least one-half |R′ | and thus: |M2∗ | ≥ 1/2|R′ | = 1/2 (|V | − 2|M1∗ | − |P0∗ |)
(4)
Using inequalities (3) and (4), and considering that |V | = 3|P2∗ |+2|P1∗ |+1|P0∗ |, we deduce: apx(I) ≤ 1/2 (|V | + |P0∗ |) and opt(I) ≥ 1/3 (|V | + |P0∗ |)
References 1. Arkin, E. and Hassin, R.: On Local Search for Weighted Packing Problems. Mathematics of Operations Research 23 (1998) 640–648 2. Ausiello, G. Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A., and Protasi, M.: Complexity and Approximation (Combinatorial Optimization Problems and Their Approximability Properties). Springer, Berlin (1999)
The Pk Partition Problem and Related Problems in Bipartite Graphs
433
3. Bazgan, C., Hassin, R., and Monnot, J.: Approximation Algorithms for Some Routing Problems. Discrete Applied Mathematics 146 (2005) 3–26 4. Chazelle, B.: A Minimum Spanning Tree Algorithm with Inverse-Ackermann Type Complexity. J. ACM 47 (2000) 1028–1047 5. Csaba, B., Karpinski, M., and Krysta, P.: Approximability of Dense and Sparse Instances of Minimum 2-Connectivity, TSP and Path Problems. SODA (2002) 74–83 6. Dyer, M., Frieze, A.: Planar 3DM is NP-Complete. J. Algorithms 7 (1986) 174–184 7. Frank, A.: Some Polynomial Time Algorithms for Certain Graphs and Hypergraphs. Proceedings of the 5th British Combinatorial Conference, Congressus Numerantium XV, Utilitas Mathematicae, Winnipeg (1976) 211–226 8. Garey, M.R. and Johnson, D.S.: Computers and Intractability. A guide to the Theory of NP-Completeness. CA, Freeman (1979) 9. Hassin, R. and Rubinstein, S.: An Approximation Algorithm for Maximum Packing of 3-Edge Paths. Inf. Process. Lett. 63 (1997) 63–67 10. Hassin, R. and Rubinstein, S.: An Approximation Algorithm for Maximum Triangle Packing. ESA, LNCS 3221 (2004) 403–413 11. Kaneko, A.: A Necessary and Sufficient Condition for the Existence of a Path Factor Every Component of Which is a Path of Length at Least Two. Journal of Combinatorial Theory, Series B 88 (2003) 195–218 12. Karpinski, M.: Personnal communication. (2006) 13. Kirkpatrick, D.G. and Hell, P.: On the Completeness of a Generalized Matching Problem. Proc. STOC’78 (1978) 240–245 14. Lovasz, L. and Plummer, M.D.: Matching Theory. North-Holland, Amsterdam (1986) 15. Monnot, J. and Toulouse, S.: Approximation Results for the Weighted P4 Partition Problem. The Symposia on Fundamentals of Computation Theory, F.C.T.’2005, LNCS 3623 (2005) 377–385 16. Monnot, J. and Toulouse, S.: The Pk Partition Problem and Related Problems in Bipartite Graphs. Technical Report (2006) (available at http://www.lamsade.dauphine.fr/∼monnot/publications(journal).htm) 17. Petrank, E.: The Hardness of Approximation: Gap Location. Computational Complexity 4 (1994) 133–157 18. Steiner, G.: On the k-Path Partition Problem in Cographs. Cong. Numer. 147 (2000) 89–96 19. Steiner, G.: On the k-Path Partition of Graphs. Theor. Comput. Sci. 290 (2003) 2147-2155 20. Wang, H.: Path Factors of Bipartite Graphs. Journal of Graph Theory 18 (1994) 161–167 21. Vishwanathan, S.: An Approximation Algorithm for the Asymmetric Travelling Salesman Problem with Distances One and Two. Information Processing Letter 44 6 (1992) 297–302 22. Yan, J.-H., Chang, G.J., Hedetniemi, S.M., and Hedetniemi, S.T.: k-Path Partitions in Trees. Discrete Applied Mathematics 78 (1997) 227–233
Spatial Selection of Sparse Pivots for Similarity Search in Metric Spaces⋆ Oscar Pedreira and Nieves R. Brisaboa Database Laboratory, Facultade de Informatica, University of A Coru˜ na Campus de Elvi˜ na s/n, 15071 A Coru˜ na, Spain {opedreira,brisaboa}@udc.es
Abstract. Similarity search is a necessary operation for applications dealing with unstructured data sources. In this paper we present a pivotbased method useful, not only to obtain a good pivot selection without specifying in advance the number of pivots, but also to obtain an insight in the complexity of the metric space. Sparse Spatial Selection (SSS) adapts itself to the dimensionality of the metric space, is dynamic, and it is suitable for secondary memory storage. In this paper we provide experimental results that confirm the advantages of the method with several metric spaces. Moreover, we explain how SSS can be easily parallelized. Finally, in this paper we conceptualize Nested Metric Spaces, and we prove that, in some applications areas, objects can be grouped in different clusters with different associated metric spaces, all of them nested into the general metric space that explains the distances among clusters.
1
Introduction
Similarity search has become a very important operation in applications that deal with unstructured data sources. The computational cost of the algorithms that determine the similarity between two objects makes similarity search an expensive operation. This fact has motivated the development of many research works aiming to do efficient similarity search over large collections of data. The similarity search problem can be formally defined through the concept of metric space. A metric space (X, d) is composed of a universe of valid objects X and a distance function d : X × X −→ R+ defined among them. This function holds several properties: strictly positiveness (d(x, y) > 0 and if d(x, y) = 0 then x = y), symmetry (d(x, y) = d(y, x)), and the triangle inequality (d(x, z) ≤ d(x, y) + d(y, z)). The finite subset U ⊆ X with size n = |U|, represents the collection of objects where searches are performed. A k-dimensional vector space is a particular case of metric space in which every object is represented by a vector of k real coordinates. The dimensionality of a vector space is clearly k, ⋆
This work has been partially supported by CYTED VII.J (RITOS2), MCYT (PGE and FEDER) grants TIC2003-06593 and TIN2006-15071-C03-03, and Xunta de Galicia grant PGIDIT05SIN10502PR.
Jan van Leeuwen et al. (Eds.): SOFSEM 2007, LNCS 4362, pp. 434–445, 2007. c Springer-Verlag Berlin Heidelberg 2007
Spatial Selection of Sparse Pivots for Similarity Search in Metric Spaces
435
the number of components of each vector. Although general metric spaces do not have an explicit dimensionality, we can talk about their intrinsic dimensionality, following the idea presented in [1] were it is defined as µ2 /2σ 2 (being μ and σ 2 the mean and variance of d respectively). The higher the dimensionality the more difficult the search. The definition of the distance function d depends on the type of the objects we are managing. For example, in the case of a vector space, d could be a dis1 tance function of the family Ls , defined as Ls (x, y) = ( 1≤i≤k |xi − yi |s ) s . For instance, L1 is known as Manhattan distance, L2 is the Euclidean distance, and L∞ = max1≤i≤k |xi − yi | is the maximum distance. There are three main queries of interest in a metric space: i) range search, that retrieves all the objects u ∈ U within a radius r of the query q, that is: {u ∈ U / d(q, u) ≤ r}; ii) nearest neighbor search, that retrieves the most similar object to the query q, that is: {u ∈ U / ∀ v ∈ U, d(q, u) ≤ d(q, v)}; and iii) k-nearest neighbors search, retrieving the set A ⊆ U such that |A| = k and ∀ u ∈ A, v ∈ U − A, d(q, u) ≤ d(q, v). The range query is the most used, and the others can be implemented in terms of it [1]. In any case, the distance function is the unique information that can be used in order to perform searches. The naive way of implementing those operations is to compare all the objects in the collection against the query. The problem is that the evaluation of the distance function is very expensive, and therefore searches become inefficient if the collection has a high number of elements. Thus, reducing the number of evaluations of the distance function is the main goal of the methods for similarity search in metric spaces. The existing techniques differ usually in some features. Some of them allow only discrete (and not continuous) distances. There are also static methods, where the index has to be build on the whole collection, and dynamic techniques where the index is built as elements are added to an initially empty collection. Other important factor is the possibility of storing the index efficiently into secondary storage, and the number of I/O operations needed to access it. In general, the applicability and efficiency of a method depends on this issues. Search methods can be classified into two types [1]: clustering-based and pivotbased techniques. Clustering-based techniques split the metric space into a set of equivalence regions each of them represented by a cluster center. During searches, whole regions can be discarded depending on the distance from their cluster center to the query. But the technique we present here is pivot-based, therefore a more detailed explanation about pivot-based methods will be provided later. We have developed Sparse Spatial Selection (SSS), a new pivot-based technique. SSS is a dynamic method since the collection can be initially empty and/or grow later. It works with continuous distance functions and is suitable for secondary memory storage. The main contribution of SSS is the use of a new pivot selection strategy. This strategy generates a number of pivots that depends on the intrinsic dimensionality of the space (something interesting from both the theoretical and practical points of view). Moreover, SSS can be easily
436
O. Pedreira and N.R. Brisaboa
parallelized as we show in this paper. On the other hand SSS can be extended to deal with more complex metric spaces where the distances among subsets of objects depend on specific dimensions that are not relevant for other set of objects. That is, in some applications areas, objects can be grouped in different clusters with different associated metric spaces, all of them nested into the general metric space that explains the distances among clusters. To deal with these complex spaces we propos