Fast Software Encryption: 17th International Workshop, FSE 2010, Seoul, Korea, Februara 7-10, 2010 Revised Selected Papers (Lecture Notes in Computer Science Security and Cryptology)

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...

Author: Seokhie Hong | Tetsu Iwata

71 downloads 911 Views 4MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany

6147

Seokhie Hong Tetsu Iwata (Eds.)

Fast Software Encryption 17th International Workshop, FSE 2010 Seoul, Korea, February 7-10, 2010 Revised Selected Papers

13

Volume Editors Seokhie Hong Korea University, CIST, Seoul, Korea E-mail: [email protected] Tetsu Iwata Nagoya University, Dept. of Computational Science and Engineering, Japan E-mail: [email protected]

Library of Congress Control Number: 2010929207 CR Subject Classiﬁcation (1998): E.3, K.6.5, D.4.6, C.2, J.1, G.2.1 LNCS Sublibrary: SL 4 – Security and Cryptology ISSN ISBN-10 ISBN-13

0302-9743 3-642-13857-8 Springer Berlin Heidelberg New York 978-3-642-13857-7 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © International Association for Cryptologic Research 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientiﬁc Publishing Services, Chennai, India Printed on acid-free paper 06/3180

Preface

Fast Software Encryption (FSE) 2010, the 17th in a series of workshops on symmetric cryptography, was held in Seoul, Korea, during February 7–10, 2010. Since 2002, the FSE workshop has been sponsored by the International Association for Cryptologic Research (IACR). The ﬁrst FSE workshop was held in Cambridge, UK (1993), followed by workshops in Leuven, Belgium (1994), Cambridge, UK (1996), Haifa, Israel (1997), Paris, France (1998), Rome, Italy (1999), New York, USA (2000), Yokohama, Japan (2001), Leuven, Belgium (2002), Lund, Sweden (2003), New Delhi, India (2004), Paris, France (2005), Graz, Austria (2006), Luxembourg, Luxembourg (2007), Lausanne, Switzerland (2008), and Leuven, Belgium (2009). The FSE workshop concentrates on fast and secure primitives for symmetric cryptography, including the design and analysis of block ciphers, stream ciphers, encryption schemes, analysis and evaluation tools, hash functions, and message authentication codes. This year 67 papers were submitted. Each paper was reviewed by at least three reviewers, and papers (co-)authored by Program Committee members were reviewed by at least ﬁve reviewers. From the 67 papers, 21 were accepted for presentation at the workshop, and these proceedings contain the revised versions of the papers. At the end of the review phase, the Program Committee selected the paper “Attacking the Knudsen-Preneel Compression Functions” by Onur ¨ Ozen, Thomas Shrimpton, and Martijn Stam to receive the best paper award. The workshop also featured two invited talks, “The Survey of Cryptanalysis on Hash Functions” by Xiaoyun Wang and “A Provable-Security Perspective on Hash Function Design” by Thomas Shrimpton. Along with the presentation of the papers and the invited talks, the rump session was organized and chaired by Orr Dunkelman. We would like to thank all the authors for submitting their papers to the workshop. The selection of the papers was a challenging task, and we are deeply grateful to the Program Committee and to all the external reviewers for their hard work to ensure that each paper received a thorough and fair review. We would like to thank Shai Halevi for letting us use his Web Submission and Review Software, which was used for the entire review process from paper submission to preparing these proceedings. We would also like to thank the General Co-chairs, Jongin Lim and Jongsung Kim, for their hard work, and we also would like to express our gratitude to CIST, Korea University and Korea Institute of Information Security and Cryptology (KIISC) for their support in organizing the workshop. The ﬁnancial support given to the FSE 2010 workshop by Electronics and Telecommunications Research Institute (ETRI), Ellipsis, Korea University, LG CNS, and National Institute for Mathematical Science (NIMS) is also gratefully acknowledged. April 2010

Seokhie Hong Tetsu Iwata

FSE 2010 Seoul, Korea, February 7–10, 2010 Sponsored by the International Association for Cryptologic Research (IACR)

General Co-chairs Jongin Lim Jongsung Kim

Korea University, Korea Kyungnam University, Korea

Program Co-chairs Seokhie Hong Tetsu Iwata

Korea University, Korea Nagoya University, Japan

Program Committee Daniel J. Bernstein Alex Biryukov Joan Daemen Orr Dunkelman Helena Handschuh Seokhie Hong (Co-chair) Tetsu Iwata (Co-chair) Thomas Johansson Antoine Joux Charanjit S. Jutla Stefan Lucks Mitsuru Matsui Willi Meier Kaisa Nyberg Elisabeth Oswald Josef Pieprzyk Bart Preneel Christian Rechberger Thomas Ristenpart Matt Robshaw Palash Sarkar Serge Vaudenay Kan Yasuda

University of Illinois at Chicago, USA University of Luxembourg, Luxembourg STMicroelectronics, Belgium ´ Ecole normale sup´erieure, France and Weizmann Institute, Israel Katholieke Universiteit Leuven, Belgium and Intrinsic-ID, USA Korea University, Korea Nagoya University, Japan Lund University, Sweden DGA and Universit´e de Versailles, France IBM T.J. Watson Research Center, USA Bauhaus-Universit¨ at Weimar, Germany Mitsubishi Electric, Japan FHNW, Switzerland Aalto University and NOKIA, Finland University of Bristol, UK Macquarie University, Australia Katholieke Universiteit Leuven, Belgium Katholieke Universiteit Leuven, Belgium UC San Diego, USA Orange Labs, France Indian Statistical Institute, India EPFL, Switzerland NTT, Japan

VIII

Organization

External Reviewers Elena Andreeva Gilles Van Assche Jean-Philippe Aumasson Steve Babbage Guido Bertoni Andrey Bogdanov Christophe De Canni`ere Ji Young Cheon Joo Yeon Cho Martin Cochran Ewan Fleischmann Christian Forler Praveen Gauravaram Benedikt Gierlichs Michael Gorski Johann Groszsch¨adl Jian Guo Risto Hakala Philip Hawkes Miia Hermelin Shoichi Hirose Sebastiaan Indesteege Shahram Khazaei Dmitry Khovratovich

Ilya Kizhvatov Simon Knellwolf Atefeh Mashatan Krystian Matusiewicz Cameron McDonald Sarah Meiklejohn Florian Mendel Kazuhiko Minematsu Petros Mol Nicky Mouha Tomislav Nad Ivica Nikoli´c Khaled Ouaﬁ Andrea R¨ ock Yu Sasaki Martin Schl¨ aﬀer Pouyan Sepehrdad Yannick Seurin Thomas Shrimpton Przemyslaw Sokolowski Daisuke Suzuki Kerem Varıcı Martin Vuagnoux

Organizing Support CIST, Korea University Korea Institute of Information Security and Cryptology (KIISC)

Financial Support Electronics and Telecommunications Research Institute (ETRI) Ellipsis Korea University LG CNS National Institute for Mathematical Science (NIMS)

Table of Contents

Stream Ciphers and Block Ciphers Cryptanalysis of the DECT Standard Cipher . . . . . . . . . . . . . . . . . . . . . . . . Karsten Nohl, Erik Tews, and Ralf-Philipp Weinmann

1

Improving the Generalized Feistel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tomoyasu Suzaki and Kazuhiko Minematsu

19

Nonlinear Equivalence of Stream Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . Sondre Rønjom and Carlos Cid

40

RFID and Implementations Lightweight Privacy Preserving Authentication for RFID Using a Stream Cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Olivier Billet, Jonathan Etrog, and Henri Gilbert Fast Software AES Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dag Arne Osvik, Joppe W. Bos, Deian Stefan, and David Canright

55

75

Hash Functions I Attacking the Knudsen-Preneel Compression Functions . . . . . . . . . . . . . . . ¨ Onur Ozen, Thomas Shrimpton, and Martijn Stam

94

Finding Preimages of Tiger Up to 23 Steps . . . . . . . . . . . . . . . . . . . . . . . . . . Lei Wang and Yu Sasaki

116

Cryptanalysis of ESSENCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mar´ıa Naya-Plasencia, Andrea R¨ ock, Jean-Philippe Aumasson, Yann Laigle-Chapuy, Ga¨etan Leurent, Willi Meier, and Thomas Peyrin

134

Theory Domain Extension for Enhanced Target Collision-Resistant Hash Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ilya Mironov

153

X

Table of Contents

Security Analysis of the Mode of JH Hash Function . . . . . . . . . . . . . . . . . . Rishiraj Bhattacharyya, Avradip Mandal, and Mridul Nandi Enhanced Security Notions for Dedicated-Key Hash Functions: Deﬁnitions and Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohammad Reza Reyhanitabar, Willy Susilo, and Yi Mu

168

192

Message Authentication Codes A Uniﬁed Method for Improving PRF Bounds for a Class of Blockcipher Based MACs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mridul Nandi

212

How to Thwart Birthday Attacks against MACs via Small Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kazuhiko Minematsu

230

Constructing Rate-1 MACs from Related-Key Unpredictable Block Ciphers: PGV Model Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Liting Zhang, Wenling Wu, Peng Wang, Lei Zhang, Shuang Wu, and Bo Liang

250

Hash Functions II Higher Order Diﬀerential Attack on Step-Reduced Variants of Luﬀa v1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dai Watanabe, Yasuo Hatano, Tsuyoshi Yamada, and Toshinobu Kaneko Rebound Attack on Reduced-Round Versions of JH . . . . . . . . . . . . . . . . . . Vincent Rijmen, Deniz Toz, and Kerem Varıcı

270

286

Hash Functions III (Short Presentation) Pseudo-cryptanalysis of the Original Blue Midnight Wish . . . . . . . . . . . . . Søren S. Thomsen

304

Diﬀerential and Invertibility Properties of BLAKE . . . . . . . . . . . . . . . . . . . Jean-Philippe Aumasson, Jian Guo, Simon Knellwolf, Krystian Matusiewicz, and Willi Meier

318

Cryptanalysis Rotational Cryptanalysis of ARX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dmitry Khovratovich and Ivica Nikoli´c

333

Table of Contents

Another Look at Complementation Properties . . . . . . . . . . . . . . . . . . . . . . . Charles Bouillaguet, Orr Dunkelman, Ga¨etan Leurent, and Pierre-Alain Fouque

XI

347

Super-Sbox Cryptanalysis: Improved Attacks for AES-Like Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Henri Gilbert and Thomas Peyrin

365

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

385

Cryptanalysis of the DECT Standard Cipher Karsten Nohl1 , Erik Tews2 , and Ralf-Philipp Weinmann3 1

University of Virginia [email protected] 2 Technische Universit¨ at Darmstadt e [email protected] 3 University of Luxembourg [email protected]

Abstract. The DECT Standard Cipher (DSC) is a proprietary 64-bit stream cipher based on irregularly clocked LFSRs and a non-linear output combiner. The cipher is meant to provide conﬁdentiality for cordless telephony. This paper illustrates how the DSC was reverse-engineered from a hardware implementation using custom ﬁrmware and information on the structure of the cipher gathered from a patent. Beyond disclosing the DSC, the paper proposes a practical attack against DSC that recovers the secret key from 215 keystreams on a standard PC with a success rate of 50% within hours; somewhat faster when a CUDA graphics adapter is available. Keywords: DECT, DECT Standard Cipher, stream cipher, cryptanalysis, linear feedback shift register.

1

Introduction

Cordless phones using the Digital Enhanced Cordless Telecommunications standard (DECT) are among the most widely deployed security technologies with 90 million new handsets shipping every year [1]. However, DECT does not provide suﬃcient security for its intended application ’cordless telephony’ as it fails to deliver conﬁdentiality and access control. The technology is also popular in other applications with even higher security needs including machine automation, building access control, alarm systems, and wireless credit card terminals [2]. DECT’s need for security is covered by two proprietary algorithms: The DECT Standard Authentication Algorithm (DSAA) for authentication [3] and the DECT Standard Cipher (DSC) for encryption. The ﬁrst attacks on DECT became known in 2008 [3]. Researchers demonstrated that encryption and even authentication could easily be switched oﬀ due to insecure DECT implementations that do not enforce them. Furthermore, the researchers observed that even when security is switched on, many devices use highly predictable random numbers thereby undermining the level of protection the DSC aims to achieve. Since the initial ﬁndings, some handsets and base stations have been patched to enforce encryption and to use strong random numbers to mitigate the previous S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 1–18, 2010. c International Association for Cryptologic Research 2010

2

K. Nohl, E. Tews, and R.-P. Weinmann

attacks. Nonetheless, this paper demonstrates that even these improved devices can be attacked by exploiting weaknesses of the DSC. The DECT Standard Cipher is an asynchronous stream cipher with low gate complexity that takes a 64 bit secret key and a 35 bit initialization vector, IV, to generate keystream. DSC is similar to GSM’s A5/1 and was reverseengineered from a DECT device using a combination of ﬁrmware probing and hardware reverse-engineering. The cipher, publicly disclosed for the ﬁrst time1 in this paper, is vulnerable against a clock guessing attack similar to the EkdahlJohansson attack [4] against A5/1. The attack – although it has a large data complexity – can be executed on a PC in hours and allows for passively sniﬀed voice and data connections to be decrypted. However, we were not able to carry over later improvements of the Ekdahl-Johansson attack [5,6] due to speciﬁc traits of A5/1 being used in them that are not present in the DSC. DSC is stronger than A5/1 by statistical indicators such as the non-linearity of the round and ﬁlter function, key size and state size. However, DSC as used in DECT is initialized in less than half the number of rounds when compared to A5/1 in GSM. This underpins that the number of initialization rounds is a major security metric of stream ciphers. The attack on DSC presented in this paper provides a trade-oﬀ between the number of available data samples and the time needed to calculate the secret state. When 215 samples are available, the attack executes in less about 22 minutes on a 16 core Opteron machine clocked at 2.3GHz. DSC and its use in DECT could be improved in several ways, most simply by increasing the number of initialization rounds. Incidentally, switching oﬀ encryption for the DECT control channel eﬀectively increases the number of initialization rounds which as a side eﬀect protects the data channel better. While this countermeasure does make our attack on data conﬁdentiality more diﬃcult, the DSC cipher – like many proprietary ciphers – is conceptually ﬂawed and should not be used for security applications. The successor technology to DECT will hopefully include an open cipher that underwent extensive peer review to provide the appropriate level of conﬁdentiality and authentication strength. The paper is structured as follows: Section 2 gives a description of the DSC, Section 3 describes how the cipher was reverse-engineered and Section 4 shows attacks against the DSC. A high-performance implementation of DSC is described in Section 5; Section 6 discusses DSC’s weaknesses and compares it to A5/1. 1.1

Notation

The internal state of DSC is represented as a 81 bit vector, s ∈ GF (2)81 , comprised of the state of four linear feedback shift registers (LFSRs) and the memory bit of the output combiner. 1

A partial description was given by the deDECTed.org project – of which the authors are members – at the 25th Chaos Communication Congress in Berlin in December 2008. At that point the output combiner and the key loading had not yet been reverse-engineered. This presentation also included the practical attacks described in [3].

Cryptanalysis of the DECT Standard Cipher

3

Since state transitions are performed by linear operations, we will use matrices to describe them. The matrices in Table 1 represent the DSC operations: Table 1. Matrices describing linear operations on the internal state Matrix Dimension Description C1 81 × 81 single clock register R1 C2 81 × 81 single clock register R2 C3 81 × 81 single clock register R3 L 81 × 128 load key and IV into state S 6 × 81 extract the ﬁrst two leading bits from R1, R2, and R3

The output combiner O of DSC is a non-linear mapping, depending on the previous bit of output y and 6 bits of the state s. The DSC round function that translates a state into the next round’s state is a non-linear mapping. The pre-ciphering phase which consists of loading the secret key and initialization vector (IV) into the DSC registers and then applying the round function i times is denoted Di . The i initialization rounds are referred to as pre-ciphering steps.

2

Description of the DECT Standard Cipher

The DECT Standard Cipher (DSC) is an irregularly clocked combiner with memory. Its internal state is built from 4 Galois LFSRs R1, R2, R3, R4 of length 17, 19, 21 and 23 respectively as well as a single bit of memory y for the output combiner. The bits of the state of the LFSR Ri shall be denoted by xi,j with the lowest-most bit being xi,0 . The taps of R1 are located at bit positions 5, 16; the taps of R2 are at bit positions 2, 3, 12, 18; the taps of R3 at bit positions 1, 20; the taps of R4 are at bit positions 8, 22. For each bit of output, register R4 is clocked three times whereas R1 to R3 are clocked either two or three times. The clocking decision is determined individually for each of the irregularly clocked registers. The decisions linearly depend on one of the three lowest bits of R4 and the middle bits of the other irregularly clocked registers. More speciﬁcally, the number of clocks ci for each of the registers is calculated as follows: c1 = 2 + (x4,0 ⊕ x2,9 ⊕ x3,10 ) c2 = 2 + (x4,1 ⊕ x1,8 ⊕ x3,10 ) c3 = 2 + (x4,2 ⊕ x1,8 ⊕ x2,9 ) 2.1

The Output Combiner

The output combiner is a cubic function that involves the lowest-most two bits of the registers R1, R2 and R3 as well as the memory bit y:

4

K. Nohl, E. Tews, and R.-P. Weinmann

O((x1,0 , x1,1 , x2,0 , x2,1 , x3,0 , x3,1 ), y) = x1,1 x1,0 y ⊕ x2,0 x1,1 x1,0 ⊕ x1,1 y ⊕ x2,1 x1,0 y ⊕ x2,1 x2,0 x1,0 ⊕ x3,0 y ⊕ x3,0 x1,0 y ⊕ x3,0 x2,0 x1,0 ⊕ x3,1 y ⊕ x1,1 x1,0 ⊕ x2,0 x1,1 ⊕ x3,1 x1,0 ⊕ x2,1 ⊕ x3,1 The output of the combiner function gives a keystream bit and is loaded into the memory bit for the next clock. 2.2

Key Loading and Initialization

Initially all registers and the memory bit are set to zero. The 35-bit IV is zero extended (most signiﬁcant bits ﬁlled with zero) to 64 bits and concatenated with the 64 bit cipher key CK to form the session key K. K = Z(IV )||CK The bits of K are clocked into the most signiﬁcant bit of all four registers, bit by bit, starting with the least signiﬁcant bit. During the key loading each LFSR is clocked once after each bit. After the session key has been loaded, 40 pre-cipher rounds are performed. In these pre-cipher rounds, the irregular clock control is used but the output is discarded. If one or more registers have all bits set to zero after executing 11 rounds, the most signiﬁcant bit of the corresponding registers is set to 1 before starting the next round.

3

Reverse-Engineering the DSC from Hardware

We did not ﬁnd any software implementations of DSC. Instead our starting point was a patent [7] describing the general structure of the DSC. From this document we learn that DSC is an LFSR-based design together with the lengths of the individual registers. Furthermore the patent discloses that the cipher has an output combiner with a single bit of memory, irregular 2-3 clocking and the number of initialization rounds. On the other hand the tap positions of the LFSRs, the clocking functions, the combiner function as well as the exact key loading routine are not described in this patent. The rule that after 11 initialization rounds a check had to be performed to make sure that no register is zero at that point is also stated in the patent. Luckily, for the National Semiconductors SC14xxx DECT chipset that is used by the deDECTed.org project, we found instructions that allow to load and store an abitrary internal state of the stream cipher. Moreover, the stream cipher can be clocked in two modes: a regular clocking of the the LFSRs for key loading and a second mode clocking irregularly as speciﬁed by the clocking functions, generating output. However we are not able to directly capture these output bits.

Cryptanalysis of the DECT Standard Cipher

5

Fig. 1. The DSC keystream generator with LFSRs in Galois conﬁguration. Bit positions that are inverted (white on black) are used in clocking decisions.

To reverse-engineer the unspeciﬁed details of the cipher we proceed as follows: Using the ﬁrst mode allows us to determine the tap positions of the LFSRs. After that, we are able to determine the clocking functions in the second mode by loading a random vector of low Hamming weight into the internal state and observing how single-bit changes aﬀect the clocking decisions. The most elaborate part to reverse-engineer is the output combiner function. To do this, we set up one machine with a modiﬁed ﬁrmware to send out frames containing zero-stuﬀed payloads. Another machine acting as the receiving side then “decrypts” these using a chosen internal state (no key setup), yielding keystream. Starting from random states, we sequentially ﬂip single bit positions of the state and inspect the ﬁrst bit to see whether the bit ﬂip aﬀected the output. If the output remains constant for a large number of random states, we assume that the ﬂipped bit is not used in the output combiner. Having identiﬁed the bits that indeed are fed into the combiner, we recover the combiner function by using multivariate interpolation for a number of keystreams. Finally we determine the correct key loading by systematically trying diﬀerent bit and byte-orders for both key and IV combined with both diﬀerent orders of key and IV. In parallel to having done the above, we also reverse-engineered the DSC cipher including its output combiner from silicon applying the techniques previously used to discover the Crypto-1 function [8].

4

Attacking the DSC

For this section, we will assume that an adversary has access to a list of DSC keystreams with matching initialization vectors, all of which were generated under the same secret key. A tuple of keystream and IV is referred to as a session.

6

K. Nohl, E. Tews, and R.-P. Weinmann

We will use clock-guessing techniques very similar to the Ekdahl-Johansson attack against A5/1 [5], but adapted to the case of a non-linear output combiner. Further improvements of this attack discussed in [5,6] seem to be too speciﬁc to the A5/1 structure. 4.1

Simple Clock Guessing

Despite its relative large state and non-linearity, DSC is easily broken because of one major design ﬂaw: the small number of pre-ciphering rounds makes clock guessing easy. After the key loading, there are only 40 clocking decisions made, compared to 100 clocking decisions for A5/1. If an internal state for DSC is randomly chosen from a uniform distribution of all states, every irregularly clocked register clocks twice with 50% probability or three times with 50% probability. We assume for now that the probability that one register is clocked twice is independent of the clocking decision of the other irregularly clocked registers. The probability that one register is clocked k times during the pre-ciphering-phase is: 40 2−40 k − 80 and the probability that a register has been clocked k times after i bits of output is: 40 + i 2−(40+i) k − (80 + 2i) The total number of clocks per register after i bits of output is distributed according to a shifted binomial distribution with mode: i+1 + 2i + 100 2 In general, let Di,j,k = S × Ci1 × Cj2 × Ck3 × L × (key, iv) be the state of the six bits of the registers which generate the output, after key and iv have been loaded, and register R1 has been clocked i, register R2 has been clocked j, and register R3 has been clocked k times. The attack focuses on the internal DSC state from which the second bit of output is produced. An attacker who has observed the ﬁrst bit of output knows the state z0 of the memory bit of the output combiner. The second bit of output depends on 6 bits of the registers R1, R2, and R3. With a probability of 3 41 −41 2 p= ≈ 2−9.09 21 all of these irregularly clocked registers will be clocked exactly 103 times before the second bit of output z1 is produced and we have D103,103,103 (key, iv)) = S × D41 (key, iv)

Cryptanalysis of the DECT Standard Cipher

7

with probability 1. If the number of clocks per register is diﬀerent, this equation 1 will hold by chance with a probability 64 . Therefore, we have 1−p Prob D103,103,103 (key, iv) = S × D41 (key, iv) = p + ≈ 2−5.84 64 Based on this guess, six aﬃne-linear equations for an unknown key can be derived given that suﬃciently many keystreams (about 218 ) are available. For every IV iv, the attacker computes I103,103,103 = D103,103,103 (0, iv) He then checks for every possible state s of the six output bits whether O(s, z0 ) = z1 holds. If so, this is an indication that D103,103,103 (key, 0) = s + I103,103,103 holds. After having processed all available keystreams, the attacker may assume the most frequent value for D103,103,103 (key, 0) to be correct. Using these six aﬃne-linear equations allows an attacker to recover the correct key by trying only 258 instead of 264 possible keys. This basic attack however is still too timeconsuming to be practical on a single PC. 4.2

Breaking DSC on a PC

We can reﬁne the basic attack principle to give us a much faster attack that allows us to practically recover a DSC key on a PC. Note that the attack scope can be extended to diﬀerent assumptions for the number of clockings for the registers R1, R2 and R3. In the basic attack, we use the mode of the distribution of the number of clock of all registers. If the total number of clocking decisions is odd, another set of clockings with the same success rate always exists. For example, if the attacker assumes that R1 and R2 have been clocked 103 times as in the previous subsection, but R3 has been clocked 102 times instead of 103 times, the previous attack works with the same computational eﬀort and success rate. In total, there are 8 possible assumptions about the number of clocks with the same success rate as the previous attack. However, these diﬀerent assumptions share many equations for the key. An attacker will only obtain nine diﬀerent aﬃne-linear equations for the key using these eight assumptions (compared to six equations for a single assumption). Extending the attack scope further, – i.e., assuming that R1 has been clocked 101 times, and R2 and R3 have been clocked 102 times – increases the success rate of the attack but at an even smaller incremental gain per additional assumption. Another way to broaden the attack is to focus on diﬀerent keystream bits. The basic attack only uses the ﬁrst two bits of the output, z0 and z1 . Instead of guessing how many times the registers have been clocked before producing the z1 , one could guess how many times the registers have been clocked before z2 is produced. For example, an attacker can try using z1 and z2 of the output and

8

K. Nohl, E. Tews, and R.-P. Weinmann

guess that R1, R2, and R3 have been clocked exactly 105 times. The resulting correlation will have the same success rate as the one from the basic attack. Using multiple output bits for a single clocking triplet is possible. Combining these two time-success trade-oﬀs, we developed a more advanced key-recovery attack on the DSC that merely requires hours of computation on a PC given enough keystreams. We chose a clocking interval C = [102, 137], and generated all 353 = 42875 possible approximations with the number of clocks (t) of R1 to R3 in C. We introduce new variables xi,j for the state of bit j of Register Ri after it has been clocked t times. Assuming that Ri has been clocked (t) (t) t times for an approximation gives us information about xi,0 and xi,1 . In total, a clocking-interval of length l gives us information about 6l variables (3 registers, (t) (t+1) 2 variables per clocking amount). However, xi,1 = xi,0 holds for all registers (t)

(t+1)

Ri, because xi,1 is just shifted to xi,0 with the next clock of Ri. Choosing a diﬀerent feedback polynomial with a feedback position between the two bits contributing to the output combiner would destroy this structure. However for DSC, all feedback polynomials don’t have a feedback position here. Eﬀectively, this gives us information about 3(l + 1) variables for a clocking interval of length (t+1) (t) l. We will always use xi,0 instead of xi,1 for the rest of this paper. (t+1)

There are also linear relations between these variables. For example x1,5 (t) x1,6

(t) ⊕ x1,0 holds. In (t) (t+1) ables xi,0 , xi,0 , . . .

=

general, having determined a consecutive sequence of vari-

for a register Ri, is equivalent to the output sequence of Ri. If more variables than the length of Ri have been determined, one might use these linear relations to check if a given assignment is feasible. However, we did not use this in our attack. The success rate that register R1 is clocked i times, register R2 is clocked j times and register R3 is clocked k times after l bits of output have been produced is: 40 + l 40 + l 40 + l pi,j,k,l = 2−(40+l)3 i − (80 + 2l) j − (80 + 2l) k − (80 + 2l) In theory, one could use all available bits of keystream for which the correlation has better than zero success rate, however after 19 bits of keystream, all of these correlations have negligible success probability. For example the probability that all registers have been clocked 137 times (the end of our clocking-interval) for the 19th bit of output is below 2−26 . As in the basic attack, we evaluate all correlations separately and create a frequency table for every correlation. Following the ideas of Maximov et al.[5] p for key = s + iv to every entry in the table, we add the log-likelihood ratio ln 1−p with

1 p= 1− pi,j,k,l ∗ [O(s, zl−1 ) = zl ] + pi,j,k,l 2 l

l

Here [O(s, zl−1 ) = zl ] = 1, if O(s, zl−1 ) = zl , 0 otherwise. Instead of writing the equations in all correlations as a linear combination of (i) key bits, we now write all equations in the form x{1,2,3},0 = {0, 1}.

Cryptanalysis of the DECT Standard Cipher

9

Taking the entry with the highest probability from the frequency table of every approximations, we obtain 42875∗6 = 257250 equations with a given probability. (Every approximation (42875) gives us information about the value of 6 statevariables. In total, these state-variables can have 26 = 64 possible values, the value with the highest probability in the frequency table is most likely. We use the number of (weighted votes) for the top entry as an extend pi how likely these equations are correct.) For every variable x, we take all the equations of the form x = bi , bi ∈ {0, 1} with extend pi and compute sx = i (2bi − 1) ∗ pi and assume that x = 0, if sx < 0, x = 1 otherwise. Combining all equations to a single equation system gives 108 equations each of which depends only on a single variable and a corresponding probability pv that this equation is correct. We sort these equations according to |pv |, rewrite all variables to key bits, and add them in order to a new linear equation system for the key bits. If adding a equation would make the resulting system unsolvable, we skip that equation. If enough linearly independent variables (for example 30) have been added to the system, we stop the process. We then iterate through all solutions to this system, and check every solution if it is the correct key, by comparing it to some sample keystreams. We created a proof-of-concept implementation of this attack written in Java. Processing all available keystreams and generating a linear equation system takes about 20 minutes using a SUN X4440 using 4 Quad-Core AMD Opteron(tm) Processor 8356 running at 2.3 GHz. The main workload here is the generation of all the frequency tables for all approximations. The post processing and the generation of the ﬁnal equation system is negligible. We think that this time can be reduced to a few minutes using parallel computation and a more eﬃcient implementation. For the time for the ﬁnal search of the correct key, see Section 5. The success rate of this attack depends on the number of available keystreams and the number of equations in the ﬁnal equation system for the keybits. Using more equations makes the ﬁnal search for the correct key faster, but increases the probability of having at least one incorrect equation in the system which makes the attack fail. If i equations are used in the ﬁnal system, one still needs to search through at most 264−i diﬀerent keys to ﬁnd the correct key (assuming the equation system is correct). Using 30 equations in the ﬁnal system (one still needs to check at most 234 diﬀerent keys), the attack was successful in 48 out of 100 simulations with 32768 diﬀerent keystreams available. Using only 16384 keystreams, the success rate dropped down to 1%. With 49152 keystreams, the attack was successful in 95% of all simulations. If only 8 equations should be used, the attack had a success rate of 8% using just 8192 keystreams. However an adversary would need to conduct a ﬁnal search for the key over 256 diﬀerent keys, which is roughly equivalent to a brute force attack against DES. 4.3

Keystream Recovery

To break the DSC stream cipher, keystream needs to be recovered from the encrypted frames, which is only possible when the user data is known or can

10

K. Nohl, E. Tews, and R.-P. Weinmann

100 90

probability of success

80 70 60 50 40 30 20 10 equations 20 equations 30 equations 40 equations

10 0 0

8192

16384

24576 32768 40960 keystreams available

49152

57344

65536

Fig. 2. Success rate of the attack

be guessed. Known user data is regularly sent over DECT’s control channel (Cchannel). The C-channel messages (e.g., for a button press) share a common structure in which the majority of the ﬁrst 40 bits stays constant. There are at most 50 C-channel packets sent per second which provides an upper bound on the number of known keystream segments from the C-channel. Especially in newer phones, the C-channel is extensively used for status updates including RSS feeds and other data communication which opens the possibility that a signiﬁcant number of known keystream can be gathered. Keystreams can also be collected from the voice channel (B-ﬁeld), but assumptions have to be made about the voice being transmitted (i.e., segments of silence). Even when these assumptions do not hold in all cases, the data is still usable in the attacks outlined below as they are error-resilient. More information can be found in appendix A. 4.4

Extending the Attack to B-Field Data

Thus far we have assumed the adversary to have access to the ﬁrst bits of output of DSC after pre-ciphering. However, these bits are only used to encrypt the Cchannel data in DECT. If C-channel data is not frequently used in a conversation, the adversary is unable to recover a suﬃcient number of keystreams using the techniques previously described. Henceforth, we adapt our attack to also work when the ﬁrst 40 bits of keystream are not available. To achieve this, we need to change the clocking interval from [102, 137] to [204, 239]. We then use 21 bits of the keystream starting from bit 41. The best approximation which exists is to assume that every register has been clocked 202 times when the second bit for the B-ﬁeld is produced. This happens with probability 2−10.527 , instead of 2−9.0915 for the best approximation for the C-channel bits.

Cryptanalysis of the DECT Standard Cipher

11

100 90

probability of success

80 70 60 50 40 30 20 10 equations 20 equations 30 equations 40 equations

10 0 0

16384

32768

49152 65536 81920 keystreams available

98304

114688

Fig. 3. Success rate of the B-ﬁeld attack

As expected, the number of keystreams required for the same success rate is increased by factor of 2-3. To make the attack work with a success rate of 50%, the attack requires 75,000 keystreams. Again we conducted 100 simulations to experimentally verify the success rate and to generate the plot in ﬁgure 3. However, B-ﬁelds are sent 100 times per second from FP to PP while a call is in progress. This allows to recover the corresponding keystreams in less than 13 minutes if a predictable plaintext pattern is used in the B-ﬁeld.

5

High-Performance DSC

The DSC is optimized for hardware implementations where LFSRs can be implemented in a small number of logic gates. To minimize the complexity of our attack implementation, we did not facilitate FPGAs or even build custom ASICs for DSC computations. Instead we optimized implementations for an x64 CPU, a NVidia CUDA graphics card–both of which are commonly found in home computer–and a cell processor. Optimizations we applied to accelerate the attack include bit slicing and the use of bit vectors. The combination of these tweaks resulted in a 25x speed-up on Intel CPUs when compared to a straightforward DSC implementation. Performance ﬁgures for components in a standard gaming PC (CPU, NVidia graphics adapter) and a PlayStation 3 (Cell processor) are provided in Table 2, including the search time for the C-channel attack searching through 234 keys. Our optimizations have been implemented for Intel CPUs (128bit SSE), the Cell processor in PlayStations (128bit wide SIMD units in both the SPEs and the PPE) and to CUDA GPUs where only 64 bit wide general purpose registers are available. However, due to the large number of compute kernels, a high-end CUDA graphics card is almost one order of magnitude faster than a high-end Intel CPU for our attack.

12

K. Nohl, E. Tews, and R.-P. Weinmann

Table 2. Eﬃciency of brute-forcing large numbers of DSC keys on diﬀerent architectures Compute Node Keys tried per second Attack time for ﬁnal search Intel 2 GHz CPU (2 cores) 24 million 716s Cell processor 25 million 687s CUDA GTX260 148 million 116s

Since the problem is parallelizable without dependencies, utilization of multicore systems comes without penalty allowing the key cracking to be distributed over a large number of CPU, cell, and CUDA nodes. Bit slicing. In the optimized implementation, the 81 bit state of DSC is stored in 81 registers, each register holding 128 bits for 128 DSC engines. With only 3 xor instructions the clocking decision for R1 can be made – for 128 DSC states at once. A ﬁrst implementation for an Intel Core 2 Duo at 2Ghz can verify 12 million key candidates per second per core, including extraction of the 64 bit key from the equation system, the 64 bit key setup procedure to clock the key into the DSC state, 40 clockings of DSC in the pre-cipher phase and the generation and comparison of 8 keystream bits. The amortized simulator processor clocks to produce a bit of DSC output is about 2. The bit slice implementation has the drawback that there is no eﬃcient way to shift a register, so for a simple one bit shift of R1, 17 locations of memory have to be copied in the ring buﬀer. Fortunately when the clocking is regular, copying is not necessary and single combined head and tail pointer can be incremented and decremented to facilitate the shift operation. This optimization is usable during key setup and for R4 during all DSC stages. Bit vectors. The extraction of the candidate keys from the equation system is further optimized through the use of bit vectors, so 64 operations are needed to produce 128 64 bit long keys, by ﬁrst building a template for key values encoding the information of any set of 128 consecutive candidate keys and using this template to produce 128 keys anywhere in the key-space with linear complexity over the number of bits (64). CUDA tweaks. High end graphics cards for computer gaming can be expected to perform at least 10 times faster then a single CPU. One such GPU has 240 ALUs clocked at about 1.2Ghz. The slower clock speed and the smaller register size of 64 bits each halve the eﬀect of the larger number of cores. Furthermore CPU features such as superscalar execution, branch prediction and out of order execution narrow the gap further. At about three times the power consumption, a high-end CUDA GPU still executes about ten times as fast as a Intel CPU making it the preferred host for our attack.

Cryptanalysis of the DECT Standard Cipher

6

13

DSC Weaknesses and Mitigations

The DSC cipher is vulnerable against the attack described in this paper because it does not accumulate enough non-linearity before producing the ﬁrst keystreams. Our attack DSC exploits the fact that the cipher can be expressed in relatively simple equations that hold true with non-negligible probability. These equations can be generated because three weaknesses come together in DSC: – A round function with a low level of non-linearity – An insuﬃciently small number of rounds before the ﬁrst key stream bit is produced – Access to keystreams through known plaintext in the C-channel The latter two properties make DSC much weaker than the related A5/1 stream cipher used in GSM2 . Attacks on A5/1 still require more keystream than can be inferred from GSM packets or extensive precomputations for timememory trade-oﬀs [9]. In other dimensions of statistical strength, the DSC cipher is stronger than A5/1, again emphasizing how serious the above mentioned weaknesses are. Table 3. Comparing A5/1 against the DSC A5/1 DSC number of registers 3 4 irregularly clocked registers 3 3 internal state in bits 64 81 output combiner linear non-linear bits used for output 3 7 bits used for clocking 3 6 clocking decision 0/1 2/3 clocks per register until ﬁrst bit of output 0-100 80-120 average clocks of registers until ﬁrst bit of output 75 100 pre-cipher rounds 100 40

The larger internal state makes practical time-memory tradeoﬀs infeasible for DSC. Since more bits are used in the output combiner, the sampling of special states [10] is much harder for DSC than for A5/1. At the same time, the nonlinearity of the output combiner in DSC improves its resilience against divide and conquer strategies. DSC has a register which only aﬀects the clocking control and doesn’t directly generate the keystream. In A5/1 every register aﬀects the output directly. Diﬀerential attacks against A5/1 [11] use that fact that this cipher does not always clock all registers; the DSC clocks every register at least two times after each bit of output. 2

A5/1 and DSC were standardized by the same organization, A5/1 in 1987 and DSC in 1992.

14

K. Nohl, E. Tews, and R.-P. Weinmann

The attack outlined in this paper is enabled through available keystream and the small number of clocking rounds. Short-term countermeasures to mitigate the risk imposed by this particular attack include: – frequent re-keying to prevent an attack from collecting suﬃciently many keystreams. – switching oﬀ encryption of the C-channel (which might lead to privacy concerns over dialed numbers etc.); Both measures can be deployed to many existing base stations and handsets through ﬁrmware updates. However, it is our belief that the current DECT standard includes too many vulnerabilities to be made secure through quick ﬁxes. Also, DECT with DSC does not provide suﬃcient security for its intended uses. To provide this level of protection, a strong peer-reviewed cipher and security protocol are needed in the next version of the DECT standard.

7

Conclusions

Cryptographic functions can be reverse-engineered from hardware devices and need to hold up to security analyses when they are disclosed. The widely-used DSC cipher in DECT cordless phones did not hold up to the test of a curious review. While the attack presented in this paper does rely on strong assumptions on the availability of keystreams, other attacks will very likely further degrade the security level of DSC. We believe that the DSC cipher as used in DECT is not suﬃcient to protect the conﬁdentiality of personal conversations, network traﬃc, and credit card information. We propose the DECT standard to be amended by a strong, peer-reviewed cipher in order to provide suﬃcient protection for its users. The fact that the DSC cipher has not undergone public review, despite being deployed in hundreds of millions of locations, raises the question of why not more proprietary security algorithms have been reverse-engineered. The techniques used for reversing DSC from a ﬁrmware and a chip implementation can certainly be further generalized. This opens interesting research avenues; research targets are aplenty as new proprietary ciphers in embedded applications such as car buses and RFID chips are constantly being created. Acknowledgements. This paper builds on software and ﬁrmware created within the deDECTed.org project. Moreover, a DECT kernel stack for Linux written by Patrick McHardy was used in the reverse engineering process. We would like to thank Andreas Schuler who helped us writing ﬁrmware for the SC14421 to reverseengineer the cipher, Sascha Krissler for implementing the DSC on CUDA and Starbug for his silicon reverse engineering work. Especially we would like to thank the anonymous reviewers, who had very valuable ideas for improvements of the attack. In particular they pointed us to a publication [4] by Ekdahl and Johansson which shows an interesting attack against A5/1; this helped us to signiﬁcantly improve our attack.

Cryptanalysis of the DECT Standard Cipher

15

Open probems. Our attacks against DSC should be seen as a starting point. Although we were not able to carry over the improvements for the EkdahlJohansson attack against A5/1 to the DSC we challenge other researchers to give it a try. The statistical methods we used to generate our equation systems certainly can be improved. Thus far, all approximations in the clocking interval have been used. A signiﬁcant amount of CPU time in the attack is used for processing all keystreams with all approximations. A more sophisticated method could select a subset of the approximations to get an improved running time. Moreover, it is an interesting problem to see whether the number of keystreams required can be reduced by taking the linear equations from the feedback polynomials into account. The most interesting problem however to us is to ﬁnd an attack against the DSC that works with a very small number of keystreams. Due to the structure of our attack we do not believe that it can be adapted to this scenario. At the same time, other attacks with a low data complexity that work against A5/1 cannot be carried over to DSC due to the larger internal state size.

References 1. MZA Telecoms & IT Analysts: Global cordless phone market. Press Release (August 2009) 2. DECT Forum: Positioning of DECT in relation to other radio access technologies. Report (June 2002) 3. Lucks, S., Schuler, A., Tews, E., Weinmann, R.P., Wenzel, M.: Attacks on the DECT authentication mechanisms. In: Fischlin, M. (ed.) RSA Conference 2009. LNCS, vol. 5473, pp. 48–65. Springer, Heidelberg (2009) 4. Ekdahl, P., Johansson, T.: Another attack on A5/1. IEEE Transactions on Information Theory 49(1), 284–289 (2003) 5. Maximov, A., Johansson, T., Babbage, S.: An improved correlation attack on A5/1. In: Handschuh, H., Hasan, M.A. (eds.) SAC 2004. LNCS, vol. 3357, pp. 1–18. Springer, Heidelberg (2004) 6. Barkan, E., Biham, E.: Conditional estimators: An eﬀective attack on A5/1. In: Preneel, B., Tavares, S. (eds.) SAC 2005. LNCS, vol. 3897, pp. 1–19. Springer, Heidelberg (2006) 7. Alcatel: Data ciphering device. U.S. Patent 5,608,802 (1994) 8. Nohl, K., Evans, D., Starbug, Pl¨ otz, H.: Reverse-engineering a cryptographic RFID tag. In: van Oorschot, P.C. (ed.) USENIX Security Symposium 2008, USENIX Association, pp. 185–194 (2008) 9. Barkan, E., Biham, E., Keller, N.: Instant ciphertext-only cryptanalysis of GSM encrypted communication. Journal of Cryptology 21(3), 392–429 (2008) 10. Biryukov, A., Shamir, A., Wagner, D.: Real time cryptanalysis of A5/1 on a PC. In: Schneier, B. (ed.) FSE 2000. LNCS, vol. 1978, pp. 1–18. Springer, Heidelberg (2001) 11. Biham, E., Dunkelman, O.: Diﬀerential cryptanalysis in stream ciphers. Cryptology ePrint Archive, Report 2007/218 (2007), http://eprint.iacr.org/2007/218

16

A

K. Nohl, E. Tews, and R.-P. Weinmann

Technical Background on DECT for Keystream Recovery

DECT divides carrier frequencies into multiple timeslots. An interval of 10 ms is divided into 24 timeslots of equal length. Connections in DECT are always between a base station – in DECT terminology an FP (Fixed Part) – and a handset, called PP (Portable Part). An FP typically uses a single timeslot i ∈ {0, . . . , 11} to transmit a full frame to the PP; the PP then responds in a single timeslot i + 12 with a frame. DECT supports multiple frame formats with diﬀerent modulations; some of which use half a time slot or two consecutive timeslots. A single DECT full frame using GFSK modulation consists of a 16 bit static preamble, a 64 bit A-Field, a 320 bit B-Field, and two 4 bit checksums. The A-Field can transport data for the C-,M-,N-,P-, or Q-Channel. If an A-Field is used to transport C-channel messages, only 40 bits of the A-Field contain C-channel data, the rest is used for header-bits. If encryption is active, the DECT Standard Cipher (DSC) generates 720 consecutive bits of keystream for every frame exchange. The output is divided into two keystream segments (KSS), the ﬁrst 360 bit KSS is used to encrypt the frame sent from the FP to the PP, the second KSS is used to encrypt the frame sent from PP to FP. If C-channel data is present in the A-Field, the ﬁrst 40 bits of the KSS are XORed with these bits, otherwise they are discarded. The remaining 320 bits of the KSS are XORed with the B-Field. Keystream can only be recovered for cryptanalysis from frames where the plaintext is known. Two examples where plaintext is guessable are: – Some phones display a counter with the duration of the current call in the hh:mm:ss format. This counter is usually implemented on the base station. The display of the phone is updated once per second by the base station with the next counter value. We observed a single C-channel message being split into 5 frames. Intercepting these messages recovers 5 diﬀerent keystreams per second for which most of the ﬁrst 40 bits are known. – When (perfect) silence is transmitted the G.721 audio codec produces plaintext of only ones. Some applications like voice mailboxes transmit silence in one direction after the greeting message. This can be used to recover up to 100 frames per second with 320 known bits known. The ﬁrst 40 bits are only used for the C-channel and cannot be recovered using this method.

Cryptanalysis of the DECT Standard Cipher

B

An implementation of the DSC in C

# include # include # include # include # include

< stdio .h > < stdint .h > < stdlib .h > < string .h > < unistd .h >

# define # define # define # define

R1_LEN R2_LEN R3_LEN R4_LEN

17 19 21 23

# define # define # define # define

R1_MASK R2_MASK R3_MASK R4_MASK

0 x010020 0 x04100C 0 x100002 0 x400100

# define # define # define # define # define # define

R1_CLKBIT R2_CLKBIT R3_CLKBIT R1_R4_CLKBIT R2_R4_CLKBIT R3_R4_CLKBIT

8 9 10 0 1 2

# define O U T P U T _ L E N # define TESTBIT (R , n )

/* /* /* /*

tap tap tap tap

bits : bits : bits : bits :

5, 2, 1, 8,

16 3, 20 22

*/ 12 , 18 */ */ */

90 /* 720 bits */ ((( R ) & (1 << ( n ))) != 0)

uint32_t clock ( uint32_t lfsr , uint32_t mask ) { return ( lfsr >> 1) ^ ( -( lfsr &1)& mask ); } uint32_t combine ( uint32_t c , uint32_t r1 , uint32_t r2 , uint32_t r3 ) { uint32_t x10 , x11 , x20 , x21 , x30 , x31 ; x10 x11 x20 x21 x30 x31

= = = = = =

r1 & 1; ( r1 >> 1) & 1; r2 & 1; ( r2 >> 1) & 1; r3 & 1; ( r3 >> 1) & 1;

return (( x11 & x10 & c ) ^ ( x20 & x11 & x10 ) ^ ( x21 & x10 & c ) ^ ( x21 & x20 & x10 ) ^ ( x30 & x10 & c ) ^ ( x30 & x20 & x10 ) ^ ( x11 & c ) ^ ( x11 & x10 ) ^ ( x20 & x11 ) ^ ( x30 & c ) ^ ( x31 & c ) ^ ( x31 & x10 ) ^ ( x21 ) ^ ( x31 )); } void d s c _ k e y s t r e a m( uint8_t * key , uint32_t iv , uint8_t * output ) { uint8_t input [16]; uint32_t R1 , R2 , R3 , R4 , N1 , N2 , N3 , COMB ; int i , keybit ; memset ( output , 0 , O U T P U T _ L E N); input [0] = iv &0 xff ; input [1] = ( iv > >8)&0 xff ; input [2] = ( iv > >16)&0 xff ; for ( i = 3; i < 8; i ++) { input [ i ] = 0; } for ( i = 0; i < 8; i ++) { input [ i +8] = key [ i ]; } R1 = R2 = R3 = R4 = COMB = 0;

17

18

K. Nohl, E. Tews, and R.-P. Weinmann /* load IV || KEY */ for ( i = 0; i < 128; i ++) keybit = ( input [ i /8] >> R1 = clock ( R1 , R1_MASK ) R2 = clock ( R2 , R2_MASK ) R3 = clock ( R3 , R3_MASK ) R4 = clock ( R4 , R4_MASK ) }

{ (( i )&7)) & 1; ^ ( keybit < <( R1_LEN -1)); ^ ( keybit < <( R2_LEN -1)); ^ ( keybit < <( R3_LEN -1)); ^ ( keybit < <( R4_LEN -1));

for ( i = 0; i < 40 + ( O U T P U T _ L E N*8); i ++) { /* check w h e t h e r any r e g i s t e r s are zero after 11 pre - c i p h e r i n g steps . * if a r e g i s t e r is all - zero after 11 steps , set input bit to one * ( see U . S . p a t e n t 5 6 0 8 8 0 2 ) */ if ( i == 11) { if (! R1 ) R1 ^= (1 < <( R1_LEN -1)); if (! R2 ) R2 ^= (1 < <( R2_LEN -1)); if (! R3 ) R3 ^= (1 < <( R3_LEN -1)); if (! R4 ) R4 ^= (1 < <( R4_LEN -1)); } N1 = N2 = N3 = COMB

R1 ; R2 ; R3 ; = combine ( COMB , R1 , R2 , R3 );

if ( TESTBIT ( R2 , R 2 _ C L K B I T) ^ TESTBIT ( R3 , R 3 _ C L K B I T) ^ TESTBIT ( R4 , R 1 _ R 4 _ C L K B I T)) N1 = clock ( R1 , R1_MASK ); if ( TESTBIT ( R1 , R 1 _ C L K B I T) ^ TESTBIT ( R3 , R 3 _ C L K B I T) ^ TESTBIT ( R4 , R 2 _ R 4 _ C L K B I T)) N2 = clock ( R2 , R2_MASK ); if ( TESTBIT ( R1 , R 1 _ C L K B I T) ^ TESTBIT ( R2 , R 2 _ C L K B I T) ^ TESTBIT ( R4 , R 3 _ R 4 _ C L K B I T)) N3 = clock ( R3 , R3_MASK ); R1 = clock ( clock ( N1 , R1_MASK ) , R1_MASK ); R2 = clock ( clock ( N2 , R2_MASK ) , R2_MASK ); R3 = clock ( clock ( N3 , R3_MASK ) , R3_MASK ); R4 = clock ( clock ( clock ( R4 , R4_MASK ) , R4_MASK ) , R4_MASK ); if ( i >= 40) { output [( i -40)/8] |= (( COMB ) << ( 7 -(( i - 4 0 ) & 7 ) ) ) ; } } } int main ( int argc , char ** argv ) { uint8_t key [8]; uint8_t output [ O U T P U T _ L E N]; if ( argc != 2) { fprintf ( stderr , " usage : % s iv \ n " , argv [0]); exit (1); } if ( read ( STDIN_FILENO , key , 8) < 8) { fprintf ( stderr , " short read \ n " ); exit (1); } d s c _ k e y s t r e a m( key , atoi ( argv [1]) , output ); write ( STDOUT_FILENO , output , 2048); return 0; }

Improving the Generalized Feistel Tomoyasu Suzaki1,2 and Kazuhiko Minematsu1 1

NEC Corporation, 1753, Shimonumabe, Nakahara, Kawasaki 211-8666, Japan {t-suzaki@pd,k-minematsu@ah}.jp.nec.com 2 Chuo University, 1-13-27, Kasuga, Bunkyo, Tokyo 112-8551, Japan

Abstract. The generalized Feistel structure (GFS) is a generalized form of the classical Feistel cipher. A popular version of GFS, called TypeII, divides a message into k > 2 sub blocks and applies a (classical) Feistel transformation for every two sub blocks, and then performs a cyclic shift of k sub blocks. Type-II GFS has many desirable features for implementation. A drawback, however, is its low diﬀusion property with a large k. This weakness can be exploited by some attacks, such as impossible diﬀerential attack. To protect from them, Type-II GFS generally needs a large number of rounds. In this paper, we improve the Type-II GFS’s diﬀusion property by replacing the cyclic shift with a diﬀerent permutation. Our proposal enables to reduce the number of rounds to attain a suﬃcient level of security. Thus, we improve the security-eﬃciency treading oﬀ of Type-II GFS. In particular, when k is a power of two, we obtain a signiﬁcant improvement using a highly eﬀective permutation based on the de Bruijn graph. Keywords: block cipher, generalized Feistel, diﬀusion, de Bruijn graph.

1

Introduction

The generalized Feistel structure (GFS) is one of the basic structures of a block cipher. While basic Feistel ciphers divide a message into two sub blocks, GFS divides a message into k sub blocks for some k > 2, which is called the partition number. One popular form of GFS is so-called Type-II1 [28], where the output of a single round of Type-II GFS for input (m0 , m1 , . . . , mk−1 ) is (c0 , c1 , . . . , ck−1 ) = (F0 (m0 ) ⊕ m1 , m2 , F1 (m2 ) ⊕ m3 , m4 , . . . , Fk−2/2 (mk−2 ) ⊕ mk−1 , m0 ), where Fi s are round functions. As we can see, this operation is equivalent to applying Feistel transformation (x, y) → (x, F (x)⊕y) for every two blocks and then performing a (left) cyclic shift of sub blocks. Recently, Type-II GFS receives a lot of attention for its simplicity and high parallelism. We have some modern Type-II-based block ciphers, e.g., CLEFIA [25] (k = 4), and HIGHT [11] (k = 8). When the length of message is ﬁxed, the width (i.e., I/O lengths) of round function gets shorter as the partition number grows. Since the width of round function is a critical factor of the size of implementation, Type-II GFS with a large 1

Zheng et al. [28] refers to this as the Type-II Feistel-Type Transformation. Some studies use the word GFS to mean Type-II GFS, e.g., [19].

S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 19–39, 2010. c International Association for Cryptologic Research 2010

20

T. Suzaki and K. Minematsu

partition number is considered to be suitable for small-scale implementations. Moreover, as well as the Feistel cipher, Type-II GFS’s round function needs not be invertible. Thus, the implementation cost for decryption would be negligibly small when the encryption has been implemented. This can be an advantage over Substitution-Permutation Network (SPN) ciphers such as AES, if we focus on small-scale implementation while need decryption in its usage. This holds true for many block cipher modes such as CBC, OCB [24], and the most of storage encryption modes, e.g. EME [10]. However, Type-II GFS with a large k has one big drawback, namely its low diﬀusion. In order to diﬀuse the input diﬀerence to all output sub blocks, we need about k rounds (see Section 2.2 for details). If diﬀusion of input diﬀerence is imperfect, there will be some attacks, such as impossible diﬀerential attack [2] or saturation attack [7], etc. Hence, there is a treading oﬀ between eﬃciency (i.e. the number of rounds) and compactness (i.e. the partition number). This might be the reason why recent GFS ciphers have relatively small k to keep a balance of implementation size and speed. To our knowledge, there has been no comprehensive study trying to improve Type-II GFS up to now. Nyberg [21] proposed a variant of Type-II GFS called Generalized Feistel Network (GFN), where a permutation of sub blocks (which we call block shuﬄe) diﬀerent from the cyclic shift is used. She evaluated GFN’s immunity against diﬀerential cryptanalysis (DC) and linear cryptanalysis (LC). However, the analysis of [21] can not be used to evaluate the goodness of diﬀusion. In this paper, we allow GFS to use an arbitrarily block shuﬄe (but identical for each round). Our goal is to ﬁnd block shuﬄes having a better diﬀusion than the cyclic shift. For this purpose, we formally deﬁne a criterion for the goodness of diﬀusion called the maximum diﬀusion round, DRmax, which tells how many rounds are needed to achieve the full diﬀusion. Hence, a smaller DRmax would imply a faster, better diﬀusion. Moreover, we observe that DRmax is closely related to the security against impossible diﬀerential and saturation attacks, and the pseudorandomness analysis. This demonstrates the usefulness of our notion. We exhaustively searched the shuﬄes up to k = 16, using a computer. As a result, a better shuﬄe than the cyclic shift exists for k ≥ 6. In addition, we present a family of highly diﬀusive shuﬄes when k is a power of two. This is based on the de Bruijn graph, and achieves DRmax being about 2 log2 k. As DRmax of Type-II GFS is k, this means a signiﬁcant improvement for a large k. To see the validity of our proposal, we also investigated our proposal’s resistance against DC and LC, and experimentally conﬁrmed that ours have the same resistance against these attacks as those provided by the cyclic shift. Our result enables us to build a secure GFS cipher having a fewer rounds than Type-II without increasing the implementation cost. From practical viewpoint, the primal application of our result would be the construction of small-scale block ciphers, where many ciphers are recently proposed in this category [4][11]. Additionally, it will also be useful to build large-block ciphers, such as 256 or 512-bit block. Applications of large-block ciphers are, e.g., storage encryption or block cipher-based hash functions, as mentioned by Junod and Macchetti [22].

Improving the Generalized Feistel

2

21

Generalized Feistel Structure

2.1

Deﬁnition of GFS

First, let us make clear what GFS means in this paper. Let k be an even integer. A single round of k-partition GFS is a permutation over ({0, 1}n )k deﬁned as (X0 , X1 , . . . , Xk−1 ) → π(X0 , F0 (X0 ) ⊕ X1 , X2 , F1 (X2 ) ⊕ X3 , . . . , F(k−2)/2 (X0 ) ⊕ X1 ),

(1)

where Fi : {0, 1}n → {0, 1}n is a cryptographic keyed function called a round function, and π : ({0, 1}n )k → ({0, 1}n)k is a deterministic permutation. Here, we restrict π to be a block-wise permutation, i.e., a shuﬄe of k sub blocks. An encryption of a GFS cipher is done by iterating the above permutation for certain number of rounds, r, where the ﬁrst input is the plaintext and the rround output is the ciphertext. For the decryption, we perform an inversion of Eq. (1) using the inverse of π, denoted by π −1 . Throughout the paper, k denotes the partition number (the number of sub blocks) and n denotes the bit length of sub block. Thus a GFS cipher is always a kn-bit block cipher. As mentioned, the most popular instance of GFS is Type-II proposed by Zheng et al.[28], which uses the left cyclic shift as π, i.e., π(X0 , X1 , . . . , Xk−1 ) = (X1 , X2 , . . . , Xk−1 , X0 ), as shown by Fig. 1(Left). Another known instance of GFS is Nyberg’s Generalized Feistel Network (GFN)[21]. It uses a diﬀerent permutation π. A GFS using block shuﬄe π is denoted by GFSπ . Thus, if π is the (left) cyclic shift GFSπ is identical to Type-II GFS. For convenience, we deﬁne the following notations. An input data to the i + 1i th round for i ≥ 0 is written as X i = (X0i , X1i , · · · , Xk−1 ), and the intermediate i+1 i+1 i+1 i+1 i+1 i data is Y = (Y0 , Y1 , · · · , Yk−1 ), where Yj = Xj if j is even and Yji+1 = i i i i Xj ⊕ F(j−1)/2 (Xj−1 ) if j is odd. Here Fh for h = 0, 1, . . . , m − 1 is the h-th (from left to right) F function in the i-th round. If underlying block shuﬄe is π, the output of i + 1-th round (which is equivalent to the i + 2-th round input) is X i+1 = π(Y i+1 ). See Fig. 1 for reference.

F0i+1

X 0i +1

X 3i

X 1i X 2i

X 0i

F1i+1

X 1i +1 X 2i +1

X ki −1

X ki − 2

F0i+1

Fmi−+11

X ki +−13 X ki +−12

Y0i +1

X 3i

X 1i X 2i

X 0i

Fmi−+11

F1i+1

Y1i +1 Y2i +1

X ki −1

X ki − 2

Y3i +1

Yki−+21

Yki−+11

X ki +−12

X ki +−11

block shuffle

X ki +−11 X 0i +1

X 1i +1 X 2i +1

X 3i +1

Fig. 1. Type-II GFS (Left) and Our generalization (Right)

22

T. Suzaki and K. Minematsu

2.2

Diﬀusion Property of Type-II GFS

In this section, we introduce a formal notion of the diﬀusion property of GFS. What we mean by ‘diﬀusion’ here is the state that a sub block input aﬀects all of the sub blocks of output. More formally, if Xjr2 can be expressed by an equation containing Xir1 for some i and r1 < r2 , we say Xjr2 is aﬀected by Xir1 . If all of the output sub blocks of the r2 -th round is aﬀected by Xir1 , then we say Xir1 has diﬀused to all of the sub-blocks in round r2 . For instance, in Type-II GFS, X0i can be expressed as F0i (X0i−1 ) ⊕ X1i−1 . Therefore X0i is aﬀected by X0i−1 and X1i−1 . Note that the expression is allowed to include the target sub block in the raw or as an argument of F . Using this, we make the following deﬁnition. Deﬁnition 1. For GFSπ , let DRi (π) be the minimum number of rounds such that the i-th sub input block of the ﬁrst round, Xi0 , is diﬀused to all sub output blocks. Then, the maximum diﬀusion rounds for GFSπ , denoted by DRmax(π), def is deﬁned as DRmax(π) = max0≤i≤k−1 DRi (π). If π is clear from the context, we simply write DRi or DRmax. It is trivial to see that Xi0 is diﬀused to all sub output blocks after any round greater than DRi (π). Thus, for any GFSπ , any sub output block is aﬀected by any sub input block after DRmax(π) rounds. We call this state the full diﬀusion. As we will see, if full diﬀusion has not been attained a certain attack is possible. Hence, any GFSπ cipher needs at least DRmax(π) rounds for its security2 , implying that a block shuﬄe π with a small DRmax(π) is desirable. To understand the property of DRmax, Fig. 2 shows traces of the paths that represents how X70 diﬀuses to all sub blocks in Type-II GFS with k = 8. The thick solid line is the data path that does not pass through any F function and the thick dotted lines are data paths that pass through at least one F function. The implications of these paths can be explained as follows. Let two distinct 0 = (X 0, . . . , X 0 ) with X 0 = X 0 for i ≤ 6, inputs be X 0 = (X00 , . . . , X70 ) and X 0 7 i i 0 0 and X7 ⊕ X7 = δ for some δ = 0. Then, the thick solid path indicates that 77 = δ holds with probability 1, as XORs on the path bring no diﬀerence X77 ⊕ X 7 for to the initial diﬀerence, δ. In contrast, dotted paths indicate that Xi7 ⊕ X i all i = 0, . . . , 6 is close to random due to the randomness of round functions. As Fig. 2 suggests, for a k-partition Type-II GFS, it is easy to prove that DRmax = k. Then, how we can improve this? Note that X43 (= X70 ) aﬀects X53 via F24 , but X53 is also aﬀected by X70 and therefore a collision of paths occur. Such collisions continue to occur in round 5 and subsequent rounds. Intuitively, what we have to do is to reduce such collisions, as frequent collisions may imply a large DRmax. For example, the above-mentioned collision can be avoided if Y53 and Y63 are given to Fi4 for some 0 ≤ i ≤ 2 and Fj4 for some i = j, 0 ≤ j ≤ 2. Table 1 shows the number of collisions and its proportion to the number of XORs for DRmax rounds. As k increases, the number and the proportion of collisions also increase. This exhibits the low diﬀusion property of Type-II GFS. 2

In fact, we have to take care of chosen plaintext and ciphertext attacks, thus roughly need DRmax(π) + DRmax(π −1 ) rounds. See Section 5.2.

Improving the Generalized Feistel

23

Table 1. Collision of data paths for k-partition Type-II GFS Partition number k Number of collision Proportion (%)

2 0 0

4 6 8 10 12 14 16 1 4 9 16 25 36 49 12.5 22.2 28.1 32.0 34.7 36.7 38.3

X 70 1 0

1 1

1 3

1 2

F

F

F

F

F02

F12

F22

F32

F03

F13

F23

F04

F14

F05

F15

F25

F35

F06

F16

F26

F36

F07

F17

F27

F37

F08

F18

F28

F38

X 43

F24

Y53 Y63 X 53

F33

F34

Fig. 2. Diﬀusion path of Type-II GFS

3

Exhaustive Search for Optimum Shuﬄes

For each 4 ≤ k ≤ 16, we investigated DRmax(π) for all shuﬄes with a computer program. Let Πk be the set of all shuﬄes of k sub blocks. Note that |Πk | = k!. As we think the securities of encryption and decryption are equally important, we focus on ﬁnding shuﬄe π that has a small DRmax± (π) = max{DRmax(π), DRmax(π −1 )}. def

Thus the optimum shuﬄe is one that provides DRmax∗k = min {DRmax± (π)}. def

π∈Πk

We denote the optimum shuﬄe for a given k by πk∗ . Note that πk∗ may not be unique. The results are presented in Table 2 (also, see Appendix A for the speciﬁc values of block shuﬄes). Interestingly, πk∗ always fulﬁlls DRmax(πk∗ ) =

24

T. Suzaki and K. Minematsu Table 2. Search result Partition number k DRmax∗k

4 4

6 5

8 6

10 7

12 8

14 8

16 8

DRmax((πk∗ )−1 ). For example, for k = 8, there was a block shuﬄe π having DRmax(π) = 5 and DRmax(π −1 ) = 7, which is not optimal in our sense. As well as DRmax, it is easy to prove DRmax± = k for the cyclic shift. Thus Table 2 shows that the gain of optimum shuﬄe from the cyclic shift is obtained from k = 6 and gradually increasing. We also include Nyberg’s GFN in our investigation (See [21] for the exact deﬁnition of shuﬄe). When k = 4, we have no gain, as there are only three valid shuﬄes: right and left cyclic shifts and Nyberg’s GFN and they have DRmax = 4. For k up to 16, the DRmax of Nyberg’s GFN was the same as for Type-II GFS (we did not prove this for arbitrary k, however we think the proof is easy.). As far as we searched, any optimum block shuﬄe πk∗ has the property that any even-number input block is mapped to an odd-number output block, and vice versa. We refer to such shuﬄes as even-odd shuﬄes. From this fact, we hereafter focus on even-odd block shuﬄes, and we generally use the word “shuﬄe” to mean an even-odd shuﬄe. An example of π8∗ is shown in Fig. 3 (corresponding to Table 4, k = 8, No.1 of Appendix A). The diﬀusion path is represented in the same way as in Fig. 2. The collision that occurred in the fourth round of Fig. 2 is avoided, and the number of collisions is reduced from nine to four. Note that if we use diﬀerent shuﬄes for each round, we could attain the same improvement. Since this approach will increase the implementation cost, we only consider using the same shuﬄe for every round.

F01

F11

F21

F31

F02

F12

F22

F32

F03

F13

F23

F33

F04

F14

F24

F34

F05

F15

F25

F35

F06

F16

F26

F36

Fig. 3. GFS with an optimum shuﬄe for k = 8

Improving the Generalized Feistel

25

Lower Bound. A lower bound of DRmax∗ for even-odd shuﬄes can be derived as follows. For a ﬁxed one block input diﬀerence, let Nio (Nie ) be the number of odd-number (even-number) sub blocks in the i-th round output aﬀected by that input block. Initially we have N0e = 0 and N0o = 1. If the shuﬄe works ideally, e o e e e we have Nie = Ni−1 + Ni−1 , and Nio = Ni−1 , thus Ni+2 = Nie + Ni+1 holds e true. Hence Ni is Fibonacci sequence. For a GFS with an even-odd shuﬄe, if a certain number of rounds is suﬃcient to achieve the diﬀusion to all even output blocks, the full diﬀusion is achieved by one more round. Therefore, if i is the smallest integer that satisﬁes Nie ≥ k/2, i + 1 is the lower bound of DRmax for all even-odd shuﬄes for k blocks √ (not necessarily achievable). As i-th term of i Fibonacci sequence is about τ / 5, where τ is the golden ratio, this lower bound √ is roughly logτ 5k/2 log2 1.44k. In our search, we found block shuﬄes that attain the lower bound described above for all k ≤ 8 (see Fig. 4). 18 16

Type-II, Nyberg's GFS Optimum GFS Lower Bound

DRmax**k DRmax

14 12 10 8 6 4 2

4 4

6 6

8 8

10 10

12 12

Partition Partition Number Number kk

14 14

16 16

Fig. 4. Search result and lower bound for k ≤ 16

4 4.1

A Shuﬄe Family with Good Diﬀusion Graphical Interpretation of Diﬀusion Rounds

The search result of previous section reveals optimum shuﬄes up to k = 16. Since the cost of exhaustive search exponentially grows with k, a diﬀerent approach is certainly required to ﬁnd a shuﬄe with a good diﬀusion for larger k. In this section we introduce a graph-theoretic interpretation of the DRmax evaluation problem for even-odd shuﬄes, and propose an even-odd shuﬄe family having much better DRmax than that of cyclic shift, for k being a power of two. For any shuﬄe of order k, π, let π[∗] : {0, . . . , k − 1} → {0, . . . , k − 1} denote the corresponding index mapping (i.e., π(x0 , . . . , xk−1 ) = (xπ[0] , . . . , xπ[k−1] )).

26

T. Suzaki and K. Minematsu

For any even-odd shuﬄe π, there is an equivalent, compact directed graph. Deﬁnition 2. For any even-odd shuﬄe π of even order k, the corresponding graph of π, denoted by G[π], is a directed graph with order m = k/2. The vertices of G[π] are labeled with {0, 1, . . . , m − 1}. Every arc (a directed edge) of G[π] is colored red or blue, and is determined as – if π[i] = j for even i and odd j, there is an arc colored red from node i/2 to node (j − 1)/2 – if π[i ] = j for odd i and even j , there is an arc colored blue from node (i − 1)/2 to node j/2. For example, the graph of the cyclic shift is in Fig, 5, where red arcs are written as thin lines and blue ones are written as thick ones.

0

1

2

3

Fig. 5. Graph for the cyclic shift with k = 8 r

b

For two vertices x, x of G[π], we write x − → x (x − → x ) if there is a red (blue) arc from x to x . Clearly, the in- and out-degrees of G[π] are 2. Every node of G[π] has one outgoing red arc and one outgoing blue arc (i.e. one arc is red and the other is blue) and has one incoming red arc and one incoming blue arc. This condition is known as the arc-coloring of the second type [27]. If we have a path (a sequence of connected arcs) between any two vertices of a directed graph G, we say G is strongly connected. Moreover, if G has in- and out-degrees being 2, and has an arc-coloring of the second type, we say G is a proper shuﬄe graph. For any proper shuﬄe graph G of order N , we have a corresponding even-odd shuﬄe for 2N elements. The following deﬁnition plays a crucial role in evaluation of DRmax. Deﬁnition 3. Let G be a proper shuﬄe graph. A directed path between two vertices of G is appropriate if its ﬁrst and last arcs are blue-colored and there are no successive red arcs. If there always exists an appropriate path of length L between any two (possibly the same) vertices, G is said to be L-appropriatelyreachable. The minimum of such L is deﬁned as the suﬃcient distance (SD) of G, and denoted by SD(G). In addition, if there always exists a path (not necessarily be appropriate) of length L between any two vertices, G is said to be L-reachable [27] and the minimum of such L is deﬁned as the weak suﬃcient distance (WSD) of G, denoted by W SD(G). Note that WSD can be deﬁned for directed graphs with single-colored arcs. For any proper shuﬄe graph G with SD(G) = L, G is (L + i)-appropriatelyreachable for any i ≥ 0 and Diam(G) ≤ W SD(G) ≤ SD(G), where Diam(G) is the diameter of G, i.e., the maximum distance of any two vertices.

Improving the Generalized Feistel

27

Proposition 1. If SD(G[π]) = L for even-odd shuﬄe π, DRmax(π) ≤ L + 1. This proposition is easy to verify. From the deﬁnition of SD, for any sub input block, Xi0 , we have a connected data path from Xi0 to XjL for any even j. As underlying shuﬄe is even-odd, this means that there are a connected data path from Xi0 to XhL+1 , for any odd h. Also, we have a connected data path from XjL to XhL+1 for any even j and h , by going through an F function. Thus, we have a connected path from Xi0 to XjL+1 for any j. 4.2

Colored de Bruijn Graph

Proposition 1 implies that if shuﬄe π has a small DRmax, G[π] will also have a small SD. Then, how we can build such a graph? We answer this question by using the well-known de Bruijn graph. Deﬁnition 4. The binary de Bruijn graph, denoted by dB(s) = (Vs , Es ), is a directed graph of order N = 2s for non-negative integer s. Its vertex set Vs is {0, 1}s, or corresponding integer set {0, . . . , N −1} (we interchangeably use). The arc set, Es ⊆ Vs × Vs , is deﬁned as Es = {(u, u ) : u = (u1 , u2 , . . . us−1 , us ), u = (u2 , u3 , . . . , us , w), w ∈ {0, 1}}. It is obvious that in- and out-degrees of dB(s) are two. Also, it is well-known that Diam(dB(s)) = s and it is even s-reachable, since we can move to any node u by choosing s successive arcs according to the bits of u in descending order. The diameter of dB(s) is minimal for all directed graphs with order 2s and maximum degree 2. Now, our task is to color the arcs of dB(s) so that it has an arc-coloring of the second type. Our coloring function is quite simple, which is as follows. Deﬁnition 5. For any s ≥ 2, let CF : Es → {0, 1} be the coloring function of arcs of dB(s), where 0 and 1 denote red and blue. For u = (u1 , u2 , . . . us−1 , us ) and v = (v1 , v2 , . . . vs−1 , vs ), it is deﬁned as vs if u1 = us , CF(u, v) = = us , vs + 1 if u1 where vs +1 denotes the complement of vs . The colored de Bruijn graph, CdB(s), is deﬁned as the binary de Bruijn with CF arc-coloring. Formally, CdB(s) = (Vs , Es ) with Es = (u, v, CF(u, v)) for all (u, v) in Es of dB(s). That is, if (u, v) ∈ c(CF(u,v))

Es , CdB(s) has an arc u −−−−−−→ v with mapping c(0) = r and c(1) = b. Fig. 6 depicts CdB(3) and CdB(4), where a thick line denotes a blue arc and a r thin line denotes a red arc. Note that, if u − → u in CdB(s), we have u1 = u2 , u2 = u3 , . . . , us−1 = us , us = u1 + us

(2)

28

T. Suzaki and K. Minematsu b

and if u − → u in CdB(s) we have u1 = u2 , u2 = u3 , . . . , us−1 = us , us = u1 + us + 1.

(3)

It is easy to see that CdB(s) for s ≥ 2 has an arc-coloring of second type3 , thus CdB(s) is a proper shuﬄe graph of order 2s and we have a corresponding even-odd shuﬄe for 2s+1 elements, for s ≥ 2. For concrete representations of block shuﬄes built from CdB(s), see Appendix A. As Fig. 6 shows, the graph is symmetric including the arc-coloring, thus the corresponding shuﬄes are also symmetric. This property will be beneﬁcial for implementation.

0000 1000

000 100

0001 0100

001

1001

010 1100

1010

101 110

0101

0011

0110 011

111

0010

1101

1011

1110

0111 1111

Fig. 6. Colored de Bruijn Graphs CdB(3) (Left), and CdB(4) (Right)

4.3

Suﬃcient Distance of Colored de Bruijn Graph

We want to know (a bound of ) SD(CdB(s)). It can be expected to be small from the minimality of dB(s)’s diameter, however, this expectation has to be theoretically veriﬁed. To do this, we focus on the successive arc pairs of CdB(s). Deﬁnition 6. Let G = (V, E) be a proper shuﬄe graph of order N , where E ⊆ = (V, E), is a directed graph V × V × {r, b}. Its double-path graph, denoted by G ⊆ V × V × {rb, bb}. For any (u, w, v) ∈ V 3 with the same vertex set as G, and E and for any (u , w , v ) ∈ V 3 with (u, w, b), (w, v, b) ∈ E, we have (u, v, bb) ∈ E with (u , w , r), (w , v , b) ∈ E, we have (u , v , rb) ∈ E. 3

This does not hold true when s = 1. The valid colorings for dB(1) are ones that implement the left and right cyclic shifts. Also, CF is not the unique solution to provide an arc-coloring of the second type.

Improving the Generalized Feistel

29

b b bb r b has u− In other words, if G has u− →w− →v for some w, G →v, and if G has u− →w − →v rb has u−→v. Note that we did not use all pair of arcs (such as br for some w , G may not be strongly connected. and rr), and G For the double-path graph of CdB(s), we have the following.

Lemma 1. The double-path graph of the colored de Bruijn graph, CdB(s), is isomorphic to CdB(s) itself under the arc-label mapping r → rb and b → bb. Proof. If x is an s-bit value, we write xi to denote its i-th bit, i.e., x = (x1 , . . . , xs ). From Let u, u , v, v be vertices of the double-path graph of CdB(s), CdB(s). rb

bb

Equations (2) and (3), when u −→ v and u −→ v , we have v = (u3 , u4 , . . . , us , u1 + us , u1 + u2 + us + 1), v = (u3 , u4 , . . . , us , u1 + us + 1, u1 + u2 + us ).

(4)

To prove the lemma, we do separate analyses for even and odd s. First, assume s is even. Let t = s/2 + 1. We deﬁne the mapping f : {0, 1}s → {0, 1}s . For f (x) = y, x, y ∈ {0, 1}s , f is deﬁned as if i is even x 2i +t−1 (5) yi = x i−1 +1 + x i−1 +t + 1 if i is odd, 2

2

for i = 1, . . . , s. Note that f is invertible; to obtain x from y, we ﬁrst get xt , . . . , xs as corresponding y’s even bits, and add them to the odd bits of y. What we shall prove is that f is an isomorphism from CdB(s) to CdB(s) with an arc-label mapping deﬁned as r → rb and b → bb. To prove this, we need to r rb b → x in CdB(s) then f (x) −→ f (x ) in CdB(s), and (2) iﬀ x − → x show (1) iﬀ x − bb r in CdB(s) then f (x) −→ f (x ) in CdB(s). Let us assume x − → x in CdB(s), and let y = f (x) and y = f (x ). Since xi = xi+1 for i = 1, . . . , s − 2, we have yi = xi +t−1 = x i +t−1+1 = x i+2 +t−1 = yi+2 , 2

2

yj

2

= x j−1 +1 + x j−1 +t + 1 = x (j+2)−1 +1 + x (j+2)−1 +t+1 + 1 = yj+2 , 2

2

2

(6)

2

, we have for all even 2 ≤ i ≤ s − 2 and all odd 1 ≤ j ≤ s − 3. For ys−1 = xs−2 +1 + xs−2 +t + 1 = xt−1 + xs + 1 = xt + (x1 + xs ) + 1 = y1 + ys , ys−1 2

2

from Eq. (2). For ys , we have ys = xs = x1 + xs = y1 + y2 + ys + 1. Hence rb

y = (y3 , y4 , . . . , ys , y1 + ys , y1 + y2 + ys + 1), which means y −→ y from Eq. (4). b

Next, we assume x ˆ− →x ˆ in CdB(s) and let yˆ = f (ˆ x) and yˆ = f (ˆ x ). The ﬁrst s − 2 bits of yˆ are the same as Eq. (6), and =x ˆt−1 + x ˆs + 1 = x ˆt + (ˆ x1 + x ˆs + 1) + 1 = yˆ1 + yˆs + 1, yˆs−1

30

T. Suzaki and K. Minematsu

from Eq. (3). Also yˆs = x ˆs = x ˆ1 + xˆs + 1 = yˆ1 + yˆ2 + yˆs . Thus yˆ = bb

(ˆ y3 , yˆ4 , . . . , yˆs , yˆ1 + yˆs + 1, yˆ1 + yˆ2 + yˆs ), which means the walk yˆ −→ yˆ from Eq. (4). This proves the direct part of the lemma for even s. The converse (i.e., r rb b if x − → x in CdB(s) then f (x) − → f (x ) in CdB(s) and if x − → x in CdB(s)

bb → f (x ) in CdB(s)) is easy. For odd s, we use a slight diﬀerent then f (x) − isomorphism. Since the proof is almost the same, we omit it here.

Let us consider to build an appropriate path from u to v, for two (possibly b

the same) vertices u, v of CdB(s). We assume u − → w. From Lemma 1 and is s-reachable, there always exists a path of length 2s from w to that CdB(s) v in CdB(s), where the last arc is colored blue. This implies the existence of appropriate path of length 2s + 1 from u to v in CdB(s). Thus we have proved the following. Lemma 2. SD(CdB(s)) ≤ 2s + 1. Using Lemma 2 and Proposition 1, we can build a block shuﬄe of k = 2s+1 (for any s ≥ 2) whose DRmax is at most 2s + 2 = 2 log2 k. As mentioned in Section 3, the lower bound of DRmax is about 1.44 log2 k, derived from Fibonacci sequence. Hence, the diﬀusion property of CdB(s) is close to the optimum. Related Work. Massey [15] also proposed a graphical representation of block shuﬄes. However the meaning of arc-coloring is diﬀerent, i.e., a thick (thin) line denotes a mapping from even (odd) input to even (odd) output block. He combined a block shuﬄe, called Armenian Shuﬄe, with a two-block linear operation called PHT to form a diﬀusion layer of an Substitution-Permutation Network (SPN) block cipher, SAFER+. His notion of diﬀusion (for the diﬀusion layer, not for the block shuﬄe itself) is diﬀerent from us, which is close to the branch number. Armenian Shuﬄe is based on dB(3). However, it is not even-odd. No arc-coloring rule of the second type (and the idea of SD) was presented in [15]. Hence, even though the basic methodology of us and [15] have some similarities, our proposal has many important diﬀerences.

5 5.1

Security Pseudorandomness

For a new block cipher structure, it is typical to ask its pseudorandomness in the idealized setting. This kind of analysis is needed to see if there is a structural ﬂaw in the proposal, as mentioned by [20]. For this, we have to prove the maximum prp-advantage and sprp-advantage in an idealized setting, deﬁned as prp

def

max | Pr[AC = 1] − Pr[APn = 1]|, and ,

AdvC (q) =

A:q−CPA

(q) = Advsprp C

A:q−CCA

def

max

−1

| Pr[AC,C

−1

= 1] − Pr[APn ,Pn = 1]|.

(7)

Improving the Generalized Feistel

31

Here, C is the encryption function of an n-bit block cipher with some idealized functions as internal modules. Pn is an n-bit uniform random permutation (URP), which is distributed uniformly over all n-bit permutations. Their inversions are written as C−1 and P−1 n . The adversary, A, tries to distinguish C from Pn using q encryption queries, i.e., chosen-plaintext attack (CPA), or (C, C−1 ) from (Pn , P−1 n ) using q encryption and decryption queries, i.e., chosen-ciphertext attack (CCA). The ﬁnal guess of A is either 0 or 1, and the probability that A’s guess is 1 when A queries C is written as Pr[AC = 1], where probability is deﬁned by the randomness of A and C. The maximums in Eq. (7) are taken for all adversaries with q queries without computational restriction. For example, the prp sprp seminal Luby-Rackoﬀ’s result [14] proved that AdvΦ3 and AdvΦ4 are O(q 2 /2n ), where Φr is the r-round, 2n-bit block Feistel structure with round functions being n-bit uniform random functions (URFs). This also means that 3-round Feistel is a pseudorandom permutation (PRP) and 4-round one is a strong PRP (SPRP), if round functions are pseudorandom functions (PRFs). These results are considered as a theoretical justiﬁcation of basic Feistel ciphers. Similar analysis was done for other structures, e.g., Misty structure [9][12]. For the pseudorandomness of Type-II GFS, the following result is known. Lemma 3. (by [28][20]) Let TypeIIr,k be the r-round, kn-bit Type-II GFS with partition number k and round functions being n-bit URFs. Then we have prp

AdvTypeIIk+1,k (q) ≤

k2 q2 k2 q2 sprp , and AdvTypeII2k,k (q) ≤ n . n 2 2

Using the idea of Mitsuda and Iwata [19], the pseudorandomness of any GFS with an even-odd shuﬄe π can be evaluated via its suﬃcient distance. Theorem 1. Let π be an even-odd shuﬄe of order k, and let GFSr,k denote the r-round GFS with shuﬄe π. Its block size is kn bits, partition number is k, and all round functions are independent n-bit URFs. Then we have kL 2 q if SD(G[π]) ≤ L, and 2n+1 kL sprp AdvGFS2L+2,k (q) ≤ n q 2 if max{SD(G[π]), SD(G[π −1 ])} ≤ L. 2 prp

AdvGFSL+2,k (q) ≤

Proof. For x = (x0 , . . . , xk−1 ) ∈ ({0, 1}n)k , let x[i] = xi . Following [19], if keyed permutation over ({0, 1}n)k , H, satisﬁes max Pr[H(x)[i] = H(x )[i] for some even i ∈ {0, . . . , k − 1}] ≤ , and x =x

max Pr[H(x)[i] = H(x )[i] for some odd i ∈ {0, . . . , k − 1}] ≤ , x =x

then H is called -AUe and -AUo , respectively [19]. Then we obtain q k prp AdvGFS2,k ◦H1 (q) ≤ + n+1 · 2 2

(8)

32

T. Suzaki and K. Minematsu

by extending the lemma 9 and theorem 7 of Maurer [17] (We omit the proof here. This slightly improves the constant of the result of [19] for Type-II GFS.). For any two distinct x and x , we have L , for any even i ∈ {0, . . . , k − 1}. (9) 2n Eq. (9) is easily proved as follows. W.l.o.g. we assume (x0 , x1 ) = (x0 , x1 ) and estimate the probability of GFSL,k (x)[0] = GFSL,k (x )[0] . From the assumption, there is an appropriate path of length L in G[π], whose start and goal are vertex 0. For h = 1, . . . , L, we can deﬁne a sequence of internal outputs, Zh = GFSh,k (x)[s(h)] , with s(h) following that appropriate path (e.g. s(1) = π[1] and s(L) = 0). Obviously we have Pr[Z1 = Z1 ] = Pr[F (x0 ) ⊕ x1 = F (x0 ) ⊕ x1 ] ≤ 1/2n , when F is URF. Moreover, using the independence of all round functions, we have Pr[Zj = Zj |Zj−1 = Zj−1 ] ≤ 1/2n for any j = 2, . . . , L. Therefore, L Pr[ZL = ZL ] is at most j=2 Pr[Zj = Zj |Zj−1 = Zj−1 ] + Pr[Z1 = Z1 ] ≤ L/2n . This indicates that Eq. (9) is true, and thus GFSL,k is 2k·L n+1 -AUe from the union bound. From this and Eq. (8), prp-advantage of GFS L+2,k is at most q k 2 ( 2k·L ≤ 2k·L n+1 + 2n+1 ) 2 n+1 q . This proves the ﬁrst claim of Theorem 1. To prove the second, we consider two independently-keyed permutations over ({0, 1}n )k , H1 and H2 . We assume H1 is 1 -AUe and H2 is 2 -AUo . Then we have q 1 sprp AdvH −1 ◦GFS ◦H (q) ≤ 1 + 2 + kn (10) ≤ (1 + 2 ) q 2 1 2,k 2 2 2 Pr[GFSL,k (x)[i] = GFSL,k (x )[i] ] ≤

using similar arguments as those of [17][19][18]. −1 −1 Let GFSr,k be GFS−1 r,k without the ﬁnal shuﬄe (i.e. GFSr,k = πk ◦GFSr,k ). As −1 k·L SD(G[π −1 ]) ≤ L, GFS−1 L,k is 2n+1 -AUe . As π is even-odd, πk is also even-odd. k·L Hence GFSL,k is 2n+1 -AUo , and the sprp-advantage of (GFSL,k )−1 ◦ GFS2,k ◦ 2 = k·L q 2 from Eq. (10). Of course GFSL,k = π −1 ◦ GFS2L+2,k is at most 2 2k·L n+1 q 2n the last application of π −1 has no impact on security, hence this bound also holds for GFS2L+2,k . This proves the second claim. Combining Lemma 2 and Theorem 1, we obtain the pseudorandomness result for GFS with shuﬄe derived from CdB(s), when k = 2s+1 . Corollary 1. Let Ωr,k be the kn-bit GFS with shuﬄe from CdB(s) for k = 2s+1 for s ≥ 2, where all round functions are independent n-bit URFs. Then we have 2k log k 2 4k log k 2 sprp q , and AdvΩ4 log k,k (q) ≤ q , n 2 2n where the base of logarithm is 2. prp

AdvΩ2 log k+1,k (q) ≤

Corollary 1 demonstrates the power of colored de Bruijn: TypeIIr,k needs k ∼ 2k rounds, while Ωr,k needs only 2 log k ∼ 4 log k rounds. The gain is obtained for any k ≥ 8 being a power of two. If k is not a power of two, we cannot use CdB(s). In such a case, we have to search a proper shuﬄe graph of order 2k having a small SD. The search result of Section 3 implies the existence of a graph with SD about 2 log k for any k, but proving this is an open problem.

Improving the Generalized Feistel

5.2

33

Evaluation of Security against Cryptanalysis

We also evaluate the security against the known cryptanalysis. In particular, we consider impossible diﬀerential attack, saturation attack and diﬀerential/linear cryptanalysis and evaluate the security of GFS with the optimum block shuﬄes found by the search. In the evaluation, we treat the round function as an arbitrary bijective function and assume that there is no pair of nonzero input and output diﬀerential that holds with probability 1. Each evaluation is carried out to derive the number of rounds for characteristic we focus. Speciﬁc numbers of round for characteristic are provided in Appendix A. Impossible Diﬀerential Attack. An impossible diﬀerential attack [2] uses the diﬀerential characteristic of probability zero (impossible diﬀerential characteristic, IDC) to eliminate wrong key candidates. Here, IDC is represented as a pair of input and output diﬀerences, e.g., (α β), α, β ∈ ({0, 1}n )k for a k-partition cipher E with P r[E(x) + E(x + α) = β] = 0 for any x. To ﬁnd an IDC, we use the U-method of Kim et al. [13]. U-method can eﬃciently search for IDCs using a truncated diﬀerence in sub block units classiﬁed by ﬁve types: zero diﬀerence, nonzero unﬁxed diﬀerence, nonzero ﬁxed diﬀerence, exclusive-or of nonzero ﬁxed and unﬁxed diﬀerences, and unﬁxed diﬀerence. These types are denoted by 0, δ, γ, δ + γ, and R, respectively. For r ≥ 1, αr = (αr0 , αr1 , . . . , αrk−1 ) denotes the output diﬀerence after r-round and α0 denotes the input diﬀerence. Each coordinate of αr is classiﬁed into one of the ﬁve types described above. Note that α0 always consists of types 0 and γ. For decryption, the output diﬀerence is denoted r2 by β r . If there is a contradiction between αr1 i and βi (0 ≤ i ≤ k − 1), the path from α to β is impossible, i.e., (α β) is an IDC of (r1 + r2) rounds. We can determine the number of round for IDC from DRmax, which is as follows. After DRmax(π) rounds, we have two cases: for some odd i has type γ, there exists a data path, P , Case 1: If αDRmax i that does not pass through any F (i.e., the equation corresponding to that path has type δ, then does not contain Xi0 as a part of arguments of F ). If αDRmax i−1 DRmax+1 DRmax αj with j = π[i] has type γ ⊕ δ. Here, if βj has type γ, it is an IDC for 2DRmax + 1 rounds. Case 2: If all data paths pass through at least one F function, Both αDRmax and β DRmax do not contain type γ. From the deﬁnition of DRmax, DRmax−1 rounds does not achieve full diﬀusion. Thus αDRmax must contain type 0. This implies that we can detect a contradiction involving diﬀerence of type 0 and diﬀerence δ or γ. Accordingly, the number of round for IDC is at most 2DRmax − 1. To see the tightness of above analysis, we also perform U-method. As a result, the number of round for IDC for any GFS with optimum shuﬄe is one of 2DRmax − 2, 2DRmax − 1, and 2DRmax + 1. Saturation Attack. Saturation Attack [7] works for block ciphers using permutations over a small space, hence it is a strong attack on GFS with invertible round functions. This attack exploits the fact that the output sum of a

34

T. Suzaki and K. Minematsu

permutation is zero when all inputs are given to determine whether the guessed key is correct. We deﬁne the following four states for a set of 2n n-bit inputs: : if ∀i, j i = j ⇔ xi = xj Constant (C) : if ∀i, j xi = xj All (A)

2n −1 Balance (B) : if x = 0 Unknown (U ) : Other i i The saturation characteristics (SC) is of form (α → β), α ∈ {C, A}k , β ∈ {C, A, B, U }k , where the input state α contains at least one A and the output state β is not all U s. We investigated the maximum number of rounds for which an SC exists, for all optimum shuﬄes. For this purpose, we used a method developed by the evaluation report of CLEFIA [26]. Here we brieﬂy describe the method. See [26] for details. To apply the method of [26], we can use DRmax, as well as the case of IDC. Note that, in case of IDC we only need to focus on a data path not passing through F , while in case of SC we also have to consider data paths passing through F , as input state A (to some F ) implies output state A. We ﬁrst search an SC, (α → β), such that α consists of one A for an even sub block and k − 1 Cs. From the deﬁnition of DRmax, the state after DRmax encryption rounds does not contain C. Let us assume that the state after DRmax contains two As for i-th and i + 1-th blocks for some even i. By adding one more round, the state of j-th(j = π[i + 1]) sub block is B(= F (A) ⊕ A). After another round, the state of s-th(s = π[j + 1]) sub block is U (= F (B) ⊕ any) and that of t-th(t = π[j]) sub block is B. Then, in the next round, the states of all sub blocks become U (= F (U ) ⊕ any). Therefore, an SC (containing one A and k − 1 Cs) exists for at most DRmax + 2 rounds. For other cases, it is easy to conﬁrm that such an SC exists for at most DRmax + 1 rounds. Next, the SC (α → β) we found in the above is used to ﬁnd another SC, (α → α → β), where α is an intermediate state and α is not all As (as this means the attack using all plaintexts). Following [26], this can be done by adding rounds to the input state α. Using the fact that α contains only one A, we can add at most DRmax − 2 rounds to obtain a valid SC. Therefore, the maximum number of rounds for SC is at most DRmax + 2 + DRmax − 2 = 2DRmax. In fact, we conﬁrmed that the maximum number of rounds for SC was either 2DRmax or 2DRmax − 1 for all optimum shuﬄes we searched. Diﬀerential / Linear Cryptanalysis. Diﬀerential cryptanalysis (DC) and linear cryptanalysis (LC) are the most basic attacks on block ciphers. Because it is generally diﬃcult to obtain a strict maximum diﬀerential/linear probability, we usually count the number of active S-boxes instead [25][1]. If the maximum diﬀerential probability of S-box is p and the number of active S-boxes is N , pN is the maximum diﬀerential characteristic probability (DCPmax ), which serves as one index of security against DC. For suﬃcient security, DCPmax ≤ 2−kn is required. Similarly, by counting the minimum number of active S-boxes with respect to linear masking, we can derive the maximum linear characteristic probability (LCPmax ), which works as an index of security against LC. For each GFS with an optimum shuﬄe, we evaluate the minimum numbers of active S-boxes for DC and LC, denoted by ASD and ASL , for 20 rounds. The result is in Appendix A. The number of active S-boxes is diﬀerent for individual shuﬄes, even

Improving the Generalized Feistel

35

Table 3. Number of active S-boxes of every round (Diﬀerential and Linear) Round k=8 Type-II No.2 k = 16 Type-II No.1

1 0 0 0 0

2 1 1 1 1

3 2 2 2 2

4 3 3 3 3

5 4 4 4 4

6 6 6 6 6

7 7 8 7 8

8 9 10 9 11

9 10 12 10 14

10 13 12 13 19

11 15 14 15 21

12 17 16 17 24

13 18 16 19 25

14 19 18 22 27

15 20 20 24 30

16 21 20 26 31

17 22 22 28 33

18 24 24 32 36

19 25 24 35 37

20 27 26 39 39

if their DRmax are the same; some are better than the Type-II and others are worse. From this result, we expect that the security against DC/LC of GF Sπ with optimum π is the same level as Type-II GFS. A typical construction of F is of the form F (x) = S(K ⊕x), where K is the key and S : {0, 1}n → {0, 1}n is S-box. Following AES, if S-box is based on the ﬁeld inversion over GF (2n ), its DPmax (LPmax ) is 2−n+2 if n is even and 2−n+1 if n is odd. Assuming such F for n = 8 (thus DPmax and LPmax are 2−6 ), we derive the number of rounds for diﬀerential/linear characteristics. Here we focus on Type-II and an optimum GFS using CdB(s). For these two structures, Table. 3 shows ASD and ASL for every round up to 20. Due to their duality the ﬁgures of ASD and ASL are basically the same. Let us assume k = 8, i.e., a 64-bit block cipher. In this case, if ASD is less than 11 there is an exploitable diﬀerential characteristic as (2−6 )10 > 2−64 . Thus, Type-II GFS has 9-round diﬀerential characteristic while optimum GFS has 8-round one. Similarly, if we set k = 16 (i.e., 128-bit block), ASD must be less than 22 to have an exploitable diﬀerential characteristic. Table 3 shows the existences of 13-round characteristics for TypeII GFS and 11-round one for optimum GFS. The same results hold true for LC. From these analyses, we think our proposal slightly improves the security against DC/LC, or at least provides the same level of security as that of Type-II GFS.

6

Concluding Remarks

We have shown that the diﬀusion property of Type-II GFS can be improved by only changing the internal block shuﬄe from the cyclic shift. Based on a concrete notion of diﬀusion, we have searched all optimum shuﬄes up to partition number k ≤ 16. We also proposed a block shuﬄe family based on a de Bruijn graph for k being a power of two, and proved that their diﬀusion property is close to the best possible. We then conﬁrmed that such block shuﬄes can be used to improve the resistance of GFS against some cryptanalysis, such as impossible diﬀerential attack and saturation attack, and improve the eﬃciency with respect to pseudorandomness. While known instances of Type-II GFS ciphers have relatively small partition number to keep a balance of speed and implementation size, our results enables to build a fast, secure GFS cipher having a large partition number. One might think of using diﬀerent shuﬄes for each round to achieve a better (or at least comparable) diﬀusion than ours in return for a larger implementation

36

T. Suzaki and K. Minematsu

cost. For example, we can alternately use a two block-wise swap (i.e. (a, b, c, d) → (b, a, d, c), two-round classical Feistel), which oﬀers a local diﬀusion, and another global shuﬄe. For some small k we observed that this two-round Feistel-based scheme can oﬀer a diﬀusion property as good as our optimum ones. However, it is open if multiple shuﬄes can contribute to a smaller DRmax than ours.

Acknowledgments We would like to thank Yukiyasu Tsunoo, Norifumi Kamiya, Toshihiko Okamura, Hiroyasu Kubo, Maki Shigeri, Teruo Saito, Takeshi Kawabata, Hirokatsu Nakagawa, Jinhui Chao and anonymous referees for helpful comments.

References 1. Aoki, K., Ichikawa, T., Kanda, M., Matsui, M., Moriai, S., Nakajima, J., Tokita, T.: Camellia: A 128-bit block cipher suitable for multiple platforms. In: Stinson, D.R., Tavares, S. (eds.) SAC 2000. LNCS, vol. 2012, pp. 41–54. Springer, Heidelberg (2001) 2. Biham, E., Biryukov, A., Shamir, A.: Cryptanalysis of skipjack reduced to 31 rounds using impossbile diﬀerentials. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 12–23. Springer, Heidelberg (1999) 3. Biham, E., Shamir, A.: Diﬀerential cryptanalysis of DES-like cryptosystems. In: Menezes, A., Vanstone, S.A. (eds.) CRYPTO 1990. LNCS, vol. 537, pp. 2–21. Springer, Heidelberg (1991) 4. Bogdanov, A., Knudsen, L.R., Leander, G., Paar, C., Poschmann, A., Robshaw, M.J.B., Seurin, Y., Vikkelsoe, C.: PRESENT: An Ultra-Lightweight Block Cipher. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 450–466. Springer, Heidelberg (2007) 5. Carter, L., Wegman, M.: Universal Classes of Hash Functions. Journal of Computer and System Science 18, 143–154 (1979) 6. Crowley, P.: Mercy: A Fast Large Block Cipher for Disk Sector Encryption. In: Schneier, B. (ed.) FSE 2000. LNCS, vol. 1978, pp. 49–63. Springer, Heidelberg (2001) 7. Daemen, J., Knudsen, L.R., Rijmen, V.: The block cipher SQUARE. In: Biham, E. (ed.) FSE 1997. LNCS, vol. 1267, pp. 149–165. Springer, Heidelberg (1997) 8. Fiol, M.A., Alegre, I., Yebra, J.L.A., F´ abrega, J.: Digraphs with walks of equal length between vertices. In: Graph Theory with Applications to Algorithms and Computer Science. John Wiley and Sons, Inc., Chichester (1985) 9. Gilbert, H., Minier, M.: New Results on the Pseudorandomness of Some Blockcipher Constructions. In: Matsui, M. (ed.) FSE 2001. LNCS, vol. 2355, pp. 248–266. Springer, Heidelberg (2002) 10. Halevi, S., Rogaway, P.: A Parallelizable Enciphering Mode. In: Okamoto, T. (ed.) CT-RSA 2004. LNCS, vol. 2964, pp. 292–304. Springer, Heidelberg (2004) 11. Hong, D., Sung, J., Hong, S., Lim, J., Lee, S., Koo, B., Lee, C., Chang, D., Lee, J., Jeong, K., Kim, H., Kim, J., Chee, S.: HIGHT: A New Block Cipher Suitable for Low-Resource Device. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 46–59. Springer, Heidelberg (2006) 12. Iwata, T., Yoshino, T., Yuasa, T., Kurosawa, K.: Round Security and SuperPseudorandomness of MISTY Type Structure. In: Matsui, M. (ed.) FSE 2001. LNCS, vol. 2355, pp. 233–247. Springer, Heidelberg (2002) 13. Kim, J., Hong, S., Sung, J., Lee, C., Lee, S.: Impossible Diﬀerential Cryptanalysis for Block Cipher Structures. In: Johansson, T., Maitra, S. (eds.) INDOCRYPT 2003. LNCS, vol. 2904, pp. 82–96. Springer, Heidelberg (2003)

Improving the Generalized Feistel

37

14. Luby, M., Rackoﬀ, C.: How to Construct Pseudo-random Permutations from Pseudo-random functions. SIAM J. Computing 17(2), 373–386 (1988) 15. Massey, J.: On the Optimality of SAFER+ Diﬀusion. In: Second AES Candidate Conference. National Institute of Standards and Technology (1999) 16. Matsui, M.: Linear cryptanalysis of the data encryption standard. In: Helleseth, T. (ed.) EUROCRYPT 1993. LNCS, vol. 765, pp. 386–397. Springer, Heidelberg (1994) 17. Maurer, U.: Indistinguishability of Random Systems. In: Knudsen, L.R. (ed.) EUROCRYPT 2002. LNCS, vol. 2332, pp. 110–132. Springer, Heidelberg (2002) 18. Minematsu, K.: Tweakable Enciphering Schemes from Hash-Sum-Expansion. In: Srinathan, K., Rangan, C.P., Yung, M. (eds.) INDOCRYPT 2007. LNCS, vol. 4859, pp. 252–267. Springer, Heidelberg (2007) 19. Mitsuda, A., Iwata, T.: Tweakable Pseudorandom Permutation from Generalized Feistel Structure. In: Baek, J., Bao, F., Chen, K., Lai, X. (eds.) ProvSec 2008. LNCS, vol. 5324, pp. 22–37. Springer, Heidelberg (2008) 20. Moriai, S., Vaudenay, S.: On the Pseudorandomness of Top-Level Schemes of Block Ciphers. In: Okamoto, T. (ed.) ASIACRYPT 2000. LNCS, vol. 1976, pp. 289–302. Springer, Heidelberg (2000) 21. Nyberg, K.: Generalized Feistel Networks. In: Kim, K.-c., Matsumoto, T. (eds.) ASIACRYPT 1996. LNCS, vol. 1163, pp. 90–104. Springer, Heidelberg (1996) 22. Junod, P., Macchetti, M.: Revisiting the IDEA Philosophy. In: Dunkelman, O. (ed.) FSE 2009. LNCS, vol. 5665, pp. 277–295. Springer, Heidelberg (2009) 23. Rivest, R.L., Robshaw, M.J.B., Sidney, R., Yin, Y.L.: The RC6 block cipher, v1.1, August 20 (1998), http://people.csail.mit.edu/rivest/Rc6.pdf 24. Rogaway, P., Bellare, M., Black, J., Krovetz, T.: OCB: a block-cipher mode of operation for eﬃcient authenticated encryption. In: ACM Conference on Computer and Communications Security, ACM CCS 2001, pp. 196–205 (2001) 25. Shirai, T., Shibutani, K., Akishita, T., Moriai, S., Iwata, T.: The 128-bit Blockcipher CLEFIA. In: Biryukov, A. (ed.) FSE 2007. LNCS, vol. 4593, pp. 181–195. Springer, Heidelberg (2007) 26. Sony Corporation, The 128-bit Blockcipher CLEFIA Security and Performance Evaluations, Revision 1.0 (June 1, 2007), http://www.sony.co.jp/Products/cryptography/clefia/technical/data/ clefia-eval-1.0.pdf 27. West, D.B.: Introduction to Graph Theory. Prentice-Hall, NJ (1996) 28. Zheng, Y., Matsumoto, T., Imai, H.: On the Construction of Block Ciphers Provably Secure and Not Relying on Any Unproved Hypotheses. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 461–480. Springer, Heidelberg (1989)

A

Optimum Block Shuﬄes

Table 4,5 and 6 show the optimum shuﬄes found in the search and their security evaluation. We eliminate isomorphic shuﬄes. Type-II and Nyberg’s GFN are also evaluated. Shuﬄes based on de Bruijn graph is indicated by ∗. A shuﬄe is presented in list: π = {3, 0, 1, 4, 5, 2} means that the ﬁrst input sub block is mapped to the third sub block of output, etc. Here, D denotes DRmax. IDC and SC denote the maximum numbers of rounds for impossible diﬀerential characteristics and saturation characteristics. ASD and ASL are as deﬁned by Section 5.2. We evaluated both π and π −1 . If SC (or ASD , ASL ) is written as x/y, x is for π and y is for π −1 . Otherwise the evaluations were the same for π and π −1 .

38

T. Suzaki and K. Minematsu Table 4. Result of security evaluation for k=6,8,10 and 12

k=6 TypeII Nyberg No.1 k=8 TypeII Nyberg No.1 No.2∗ k = 10 TypeII Nyberg No.1 No.2 No.3 k = 12 TypeII Nyberg No.1 No.2 No.3 No.4 No.5 No.6 No.7 No.8 No.9 No.10 No.11 No.12 No.13 No.14 No.15 No.16 No.17 No.18 No.19 No.20 No.21 No.22 No.23 No.24 No.25 No.26 No.27 No.28 No.29 No.30 No.31 No.32

block shuﬄe π {5,0,1,2,3,4} {2,0,4,1,5,3} {3,0,1,4,5,2} block shuﬄe π {7,0,1,2,3,4,5,6} {2,0,4,1,6,3,7,5} {3,0,1,4,7,2,5,6} {3,0,7,4,5,6,1,2} block shuﬄe π {9,0,1,2,3,4,5,6,7,8} {2,0,4,1,6,3,8,5,9,7} {5,0,7,2,9,6,3,8,1,4} {3,0,1,4,7,2,5,8,9,6} {3,0,7,4,1,6,5,8,9,2} block shuﬄe π {11,0,1,2,3,4,5,6,7,8,9,10} {2,0,4,1,6,3,8,5,10,7,11,9} {3,0,7,2,9,4,11,8,5,10,1,6} {3,0,7,2,11,4,1,8,5,10,9,6} {7,0,9,2,11,4,1,8,5,10,3,6} {5,0,9,2,1,6,11,4,3,10,7,8} {5,0,7,2,1,6,11,8,3,10,9,4} {5,0,7,2,3,6,11,8,1,10,9,4} {5,0,7,2,3,6,11,8,9,10,1,4} {5,0,7,2,9,6,11,8,3,10,1,4} {5,0,9,2,1,6,11,8,7,10,3,4} {5,0,9,2,3,6,11,8,1,10,7,4} {5,0,9,2,3,6,11,8,7,10,1,4} {3,0,1,4,7,2,9,8,5,10,11,6} {3,0,1,4,7,2,11,8,9,10,5,6} {3,0,7,4,9,2,11,8,1,10,5,6} {3,0,7,4,11,2,1,8,9,10,5,6} {3,0,7,4,11,2,5,8,1,10,9,6} {7,0,1,4,9,2,11,8,3,10,5,6} {7,0,1,4,11,2,5,8,3,10,9,6} {7,0,3,4,11,2,1,8,5,10,9,6} {7,0,9,4,11,2,5,8,3,10,1,6} {3,0,7,4,1,6,11,8,5,10,9,2} {3,0,7,4,5,6,11,8,1,10,9,2} {3,0,7,4,9,6,5,8,1,10,11,2} {3,0,7,4,9,6,11,8,5,10,1,2} {3,0,7,4,11,6,1,8,5,10,9,2} {3,0,7,4,11,6,5,8,9,10,1,2} {3,0,9,4,1,6,5,8,7,10,11,2} {3,0,9,4,1,6,11,8,7,10,5,2} {3,0,9,4,11,6,1,8,7,10,5,2} {3,0,9,4,11,6,5,8,7,10,1,2} {3,0,11,4,1,6,5,8,9,10,7,2} {3,0,11,4,9,6,1,8,5,10,7,2}

block shuﬄe π −1 {1,2,3,4,5,0} {1,3,0,5,2,4} {1,2,5,0,3,4} block shuﬄe π −1 {1,2,3,4,5,6,7,0} {1,3,0,5,2,7,4,6} {1,2,5,0,3,6,7,4} {1,6,7,0,3,4,5,2} block shuﬄe π −1 {1,2,3,4,5,6,7,8,9,0} {1,3,0,5,2,7,4,9,6,8} {1,8,3,6,9,0,5,2,7,4} {1,2,5,0,3,6,9,4,7,8} {1,4,9,0,3,6,5,2,7,8} block shuﬄe π −1 {1,2,3,4,5,6,7,8,9,10,11,0} {1,3,0,5,2,7,4,9,6,11,8,10} {1,10,3,0,5,8,11,2,7,4,9,6} {1,6,3,0,5,8,11,2,7,10,9,4} {1,6,3,10,5,8,11,0,7,2,9,4} {1,4,3,8,7,0,5,10,11,2,9,6} {1,4,3,8,11,0,5,2,7,10,9,6} {1,8,3,4,11,0,5,2,7,10,9,6} {1,10,3,4,11,0,5,2,7,8,9,6} {1,10,3,8,11,0,5,2,7,4,9,6} {1,4,3,10,11,0,5,8,7,2,9,6} {1,8,3,4,11,0,5,10,7,2,9,6} {1,10,3,4,11,0,5,8,7,2,9,6} {1,2,5,0,3,8,11,4,7,6,9,10} {1,2,5,0,3,10,11,4,7,8,9,6} {1,8,5,0,3,10,11,2,7,4,9,6} {1,6,5,0,3,10,11,2,7,8,9,4} {1,8,5,0,3,6,11,2,7,10,9,4} {1,2,5,8,3,10,11,0,7,4,9,6} {1,2,5,8,3,6,11,0,7,10,9,4} {1,6,5,2,3,8,11,0,7,10,9,4} {1,10,5,8,3,6,11,0,7,2,9,4} {1,4,11,0,3,8,5,2,7,10,9,6} {1,8,11,0,3,4,5,2,7,10,9,6} {1,8,11,0,3,6,5,2,7,4,9,10} {1,10,11,0,3,8,5,2,7,4,9,6} {1,6,11,0,3,8,5,2,7,10,9,4} {1,10,11,0,3,6,5,2,7,8,9,4} {1,4,11,0,3,6,5,8,7,2,9,10} {1,4,11,0,3,10,5,8,7,2,9,6} {1,6,11,0,3,10,5,8,7,2,9,4} {1,10,11,0,3,6,5,8,7,2,9,4} {1,4,11,0,3,6,5,10,7,8,9,2} {1,6,11,0,3,8,5,10,7,4,9,2}

D 6 6 5 D 8 8 6 6 D 10 10 7 7 7 D 12 12 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8

IDC 13 11 9 IDC 17 14 11 10 IDC 21 17 12 13 12 IDC 25 20 15 15 14 15 14 14 15 15 14 15 15 15 14 14 14 14 14 14 14 14 14 14 14 17 14 14 14 14 15 15 14 14

SC 12 12 10 SC 16 15 11 11 SC 20 18 13 13 13 SC 24 21 15 16 15 15 15 15 15 15 15 15 15 15/16

15 15 14/15

15 15 15 15 15 15 14 15 16/15

15 15/14

15 15 15 15 15 15

ASD 25 18 25 ASD 27 18 30 26 ASD 32 20 34 33 35 ASD 37 16 34 33 36 37 35 35 35 33 35 30 35 33 37 34 28 35 35 33 37 34 35 36 34 35 36 34 33 34 33 33 35 34

ASL 25 18 25 ASL 27 18 30 26 ASL 32 20 34 33 35 ASL 37 16 34 33 36 37 35 35 35 33 35 30 35 33 37 33 28 35 35 34 37 34 34 34 35 35 36 36 33 35 34 33 35 33

Improving the Generalized Feistel

39

Table 5. Result of security evaluation for k=14 k = 14 TypeII Nyberg No.1 No.2 No.3 No.4 No.5 No.6 No.7 No.8 No.9 No.10 No.11 No.12 No.13 No.14 No.15 No.16 No.17 No.18 No.19 No.20 No.21 No.22 No.23

block shuﬄe π {13,0,1,2,3,4,5,6,7,8,9,10,11,12} {2,0,4,1,6,3,8,5,10,7,12,9,13,11} {1,2,9,4,3,6,13,8,7,10,11,12,5,0} {1,2,9,4,13,6,7,8,5,10,3,12,11,0} {1,2,11,4,3,6,13,8,9,10,7,12,5,0} {5,2,1,4,11,6,3,8,13,10,9,12,7,0} {5,2,9,4,1,6,13,8,7,10,3,12,11,0} {5,2,13,4,11,6,3,8,1,10,9,12,7,0} {1,2,7,4,3,6,13,8,5,12,9,10,11,0} {1,2,7,4,5,6,11,8,3,12,13,10,9,0} {1,2,7,4,11,6,13,8,5,12,3,10,9,0} {1,2,9,4,3,6,7,8,11,12,13,10,5,0} {1,2,9,4,5,6,11,8,7,12,13,10,3,0} {1,2,9,4,11,6,7,8,5,12,13,10,3,0} {1,2,11,4,9,6,3,8,7,12,13,10,5,0} {1,2,11,4,13,6,7,8,5,12,9,10,3,0} {5,2,1,4,11,6,9,10,7,8,13,0,3,12} {5,2,9,4,1,6,13,10,11,8,7,0,3,12} {5,2,9,4,3,6,1,10,7,8,13,0,11,12} {5,2,9,4,11,6,3,10,7,8,13,0,1,12} {7,2,1,4,9,6,3,10,11,8,13,0,5,12} {7,2,1,4,9,6,5,10,3,12,13,0,11,8} {1,2,11,4,3,8,5,6,13,0,7,12,9,10} {1,2,9,6,3,4,13,0,5,10,7,12,11,8} {1,2,9,6,13,4,3,0,7,10,5,12,11,8}

block shuﬄe π −1 {1,2,3,4,5,6,7,8,9,10,11,12,13,0} {1,3,0,5,2,7,4,9,6,11,8,13,10,12} {13,0,1,4,3,12,5,8,7,2,9,10,11,6} {13,0,1,10,3,8,5,6,7,2,9,12,11,4} {13,0,1,4,3,12,5,10,7,8,9,2,11,6} {13,2,1,6,3,0,5,12,7,10,9,4,11,8} {13,4,1,10,3,0,5,8,7,2,9,12,11,6} {13,8,1,6,3,0,5,12,7,10,9,4,11,2} {13,0,1,4,3,8,5,2,7,10,11,12,9,6} {13,0,1,8,3,4,5,2,7,12,11,6,9,10} {13,0,1,10,3,8,5,2,7,12,11,4,9,6} {13,0,1,4,3,12,5,6,7,2,11,8,9,10} {13,0,1,12,3,4,5,8,7,2,11,6,9,10} {13,0,1,12,3,8,5,6,7,2,11,4,9,10} {13,0,1,6,3,12,5,8,7,4,11,2,9,10} {13,0,1,12,3,8,5,6,7,10,11,2,9,4} {11,2,1,12,3,0,5,8,9,6,7,4,13,10} {11,4,1,12,3,0,5,10,9,2,7,8,13,6} {11,6,1,4,3,0,5,8,9,2,7,12,13,10} {11,12,1,6,3,0,5,8,9,2,7,4,13,10} {11,2,1,6,3,12,5,0,9,4,7,8,13,10} {11,2,1,8,3,6,5,0,13,4,7,12,9,10} {9,0,1,4,3,6,7,10,5,12,13,2,11,8} {7,0,1,4,5,8,3,10,13,2,9,12,11,6} {7,0,1,6,5,10,3,8,13,2,9,12,11,4}

D 14 14 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8

IDC SC ASD ASL 29 28 39 39 23 24 15 15 15 15 39 39 14 15 40 40 14 15 40 40 15 16 39 40/39 15 15/16 37 37 15 15 37 37 15 15 39 39 15 15 37/38 39 15 15/16 39 39 15 16 39 39 15 15 39 37 14 15 38 40 14 15 33 33 14 15 40 38 15 15 38 38 15 15/16 39 39 15 15 38 38 14 15 39 39 15 15 39 39 14 15 40 40 14 15 38 38 14 15 41 41 15 16 36 36

Table 6. Result of security evaluation for k=16 k = 16

block shuﬄe π

block shuﬄe π −1

D IDC

SC

TypeII

{15,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}

{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,0}

16

33

32

39

39

Nyberg

{2,0,4,1,6,3,8,5,10,7,12,9,14,11,15,13}

{1,3,0,5,2,7,4,9,6,11,8,13,10,15,12,14}

16

26

27

16

16

No.1∗

{1,2,9,4,15,6,5,8,13,10,7,14,11,12,3,0}

{15,0,1,14,3,6,5,10,7,2,9,12,13,8,11,4}

8

15

16

39

39

No.2

{1,2,11,4,9,6,7,8,15,12,5,10,3,0,13,14}

{13,0,1,12,3,10,5,6,7,4,11,2,9,14,15,8}

8

15

15

35

36

No.3

{1,2,11,4,9,6,15,8,5,12,7,10,3,0,13,14}

{13,0,1,12,3,8,5,10,7,4,11,2,9,14,15,6}

8

15

15

38

38

No.4

{5,2,9,4,1,6,11,8,15,12,3,10,7,0,13,14}

{13,4,1,10,3,0,5,12,7,2,11,6,9,14,15,8}

8

15

15

39

26

No.5

{5,2,9,4,11,6,15,8,3,12,1,10,7,0,13,14}

{13,10,1,8,3,0,5,12,7,2,11,4,9,14,15,6}

8

14

15

41

41

No.6

{5,2,11,4,1,6,15,8,3,12,13,10,7,0,9,14}

{13,4,1,8,3,0,5,12,7,14,11,2,9,10,15,6}

8

15

15

26

39

No.7

{1,2,11,4,3,6,7,8,15,12,5,14,9,0,13,10}

{13,0,1,4,3,10,5,6,7,12,15,2,9,14,11,8}

8

14

15

40

40

No.8

{1,2,11,4,9,6,7,8,15,12,13,14,3,0,5,10}

{13,0,1,12,3,14,5,6,7,4,15,2,9,10,11,8}

8

15

15

26

26

No.9

{1,2,11,4,9,6,15,8,5,12,7,14,3,0,13,10}

{13,0,1,12,3,8,5,10,7,4,15,2,9,14,11,6}

8

15

16/15

42

42

No.10

{7,2,13,4,11,8,3,6,15,0,9,10,1,14,5,12}

{9,12,1,6,3,14,7,0,5,10,11,4,15,2,13,8}

8

14

15

44

44

No.11

{7,2,13,4,11,8,9,6,15,0,3,10,5,14,1,12}

{9,14,1,10,3,12,7,0,5,6,11,4,15,2,13,8}

8

15

16

38

38

No.12

{1,2,11,4,15,8,3,6,7,0,9,12,5,14,13,10}

{9,0,1,6,3,12,7,8,5,10,15,2,11,14,13,4}

8

15

16

42

42

No.13

{5,2,11,6,13,8,15,0,3,4,9,12,1,14,7,10}

{7,12,1,8,9,0,3,14,5,10,15,2,11,4,13,6}

8

15

15

35

35

ASD ASL

Nonlinear Equivalence of Stream Ciphers Sondre Rønjom1 and Carlos Cid2 1 Crypto Technology Group, Norwegian National Security Authority, Bærum, Norway 2 Information Security Group, Royal Holloway, University of London Egham, United Kingdom

Abstract. In this paper we investigate nonlinear equivalence of stream ciphers over a ﬁnite ﬁeld, exempliﬁed by the pure LFSR-based ﬁlter generator over F2 . We deﬁne a nonlinear equivalence class consisting of ﬁlter generators of length n that generate a binary keystream of period dividing 2n −1, and investigate certain cryptographic properties of the ciphers in this class. We show that a number of important cryptographic properties, such as algebraic immunity and nonlinearity, are not invariant among elements of the same equivalence class. It follows that analysis of cipher-components in isolation presents some limitations, as it most often involves investigating cryptographic properties that vary among equivalent ciphers. Thus in order to assess the resistance of a cipher against a certain type of attack, one should in theory determine the weakest equivalent cipher and not only a particular instance. This is however likely to be a very diﬃcult task, when we consider the size of the equivalence class for ciphers used in practice; therefore assessing the exact cryptographic properties of a cipher appears to be notoriously diﬃcult. Keywords: Stream ciphers, sequences, nonlinear equivalence.

1

Introduction

A stream cipher [8] is a type of encryption algorithm which encrypts individual alphabet elements of a plaintext, one at a time, with a time-varying transformation. Stream ciphers are very popular due to their many attractive features: they are generally fast, can usually be implemented eﬃciently in hardware, have no (or limited) error propagation, and are particularly suitable for environments where no buﬀering is available and alphabet-elements need to be processed individually. It is very common to construct stream ciphers based on linear feedback shift registers (LFSRs). Besides their attractive implementation features, the rich algebraic structure often enables a more formal and detailed security analysis. A ﬁlter generator over F2 is perhaps a stream cipher in its simplest form, with a well-deﬁned mathematical description: it consists of a sequence generator and a Boolean function which produce a keystream based on the state of the register. S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 40–54, 2010. c International Association for Cryptologic Research 2010

Nonlinear Equivalence of Stream Ciphers

41

The security of such a construction is highly reliant on both the properties of the sequence-generator, as well as the properties of the Boolean function. Boolean functions play a very important role in stream cipher design and analysis (as well as in several other cryptographic primitives), and a signiﬁcant amount of literature has been devoted to the study of cryptographic properties of Boolean functions. Cryptanalytic techniques that may exploit these properties include correlation attacks, algebraic attacks, inversion attacks, among others. We note however that for several methods of analysis one often investigates the Boolean function in isolation from the associated sequence generator. For instance, the algebraic normal form of a Boolean function can be constructed and related properties such as algebraic immunity, algebraic degree, nonlinearity and correlation immunity, can be computed to derive the cipher’s security. On the other hand, other types of attacks take advantage of certain properties of the sequence generator. For instance, the Hamming weight of a feedback polynomial should not be low in order to resist correlation attacks; likewise, to resist inversion attacks, the positions of the cipher’s LFSR which a Boolean function taps from, should satisfy additional requirements. In this paper, we discuss and attempt to combine the analysis of both the generator and the corresponding Boolean function. Such an approach has for instance been taken by the authors of [10], enabling a very eﬃcient attack on a class of stream ciphers by identifying certain characteristic structures which are not evident from isolated analysis of the cipher components. Our main focus point is to investigate (nonlinear) equivalence of LFSR-based stream ciphers using basic properties of Galois ﬁelds and certain isomorphisms between the corresponding multiplicative groups. This can be seen as a way of constructing isomorphic ciphers (examples of cipher representations and isomorphisms were provided in [1,9]; the subject was discussed in detail in [2]). We show here that important cryptographic properties such as nonlinearity and algebraic immunity are variant with respect to such an equivalence. The focal point of this paper is therefore: since there are many ciphers generating the same keystream, any cryptographic property should be deﬁned with respect to the weakest equivalent cipher. However, without some type of provable construction, it seems diﬃcult to assess the exact security of a ﬁlter generator for practical sizes, since the class of equivalent ciphers is very large in practice. For instance, there are about 2121 nonlinearly equivalent ﬁlter generators with an LFSR of length 128 over F2 generating a keystream of period 2128 − 1. We note however that we are not concerned here with aﬃne equivalences, as such equivalences are not particularly revealing in general. This paper is organized as follows. In section 2 we present some basic deﬁnitions and deﬁne the notation used in the paper. In section 3, the basic principle of equivalence and change of basis is introduced, and in section 4 we introduce an equivalence class of ﬁlter generators with respect to a periodic sequence. In section 5 we explain how to determine equivalences realised as nonlinear polynomial functions, and in section 6 we reﬂect on some consequences for the design and cryptanalysis of LFSR-based stream ciphers.

42

2

S. Rønjom and C. Cid

Preliminaries

In this section we provide some deﬁnitions which are essential in our analysis; see [6] and [5] for a more detailed discussion of sequences over ﬁnite ﬁelds. Let p be a prime, q = pn , and let Fq denote the ﬁnite ﬁeld with q elements. The order of an element α ∈ Fq is the smallest positive integer k such that αk = 1, denoted by ord(α). An element α with order q − 1 is called a primitive element and its minimal polynomial gα (x) ∈ Fp [x] is called a primitive polynomial. The primitive elements are exactly the generators of F∗q , the multiplicative group consisting of the non-zero elements of Fq . If α is a primitive element of Fq and gcd(k, q − 1) = 1, then any element αk is i also primitive. In particular, the conjugates αp of α are all primitive and form i n−1 the roots of the primitive polynomial gα (x) = i=0 (x − αp ) of degree n over Fp [x]. It follows that there are φ(q − 1) primitive elements of Fq , where φ denotes Euler’s totient function, and that the number of primitive polynomials over Fp of degree n is given by φ(q − 1)/n. If k divides n, then pk − 1 divides q − 1 = pn − 1, and it follows that there is an element β ∈ Fq with order pk − 1. Furthermore, β is a primitive element of Fp (β) Fpk ⊆ Fp (α) Fq . The absolute trace of an element β ∈ Fpk ⊆ Fq is given by Trk1 (β) =

k−1

i

βp ,

i=0

where Trk1 (x) denotes the trace function from Fpk to Fp . We write Tr(x) = Trn1 (x) when there is no room for confusion. If α ∈ Fq is a primitive element, then {1, α, . . . , αn−1 } is a basis of Fq (when considered as a vector space over Fp ). Let s denote a periodic sequence over Fp with period e dividing q−1, viewed as k a vector of length q −1, and let m(x) = i=0 ci xi ∈ Fp [x] be a monic polynomial of degree k. We say that the sequence s satisﬁes the linear recurrence deﬁned by m(x) if c0 at + c1 at+1 + . . . + ck−1 at+k−1 + at+k = 0, for all t ≥ 0. The minimal polynomial of s is the polynomial of least degree whose linear recurrence is satisﬁed by s. We say that a sequence s is irreducible if its minimal polynomial is irreducible over Fp . A sequence s is generated by a polynomial g(x) ∈ Fp [x], if the minimal polynomial ms (x) of s divides g(x). Denote by Ω(g(x)) the vector space spanned by the sequences generated by g(x). If g(x) is primitive, then Ω(g) contains q − 1 cyclically equivalent sequences (in addition to the zero-sequence), and every non-zero sequence in Ω(g) has maximal period q − 1. Such sequences are called maximal sequences (or m-sequences). Let s be an m-sequence over Fp with minimal polynomial ms (x) of degree n, and α ∈ Fq be a root of ms (x) (and thus ms (x) = gα (x)). Then s may be written over Fq in terms of the roots of ms (x) as

Nonlinear Equivalence of Stream Ciphers

st = Tr(Xαt ) =

n−1

43

i

(Xαt )p , t = 0, 1, 2, . . . ,

i=0

where X ∈ F∗q . Furthermore, the q − 1 nonzero choices of X ∈ F∗q result in the q − 1 distinct shifts of the same m-sequence s. In the remaining of this paper, we will consider sequences deﬁned over the ﬁeld F2 , that is, p = 2 and q = 2n . It should be noted however that the analysis provided here can be extended trivially to sequences and ﬁlter generators over any prime extension Fpn . Let R = F2 [x0 , x1 , . . . , xn−1 ] and J be the ideal of R generated by the set {x2i + xi }(0≤i
3

Equivalent Sequence Generators

Our main motivation results from the following observation: an m-sequence s of period q − 1 = 2n − 1 may in general be written in terms of the roots of any primitive polynomial of degree n in F2 [x]. Indeed, let β = αk be a primitive element of F2 (α) Fq . Then gcd(k, q − 1) = 1, and the k-power exponentiation is an automorphism of the multiplicative group F∗q . Furthermore, this automorphism induces the mapping xk : F2 (α) → F2 (β), with inverse xr , where r is the multiplicative inverse of k modulo q − 1. Let s ∈ Ω(gα (x)) be an m-sequence generated by st = Tr(Xαt ),

(1)

and β = αk ∈ F2 (α), where ord(β) = q − 1, and let r · k ≡ 1 (mod q − 1). Then we may rewrite (1) in terms of the primitive element β as st = Tr(Xαt ) = Tr((Y β t )r ),

(2)

where Y ∈ F2 (β) and Y = X is an elementary change of basis. Equation (2) shows how an m-sequence s ∈ Ω(gα ) may be represented (nonlinearly) in terms of the roots of the minimal polynomial of another m-sequence b ∈ Ω(gβ ). In particular, the output of the LFSR satisfying the linear recursion deﬁned by gα (x) may also be generated by a nonlinear ﬁlter generator using an LFSR satisfying the linear recursion deﬁned by gβ (x), as illustrated in the following example. r

44

S. Rønjom and C. Cid

Example 1. Let n = 5, q = 2n = 32 and let F2 (α) F32 , where gα (x) = x5 + x4 + x3 + x2 + 1 ∈ F2 [x] is a primitive polynomial. An m-sequence s ∈ Ω(gα (x)) can be generated by st = Tr(Xαt ), t = 0, 1, 2, . . . , where X ∈ F∗32 . Now let β = α21 and X 21 = Y ∈ F2 (β). It follows that Tr(Xαt ) = Tr((Y β t )3 ), t = 0, 1, 2, . . . , since 3 · 21 ≡ 1 (mod 31). The corresponding ﬁlter generator over F2 (β) is given by st = f (bt , bt+1 , . . . , bt+4 ), t = 0, 1, 2, . . . , where (bt , bt+1 , . . . , bt+4 ) = (Tr(Y β t ), Tr(Y β t+1 ), . . . , Tr(Y β t+4 )), and f (x0 , x1 , x2 , x3 , x4 ) = x0 x2 + x2 x3 + x1 x4 + x2 x4 + x1 + x3 . The two ﬁlter generators (one of them is linear) will generate identical sequences for all possible initial states X and Y = X 21 , and they are thus equivalent sequence generators. Notice that the function f has algebraic immunity 2 and nonlinearity 12, which is maximal for a quadratic Boolean function in 5 variables. Thus, on the basis of certain types of analysis, one of the ciphers appears to be secure while the other is not. Example 1 illustrates that the Boolean function corresponding to the tracerepresentation over F2 (β) may possess strong cryptographic properties in general. Thus, if we investigate the security of the whole cipher by analysing the Boolean function of a particular ﬁlter generator in isolation, we might perhaps conclude (erroneously) that it is a cryptographically strong cipher.

4

Equivalence of Filter Generators

In order to simplify the presentation, we introduce the following notation. Definition 1. Let X, α ∈ F∗q . Then deﬁne the vector S(Xαt ) = (Trk1 (Xαt ), Trk1 (Xαt+1 ), . . . , Trk1 (Xαt+k−1 )) ∈ Fk2 , where k = dim(F2 (α)) and k divides n. The vector S(Xαt ) is equivalent to the state at time t of an LFSR with characteristic polynomial gα (x) of degree k and initial state S(X) ∈ Fk2 . In the remaining of this paper, we will only consider the case k = n. We view any Boolean function in r ≤ n variables as a polynomial in Bn = R/J. For convenience in the presentation, we have the following deﬁnition.

Nonlinear Equivalence of Stream Ciphers

45

Definition 2. Let β, X ∈ F∗q , bt = Tr(Xβ t ) be a linear recurrence sequence and f ∈ Bn a Boolean function. Deﬁne a sequence Lβ (f, t, X) = (f (S(Xβ t )), f (S(Xβ t+1 )), . . . , f (S(Xβ t+q−2 ))), of length q − 1, with entries f (S(Xβ t )) = f (bt , bt+1 , . . . , bt+n−1 ). Let Lβ (f ) be the set of sequences deﬁned as Lβ (f ) = {Lβ (f, 0, X) | X ∈ F∗q }. The set Lβ (f ) can be seen as the set of all possible keystream output sequences (of length q − 1) from a ﬁlter generator, whose LFSR has feedback polynomial gβ (x) and ﬁltering function f . The non-zero elements X ∈ F∗q determine the initial state of the LFSR. The period of the sequences in Lβ (f ) depend on the order of β and the function f ; for instance, it should be clear that the period of any sequence in Lβ (f ) cannot be greater than ord(β), and in fact must divide ord(β) (in particular, it divides q − 1). In general, if ord(β) = q − 1, then for a random function f ∈ Bn , the sequence Lβ (f ) have almost surely period q − 1. When considering the set of (polynomial) Boolean functions Bn = R/J, we can deﬁne a surjective homomorphism ϕ from Bn to the set of sequences over F2 of length q − 1 = 2n − 1 as ϕ : Bn → Fq−1 2 f → sf , where sf corresponds to the truth-table of f in all points of Fn2 except (0, . . . , 0). It can be shown that ker(ϕ) = h, where h(x0 , . . . , xn−1 ) = n−1 i=0 (xi + 1), and as a result ϕ(f1 ) = ϕ(f2 ) if, and only if, f1 ≡ f2 mod h. Moreover, since given by

h = {0, h}, we have the counter-image of any sequence s ∈ Fq−1 2 ϕ−1 (s) = {fs , fs∗ } ⊂ Bn ,

(3)

where fs∗ = fs + h. Note that fs , fs∗ are the functions that coincide in the set Fn2 \ {(0, . . . , 0)} (with image in this set given by the sequence s), but with fs (0, . . . , 0) = fs∗ (0, . . . , 0). and β ∈ F∗q , let Definition 3. For a sequence s ∈ Fq−1 2 Vβ (s) = {f ∈ Bn | s ∈ Lβ (f )}. In other words, we can consider Vβ (s) as the set of all ﬁlter generators with feedback polynomial gβ (x) that generate s as its ﬁrst q − 1 terms. The following lemma summarises the conjugation property of such sets. Lemma 1. For any s ∈ Fq−1 and β ∈ F∗q , we have 2 Vβ 2i (s) = Vβ 2j (s), 0 ≤ i, j ≤ n − 1.

46

S. Rønjom and C. Cid

The above lemma follows directly from the fact that gβ 2i (x) = gβ 2j (x). We then have the following lemma. Lemma 2. Let s ∈ Fq−1 denote a periodic sequence with e = per(s) and β ∈ F∗q , 2 where per(s) | ord(β). Then |Vβ (s)| ≤

e(q − 1) q−ord(β) ·2 . ord(β)

Proof. Let w = ord(β) and X be the subgroup of F∗q generated by β. The subgroup X has index k = (q − 1)/w in F∗q , and thus there are k elements 1 = X0 , X1 , X2 , . . . , Xk−1 ∈ F∗q such that the sets Xi = Xi X form a partition of F∗q (these are the cosets of X in F∗q ). We can thus associate the sets Xi ⊆ F∗q with the distinct and non-intersecting ordered sets Vi = {S(Xi β t ) = vit | t = 0, 1, . . . , w − 1} ⊆ Fn2 . It is clear that the elements X0 , X1 , . . . , Xk−1 ∈ F∗q result in the k distinct and shift-nonequivalent state-cycles of the corresponding LFSR with period w. Let H = {h0 , h1 , . . . , hk−1 } ⊆ Bn be the set of Boolean polynomials such that hi (x) = 0 if x ∈ Vi and hi (x) = 1 if x ∈ Fn2 \Vi . The ideal hi consists of the set of all functions in Bn that are zero when restricted to Vi . Since Vi has cardinality w, then | hi | = 2q−w for every i. Given s ∈ Fq−1 with period e, for every Vi we can deﬁne the function fi ∈ Bn 2 as fi (vit ) = st for 0 ≤ t ≤ w − 1, and fi (x) = 0 if x ∈ Fn2 \Vi . Thus fi ∈ Vβ (s). Furthermore, it is clear that if gi ≡ fi mod hi , then gi (vit ) = st for 0 ≤ t ≤ w − 1, and gi ∈ Vβ (s). Now, by considering the w shift-equivalent sets Vi of the ordered set Vi , we obtain shift-equivalent functions to the elements of the set Fi = {fi + gi | gi ∈

hi } ⊂ Bn . In fact we get e = per(s) such functions for each element in Fi . Thus, for every Vi , we have e · 2q−w functions in Vβ (s). We can repeat the above with all k sets Vi to obtain k · e · 2q−w =

e(q − 1) q−w ·2 w

elements in Vβ (s).

The inequality in lemma 2 is necessary in case per(s) < 2 − 1, since it may then be the case that some of the functions are counted several times. However, the main motivation of this paper is sequences of maximal period and one should note that when per(s) = ord(β) = 2n − 1, then |Vβ (s)| = 2(q − 1); in fact, we have that ϕ−1 (s) contains the two representatives of the shift equivalence classes in Vβ (s) (assuming the natural ordering on the elements of Fn2 induced by the cyclic group generated by β). This fact also implies the following lemma. n

Lemma 3. Let β be a primitive element of Fq , and f ∈ Bn . If s1 and s2 are sequences in the set Lβ (f ), then Vβ (s1 ) = Vβ (s2 ). We note that when β is not primitive, then lemma 3 is not necessarily true.

Nonlinear Equivalence of Stream Ciphers

47

In the following deﬁnition, we assume sequences with period e|(q − 1), where e is not a divisor of 2k − 1, 0 < k < n. That is, we assume that the sequences are generated by ﬁlter generators consisting of irreducible LFSRs of length n. Definition 4. Let s ∈ Fq−1 be a sequence with period e dividing q − 1, where e 2 is not a divisor of 2k − 1, with 0 < k < n. Then let Gn (s) = {Vβ (s) | β ∈ Fq , e | ord(β)}. In other words, the set Gn (s) may be viewed as a class of ﬁlter generators of length n that generate s as a keystream. For sequences with period e dividing q − 1, the size of Gn is given by the following proposition. has period e dividing q − 1, where e is not a divisor Proposition 1. If s ∈ Fq−1 2 of 2k − 1, with 0 < k < n, then |Gn (s)| = φ(w)/n, e|w

where the sum is extended over all positive divisors w of q − 1. Proof. By restricting the class Gn (s) to sequences with period e, where e is not a divisor of 2k − 1, with 0 < k < n, we are restricting the sets Vβ to elements β with minimal polynomial of degree n over F2 . Thus, we need only count the distinct irreducible polynomials in F2 [x] of degree n with periods of which e is a divisor. The following corollary then follows immediately, which is of most interest for this paper. has period q − 1, then Corollary 1. If s ∈ Fq−1 2 |Gn (s)| = φ(q − 1)/n, where φ(q − 1) is the number of generators of the multiplicative group of Fq . Thus when per(s) = q − 1, the set Gn (s) contains φ(q − 1)/n elements, where each element Vβ (s) contains two equivalent functions with respect to Fn2 \ {0} (without counting aﬃne equivalences). There are thus in total 2 · φ(q − 1)/n

(4)

distinct ﬁlter generators with feedback-polynomial of degree n that generate s (again, without counting aﬃne equivalences). Remark 1. Assume that we determine Gn for a sequence s of period e < q − 1 and assume that the sequence stems from a ﬁlter generator with irreducible (but not primitive) feedback polynomial of degree n. Such ﬁlter generators (most often) produce r = (q − 1)/e shift-nonequivalent sequences of period e. Thus, the equivalence only encapsulates one out of r = (q−1)/e sequences generated by that generator, and we are only guaranteed that the two generators are equivalent for a subset of initial states. Thus, Gn induce a strong equivalence for sequences with periods 2k −1 (see Proposition 2), and a weak form of equivalence otherwise. This will be studied in closer detail in a follow-up paper.

48

S. Rønjom and C. Cid

We have restricted Gn (s) to the set of ﬁlter generators with feedback polynomial of degree n for the purpose of simplicity and clarity of the presentation. Our main focus are sequences with period q − 1, the case of ﬁlter generators with a primitive feedback polynomial, in which the equivalence class Gn becomes especially simple and clear. While it is possible to generalise Gn into more complex equivalence classes oﬀering more insight in cryptanalysis, it is out of the scope of this paper. In particular, one may generalise Gn by incorporating combiner generators that generate the same sequences or for instance ﬁlter generators based on NLFSRs. For instance, it should be clear that a sequence generated by a combiner generator, can also be generated by a ﬁlter generator, and vice-versa. It is especially simple to deduce equivalent ciphers generating a sequence of period q − 1 in terms of nonlinear equivalences of Boolean functions. In the following section, we describe how to deduce isomorphic ﬁlter generators in the case of sequences of period q − 1.

5

Structure of Equivalent Functions

With access to a ﬁlter generator that generates a sequence a, we may in fact generate all other equivalent ﬁlter generators. Let F2 (α) Fq and let β = αk be a primitive element of F2 (α). Then for any elements X ∈ F2 (α) and Y ∈ F2 (β), let φβ (x0 , x1 , . . . , xn−1 ) be the vectorial Boolean function which maps states S(Xαt ) ∈ Fn2 to states S(Y β t ) ∈ Fn2 . Moreover, we have that φβ (x0 , x1 , . . . , xn−1 ) = (y0 , y1 , . . . , yn−1 ), and thus φβ (S(Xαt )) = S(Y β t ). Now if we select a function fα (x0 , x1 , . . . , xn−1 ) ∈ Bn , then we may compute another function by fα (x0 , x1 , . . . , xn−1 ) = fα (x0 , x1 , . . . , xn−1 ) ◦ φ−1 β (y0 , y1 , . . . , yn−1 ) −1 = fα (φ−1 β,0 (y0 , . . . , yn−1 ), . . . , φβ,n−1 (y0 , . . . , yn−1 )) = fβ (y0 , y1 , . . . , yn−1 ),

where φ−1 β (y0 , y1 , . . . , yn−1 ) is the inverse of φβ (x0 , x1 , . . . , xn−1 ). And since Y = X k , it follows that at = fβ (S(Y β t )) = fα (S(Xαt )), t = 0, 1, 2, . . . , which corresponds to two ﬁlter generators with distinct LFSRs and ﬁlter functions, but which generate the same sequence a. In the case of sequences of period q − 1, we need only determine one element fα ∈ Vα (a) ∈ Gn (a), and then determine the other elements of Gn (a) by ∗ composing fα with nonlinear maps φ−1 γ for each primitive element γ ∈ Fq .

Nonlinear Equivalence of Stream Ciphers

49

Remark 2. From the trace-representation of one ﬁlter generator (using a univariate polynomial P (x) ∈ Fq [x]/(xq − x)), it is much simpler to derive the trace-representation of the equivalent ﬁlter generators and then transform back to the ANF form. The univariate representation of the equivalent sequence generators are of the form P (xk ), where all such polynomials have exactly the same weight and the equivalent functions are no more complicated in this sense. The following proposition follows from lemma 3 and the discussion in this section. Proposition 2. Let s1 , s2 ∈ Fq−1 be sequences of period q − 1, and assume 2 there is β ∈ Fq a primitive element, such that Vβ (s1 ) = Vβ (s2 ). Then Gn (s1 ) = Gn (s2 ).

6

Cryptanalytic Implications

If we restrict ourselves to keystream-sequences of period q − 1, which is the common case for sequences generated by ﬁlter generators, then it follows from (4) that there are 2 · |Gn (s)| isomorphic ﬁlter generators generating the same keystream sequence, excluding aﬃne equivalence. Thus, in order to assess the cryptographic properties of a ﬁlter generator, one should in theory check whether there exist in this class weak isomorphic ciphers with respect to some cryptographic property. In particular, it should be clear that any cryptographic property must be deﬁned with respect to the weakest isomorphic cipher. This motivates a deﬁnition of the following type. Definition 5. Let P be a cryptographic measurement of a ﬁlter generator S, which generates a sequence s. Then the ﬁlter generator S is said to be P-resistant only if there is no isomorphic ﬁlter generator S with measurement P < P. For example, consider a stream cipher S with a ﬁlter generator structure, which may employ a weak ﬁlter function that enables a successful algebraic attack. The results of the previous section imply that it is likely that there exists a cipher S isomorphic to S, that has a cryptographically much stronger Boolean function, and in turn may have been considered secure in that sense. An argument that supports this, while certainly not a proof, is given in Example 1 and in the next two subsections. One can do the same type of argument with respect to any other ﬁlter generator, in that a randomly chosen isomorphic cipher may look more secure than a speciﬁcally designed instance, in the classical view of cryptanalysis. In cryptanalysis, it is clear that one would go for the weakest isomorphic cipher. It would in principle be possible to construct a trap-door function this way. However, such a direction would require further analysis, as such applications seem apparently ineﬃcient in general. Remark 3. Although out of the scope of this paper, as a general result, it would be interesting to divide Bn into classes of Boolean functions which are equivalent with respect to both nonlinear and linear equivalence. The main goal would be to

50

S. Rønjom and C. Cid

measure the amount of cryptographically strong Boolean functions. This could be achieved by dividing Bn into classes consisting of nonlinearly equivalent functions, together with the aﬃne equivalences of those, and pick one representative from each such class. Such a class would be invariant regardless of the generator of F∗q . It should be noted that such a class would be much larger and general than the usual aﬃne equivalences studied in literature, and would restrict the set of representatives of Boolean functions much further. In the following section we discuss two properties of ﬁlter generators of cryptographic relevance, and how the results of this paper may be applied in the analysis of stream ciphers. 6.1

Algebraic Attacks

Algebraic attacks against stream ciphers were originally proposed by Courtois and Meier in [3]. The attack is a powerful technique against ﬁlter generators, and works by constructing systems of equations derived from the cipher operations, which can be solved using a choice of methods. Protection against algebraic attacks may for instance be reached by using ﬁltering functions f of high degree, which neither f nor its complement f + 1 have low degree multiples. Algebraic degree and algebraic immunity are two properties of Boolean functions which are aﬃne invariant. However, we have the following lemma when considering the equivalence Gn . Lemma 4. The algebraic degree and algebraic immunity of a Boolean function f are not invariant with respect to Gn (s). This is clearly seen in Examples 1 and 2 (in the Appendix). It is then for instance useful to have the following deﬁnition of algebraic immunity with respect to the equivalence Gn . Definition 6. Let f ∈ Bn be a ﬁlter function used in a ﬁlter generator generating a sequence s ∈ Lα (f ) of period q − 1, where we let F2 (α) Fq . A more general algebraic immunity of a Boolean function f can be deﬁned as GAI(f ) = min(AI(fβ ) | fβ ∈ Vβ (s), for all Vβ (s) ∈ Gn (s)). However, it is not apparent whether the algebraic immunity of fβ is less than f or not in general. One could argue that if f contains less than n variables, then the equivalent functions will probably have higher algebraic immunity (since they probably involve all n variables). We consider this as a general open problem arising from our work. 6.2

Correlation Attacks

Correlation attack (see [11] and [7]) is another type of attack which has shown to be particularly successful against stream ciphers. A full treatment of the potential impact of our analysis on correlation attacks will be discussed on an forthcoming paper. Nevertheless, the purpose of this section is to show that:

Nonlinear Equivalence of Stream Ciphers

51

1) current analysis of distance from a nonlinear function to the space of aﬃne (linear) functions is incomplete with respect to LFSR-based stream ciphers; 2) the notion of so-called weak feedback polynomials needs reﬁnement. In order to address 1), we only need to point out the fact that there is not only one linear basis, but several. Assume that F2 (α) Fq . If we let gαk (x) = nk k·2i ), where nk = dim(F2 (αk )), it follows that i=0 (x + α xq−1 − 1 = gαk (x), k∈C(n)

where C(n) ⊂ {0, 1, 2, 3, . . . , q − 2} denotes the coset-leaders modulo q − 1. In the following, for a polynomial p(x) ∈ F2 [x] dividing xq − x, denote by dH ( Ω(p) , a ) = (min( dH ( s, a) ) | s ∈ Ω(p)), the minimal distance between the vector space Ω(p) of sequences spanned by p(x) . Then we have the following deﬁnition of generalised and a sequence a ∈ Fq−1 2 correlations and distance to linear functions (linear subspaces). Definition 7. Let a ∈ Fq−1 . Then deﬁne the minimal distance between a and 2 a linear subspace by N1 (a) = min( dH ( Ω(gαk ) , a ) ) | 0 ≤ k ≤ q − 2), Assume that a is in Lα (f ). Then, if for l2 ∈ Bn we have that dH (a, Lα (l1 )) > dH (a, Lβ (l2 )) for all linear functions l1 ∈ Bn , it follows that a correlation attack is more successful on the equivalent function fβ . Some correlation attacks (see for instance [4]) involve analysing LFSRs with low-weight feedback polynomials (or certain other nice properties). Such correlation analysis assume that the Boolean function models a binary symmetric channel (BSC) with certain correlation probability. Thus, it is sometimes possible to construct parity-check equations that relate the keystream to the underlying sequence-generator and allowing for instance one to mount a distinguishing attack. However, due to the fact that one may choose an equivalent ﬁlter generator with any desirable primitive polynomial (for instance a trinomial), it is clear that such analysis is not complete without taking into account the exact channel modelled by the Boolean function. If not, then this would mean that there always exists a cipher among the equivalent ciphers that is susceptible to correlation analysis, which is probably not true.

7

Conclusions and Future Research

Given a LFSR-based stream cipher S generating a sequence s, we showed how to deﬁne an equivalence class Gn (s), consisting of all ﬁlter generators of length n that produce s as output (and in most cases of interest, of all ﬁlter generators equivalent to S). In general, several properties of cryptographic relevance

52

S. Rønjom and C. Cid

are not invariant among the elements of Gn (s), and as a result it does not appear to make sense to conclude the security properties of a ﬁlter generator by, for instance, analysing the algebraic degree or algebraic immunity of the corresponding Boolean function, the properties such as the weight of the polynomial deﬁning the LFSR, or the position of the registers that are tapped as input to the Boolean function. In particular, our analysis makes it clear that one cannot generally analyse the components of a stream cipher separately, as it is usual in practice. The natural object of analysis is the equivalence class Gn (s), and thus we believe that no analysis is complete without considering all of its elements. Furthermore, we note that the idea presented here can be generalised into more complete equivalence classes. For example, instead of restricting oneself to the set of ﬁlter generators generating a particular sequence, one may instead deﬁne an equivalence with respect to the set of all possible combiner-generators generating a periodic sequence, in which cryptanalysis becomes much more ﬁnegrained. We plan to explore this subject in more detail in a follow-up paper.

Acknowledgements The work described in this paper was carried out while the ﬁrst author was visiting Royal Holloway, University of London, supported by the Norwegian Research Council. The work has also been supported in part by the European Commission through the IST Programme under contract ICT-2007-216646 ECRYPT II.

References 1. Barkan, E., Biham, E.: How Many Ways Can You Write Rijndael? In: Zheng, Y. (ed.) ASIACRYPT 2002. LNCS, vol. 2501, pp. 160–175. Springer, Heidelberg (2002) 2. Cid, C., Murphy, S., Robshaw, M.J.B.: An Algebraic Framework for Cipher Embeddings. In: Smart, N.P. (ed.) Cryptography and Coding 2005. LNCS, vol. 3796, pp. 278–289. Springer, Heidelberg (2005) 3. Courtois, N., Meier, W.: Algebraic Attacks on Stream Ciphers with Linear Feedback. In: Biham, E. (ed.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 345–359. Springer, Heidelberg (2003) 4. Englund, H., Hell, M., Johansson, T.: Correlation attacks using a new class of weak feedback polynomials. In: Roy, B., Meier, W. (eds.) FSE 2004. LNCS, vol. 3017, pp. 127–142. Springer, Heidelberg (2004) 5. Golomb, S.W., Gong, G.: Signal Design for Good Correlation: For Wireless Communication, Cryptography, and Radar. Cambridge University Press, New York (2004) 6. Lidl, R., Niederreiter, H.: Introduction to Finite Fields and their Applications. Cambridge University Press, Cambridge (1994) (revised edition) 7. Meier, W., Staﬀelbach, O.: Fast correltaion attacks on stream ciphers (extended abstract). In: Günther, C.G. (ed.) EUROCRYPT 1988. LNCS, vol. 330, pp. 301– 314. Springer, Heidelberg (1988) 8. Menezes, A.J., Van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography. CRC Press, Boca Raton (1996)

Nonlinear Equivalence of Stream Ciphers

53

9. Murphy, S., Robshaw, M.J.B.: Essential Algebraic Structure Within the AES. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, pp. 1–16. Springer, Heidelberg (2002) 10. Rønjom, S., Helleseth, T.: A new attack on the ﬁlter generator. IEEE Transactions on Information Theory 53(5), 1752–1758 (2007) 11. Siegenthaler, T.: Correlation-immunity of nonlinear combining functions for cryptographic applications. IEEE Transactions on Information Theory 30(5), 776–780 (1984)

A

Appendix

The following example illustrates the lack of invariance of cryptographic properties of Boolean functions with respect to the equivalence classes G5 (s). Example 2. Consider the binary sequence s = (1011111101000100110001010110001), of length 31. There are φ(31)/5 = 6 primitive polynomials over F2 of degree 5. For each (distinct) generator β of the multiplicative group of F(α), we compute a function fβ such that s ∈ Lβ (fβ ), where we let gα = x5 + x2 + 1. The distinct nonzero coset-leaders modulo 31 are K = {1, 3, 5, 7, 11, 15}, and thus we may compute six functions fαk , k ∈ K, where we let αk = αk and pick one function fαk from each class Vαk ∈ G5 (s). The columns of the table below are ordered by the six functions fαk ∈ Vαk (s) ∈ G5 (s), k ∈ K: fα1 fα3 fα5 fα7 fα11 fα15 n 5 5 5 5 5 5 d 4 4 4 3 3 2 wH 16 16 16 16 16 16 NL 10 10 10 8 12 8 AI 2 3 2 2 3 2 CI 0 0 0 1 0 1 PC 0 0 0 0 0 1 AB 16 16 16 16 8 32 SS 2432 2816 2816 3584 2048 8192 In the table above, wH denotes the hamming weight of the functions, N L denotes nonlinearity, AI denotes algebraic immunity, CI denotes correlation immunity, PC denotes propagation criterion of order 0, AB denotes absolute indicator and SS denotes sum-of-squares indicator. As one would expect, the weight of the truth-tables and the number of variables remains the same for each function. But notice that none of the other properties remain the same with respect to the transformation; and yet most

54

S. Rønjom and C. Cid

of these are properties that are invariant with respect to aﬃne transformations. The functions are: fα1 = x0 x1 x2 x3 + x0 x1 x2 x4 + x0 x1 x3 x4 + x1 x2 x3 x4 + x0 x1 x2 + x0 x1 x3 + x0 x2 x3 + x1 x2 x3 + x0 x1 x4 + x2 x3 x4 + x0 x2 + x0 + x1 fα3 = x0 x1 x2 x3 + x0 x1 x3 x4 + x1 x2 x3 x4 + x0 x1 x2 + x0 x1 x4 + x0 x3 x4 + x1 x3 x4 + x2 x3 x4 + x0 x1 + x1 x3 + x2 x4 + x2 + x3 fα5 = x0 x1 x2 x4 + x0 x2 x3 x4 + x1 x2 x3 x4 + x0 x1 x2 + x0 x1 x3 + x0 x2 x3 + x1 x2 x3 + x0 x1 x4 + x0 x3 x4 + x0 x2 + x0 x4 + x1 x4 + x2 x4 + x0 + x1 + x2 + x3 + x4 fα7 = x0 x1 x3 + x0 x2 x3 + x1 x2 x4 + x1 x3 x4 + x2 x3 x4 + x1 x2 + x0 x3 + x0 x4 + x1 x4 + x3 x4 + x0 + x3 fα11 = x0 x1 x2 + x0 x2 x3 + x1 x2 x3 + x0 x1 x4 + x1 x2 x4 + x0 x1 + x0 x2 + x1 x3 + x0 x4 + x2 fα15 = x0 x1 + x1 x2 + x1 x3 + x0 x4 + x1 x4 + x2 x4 + x3 x4 + x0 + x1 + x3 For instance, we now pick two of the above functions, say fα1 and fα15 . If we let αi = αi , X1 ∈ F2 (α1 )∗ and X15 ∈ F2 (α15 )∗ and assume X1 = α10 1 , 15 then we have that X15 = X115 = (α10 = α10 1 ) 15 . Thus, if S(X1 ) = (1, 1, 1, 1, 0) denotes the initial state of an LFSR L1 with generator polynomial gα1 (x), then S(X15 ) = (1, 1, 0, 1, 1) denotes the initial state of the register L2 with generator polynomial gα15 (x). It then follows that fα1 (S(X1 αt1 )) = fα15 (S(X15 αt15 )), t = 0, 1, 2, 3 . . . , and so the two diﬀerent ﬁlter generators generate the same keystream-sequence (0001001100010101100011011111101). Thus, if one recovers the initial state of one cipher, it is a simple matter to recover the initial state of an isomorphic cipher. One would in this case for instance choose to attack the ﬁlter generator with the weakest function, say f15 .

Lightweight Privacy Preserving Authentication for RFID Using a Stream Cipher Olivier Billet , Jonathan Etrog , and Henri Gilbert Orange Labs, Issy-les-Moulineaux, France [email protected], [email protected], [email protected]

Abstract. In this paper, a privacy preserving authentication protocol for RFID that relies on a single cryptographic component, a lightweight stream cipher, is constructed. The goal is to provide a more realistic balance between forward privacy and security, resistance against denial of service attacks, and computational eﬃciency (in tags and readers) than existing protocols. We achieve this goal by solely relying on a stream cipher—which can be arbitrarily chosen, for instance a stream cipher design aimed at extremely lightweight hardware implementations—and we provide security proofs for our new protocol in the standard model, under the assumption that the underlying stream cipher is secure. Keywords: RFID protocol, authentication, privacy, DoS resistance, provable security.

1

Introduction

Radio frequency identiﬁcation, RFID, is a fast expanding technology that allows for the identiﬁcation of items in an automated way through attached RFID tags, i.e. small low-cost devices equipped with an integrated circuit and an antenna. An RFID system typically consists of three main components: (1) a set of RFID tags, (2) readers capable of communicating with the RFID tags through their radio interface, and (3) a centralized or distributed back-end system connected to the readers through a network. The applications are numerous: automated management of the supply chain, ticketing, access control, automatic tolls, transportation, prevention of counterfeiting, pets tracking, airline luggage tracking, library management, only to name a few. Various RFID systems designed to address these diﬀerent needs have varying radio and tag power supply characteristics, memory and processing capabilities, and hence costs. Unsurprisingly with such a broad range of applications and physical characteristics, the security and privacy needs for RFID systems are quite diverse: – Though security and privacy were not felt to be important issues in some initial supply chain management applications where RFID tags were used

Work performed while at Orange Labs. Partially supported by the national research project RFIDAP ANR-08-SESU-009-03.

S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 55–74, 2010. c International Association for Cryptologic Research 2010

56

O. Billet, J. Etrog, and H. Gilbert

as a mere replacement of bar codes and not delivered to the end consumers, the emergence of more and more RFID applications where the tags enter the life of end users (e.g. library management) has resulted in an ever increasing level of concern regarding the potential compromise of their privacy. The fear is that through RFID tags attached to the objects she is carrying, a person might leave electronic tracks of her moves and actions and become traceable by a malicious party equipped with a radio device. – Applications such as ticketing or access control where owning an RFID tag materializes some rights needs to prevent the counterfeiting or impersonation of legitimate RFID tags, which can for instance result from the cloning of a legitimate tag or the replay of data previously transmitted by a legitimate tag. In order to address these security needs, an authentication mechanism that allows the system to corroborate the identity of the tag is required. The latter need for security and the former need for privacy are often combined, for instance in the case of ticketing, public transportations, etc. However, as will be seen in the sequel, accommodating both needs for security and privacy in RFID systems using adequate cryptographic solutions is not an easy task, primarily because of strong limitations on computing and communication resources that result in strong cost constraints encountered in most RFID systems, and to a lesser extent because these two requirements are not easy to reconcile. Security and privacy in RFID systems have now become a very active research topic in cryptography, and the design of eﬃcient algorithms and protocols suitable for such systems is a major challenge in the area of lightweight cryptography. Authentication, which addresses the above-mentioned security threat, i.e. preventing the cloning or impersonation of legitimate tags, is probably the most explored topic in lightweight cryptography. Eﬃcient authentication solutions for RFIDs are gradually emerging, even for the most constrained settings. To take into account the strong limitation of computing resources in the tags (3000 GE, is often considered as the upper limit for the area reserved to the implementation of security in low-cost RFID tags), dedicated lightweight block ciphers such as DESXL, PRESENT, and KATAN [35,11,15] have been developed. Such block ciphers can be used for authentication purposes in a traditional challengeresponse protocol. Some stream ciphers with a very low hardware footprint, e.g. Grain v1 or Trivium [25,16], are also known to have the potential to lead to extremely eﬃcient authentication solutions. On the other hand, few explicit stream cipher based authentication schemes have been proposed so far; an example is the relatively complex stream cipher based protocol from [36] which requires up to six message exchanges. Lightweight authentication protocols not based on a symmetric primitive like SQUASH [47] and the HB family of RFID schemes [34,23] represent another promising avenue of research, even though it remains a complicated task to identify practical instances from those families resisting all the partial cryptanalysis results obtained so far [42,21,22,41]. In this paper, we will use the following distinction between identiﬁcation and authentication: a protocol allowing an RFID system to identify a tag, but not to corroborate this identity and thus resist cloning or impersonation attacks will be named an

Lightweight Privacy Preserving Authentication for RFID

57

identiﬁcation protocol, while a protocol allowing the system to both identify a tag and corroborate this identity will be named an authentication protocol or equivalently an authentication scheme. If an authentication protocol additionally results in the corroboration by the tag that the RFID reader involved in the exchange is legitimate, we will call it a mutual authentication protocol. Privacy preserving identiﬁcation or authentication protocols for RFID have also been much researched in the recent years. It is however fair to say that these still represent a less mature area than mere authentication protocols and designing realistic lightweight protocols that take into account the constraints at both the tag and the reader side remains a very challenging problem. Following the seminal work of [30,33,4,49] deﬁnitions and formalizations of various notions of privacy have been proposed and their mutual links have been explored. Without going into the detailed deﬁnition of the various privacy notions introduced so far (a rather comprehensive typology is proposed in [49]), it is worth mentioning that a basic requirement on any private RFID identiﬁcation or authentication protocol is to prevent a passive or active adversary capable of accessing the radio interface from tracing a tag—i.e. to ensure both the anonymity and the unlinkability of the exchanges of a legitimate tag. This property, named weak privacy by some authors [49], is easy to provide in a symmetric setting, for example by using a lightweight block cipher in a challenge-response protocol and trying the keys of all the tags in the system tags at the reader side in order to avoid transmitting the identity of the tag before the authentication exchange. A signiﬁcantly more demanding privacy property is forward privacy, that is motivated by the fact that the cost of RFID tags renders any physical tamper resistance means prohibitive. In addition to the former weak privacy requirements, a forward private protocol must ensure that an adversary capable of tampering with a tag remains unable to link the data accessed in the tag to any former exchange she might have recorded. It is easy to see that the former simple example of block cipher based protocol is not forward private at all. A paradigmatic example of an RFID identiﬁcation protocol providing some forward privacy is the OSK scheme [39,40] which relies on the use by the tag of two one-way hash functions.1 Variants of the OSK scheme turning it into a forward private authentication protocol have been proposed in [5], thus making it resistant to replay attacks.2 It was however noticed that the OSK protocol and its authentication variants are vulnerable to Denial of Service attacks (DoS) that desynchronize a tag from the system. Furthermore, such DoS attacks compromise the forward privacy if the adversary can learn whether the identiﬁcation or authentication exchanges involving a legitimate reader she has access to are successful or not (in this paper the conservative assumption that adversaries have access to this side information 1

2

One hash function updates the current state of the tag at each identiﬁcation while the other derives an identiﬁcation value from the current internal state. The identiﬁcation value received by the reader is then searched in the back-end in hash chains associated to each tag in the system. A time-memory trade-oﬀ speeding up the back-end computations at the expense of pre-computations was proposed for the original scheme and some of its variants [6].

58

O. Billet, J. Etrog, and H. Gilbert

is made). An alternative to the OSK family of authentication schemes named PFP was recently proposed [10] which is based on less expansive cryptographic ingredients than the one way hash functions involved in OSK (namely a pseudorandom number generator and a universal family of hash functions) and provably oﬀers a strong form of forward privacy under the assumption that the maximum number of authentications an adversary can disturb is not too large. Its main practical drawback is the signiﬁcant workload at the reader end. Our contribution. In this paper, we address the problem of rendering forward private authentication protocols fully practical. We show how to convert any lightweight stream cipher such as Grain or Trivium into a simple and highly eﬃcient privacy preserving mutual authentication protocol. Our main motivation is to ﬁnd a more realistic balance than existing protocols of the OSK family or the PFP protocol between forward privacy, resistance to DoS attacks, and computational eﬃciency of the tag and the reader. If one accepts to slightly relax the unlinkability requirements in the deﬁnition of a forward private protocol (for that purpose we introduce the notion of almost forward private protocol and only require our scheme to be almost forward private), we escape the dilemma between forward privacy and resistance to DoS attacks otherwise encountered in a symmetric setting. Desynchronisation can no longer occur even if there is no limitation on the maximum number of authentications an adversary can disturb, and a signiﬁcant gain in complexity is achieved in the tag and the reader compared to former schemes. We provide formal proofs in the standard model that if the underlying stream cipher is secure then our mutual authentication protocol is correct, secure, and almost forward private. We provide the deﬁnitions of the security and privacy properties required for an RFID authentication protocol and introduce the security notions needed in the subsequent proofs in Section 2. We describe our mutual authentication protocol in Section 3 and prove its security, almost privacy, and correctness in Section 4. In Section 5, we brieﬂy discuss implementation.

2

Security and Privacy Model

In this section we introduce a simple security and privacy model inspired from [10], an adaptation of the more comprehensive typology of security and privacy models introduced by Vaudenay in [49] to the symmetric setting where (1) the internal states of the tags are initialized with independent individual secret keys and (2) these initial internal states are updated throughout the lifetime of the tags. The main diﬀerences with the security and privacy model of [10] lie in the modiﬁcation of the deﬁnition of a secure protocol to address mutual authentication instead of authentication (similar to the adaptation of [49] in [43]), in the adapted deﬁnition of correctness to systems with unlimited lifetime, and in the introduction of a distinction between the notions of forward private protocol and the slightly relaxed notion of almost forward private protocol. Assumptions. We denote by N the number of initialized tags of the system. During their lifetime, initialized tags enter mutual authentication exchanges with

Lightweight Privacy Preserving Authentication for RFID

59

a reader. Each exchange results in (possibly distinct) success or failure outcomes at both sides. A mutual authentication exchange involving a legitimate tag and a reader is said to be undisturbed if all messages sent by all parties are correctly transmitted and neither modiﬁed nor lost in either direction. We consider powerful active adversaries capable of tracking an individual tag during a limited time period named an exposure period, i.e. to identify and read the messages exchanged by this tag and its reader, to modify these messages while they are transmitted or to themselves transmit messages viewed by one side as coming from the opposite side, and ﬁnally to access the authentication success or failure information at both ends (reader and tag) at the completion of a mutual authentication exchange. In other words, we consider active adversaries capable of performing man in the middle attacks. We assume that after an exposure period of a tag, no physical characteristics of the tag nor information unrelated to mutual authentication exchanges allow an adversary to diﬀerentiate it from any of the N − 1 other tags. 2.1

Security

We say that a mutual authentication protocol is secure if it resists impersonation attacks. An impersonation attack proceeds in two phases. During the ﬁrst phase (assumed, without loss of generality, to take place during a single exposure period) an adversary interacts both with a legitimate reader and a legitimate tag Ti and is allowed to trigger, observe, and disturb or entirely replace up to q mutual authentication exchanges involving the tag Ti and the reader, and to access the outcomes of the authentication (success or failure). During the second phase, he only interacts with the reader (or with the tag Ti , depending on which party is being impersonated) and initiates a mutual authentication exchange to impersonate the tag Ti (respectively the reader). The impersonation succeeds if the mutual authentication is successful and the adversary is identiﬁed as tag Ti (respectively a legitimate reader). Definition 1. A mutual authentication protocol is said to be (q, T, )-secure iﬀ for any adversary running in time upper-bounded by T , the probability that an impersonation attack involving at most q authentication exchanges during phase 1 be successful is at most . 2.2

Forward Privacy

Let us consider the following forward privacy experiment involving a (q, T )privacy adversary A with a running time upper-bounded by T . During a ﬁrst phase, A interacts with any two legitimate tags Ti0 and Ti1 , and a legitimate reader. These interactions happen, without loss of generality, during a single exposure period of Ti0 and a single exposure period of Ti1 where the adversary is allowed to trigger, observe, and disturb at most q mutual authentication exchanges involving Ti0 and possibly the reader and at most q mutual authentication exchanges involving Ti1 and possibly the reader. During a second phase, A

60

O. Billet, J. Etrog, and H. Gilbert

again interacts with a tag Tib randomly selected among the two tags Ti0 and Ti1 , and b is concealed to A. First, A is allowed to trigger, observe, and disturb at most q additional mutual authentication exchanges involving Tib and is given access to the corresponding mutual authentication outcome (success or failure). Then, A is given access to the internal state value of Tib . Eventually, A outputs a guess b for the value of b, and succeeds if b is equal to b. Definition 2. An RFID mutual authentication protocol is said (q, T, )-private iﬀ any (q, T )-privacy adversary in the above game has an advantage at most : 1 Pr A succeeds − 2 ≤ . We now slightly relax the above forward privacy requirements by introducing the notion of almost forward privacy We only require that adversaries be unable to link the internal state recovered when tampering with a tag with any mutual authentication exchanges involving the tag up to the last successful authentication exchange of the tag. This removes the constraint that adversaries be unable to link a failed mutual authentication exchange of a tag with its internal state immediately after the failed exchange. (A similar limitation of the considered privacy attacks is also encountered in the privacy notion proposed in [48].) In real life scenarios, almost forward privacy seems to be a relevant privacy notion. Let us for instance assume that tags are monthly access passes. To thwart adversaries who ﬁrst collect information from tags by eavesdropping legitimate readers or using false readers, and try later on to correlate this information to (say) thrown tags, almost forward privacy is suﬃcient in practice. To deﬁne a (q, T )-almost private adversary A, we therefore restrict the former forward privacy experiment as follows. During a ﬁrst phase, A interacts with any two legitimate tags exactly in the same way as in the ﬁrst phase of the former deﬁnition. Before the second phase, an undisturbed exchange between a legitimate reader and each of the two tags Ti0 and Ti1 takes place and A is assumed not to have access to this exchange. During the second phase, A interacts with a tag Tib randomly selected among the two tags Ti0 and Ti1 exactly in the same way as in the former deﬁnition. A is ﬁnally given access to the internal state value of Tib , outputs a guess b for the value of b, and succeeds if b is equal to b. Definition 3. An RFID mutual authentication protocol is said to be (q, T, )almost forward private iﬀ any (q, T )-almost privacy adversary in the above game has an advantage at most : 1 Pr A succeeds − 2 ≤ . 2.3

Correctness

We ﬁrst deﬁne the notion of correctness in a setting where the mutual authentication exchanges of legitimate tags are not disturbed by transmission errors or by an adversary. In such a setting, the protocol executions of a legitimate tag Ti must succeed with overwhelming probability, i.e. result in an authentication success outcome at both sides and a correct identiﬁcation of Ti by the reader.

Lightweight Privacy Preserving Authentication for RFID

61

Definition 4. An RFID mutual authentication protocol is -correct iﬀ for any legitimate tag Ti , the probability (over the initial secrets of the legitimate tags in the system and the random numbers chosen during the execution of the protocol) that an undisturbed execution of the protocol between Ti and a legitimate reader fails is upper-bounded by . We further extend the former deﬁnition of correctness by considering a setting where the mutual authentication exchanges of legitimate tags may be disturbed by a DoS adversary who succeeds if she causes the failure of a mutual authentication attempt of a legitimate tag with a legitimate reader. This allows to incorporate resistance to DoS attacks into the deﬁnition of correctness. Although we consider a unique adversary, this is not restrictive since situations where transmission errors occur and/or where mutual authentication exchanges are disturbed by a coalition of adversaries can be viewed as coming from a single adversary. We introduce limitations on the capabilities of the adversary: an adversary with a running time upper-bounded by T and able to disturb at most q mutual authentication exchanges is called a (q, T )-adversary. The correctness experiment proceeds in two phases. During the ﬁrst phase the (q, T )-adversary interacts with the whole system. During the second phase, an undisturbed execution of the protocol between Ti and a legitimate reader occurs. The adversary succeeds if the mutual authentication protocol execution fails. Definition 5. An RFID mutual authentication protocol is said (q, T, )-correct iﬀ the probability (over the initial secrets of the legitimate tags in the system, the random numbers chosen during the executions of the protocol, and the random numbers used by the adversary) that an undisturbed execution of the protocol between any tag Ti and a legitimate reader fails is upper-bounded by , even in the presence of a (q, T )-adversary. 2.4

Definitions and Properties

We now introduce a few general security deﬁnitions. The starting point for the construction of our mutual authentication protocol is a stream cipher such as Grain or Trivium [25,16], that takes a secret key and a non-secret initialization value (IV) as input and produces a binary sequence (the keystream). An IVdependent stream cipher of key length k bits and IV length n bits that produces a keystream sequence of length up to m bits can be conveniently represented as a family of functions F = {fK } : {0, 1}n → {0, 1}m indexed by a key K randomly chosen from {0, 1}k ; fK thus represents the function mapping the IV to the keystream associated with key K. For such an IV-dependent stream cipher to be considered secure when producing keystreams of length at most m bits, one usually requires [9] that the associated family of functions F be a pseudo-random function (PRF). In order to formalize and quantify what we mean by a secure stream cipher, we therefore need to introduce the notion of PRF distinguisher. Definition 6 (PRF distinguisher). Let F = {fK } : {0, 1}n → {0, 1}m be a family indexed by a key K randomly chosen from {0, 1}k . A PRF distinguisher

62

O. Billet, J. Etrog, and H. Gilbert

for F is a probabilistic testing algorithm A modeled as a Turing Machine with a random tape and an oracle tape, that produces a binary output 0 or 1 and is able to distinguish a randomly chosen function fK of F from a perfect random n-bit to m-bit function f ∗ with an advantage ∗ RF (A) = Pr AfK = 1 − Pr Af = 1 AdvP F where the probabilities are taken over K and over all the random choices of A. So we deﬁne the (q, T ) PRF advantage for distinguishing the family F as RF P RF AdvP (q, T ) = max Adv (A) F F where the maximum is taken over all the possible attackers A working in time at most T and able to query an n-bit to m-bit oracle up to q times. We will call the family a PRF if this advantage smaller than a threshold for T and q suitably chosen to reﬂect realistic upper limits for the resources of an adversary. Definition 7 (Secure stream cipher). An IV-dependent stream cipher associated with a family F = {fK } of IV to keystream functions is (q, T, )-secure if RF AdvP (q, T ) ≤ . F Note that in the above deﬁnition of a PRF distinguisher, the experiment performed by a PRF distinguisher involves a single randomly chosen instance of F . In the stream cipher based construction presented in this paper, the keystream output by the stream cipher is however used to produce the key used during the next invocation of the stream cipher. Therefore the proofs of this construction require to consider testing experiments involving several instances of F instead of a single one. We address this issue by introducing the notion of multiple oracle PRF distinguisher. To avoid some confusions we sometimes use the name single oracle distinguisher to refer to the former (classical) notion of PRF distinguisher. Definition 8 (Multiple oracle PRF distinguisher). Let us consider a family F = {fK } : {0, 1}n → {0, 1}m indexed by a key K randomly chosen from {0, 1}k . A multiple oracle PRF distinguisher for f is a probabilistic testing algorithm A distinguishing λ randomly chosen instances fKi of F from λ independent perfect random n-bit to m-bit functions fi∗ (i = 1, . . . , λ) with an advantage (f ∗ ) RF (fKi )i=1,...,λ i i=1,...,λ = 1 Pr A AdvP (A) = = 1 − Pr A F where the probabilities are taken over K and over all the random choices of A. So we deﬁne the (λ, q, T ) PRF advantage for distinguishing the family F as RF RF AdvP (λ, q, T ) = max AdvP (A) F F where the maximum is taken over all the possible attackers A working in time at most T and able to query up to λ n-bit to m-bit oracles up to q times each.

Lightweight Privacy Preserving Authentication for RFID

63

Theorem 1 (Link between single and multiple oracle distinguishers). Let us denote by F = {fK } : {0, 1}n → {0, 1}m a family of functions with K ∈ {0, 1}k . The resistance of F against λ-oracle distinguishers is related to its resistance against single-oracle distinguishers attackers via the following formula (where TF is the time needed to compute one instance of F in one point): RF RF AdvP (λ, q, T − λTF ) ≤ λAdvP (q, T ) F F

Proof. See Appendix A. Lemma 1 (PRF Product). If F is indistinguishable of F ∗ in time T , with q queries and with an advantage greater than 1 and if G is indistinguishable of G∗ in time T , with q queries and with an advantage greater than 2 then F × G = {fK1 , gK2 } : {0, 1}n → {0, 1}m × {0, 1}m is indistinguishable of F ∗ × G∗ using q queries, in time T − qTG with an advantage greater than 1 + 2 . Proof. See Appendix B.

3 3.1

A Stream Cipher Based Protocol DoS Resistance and Privacy

To achieve resistance against DoS attacks, a natural idea is to use mutual authentication instead of one way authentication so that the tag only updates its internal state after the reader has been authenticated. However, as discussed in [10], it is not possible to aim for full DoS resistance and forward privacy in symmetric key based protocols. We thus need to ﬁnd a trade-oﬀ and it seems a reasonable approach to somewhat relax the privacy requirements while keeping full DoS resistance with mutual authentication. Diﬀerent protocols have been designed to achieve both DoS resistance and almost-forward privacy. An example in the OSK family is the C2 protocol [12] which reaches this goal by using cryptographic hash functions. Hash functions are unfortunately prohibitively expensive for RFID tags and have security properties (e.g. collision resistance) that are unnecessary in these applications. While the S-protocol [36] has no privacy goals, an example of a stream cipher based privacy preserving protocol is the recently proposed protocol O-FRAP and its variants [48] but it does not achieve almost-forward privacy because it stores a pseudo-random number in the tag that is transmitted during the last pass of the protocol. Due to this feature, an attacker can compare the value of the pseudo-random number found by tampering with a tag with the last pseudo-random number used by an unknown tag to immediately determine whether it is the same tag or not. 3.2

Our Protocol: PEPS

We present a DoS-resistant, almost-forward private mutual authentication RFID protocol accommodating any secure IV-dependent stream cipher which we call

64

O. Billet, J. Etrog, and H. Gilbert

Tag

Reader i current key pairs (K i , Knew )

Draw a

a

current key K Draw b IV = a||b

i search i and K ∈ {K i , Knew }

b, c = Gt (a||b, K)

K

s.t. c = Gt (a||b, K )

compute d = Gr (a||b, K )

d

i (K i , Knew ) ← (K , Gs (a||b, K ))

G

Gt Gr Gs

if d = Gr (a||b, K) Fig. 1. Our protocol: PEPS

PEPS: a Private and Eﬃcient Protocol based on a Stream cipher. The stream cipher is keyed with the current internal state K of the tag. The initial value of K is randomly selected at the tag initialization and known by the RFID system. The conducting idea of our design is to use the input expanding PRF G associated with the stream cipher with input values resulting from random numbers generated by the tag and the reader in order to (1) generate mutual authentication responses at both sides and (2) refresh the current internal state of the tag in a simple three-pass protocol. In order to avoid any desynchronisation due to lost messages or DoS attacks, the back-end system keeps and updates, for each i active tag Ti of the system, a pair (K i , Knew ) of potential current keys for Ti . More explicitly, let us denote by K the k-bit key and by IV the n-bit IV of the stream cipher. The stream cipher is used to produce a keystream sequence G(K, IV ) of length m = 2l + k, where l represents the length of the authentication responses of our protocol, and the keystream G(K, IV ) is viewed as the concatenation Gt (K, IV )||Gr (K, IV )||Gs (K, IV ) of three subsequences of respective lengths l, l, and k. Thus Gt and Gr produce l-bit sequences while Gs produces a k-bit sequence (note that the symbols t, r, and s stand here for “tag”, “reader”, and “secret”). When tag Ti is initialized, a random initial internal value K0i is drawn and i installed in the tag. At the back-end side, the current pair (K i , Knew ) associated with tag Ti is initialized with (K0i , _). An execution of the mutual authentication protocol between a tag and a reader is illustrated by Figure 1. It works as follows: ﬁrst the reader randomly generates an authentication challenge a of length n2 bits and sends it to the tag. At the receipt of a, the tag (whose current key value is denoted by K) randomly generates a n2 -bit number b, derives

Lightweight Privacy Preserving Authentication for RFID

65

the value IV = a||b , and computes G(K, IV ) using the stream cipher. Then it sends (randt , c)—where c = Gt (K, IV )—to the reader. The reader authenticates i the tag by searching a tag index i and a key K ∈ {K i , Knew } such that Gt (K , IV ) = c. The key K represents the conjectured internal state of the tag from the reader’s “point of view”. If the reader ﬁnds such an index i then the tag is considered as successfully authenticated as tag Ti , otherwise the outcome of the authentication exchange is an authentication failure. (In the case of an authentication failure, the reader can then either terminate the exchange or send a dummy message back to the tag. This does not matter here due to the fact that we assume that adversaries have access to the positive or negative authentication outcome anyway.) If the tag has been authenticated as tag Ti the reader updates the current pair associated with tag Ti to (K , Gs (K , IV )), computes the reader authentication answer d = Gr (K , IV ) and sends it back to the tag. At the receipt of the reader’s answer the tag checks whether d = Gr (K, IV ). If this equality holds, it replaces its current key value K by Gs (K, IV ): the reader is considered as successfully authenticated and this terminates the mutual authentication exchange. Otherwise it keeps its key value.

4

Security, Almost Forward Privacy, and Correctness

To prove the security and privacy properties of PEPS, we essentially need to show that any information available to the attacker deﬁned previously behaves 1 pseudo-randomly. We denote Gt (K, .)||Gr (K, .) : {0, 1}n → {0, 1}2l by fK and 2 1 2 . If G is a PRF, {fK } and {fK } are obviously Gs (K, .) : {0, 1}n → {0, 1}k by fK also PRFs satisfying the same indistinguishability bounds. Figure 2 shows every possible output with known or partly chosen input an adversary can access to in a security or privacy experiment involving PEPS. In other words, attackers of the system have access to a composed function and we show that this function is indistinguishable of an ideal one resulting from a sequence of independent perfect random functions (with the right number of arguments for each coordinate); these independent random functions are shown in Figure 2. More formally: a1

···

a2

at+1

··· a1

f 1 ||f 2

at+1

a2

f 1 ||f 2

...

f 1 ||f 2

f1∗

∗ Fig. 2. On the left Ft+1 , on the right Ft+1

f2∗

···

∗ ft+1

∗ gt+1

66

O. Billet, J. Etrog, and H. Gilbert

1 2 Theorem 2 (Composition). If F = {fK ||fK : {0, 1}n → {0, 1}2l × {0, 1}k } P RF is a PRF with AdvF (q, T ) ≤ then for any integer t

Ft =

(a1 , . . . , at ) →

1 (a1 ), ff12 (a1 ) (a2 ), . . . , ff1... fK 2 K

f2 (a2 ) f 2 (a1 ) K

2 2 (at−1 ) (at ), ff...

f2 (a2 ) f 2 (a1 ) K

(a ) (at−1 ) t

is indistinguishable from the ideal function generator Ft∗ = {(a1 , . . . , at ) → (f1∗ (a1 ), f2∗ (a1 , a2 ), . . . , ft∗ (a1 , . . . , at ), gt∗ (a1 , . . . , at ))} where fi∗ : {0, 1}ni → {0, 1}2l and gt∗ : {0, 1}nt → {0, 1}k are independent random functions using q queries, in time T − (t − 1)qTP RF and with advantage greater than ((t − 1)q + 1). Proof. See Appendix C. 4.1

Security of the Scheme

RF Theorem 3 (Security). If AdvP (q, T ) ≤ then PEPS is (q − 1, T − q(q − G q−1 1 1)TG , s )-secure with s = n2 + 2l + (q(q − 1) + 1). 2

A (q − 1, T )−attacker A against PEPS succeeds if it is authenticated as the legitimate tag by the legitimate reader or as the legitimate reader by the legitimate tag. We will denote the diﬀerent internal states of the tag through the diﬀerent updates during the experiment by Ki , 1 ≤ i ≤ I and A’s queries by xl . In the ﬁrst phase the attacker A collects some (Gt (Ki , xl ), Gr (Ki , xl )). Then in the second one it has to guess Gt (KI , x) or Gr (KI , x) when it is challenged by a legitimate part. We use A to construct a distinguisher B for Fq . We still use 1 2 the notation fK (x) = Gt (K, x)||Gr (K, x) and fK (x) = Gs (K, x). B works as follows: it simulates a tag and a reader to answer A’s queries using its oracle F = (f1 , f2 , . . . , fq , gq ). The fi are used to simulate both the tag’s and reader’s behavior, for example to simulate the tag, B generates a random r and computes some fi (((xj )||(rj ))j≤i ) as answers or veriﬁes an equality to know if the state should be updated, in this case B keeps in memory the challenge a which provokes the update, adds it to the list ai of its memories challenges and will use the next coordinate with the tuple ((ai )i ) for the ﬁrst arguments for the sequel of the simulation. Finally if in the second phase the value computed by A matches the correct value as veriﬁed by B using an additional query then B outputs ‘1’ (i.e. it guesses F ∈ Fq ), otherwise it outputs ‘0’ (i.e. it guesses F ∈ Fq∗ ). Clearly P RF B uses q queries and runs in the same time as A and since AdvG (q, T ) ≤ , Theorem 2 upper bounds B’s advantage by ((t − 1)q + 1) with t = q. So A’s success probability is upper bounded by (q(q − 1) + 1) + r where r is the maximum of the success probability of a (q − 1, T − q(q − 1)TG )-attacker in the case where he has access to Fq∗ . In this case it is obvious that the best strategy

Lightweight Privacy Preserving Authentication for RFID

67

for an attacker is to make the state of its target constant and to make the same challenges each time, hoping that the last challenge will be equal to one of the previous ones. So in the random case an attacker has a probability of success 1 upper bounded by q−1 n + l where the ﬁrst term corresponds to the case where 2 22 the last challenge is equal to one of the previous ones and the second term is the probability of a random guess of an unknown challenge. This ends the proof. In order to allow the reuse of this security result in the proof of correctness, hereafter, we introduce an extended notion of security. The adversary is now considered successful if it manages to be successfully authenticated as one of the legitimate tags of the system or as a legitimate reader. We have the following: RF (q, T ) ≤ then in a system with N tags and a legitimate Theorem 4. If AdvP G reader PEPS is (q −1, T −q(q −1+N )TG, S )-secure (under the extended security N notion introduced above) with S = q−1 + N (q(q − 1) + 1). n + 2l 2 2

Proof. A (q − 1, T )−attacker A which interacts with diﬀerent tags has access to diﬀerent instances of Fq so we use it to derive a multiple oracle distinguisher against Fq with N instances (B needs to ask each instance corresponding to each tag in the system to simulate the reader), using at most q queries to each and working in time T − q(q − 1 + N )TG with a similar process than in the proof of Theorem 3. As TFq = qTF , Theorem 1 upper bounds B’s advantage by N (q(q − 1) + 1) and a similar proof to the one of Theorem 3 upper bounds the advantage of an attacker running in time T − q(q − 1 + N )TG against the N system in the case where the oracles are random by q−1 n + l , where the ﬁrst term 2 22 corresponds to the probability that the last challenge from the target matches a previous one and the second term is the probability of a random guess of an unknown challenge among the N tags. 4.2

Almost Forward Privacy of the Scheme

RF Theorem 5 (Almost forward privacy). If AdvP (2q+1, T ) ≤ then PEPS G is (q, T , f )-almost forward private with T = T − q(2q + 1)TG and f = n2q−1 + 1

2l−1

2

+ 2(q(q + 1) + 1) + (2q + 1)((2q(2q + 1) + 1) + ((2q + 1)(q − 1) + 1)).

Proof. We consider a (q, T )-privacy attacker A with advantage a and we need 1 to prove that a ≤ f . As this is obvious if a ≤ 2s = n2q−1 + 2l−1 +2(q(q+1)+1) 2 ∗ we assume that a ≥ 2s . We are using the notation Fi,j = Fi∗ × Fj∗ and Fi,j = Fi × Fj . We will denote by α the number of updates of the state of the tag Tb during the almost forward privacy experiment conducted by A. We use A ∗ to derive a distinguisher B between {Fi,q } and {Fi,q } for a randomly chosen integer i such that 1 ≤ i ≤ 2q + 1. B uses its oracle to simulate the system to A as previously, B works in the same time than A and uses 2q + 1 queries. If α = i (which means that B cannot answer correctly when A asks the internal state) then B aborts the simulation and returns a random guess. We note Q the event that α = i and S the event that the undisturbed execution of the protocol between Tb and the reader has been successful. In the case where we have both S

68

O. Billet, J. Etrog, and H. Gilbert

and Q, then B perfectly simulates the behavior of the system so its probability ∗ of success is exactly that of A. In the sub-case where B’s oracle is in Fα,q , the oracles used in phase 2 are independent random functions that are independent of the random functions used in phase 1, and so the probability of success of A is exactly 12 . By Lemma 1, ((2q(2q + 1) + 1) + ((2q + 1)(q − 1) + 1)) upper-bounds the advantage of B. Since the probability that the undisturbed execution of the protocol at the beginning of phase 2 does not succeed is upper-bounded by the probability that a (q, T )-attacker against the security of the scheme succeeds in phase 1, Theorem 3 shows that it is upper bounded by qn2 + 21l + (q(q + 1) + 1). 2 1 1 Pr B f = 1|Q + (1 − 2q+1 ) 12 together with Now we have Pr B f = 1 = 2q+1 f f f Pr B = 1|Q = Pr[S]Pr B = 1|S, Q + Pr ¬S Pr B = 1|¬S, Q which f ∈F ∗ ∗ implies Pr B K i,q = 1|Q ≥ a + 12 − s and Pr B f ∈Fi,q = 1|Q ≤ 12 + s . f ∈F f ∗ ∈F ∗ 1 i,q = 1 ≥ Therefore we have Pr B K i,q = 1 − Pr B 2q+1 (a − 2s ) so RF a ≤ (2q + 1)AdvP Fi,q (B) + 2s which concludes the proof.

4.3

Correctness of the Scheme

RF Theorem 6 (Correctness). If AdvP (q, T ) ≤ with T ≥ N Tc + (1 + 2N )TG G then PEPS is (q − 1, T − q(q − 1 + N )TG , c )-correct where Tc denotes the time N needed to store the answer of one oracle and c = 2N + N (N2l−1) + q−1 n + 2l + 22 N (q(q − 1) + 1). i ) Proof. We denote the current key pair for tag Ti by (K1i , K2i ) instead of (K i , Knew to simplify the notation of the proof. The failure of the ﬁnal authentication can only come from two scenarii: the attacker has provoked an undesired update during the ﬁrst phase or a collision occurs during the ﬁnal authentication and provokes the incorrect identiﬁcation of Ti as another tag. As the ﬁrst event has a probability bounded by Theorem 4 we only need to upper bound the probability of a collision. To upper bound the probability of a collision between a given Gt (K i , x) and Gt (Kbi , x) we construct a multiple oracle distinguisher B for F2 which queries each of its N instances fi of its oracle with one random query x and compare for each i the values fi (x). If B ﬁnds a collision between the ﬁrst l bits of the ﬁrst coordinate of fi0 with the ﬁrst l bits of the ﬁrst or the second coordinate of another fi then it guesses F2 otherwise it guesses a truly random function generator. B RF works in time N Tc ≤ T − (1 + 2N )TG . As previously AdvP F2 (B) ≤ 2N . For a truly random function generator the probability of a collision is upper bounded by N (N2l−1) so the probability of the collision in the case of F2 is upper bounded by N (N2l−1) + 2N . We add these probabilities with the probability of an impersonation attack of a (q − 1, T − q(q − 1 + N )TG )-attacker in the whole system by the attacker to ﬁnd c .

5

Eﬃcient Implementation of PEPS

The eSTREAM project [18] has lead to the design of several stream ciphers which oﬀers very lightweight hardware implementation [24]. The hardware foot-

Lightweight Privacy Preserving Authentication for RFID

69

print of some implementations of Grain and Trivium (two stream ciphers with a hardware proﬁle selected in the eSTREAM portfolio) is quite low: Grain uses 1294 GE, Trivium 2580 GE, both conjecturing an 80-bit security. It is also possible to use stream ciphers oﬀering some provable security arguments and eﬃcient implementations. An example is QUAD [8] also conjecturing 80-bit security for instances of the algorithm requiring less than 3000 GE to implement [1]. Also, note that a very interesting feature of our design is that it allows for an easy bit by bit processing. Therefore, when the key and IV setup of the underlying stream cipher also loads the key and the IV bit by bit, it is possible to implement the protocol inside the tag with just a few additional GE for the storage of the next key K. To see this, note that once the key is loaded, the tag can load its seed b at the same time as it outputs it. Then, it switches to keystream production mode and outputs c bit by bit. Finally, as it inputs d, it checks it by comparing it bit by bit to the keystream bits it produces. Eventually, if d was correct, it accumulates the next key in a buﬀer. In the case of Grain with an 80-bit secret key, this strategy only increases the size by about 4 × 80 GE, leading to an overall implementation of size about 1700 GE.

6

Conclusion

We presented an RFID protocol that provably achieves both DoS-resistance and a very strong form of privacy, close to the notion of forward privacy. Our protocol can be instantiated with any secure stream cipher, and choosing a stream cipher that admits a very low hardware complexity demonstrated that our protocol is also suitable for the highly constrained setting of RFID systems.

References 1. Arditti, D., Berbain, C., Billet, O., Gilbert, H.: Compact FPGA implementations of QUAD. In: Bao, F., Miller, S. (eds.) ASIACCS 2007. ACM, New York (2007) 2. Auto-ID Center. 860MHz 960MHz Class I RFID Tag Radio Frequency & Logical Communication Interface Spec., v1.0.0. RR MIT-AUTOID-TR-007 (2002) 3. Avoine, G.: Privacy Issues in RFID Banknote Protection Schemes. In: Quisquater, J.-J., Paradinas, P., Deswarte, Y., Abou El Kadam, A. (eds.) CARDIS 2004, pp. 33–48. Kluwer, Dordrecht (2004) 4. Avoine, G.: Adversarial model for radio frequency identiﬁcation. Cryptology ePrint Archive, Report 2005/049 (2005), http://eprint.iacr.org/ 5. Avoine, G., Dysli, E., Oechslin, P.: Reducing Time Complexity in RFID Systems. In: Preneel, B., Tavares, S. (eds.) SAC 2005. LNCS, vol. 3897, pp. 291–306. Springer, Heidelberg (2006) 6. Avoine, G., Oechslin, P.: A Scalable and Provably Secure Hash Based RFID Protocol. In: PerSec 2005. IEEE Computer Society Press, Los Alamitos (2005) 7. Avoine, G., Oechslin, P.: RFID traceability: A multilayer problem. In: Patrick, A., Yung, M. (eds.) FC 2005. LNCS, vol. 3570, pp. 125–140. Springer, Heidelberg (2005) 8. Berbain, C., Gilbert, H., Patarin, J.: QUAD: A practical stream cipher with provable security. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 109–128. Springer, Heidelberg (2006)

70

O. Billet, J. Etrog, and H. Gilbert

9. Berbain, C., Gilbert, H.: On the security of IV dependent stream ciphers. In: Goos, G., Hartmanis, J., van Leeuwen, J. (eds.) FSE 2007. LNCS, vol. 4593, pp. 254–273. Springer, Heidelberg (2007) 10. Berbain, C., Billet, O., Etrog, J., Gilbert, H.: An Eﬃcient Forward-Private RFID Protocol. In: ACM CCS 2009 (2009) 11. Bogdanov, A., Knudsen, L.R., Leander, G., Paar, C., Poschmann, A., Robshaw, M.J.B., Seurin, Y., Vikkelsoe, C.: present: An Ultra-Lightweight Block Cipher. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 450–466. Springer, Heidelberg (2007) 12. Canard, S., Coisel, I.: Data Synchronization in Privacy-Preserving RFID Authentication Schemes. In: Conference on RFID Security (2008) 13. CASPIAN, http://www.spychips.com 14. Damgård, I., Østergaard, M.: RFID Security: Tradeoﬀs between Security and Efﬁciency. Cryptology ePrint Archive, Report 2006/234 (2006) 15. De Cannière, C., Dunkelman, O., Knežević, M.: KATAN and KTANTAN—A Family of Small and Eﬃcient Hardware-Oriented Block Ciphers. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 272–288. Springer, Heidelberg (2009) 16. De Cannière, C., Preneel, B.: Trivium. In: Robshaw, M.J.B., Billet, O. (eds.) New Stream Cipher Designs: The eSTREAM Finalists. LNCS, vol. 4986, pp. 244–266. Springer, Heidelberg (2008) 17. Dimitriou, T.: A lightweight RFID protocol to protect against traceability and cloning attacks. In: SECURECOMM 2005. IEEE Computer Society, Los Alamitos (2005) 18. ECRYPT. The eSTREAM Project (2008), http://www.ecrypt.eu.org/stream/ 19. Electronic Product Code Global Inc., http://www.epcglobalinc.com 20. Feldhofer, M., Rechberger, C.: A case against currently used hash functions in RFID protocols. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM 2006. LNCS, vol. 4275. Springer, Heidelberg (2006) 21. Gilbert, H., Robshaw, M., Sibert, H.: An active attack against HB+ —a provably secure lightweight authentication protocol. IEE Electronic Letters 41, 1169–1170; See also Cryptology ePrint Archive, Report 2005/237, http://eprint.iacr.org 22. Gilbert, H., Robshaw, M., Seurin, Y.: Good variants of HB+ are hard to ﬁnd. In: Tsudik, G. (ed.) FC 2008. LNCS, vol. 5143, pp. 156–170. Springer, Heidelberg (2008) 23. Gilbert, H., Robshaw, M., Seurin, Y.: HB # : Increasing the Security and Eﬃciency of HB. In: Smart, N. (ed.) EUROCRYPT 2008. LNCS, vol. 4965, pp. 361–378. Springer, Heidelberg (2008) 24. Good, T., Benaissa, M.: Asic hardware performance. In: Robshaw, M.J.B., Billet, O. (eds.) New Stream Cipher Designs. LNCS, vol. 4986, pp. 267–293. Springer, Heidelberg (2008) 25. Hell, M., Johansson, T., Meier, W.: Grain—A Stream Cipher for Constrained Environments. In: Robshaw, M., Billet, O. (eds.) New Stream Cipher Designs: The eSTREAM Finalists. LNCS, vol. 4986, pp. 179–190. Springer, Heidelberg (2008) 26. Hellman, M.: A Cryptanalytic Time-Memory Trade-Oﬀ. IEEE Transactions on Information Theory 26(4), 401–406 (1980) 27. Hennig, J.E., Ladkin, P.B., Sieker, B.: Privacy Enhancing Technology Concepts for RFID Technology Scrutinised. RVS-RR-04-02, Univ. of Bielefeld (2004) 28. Henrici, D., Muller, P.: Hash-based Enhancement of Location Privacy for RadioFrequency Identiﬁcation Devices using Varying Identiﬁers. In: Pervasive Computing and Communications Workshops (2004)

Lightweight Privacy Preserving Authentication for RFID

71

29. International Organisation for Standardisation, http://www.iso.org 30. Juels, A.: Minimalist Cryptography for Low-Cost RFID Tags. In: Blundo, C., Cimato, S. (eds.) SCN 2004. LNCS, vol. 3352, pp. 149–164. Springer, Heidelberg (2005) 31. Juels, A., Pappu, R.: Squealing Euros: Privacy Protection in RFID-Enabled Banknotes. In: Wright, R.N. (ed.) FC 2003. LNCS, vol. 2742, pp. 103–121. Springer, Heidelberg (2003) 32. Juels, A., Rivest, R., Szydlo, M.: The Blocker Tag: Selective Blocking of RFID Tags for Consumer Privacy. In: Atluri, V. (ed.) ACM CCS (2003) 33. Juels, A., Weis, S.: Deﬁning strong privacy for RFID. ePrint, Report 2006/137 34. Juels, A., Weis, S.A.: Authenticating Pervasive Devices With Human Protocols. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 293–308. Springer, Heidelberg (2005) 35. Leander, G., Paar, C., Poschmann, A., Schramm, K.: A Family of Lightweight Block Ciphers Based on DES Suited for RFID Applications. In: Biryukov, A. (ed.) FSE 2007. LNCS, vol. 4593, pp. 196–210. Springer, Heidelberg (2007) 36. Lee, J., Yeom, Y.: Eﬃcient RFID Authentication Protocols Based on Pseudorandom Sequence Generators. Cryptology ePrint Archive, Report 2008/343 37. Molnar, D., Wagner, D.: Privacy and security in library RFID: Issues, practices, and architectures. In: Pﬁtzmann, B., Liu, P. (eds.) ACM CCS 2004, pp. 210–219 (2004) 38. Oechslin, P.: Making a faster cryptanalytic time-memory trade-oﬀ. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 617–630. Springer, Heidelberg (2003) 39. Ohkubo, M., Suzuki, K., Kinoshita, S.: Cryptographic Approach to “PrivacyFriendly” Tags. In: RFID Privacy Workshop (2003) 40. Ohkubo, M., Suzuki, K., Kinoshita, S.: Eﬃcient hash-chain based RFID privacy protection scheme. In: Ubiquitous Computing—Privacy Workshop (2004) 41. Ouaﬁ, K., Overbeck, R., Vaudenay, S.: On the Security of HB# against a Man-inthe-Middle Attack. In: Pieprzyk, J. (ed.) ASIACRYPT 2008. LNCS, vol. 5350, pp. 108–124. Springer, Heidelberg (2008) 42. Ouaﬁ, K., Vaudenay, S.: Smashing SQUASH-0. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 300–312. Springer, Heidelberg (2009) 43. Paise, R.-I., Vaudenay, S.: Mutual Authentication in RFID: security and privacy. In: Abe, M., Gligor, V.D. (eds.) ASIACCS 2008, pp. 292–299. ACM, New York (2008) 44. Robshaw, M., Billet, O. (eds.): New Stream Cipher Designs: The eSTREAM Finalists. LNCS, vol. 4986. Springer, Heidelberg (2008) 45. Stop RFID, http://www.stoprfid.de/en/ 46. Sarma, S., Weis, S., Engels, D.: RFID Systems and Security and Privacy Implications. In: Kaliski, B., Koç, C., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 454–469. Springer, Heidelberg (2002) 47. Shamir, A.: SQUASH—a New MAC With Provable Security Properties for Highly Constrained Devices Such As RFID Tags. In: Nyberg, K. (ed.) FSE 2008. LNCS, vol. 5086, pp. 144–157. Springer, Heidelberg (2008) 48. van Le, T., Burmester, M., de Medeiros, B.: Universally composable and forwardsecure RFID authentication and authenticated key exchange. In: Bao, F., Miller, S. (eds.) ASIACCS 2007, pp. 242–252. ACM press, New York (2007) 49. Vaudenay, S.: On privacy models for RFID. In: Kurosawa, K. (ed.) ASIACRYPT 2007. LNCS, vol. 4833, pp. 68–87. Springer, Heidelberg (2007)

72

O. Billet, J. Etrog, and H. Gilbert

50. Weis, S., Sarma, S., Rivest, R., Engels, D.: Security and Privacy Aspects of LowCost Radio Frequency Identiﬁcation Systems. In: Hutter, D., Müller, G., Stephan, W., Ullmann, M. (eds.) SPC 2003. LNCS. Springer, Heidelberg (2003) 51. Wolkerstorfer, J., Dominikus, S., Feldhofer, M.: Strong authentication for RFID systems using the AES algorithm. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 357–370. Springer, Heidelberg (2004)

A

Proof of Theorem 1

We show that, if for a given λ ≥ 1 there exists a multiple oracle distinguisher A for {fK } using λ oracles and asking at most q queries to each, of computing time lower than T = T − λTP RF , and of advantage of at least , then there is a single oracle distinguisher B able to distinguish {fK } with an advantage of at least λ , asking at most q queries and of computing time lower than T + λTP RF . We use a classical proof technique relying on an hybrid argument. For 0 ≤ i ≤ λ, K1 , . . . , Ki denote randomly chosen values of {0, 1}k at random if i ≥ 1 and the empty list if i = 0, f1 , . . . , fλ−i random functions if i ≤ λ − 1 and the empty list if i = λ. Let xij (1 ≤ j ≤ q) denote the challenges of {0, 1}n chosen by A (the challenges (xij ) with 1 ≤ j ≤ q are given to the ith oracle). Let Xi be the following λm-bit random vector: (fK1 (x1j ))j=1,...,q , . . . , (fKi (xij ))j=1,...,q , λ (f1 (xi+1 j ))j=1,...,q , . . . , (fλ−i (xj ))j=1,...,q . We use the conventions that (fl (xlj ))j=1,...,q;l=1,...,λ−i represents the empty string for i = λ and (fKl (xlj ))j=1,...,q;l=1,...,i represents the empty string for i = 0. We see that X0 is the random vector obtained by A when the oracles are random functions, Xλ is the vector obtained by A when the oracles are chosen from {fK }, and the Xi are intermediate between X0 and Xλ . Let pi denote the probability that A accepts while receiving a vector Xi to his challenges. The hypothesis about algorithm A is |p0 − pλ | ≥ . Algorithm B works as follows: given an oracle f , it randomly selects an integer i0 such that 1 ≤ i0 ≤ λ and i0 − 1 random values K1 , . . . , Ki0 −1 . When receiving the challenges (xij ) from A with 1 ≤ i ≤ i0 −1 he returns to A the values fKi (xij ). When receiving the challenges (xij0 ) from A, B sends them to its oracle f and forwards the answers f (xij0 ) to A. When receiving the challenges (xij ) from A with i0 +1 ≤ i ≤ λ he chooses q(λ−i0 ) random values rlk (with λ−i0 ≤ k ≤ λ and 1 ≤ l ≤ q) such that if xko1 = xko2 then rok1 = rok2 . These random values are used to simulate λ−i0 random functions f1 , . . . , fλ−i0 to A. To summarize, B constructs and sends ito A the λmi bit vector Yi deﬁned as: fK1 (x11 ), . . . , fK1 (x1q ), . . . , fKi−1 (xi−1 q ), f (x1 ), . . . , f (xq ), i+1 λ f1 (x1 ), . . . , fλ−i (xq ) . If the oracle given to B is a perfect random function then the vector Yi is distributed in the same way as Xi−1 . On the other hand, if the oracle given to B belongs to {fK } then the vector is is distributed in the same way as Xi . To distinguish {fK } from a perfect random function generator, B calls Awith input Yi and outputs A’s output. Since Prf B(f ) = 1 − PrfK B(fK ) = 1 can be written as

Lightweight Privacy Preserving Authentication for RFID

73

λ λ 1 1 1 pi−1 − pi = |p0 − pλ | ≥ , λ λ i=1 λ λ i=1 B distinguishes {fK } from a perfect random function generator with probability at least λ in time at most T + λTP RF .

B

Proof of Lemma 1

To upper bound the advantage of a (q, T − qTG )-adversary A, we consider the intermediate situation where the oracle function is (f ∗ , gK ) and f ∗ is a random RF fK ,gK function. The triangular inequality gives: AdvP = 1) − F ×G (A) ≤ |Pr(A ∗ ∗ ∗ ∗ f ,gK f ,gK f ,g = 1)| + |Pr(A = 1) − Pr(A = 1)|. We bound each term by Pr(A expressing it as the advantage of a distinguisher against F or G. For the ﬁrst term, we consider a single oracle distinguisher B against F constructed as follows: ﬁrst it chooses a random K, then having access to an oracle f it answers the ∗ ∗ challenges xi of A with (f (xi ), gK (xi )). Clearly Pr(B f = 1) = Pr(Af ,gK = 1) and Pr(B fK = 1) = Pr(AfK ,gK = 1) and as B works in the same time as A plus q computations of gK , the ﬁrst term is bounded by 1 . For the second term, we consider a single oracle distinguisher C against G constructed as follows: having access to an oracle g it answers the queries xi of A with (yi , g(xi )) where yi are random values simulating a random function and so xi = xj ⇒ yi = yj . Clearly ∗ ∗ ∗ ∗ Pr(B g = 1) = Pr(Af ,g = 1) and Pr(B gK = 1) = Pr(Af ,gK = 1) and as B runs in the same time as A, the second term is bounded by 2 .

C

Proof of Theorem 2

We prove by induction. The case t = 1 is trivial. To establish the property at rank t we consider the intermediate situation where the oracle function is (a1 , . . . , at+1 ) → (f1∗ (a1 ), . . . , ft∗ (a1 , . . . , at ), fg1∗ (a1 ,...,at ) (at+1 ), fg2∗ (a1 ,...,at ) (at+1 )) t t where the fi∗ and gt∗ are independent random functions (see Figure 3). Let A

a1

···

a2

at

··· at+1

f1∗

f2∗

···

ft∗

gt∗

f 1 ||f 2

Fig. 3. Intermediate setting

74

O. Billet, J. Etrog, and H. Gilbert

be a single oracle distinguisher against Ft+1 using q queries and working in time T = T − tqTP RF . Its advantage is upper-bounded by the triangular inequality: 1 fK (.),f 12

RF AdvP Ft+1 (A)

f (.) K

f1∗ (),f2∗ (),...,ft∗ (),fg1∗ (.,...,.) (.),fg2∗ (.,...,.) (.) t

(.) f... f2 (.) f 2 (.) K

≤ |Pr(A −Pr(A

+|Pr(A

(.),...,f 12

t

(.),f 22

(.) f... f2 (.) f 2 (.) K

(.))

= 1)

f1∗ (),f2∗ (),...,ft∗ (),fg1∗ (.,...,.) (.),fg2∗ (.,...,.) (.) t

= 1) − Pr(A

t

∗ ∗ (f1∗ (),...,ft∗ (),ft+1 (),gt+1 ())

= 1)| = 1)|

To bound the ﬁrst term by ((t − 1)q + 1) we notice that it is the advantage of the single oracle distinguisher B against Ft constructed as follows. B’s oracle is a (t + 1)-tuple of functions denoted by g = (g 1 , . . . , g t+1 ) (where g i for 1 ≤ i ≤ t has i arguments and g t+1 has t arguments). B launches A and answers any oracle query (a1 , . . . , at+1 ) of A by returning the value (g 1 (a1 ), . . . , g t (a1 , . . . , at ), fg1t+1 (a1 ,...,at ) (at+1 ), fg2t+1 (a1 ,...,at ) (at+1 )). Finally B outputs A’s output. Clearly B works in time T + qTP RF and its advantage is exactly equal to the ﬁrst term. The induction hypothesis on the indistinguishability of Ft provides the claimed upper bound. To bound the second term by q we show that it is the advantage of the multiple oracle distinguisher C against F build as follows. Each of the λ = q oracles queried by C is a pair hl = (h1l , h2l ) of single argument functions. C launches A and answers any of the q queries ((xji )1≤j≤t , yi ) of A as follows: ﬁrst it chooses an I ∈ {1, . . . , λ} as a random function of (xji )1≤j≤t , then chooses a qt-tuple (rij )(1≤i≤q,1≤j≤t) of values so that for all 1 ≤ i ≤ q, 1 ≤ j ≤ t, rij is a random func

tion of (xji )1≤j ≤j , and ﬁnally returns to A the value ((rij )1≤j≤n , h1I (yi ), h2I (yi )) and outputs A’s output. C works in the same time as A. When C’s oracles are chosen into F as it is equivalent to choose a key among q random keys K1 , . . . , Kq by selecting a random function of (a1 , . . . , at ) as index in {1, . . . , q} and to choose as a key g ∗ (a1 , . . . , at ) ∈ {0, 1}k where g ∗ is a perfect random function we have 1

f1∗ (),f2∗ (),...,ft∗ (),ff1∗

2

Pr(C (fKi ,fKi )i=1,...,q = 1) = Pr(A

t+1

2 (.,...,.) (.),ff ∗ (.,...,.) (.) t+1

= 1).

When C’s oracles are perfect random functions, as it is equivalent (as long as at most q distinct t tuples (a1 , . . . , at ) are considered) to choose a perfect random function of (a1 , . . . , at+1 ) and to choose a perfect random function of at+1 among q parametrized by a perfect random function of (a1 , . . . , at ), we have ∗

∗

∗

∗

∗

Pr(C (hi )i=1,...,q = 1) = Pr(A(f1 (),...,ft (),ft+1 (),gt+1 ()) = 1). RF RF RF The inequality AdvP (C) ≤ AdvP (q, q, T − tqTP RF ) ≤ qAdvP (q, T − F F F P RF (t − 1)qTF ) ≤ qAdvF (q, T ) of Theorem 1 gives the upper bound of the second term and concludes the proof.

Fast Software AES Encryption Dag Arne Osvik1 , Joppe W. Bos1 , Deian Stefan2 , and David Canright3 1 2

Laboratory for Cryptologic Algorithms, EPFL, CH-1015 Lausanne, Switzerland Dept. of Electrical Engineering, The Cooper Union, NY 10003, New York, USA 3 Applied Math., Naval Postgraduate School, Monterey CA 93943, USA

Abstract. This paper presents new software speed records for AES-128 encryption for architectures at both ends of the performance spectrum. On the one side we target the low-end 8-bit AVR microcontrollers and 32-bit ARM microprocessors, while on the other side of the spectrum we consider the high-performing Cell broadband engine and NVIDIA graphics processing units (GPUs). Platform speciﬁc techniques are detailed, explaining how the software speed records on these architectures are obtained. Additionally, this paper presents the ﬁrst AES decryption implementation for GPU architectures. Keywords: Advanced Encryption Standard (AES), Advanced Virtual RISC (AVR), Advanced RISC Machine (ARM), Cell Broadband Engine, Graphics Processing Unit (GPU), Symmetric Cryptography.

1

Introduction

In 2001, as the outcome of a public competition, Rijndael was announced as the Advanced Encryption Standard (AES) by the US National Institute of Standards and Technology (NIST). Today, the AES is one of the most widely used encryption primitives. A wide range of computational devices, from high-end machines, such as dedicated cryptographic servers, to low-end radio frequency identiﬁcation (RFID) tags, use this encryption standard as a primitive to implement security. Besides its well-regarded security properties1 , the AES is extremely eﬃcient on many diﬀerent platforms, ranging from 8-bit microcontrollers to 64-bit processors to FPGAs. Indeed, eﬃciency was a crucial metric in making Rijndael an encryption standard. There is an active research area devoted to not only creating more eﬃcient and secure implementations, but also evaluating the performance of the AES on diﬀerent architectures. For example, the recent AES performance records for the Intel Core i7 by K¨ asper and Schwabe [15] was rewarded with one of the best paper awards at CHES 2009. Many improved performance results are aided by techniques such as byte- and bitslicing, as introduced by Biham [5], and by improved single-instruction multiple data (SIMD) instruction set extensions to commonly available architectures such as x86. 1

The only attack on the full AES is applicable in the related key scenario to the 192-bit [6] and 256-bit key versions [6,7].

S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 75–93, 2010. c International Association for Cryptologic Research 2010

76

D.A. Osvik et al.

It is expected that the use of lightweight devices, i.e., low-end smart cards and radio frequency identiﬁcation tags, in electronic commerce and identiﬁcation will grow rapidly within the near future. For instance, the passive RFID tag market is expected to reach up to US$ 486M by 2013 [12], and the AES has already attracted signiﬁcant attention due to its capabilities for such devices [11]. This work further investigates the performance of the AES on low-power devices; speciﬁcally, 8-bit AVR microcontrollers and 32-bit ARM microprocessors. Other platforms we target in this article are the high-end Cell Broadband Engine architecture (Cell) and the NVIDIA Graphics Processing Units (GPUs). For these platforms, which allow the use of vectorization optimization techniques, multiple input streams are processed at once using SIMD and SIMT (single instruction, multiple threads) techniques for the Cell and GPUs, respectively. Due to the low prices and wide availability of these devices it is interesting to evaluate their performance as cryptologic accelerators. We present new software implementations of AES-128 with high speed and small code size. To the best of our knowledge, our results set new performance records on all the targeted platforms. These performance records are achieved by carefully mapping the diﬀerent components of the AES to the various platforms for optimal performance. Our AVR implementation requires 124.6 and 181.3 cycles per byte for encryption and decryption with a code size of less than 2 kilobytes. Compared to the previous AVR records our encryption code is 0.62 times the size and 1.24 times faster. Our ARM implementation requires 34.0 cycles per byte for encryption, which is 1.17 times faster than the previous record on this platform. Our 16-way SIMD byte-sliced implementations for the synergistic processing elements of the Cell architecture achieve speeds of 11.3 and 13.9 cycles per byte, which are 1.10 and 1.23 times faster than the previous Cell records, for encryption and decryption respectively. Similarly, our fastest GPU implementation, running on a single GPU of the NVIDIA GTX 295 and handling many input streams in parallel, delivers throughputs of 0.32 cycles per byte for both encryption and decryption. When running on the older GeForce 8800 GTX our results are 1.2 times and 1.34 times faster than the previous records on this GPU with and without memory transfer, respectively. Furthermore, this is the ﬁrst AES implementation for the NVIDIA GPU which implements both encryption and decryption. The paper is organized as follows. Section 2 brieﬂy recalls the design of the AES. In Section 3 our target platforms are described. Section 4 describes the techniques used and decisions made when porting the AES to the diﬀerent architectures. In Section 5 we present our results and a comparison is made to other reported results in literature. Finally, we conclude in Section 6.

2

A Brief Introduction to the AES

The AES is a ﬁxed block length version of the Rijndael block cipher [9,19], with support for 128-, 192-, and 256-bit keys. The cipher operates on an internal state of 128 bits, which is initially set to the plaintext block, and after transformations,

Fast Software AES Encryption

77

becomes the output ciphertext block. The state is organized in a 4 × 4 array of 8-bit bytes, which is transformed according to a round function Nr times. The number of rounds is Nr = 10 for 128-bit keys, Nr = 12 for 192-bit keys, and Nr = 14 for 256-bit keys. In order to encrypt, the state is ﬁrst initialized, then the ﬁrst 128-bits of the key are xored into the state, after which the state is modiﬁed Nr − 1 times according to the round function, followed by the slightly diﬀerent ﬁnal round. The round function consists of four steps: SubBytes, ShiftRows, MixColumns and AddRoundKey (except for the ﬁnal round which omits the MixColumns step). Each step operates on the state, at each round r, as follows: 1. SubBytes: substitutes every entry (byte) of the state with an S-box entry, 2. ShiftRows: cyclically left shifts every row i of the state matrix by i, 0 ≤ i ≤ 3, 3. MixColumns: multiplies each column, taken as a polynomial of degree less than 4 with coeﬃcients in F28 , by a ﬁxed polynomial, modulo x4 + 1, 4. AddRoundKey: xors the r-th round key into the state. Each transformation has an inverse from which decryption follows in a straightforward way by reversing the steps in each round: AddRoundKey (inverse of itself), InvMixColumns, InvShiftRows, and InvSubBytes. The key expansion into the Nr 128-bit round keys is accomplished using a key scheduling algorithm, the details of which can be found in [19] and [9]. The design of the key schedule allows for the full expansion to precede the round transformations, which is advantageous if multiple blocks are encrypted using the same key, while also providing the option for on-the-ﬂy key generation. Onthe-ﬂy key generation proves useful in memory constrained environments such as microcontrollers. For 32-bit (and greater word length) processors, in [9], Daemen and Rijmen detail a fast implementation method that combines the SubBytes, ShiftRows, and MixColumns transformations into four 256-entry (each entry is 4 bytes) lookup tables, Ti , 0 ≤ i ≤ 3. Following [9], the “T -table” approach reduces the round transformations to updating the j-th column according to: 3 T Ti [si,j+Ci ], for 0 ≤ j ≤ 3, s0,j , s1,j , s2,j , s3,j =

(1)

i=0

where sj,k is the byte in the j-th row and k-th column of the state, and Ci is a constant equivalently doing the ShiftRows in-place. After the columns are updated, the remaining transformation is AddRoundKey (which is a single 4-byte look-up and xor per column). We, however, note that since the Ti ’s are simply rotations of each other, some implementations of (1) beneﬁt from using a single table and performing the necessary rotations.

3

Target Platforms

On the one hand we target low-end lightweight devices—the performance and code-size of the AES on such devices, e.g., RFID-tags, is crucial (see for instance [11]) for the use of this encryption primitive in practice. Contrastingly,

78

D.A. Osvik et al.

on the other side of the performance spectrum we target the many-core, high performing, Cell and GPU platforms. Fast multi-stream implementations of the AES on these architectures show the potential use of these platforms as AESaccelerators which could be used in high-end servers. Below, we introduce the three target platforms and discuss their overall design and execution models. 8-bit Advanced Virtual RISC Microcontroller. Advanced Virtual Risc (AVR) is a family of 8-bit microcontrollers designed by Atmel, targeting lowpower embedded systems. Although a lightweight microcontroller, the AVR has 32 8-bit registers, a large number of instructions (125 for the AT90USB82/162), between 512B and 384KB in-system programmable ﬂash (ISP), 0 to 4KB of EEPROM and 0 to 32KB SRAM. Additionally, the microcontrollers are equipped with timers, counters, USART, SPI, and many other features and peripherals that make the AVR a favorable platform for embedded applications [3]. The AVR CPU is a modiﬁed Harvard architecture (program and data memories are separate) with a two stage single level pipeline supporting instruction pre-fetching. The (memory) parallelism and pipelining greatly improve the microcontroller’s performance. Moreover, the majority of AVR instructions take up only a single 16-bit word and execute with a single cycle latency; only a small number of instructions require two or four cycles to complete. Features like free pre-decrement and post-increment of pointer registers also contribute towards small and eﬃcient program code. The data memory consists of the register ﬁle, I/O memory, and SRAM. As such, the various direct and indirect addressing (through 16-bit pointer registers X, Y, Z) modes can be used to not only access the data memory, but also the 32 registers. The ability to access the register ﬁle as memory, in addition to the optimized direct register access, provides a designer with additional ﬂexibility in optimizing an application implementation. We note, however, that although direct addressing can access the whole data space, indirect addressing with displacement is limited to using the Y or Z registers as base pointers, in addition to the displacement being limited to 64 address locations (including 0) [3]. Similarly, conditional branches have a 6-bit displacement restriction. Hence, it is often necessary for an implementer to use the trampoline technique to address these limitations. Only the Z register may be used for addressing ﬂash memory, e.g., for the AES S-box lookups, and in some AVR devices this is not possible at all. Additionally, the ﬂash memory is relatively small and because in practical applications it is of little interest to dedicate the whole ﬂash to a cryptographic primitive, it is critical that the code-size of the AES remain small. Most AVRs are clocked between 0 and 20 MHz, and with the ability to execute one instruction per cycle the embedded microcontrollers can achieve throughputs up to 20 MIPS (16 MIPS for the AT90USB162). Thus, given the relatively high computation power of these low-cost and low-power devices, the performance of block ciphers, such as the AES, is of practical consideration for applications requiring cryptographic primitives (e.g., electronic automobile keys).

Fast Software AES Encryption

79

32-bit Advanced RISC Machine. Advanced RISC Machine (ARM) is an instruction set architecture (ISA) design by ARM for high-performance 32-bit embedded applications. Although ARM cores are usually larger and more complex than AVRs, most ARM designs are also well regarded for their low power consumption and high code density—the ARM is one of the most widely used 32-bit processors in mobile applications [16]. The ARM family has evolved from the ARM1 to the prominent ARM7TDMI core (ARMv4T ISA) to the ARM9 (ARMv5), ARM11 (ARMv6) and most recent Cortex (ARMv7) [30]. The number of pipeline stages increased from 3 on the ARM7 to 8 on the ARM11, with diﬀerent cores including various caches, memory management units, SIMD extensions, and other capabilities using coprocessors. Diﬀerent cores even implement diﬀerent memory architectures: some the von Neumann, while others, such as the StrongARM SA-1110 (ARMv4), are (modiﬁed) Harvard architectures. Most of the RISC design features are, however, constant across the families, making code written for the ARM7 core compatible with the ARM11. Like the AVR, the ARM microprocessor is a load-store architecture with simple, yet powerful, addressing modes. The processor has 37 32-bit registers, of which only 16 are visible at any one time [27]. Register banking is employed to “hide” the registers used in operating modes other than user (e.g., supervisor, abort, etc.). Most instructions on the ARM execute in a single cycle, in addition to their opcode length being ﬁxed to 32-bits2 To eliminate the need for branching, almost all ARM instructions can be made conditional by adding a suﬃx to the instruction mnemonic. Furthermore, the condition bits are only modiﬁed by an instruction if the programmer desires (expressed by adding the S suﬃx to the instruction). We mention two additional features of the ARM, which are extensively used in this work: the inline barrel shifter and the load-store-multiple instruction. The inline barrel shifter allows for the second operand of most data processing instructions (e.g., arithmetic and logical) to be shifted or rotated before the operation is performed. The load and store-multiple instructions copy any number of general-purpose registers from/to a block of sequential memory addresses. The Cell Broadband Engine. The Cell architecture [14], jointly developed by Sony, Toshiba, and IBM, is equipped with one dual-threaded, 64-bit in-order “Power Processing Element” (PPE), which can oﬄoad work to the eight “Synergistic Processing Elements” (SPEs) [31]. The SPEs are the workhorses of the Cell processor and the target for the AES implementation in this article. Each SPE consists of a Synergistic Processing Unit (SPU), 256 kilobyte of private memory called Local Store (LS) and a Memory Flow Controller (MFC). The latter handles communication between each SPE and the rest of the machine, including main memory, as explicitly requested by programs. If one wants to 2

The ﬁxed-size opcodes simplify decoding, but also increase the overall code size. To address this a separate and reduced (16-bit) instruction set, Thumb-2, can be used instead.

80

D.A. Osvik et al.

avoid the complexity of sending explicit DMA (Direct Memory Access) requests to the MFC, all code and data must ﬁt within the LS. Most SPU instructions are 128-bit wide SIMD operations performing sixteen 8-bit, eight 16-bit, four 32-bit, or two 64-bit operations in parallel. Each SPU is also equipped with a large register ﬁle containing 128 registers of 128 bits each. This provides space for unrolling and software pipelining of loops, hiding the relatively long latencies of the instructions. Unlike the processor in the PPE, the SPUs are asymmetric processors, having two pipelines (denoted by the odd and the even pipeline) which are designed to execute two disjoint sets of instructions. Most of the arithmetic instructions are executed in the even pipeline while most of the load and store instructions are executed in the odd pipeline. In an ideal scenario, two instructions (one even and one odd) can be dispatched per cycle. The SPUs are in-order processors with no hardware branch-prediction. The programmer (or compiler) must instead notify the instruction fetch unit in advance where a (single) branch instruction will jump to. Hence, for most code with infrequent jumps and where the target of each branch can be computed suﬃciently early, perfect branch prediction is possible. One of the ﬁrst applications of the Cell processor was to serve as the heart of Sony’s PlayStation 3 (PS3) game console. The Cell contains eight SPEs, but in the PS3 one is disabled, allowing improved yield in the manufacturing process as any chip with a single faulty SPE can still be used. One of the remaining SPEs is reserved by Sony’s hypervisor, a software layer providing a virtual machine environment for running an external operating system. Hence, we have access to six SPEs when running GNU/Linux on (the virtual machine on) the PS3. Fortunately, the virtualization does not slow down programs running on the SPUs, as they are naturally isolated and protection mechanisms only need to deal with requests sent to the MFC. Besides its use in the PS3, the Cell has been placed on a PCI-Express card such that it can serve as an arithmetic accelerator. The Cell has also established itself in the high-performance market with the Roadrunner supercomputer, in the top 500 supercomputing list [10]. This supercomputer consists of a revised variant of the Cell, the PowerXCell 8i, which is available in the IBM QS22 blade servers. Graphics Processing Units Using the Compute Unified Device Architecture. Similar to the PS3, Graphic Processing Units (GPUs) have mainly been game- and video-centric devices. Due to the increasing computational requirements of graphics-processing applications, GPUs have become very powerful parallel processors and this, moreover, incited research interest in computing outside the graphics-community. Until recently, however, programming GPUs was limited to graphics libraries such as OpenGL [28] and Direct3D [8], and for many applications, especially those based on integer-arithmetic, the performance improvements over CPUs was minimal, sometimes even degrading. The release of NVIDIA’s G80 series and ATI’s HD2000 series GPUs (which implemented the uniﬁed shader architecture), along with the companies’ release of higherlevel language support with Compute Uniﬁed Device Architecture (CUDA), Close to Metal (CTM) [24] and the more recent Open Computing Language

Fast Software AES Encryption

81

(OpenCL) [18], however, facilitate the development of massively-parallel general purpose applications for GPUs [21,1]. These general purpose GPUs have become a common target for numerically-intensive applications given their ease of programming (relative to previous generation GPUs), and ability to outperform CPUs in data-parallel applications, commonly by orders of magnitude. In this paper we focus on NVIDIA’s GPU architecture with CUDA, programming ATI GPUs using the Stream SDK (successor of CTM) is part of our ongoing work. In addition to the common ﬂoating point processing capabilities of previousgeneration GPUs, starting with the G80 series, NVIDIA’s GPU architecture added support for integer arithmetic, including 32-bit addition/subtraction and bitwise operations, scatter/gather memory access and diﬀerent memory spaces [20,21]. Each GPU contains between 10 and 30 streaming multiprocessors (SMs) each equipped with: eight scalar processor (SP) cores, fast 16-way banked onchip shared memory (16KB/SM), a multithreaded instruction unit, large register ﬁle (8192 for G80-based GPUs, 16384 for the newer GT200 series), read-only caches for constant (8KB/SM) and texture memories (varying between 6 and 8 KB/SM), and two special function units (for transcendentals). We refer to [21] for further details. CUDA is an extension of the C language that employs the new massively parallel programming model, single-instruction multiple-thread. SIMT diﬀers from SIMD in that the underlying vector size is hidden and the programmer is restricted to writing scalar code that is parallel at the thread-level. The programmer deﬁnes kernel functions, which are compiled for and executed on the SPs of each SM, in parallel: each light-weight thread executes the same code, operating on diﬀerent data. A number of threads (less than 512) are grouped into a thread block which is scheduled on a single SM, the threads of which time-share the SPs. This additional hierarchy provides for threads within the same block to communicate using the on-chip shared memory and synchronize their execution using barriers. Moreover, multiple thread blocks can be executed simultaneously on the GPU as part of a grid; a maximum of eight thread blocks can be scheduled per SM and in order to hide instruction and memory (among other) latencies, it is important that at least two blocks be scheduled on each SM.

4

Porting the AES

When porting the AES to our target platforms diﬀerent implementation decisions have to be made. These decisions are inﬂuenced by the features and restrictions of the instruction sets, and the available memory on the target platforms. We started by optimizing the AES for 8-bit AVR microcontroller architecture. This 8-bit version of the AES is used as a framework to create a byte-sliced implementation on the SIMD-architecture of the SPE. Hence, 16 instances of the AES are processed in parallel per SPE using 16-way SIMD arithmetic on the 128-bit registers. Unlike the Cell and AVR, the ARM and GPU architectures do not directly beneﬁt from the byte-sliced framework and, instead, the T-table approach is used, see Section 2.

82

D.A. Osvik et al.

In order to illustrate some of the techniques used we often use the functionality of xtime as an example. The multiplication of a variable b ∈ F28 , using its polynomial representation, by the constant value 0x02 (which corresponds to the polynomial x) is denoted by xtime. The functionality of xtime can be implemented by a shift and a conditional xor with the constant 0x1B, depending on whether the most-signiﬁcant bit of b is set. 8-bit Advanced Virtual RISC Microcontroller. Our implementations of the AES on 8-bit AVR microcontrollers use the conventional lookup-based approach. The lookup tables used are the forward and inverse S-boxes, each 256 bytes; to keep the memory requirements of the implementation low, no other tables are used. Lookups are performed by putting the index value in the lower half of a pointer register; when the table is in ﬂash memory, this needs to be in the Z pointer register. In order to allow for this simple way of calculating a lookup address, the tables must be 256-byte aligned, and in our implementation, the S-box pointer is always in the Z register. This allows the placement of the S-box in ﬂash or SRAM memory to be selected at compile time, without any additional code modiﬁcations. Load instructions require 2 cycles when loading from SRAM and 3 cycles when loading from ﬂash memory, so the former provides higher performance in cases where SRAM is available. The functionality of xtime is implemented as a left shift followed by a conditional branch (depending on the shifted-out bit) which skips the xor (with 0x1B) if the most signiﬁcant bit is 0. On the AVR, a branch costs 2 cycles if taken and 1 cycle if not. In the latter case, the xor operation is performed, taking an additional cycle. Hence, the total number of required cycles (including the left shift) is independent of the branch and is always 3 cycles. The MixColumns step is implemented without the use of lookup tables as a series of register copies, xors and xtime operations, taking a total of 26 cycles. The InvMixColumns step is implemented similarly, but is more complicated and takes a total of 42 cycles. On the AVR, conditional branches are limited to jumping 63 instructions back or 64 instructions forward. Hence, in our implementation, the branching to the code for the last round cannot be accomplished in a single step. However, since this branch is close to the beginning of the encryption loop we, instead, branch to an unconditional rjmp (relative jump) instruction just before the encryption loop. The relative jump instruction is able to jump 2047 instructions backward or 2048 instructions forward, and using this trampoline technique we eﬃciently reach the code for the last round. We use our own internal, assembly only, calling convention for the interface between mode-of-operation code and the AES encryption core. However, the modeof-operation code fully supports the C-programming language calling convention. 32-bit Advanced RISC Machine. For the ARM microprocessor we only implemented the AES-128 encryption. As the ARM has considerably more resources than the AVR, the main design goal is to optimize the implementation for speed; we do not take any code-size or memory restrictions into account when designing the code for this platform. Furthermore, we do not restrict ourselves

Fast Software AES Encryption

83

to any speciﬁc ARM processor—the code is portable among the ARM family of processors. The 32-bit based AES-128 encryption implementation for the ARM uses the T -table approach, see Section 2, storing only one of the four tables. This saves the required storage size for the T -table at no additional performance cost by taking advantage of the inline barrel shifter. Other features of the ARM are considered as building blocks of the AES implementation, including conditional execution of instructions and ability to perform multiple load or store operations using a single instruction. The code has been designed to work without slowdown for processors with up to 3 cycles load-to-use latency and with a minor slowdown when 4 cycles are required before a loaded value can be used. In practice, these latencies are often lower; for example, the ARM SA-1110 we used for benchmarking has a 2 cycle latency for cache hits. In addition, all the steps in the round function and all the rounds have been completely unrolled to maximize speed. We would like to emphasize that an iterated version would have a much smaller code size and still be nearly as fast as the implementation described. Furthermore, although the encryption code is unrolled, the key expansion is not. The key-expansion is implemented such that a single byte from each 4-byte T -table element is used when an S-box entry is required. This is possible since each 4-byte entry of the T -tables is composed of four S-box multiples (e.g., T0 [a] = (0x02 · S-box[a], S-box[a], S-box[a], 0x03 · S-box[a])). Synergistic Processing Elements. The 16-way SIMD capabilities of the SPE, working on 16 bytes simultaneously, are used to create a byte-sliced implementation of the AES. In an optimistic scenario one may expect to achieve roughly a 16-fold speedup compared to machines with a word-size of one byte and a comparable instruction set. For many of the operations required by the AES, e.g., bitwise operations, this speedup holds on the Cell. Unlike most other modern architectures, on the SPE all distinct binary operations f : {0, 1}2 → {0, 1} are available. Other instructions of particular interest for the implementation of the AES are the shuffle and select instructions. The shuffle instruction can pick any 16 bytes of a 32-byte (two 128-bit registers) input or select one of the constants {0x00, 0xFF, 0x80} and place them in any of the 16 byte positions of the 128-bit output register. We note that this allows 16 lookups in a 32 entry one-byte table using a single instruction. The select instruction acts as a 2-way multiplexer: depending on the input pattern the corresponding bit from either the ﬁrst or the second input quadword is selected as output. Some operations which are necessary to implement the AES are not, however, so trivial to perform in a SIMD fashion. A prime example is the S-box lookup. Typically this can be done in a few instructions, depending on the index, by calculating the address to load from and loading the S-box value from this address. Neither the Cell nor any other current mainstream architecture supports parallel lookups, so this approach would need to be performed sequentially. In [22], Osvik describes a technique on how to do this eﬃciently on the SPE architecture. The idea is to use the ﬁve least signiﬁcant bits, of each of the 16 bytes

84

D.A. Osvik et al.

in parallel, as input to eight diﬀerent table lookups – one for each possible value of the three most signiﬁcant bits. Then, one-by-one, extract the three remaining bits. Depending on the values of each of these bits, one or the other half of the lookup results are selected. After all three bits have been processed, the correct S-box output has been fetched, 16 times in parallel. Simply implementing the AES on the Cell architecture in this way results in an imbalanced number of even and odd instructions, since the majority of the instructions are even (mainly due to all the required xors). Rebalancing the instructions requires an unconventional way of thinking: increasing the number of instructions and the latency of a given part of the encryption might result in faster overall code if we are able to “hide” these extra instructions in the odd pipeline. Consider the implementation of xtime as an example. Recall that the functionality of xtime can be implemented by a shift and a conditional xor. However, when performing SIMD arithmetic (where the same steps need to be performed for all concurrent streams), conditional statements are to be avoided. We present an approach (which can be applied in 16-SIMD) that eliminates the conditional instruction by creating masks using the compare-greater-than (cmp-gt) instruction. This instruction compares the 16 8-bit values of two vectors simultaneously and returns either all ones or all zeros in every corresponding output byte, depending whether the value is greater or not. The constant value is selected depending on this select-mask using a single and instruction. As there is no 16-SIMD shift instruction on the Cell, this operation is mimicked by using an and, to clear the most-signiﬁcant bit of the 16 entries, and performing a global 128-bit left shift. The resulting xor of these two values is the output of xtime. We note that this approach requires four even and one odd instruction, with a total latency of eight cycles. An alternative implementation approach to xtime is by ﬁrst rotating the entire 128-bit vector one bit to the left, making the most-signiﬁcant bit of byte number i the least-signiﬁcant bit of byte number i − 1 mod 16. In a separate register this bit is then cleared using an and instruction which is then used for the ﬁnal xor. Just as in the previous approach a select-mask is created, however, in this case we are using the least-signiﬁcant bit of the 16 elements of the rotated value. This is achieved by ﬁrst gathering these least-signiﬁcant bits, using the gather instruction, and forming the mask with the help of the form-select-bytemask instruction maskb (both odd instructions). Next, these masks are rotated one byte to the right such that the index of the mask corresponds to the index of the original value of the least-signiﬁcant bit of the rotated value. Following this, the masks are in the correct position and are used to select the constant value. Finally, this masked constant is used to create (using an xor) the output of xtime. Figure 1 shows the program ﬂow of both discussed methods; we denote the ﬁrst as the unbalanced and the second as the balanced approach. The ﬁgure also highlights their respective latencies. Despite the longer latency of the balanced approach (20 versus 8 cycles) it requires one fewer even instructions in comparison

Fast Software AES Encryption x

85

x

cmp-gt

and 2

rotate-left

2 and

shift-left 4

4

4

gather

and

4

2 xor

form-mask

2 xtime (x)

4 rotate-right

2

4 and 2 xor 2 xtime (x) Fig. 1. Two diﬀerent program ﬂows, with cycle latencies, of an 16-way SIMD implementation of xtime on the Cell. Even and odd instructions are denoted by italic and bold text respectively. The unbalanced implementation requires 4 even and 1 odd instruction and has a latency of 8 cycles. The balanced implementation requires 3 even and 4 odd instructions and has a latency of 20 cycles.

to the unbalanced approach. Moreover, the three extra odd instruction can, in most instances of xtime in the AES encryption, be dispatched for “free” in pairs with surrounding even instructions. In order to make an overall well-balanced (in terms of odd and even instruction count) implementation for the SPE on the Cell, similar techniques are applied when implementing a more balanced variant of the S-box fetching algorithm. Graphics Processing Units. As the instruction set of the GPU is substantially less rich than that of the Cell and ARM, when optimizing an implementation using CUDA, it is essential to be able to execute many threads concurrently and thereby maximally utilize the device. Hence, our GPU implementation processes thousands of streams. Additionally, we also consider implementations with onthe-ﬂy key scheduling, key-expansion in texture memory, key-expansion in shared memory, and variants of the former two with storage of the T -tables in shared memory. To maximize the throughput between the device and host, our GT200 implementations use page-locked host memory with concurrent memory copies and kernel execution. Since kernel execution and copies between page-locked host

86

D.A. Osvik et al.

memory and device memory are concurrent, the latency of the memory copies is hidden (except for ﬁrst and last kernel). The older G80 series GPUs do not, however, support concurrent memory transfers and kernel execution. Our ﬁrst three, and simplest, variants are similar to the implementation of [17] in placing the T -tables in constant memory. Because the constant memory cache size is 8KB, the full tables can be cached with no concern for having cache misses. Although this approach has the advantage of simplicity, unless all the threads of a half-warp (16 threads executing concurrently) access the same memory location (broadcast), the accesses must be serialized and thus the gain in parallelism is severely degraded. Hence, to lower the memory access penalties, in implementing the transformation according to (1), we only used as single T table (speciﬁcally: T2 )—unlike the ARM, which allows for inline rotates, the rotates on the GPU were implemented with two shifts and an or. Moreover, we improve on [17] by including on-the-ﬂy key scheduling and key-expansion in texture and shared memory. The AES implementation of [17] assumes the availability of the expanded key in global memory, which is of practical interest only for single-stream cryptographic applications; for multi-stream cryptographic and cryptanalytic applications, however, key scheduling is critical as deriving many keys on the CPU is ineﬃcient. Our on-the-ﬂy key scheduling variants are ideal for key search applications and multi-stream applications with many thousand streams since each thread independently encrypts/decrypts multiple (diﬀerent) blocks with a separate key. Since our implementation processes multiple blocks, for the on-the-ﬂy key generation we buﬀer the ﬁrst round key in shared memory, from which the remaining round keys are derived. Adopting the method of [9], during each round four S-box lookups and ﬁve xors are needed to derive the new round key. Additional caching (e.g., the last round) can further improve the performance of the implementation. For many applications having on the order of a few hundred to a few thousand streams is suﬃcient and thus further speedup can be achieved by doing the key expansion in texture or shared memory. Texture memory, unlike constant memory, allows for multiple non-broadcast accesses and, like constant memory, has the advantage that it can be written once and retain the contents (expanded keys) across multiple kernel launches. Similarly, when accessing shared memory it is important to understand that although a single instruction is issued per warp, the warp execution is split into two half-warps and if no threads of a halfwarp access the same shared memory location, i.e., there are no bank conﬂicts, 16 diﬀerent reads/writes can be completed in two cycles (per SM). As with using texture memory, this can further increase the throughput of the AES when encrypting/decrypting multiple blocks. Although shared memory bank conﬂicts on the GT200 series result in only serializing the conﬂicting accesses, as opposed to the G80s serialization of all the threads in the half warp, we carefully implemented the shared memory access to avoid any bank conﬂicts. For the key expansion into texture and shared memory we create 16 streamgroups per block, each group consisting of multiple threads that share a common expanded key. The key expansion for the texture memory variant is performed

Fast Software AES Encryption

87

using a separate kernel and all subsequent kernel executions reuse the expanded key. The shared memory variant, however, performs the key expansion at the start of every kernel. Furthermore, the number of stream-groups per block was chosen between 8 and 24 (most commonly 16) to allow for a higher number of blocks to be scheduled concurrently and thus hide block-dependent latencies. We emphasize that for all variants (including on-the-ﬂy), in addition to launching multiple grids, each kernel processes multiple blocks. Of course, for the key expansion into texture and shared memory no additional speedup is attained unless multiple blocks are processed. As previously mentioned, the throughput of constant memory for random access is quite low when compared to shared and texture memory and so we further optimize the AES by placing the T -tables in shared memory. To avoid bank conﬂicts one Ti (of size 1KB) must be stored in each bank; this of course, is not directly possible because kernel arguments are also usually placed in shared memory, and furthermore if most of the table (save a few entries), as in [13], is placed in shared memory, the maximum number of blocks assigned to that SM would be limited to one. Thus, the overall gain would not be very high. The authors in [13] also propose a quad-table approach in shared memory, though they do not specify whether the design contains bank conﬂicts. Our shared memory version is a “lazy” approach in simply laying out the tables in-order. Because we are targeting the newer generation GPUs, a bank conﬂict is resolved by serializing only the colliding accesses; thus, although bank conﬂicts are expected (simulations show that roughly 35% of the memory accesses are serialized, so 6 of the 16), on average, the gain in using shared memory is much higher than constant memory. For these variants we store all four T tables, as the measured performance over using a single T table was higher. We recall that the key scheduling for decryption consists of running the encryption key scheduling algorithm and then applying InvMixColumns to all, except the ﬁrst and last round keys. It is clear that on-the-ﬂy key generation for decryption is considerably more complex than for encryption; the key expansion for all other variants is a direct application of this method. For on-the-ﬂy decryption we buﬀer the ﬁrst round key, and (after running the encryption key scheduler) the InvMixColumns of the ﬁnal key. We derive all successive keys from the second to last round key using six xors, a MixColumns transformation (of part of the key), and a transformation combing InvMixColumns and InvSubBytes per round. These complex transformations also take advantage of the T -tables, with the additional need for an S-box[S-box[·]] table to further lower the memory pressure. We note that this eﬃcient on-the-ﬂy key scheduling for decryption is not GPU speciﬁc and can be further applied to any other T -table based implementations.

5

Results

Table 1 shows AES-128 performance and code size (including lookup tables) for our implementations on 8-bit AVR microcontroller and 32-bit ARM microprocessor. Depending on the AVR model used, the availability of RAM and ﬂash

88

D.A. Osvik et al.

Table 1. AES-128 implementation results on an 8-bit AVR microcontroller and a 32-bit ARM microprocessor

Reference

Key Encryption Decryption Code size scheduling (cycles) (cycles) (bytes)

Notes

8-bit AVR microcontroller [32] Fast [32] Compact

on-the-ﬂy on-the-ﬂy

1,259 1,442

1,259 1,443

1,708 840

[26] [25] Fast [25] Furious [25] Fantastic [23] [23]

precompute precompute precompute precompute precompute precompute

3,766 2,474 2,739 4,059 2,555 2,555

4,558 3,411 3,579 4,675 6,764 3,193

3,410 3,098 1,570 1,482 2,070 2,580

New - Low RAM precompute precompute New - Fast

2,153 1,993

2,901 2,901

1,912 1,912

Hardware ext. cost: 1.1 kGates Key setup: Enc 756 cycles Dec 4,977 cycles Key setup: 2,039 cycles Key setup: 789 cycles 747 cycles

32-bit ARM microprocessor [2] Atasu et. al New

precompute precompute

639 544

638 -

5,966 3,292

memory varies. We created two variants: a fast and a compact version. The compact version only stores the key (176 bytes) in RAM—no additional tables; this version is designed for AVR implementations with low RAM requirements. The faster version trades RAM usage for speed by placing the 256 byte S-box in RAM. Our timing results are obtained by running our compact version on the AT90USB162 (16 MHz, 512 byte RAM and 16 kilobyte ﬂash) and the fast version on the larger AT90USB646 (16 MHz, 4 kilobyte RAM and 64 kilobyte ﬂash). Although a direct comparison is not possible, for completeness we also include estimates of an AVR implementation using hardware extensions [32] in Table 1. Figure 2 graphically shows the code size versus the required cycles for decryption and encryption of diﬀerent AVR implementations of AES-128. Our AVR encryption and decryption routines are 1.24 and 1.10 times faster compared to the previous fastest results. In both cases we also achieve a smaller code size. For comparison, let us brieﬂy analyze the previous fastest AVR AES implementation: the fast variant from [25]. This implementation integrates xtime in its S-box lookup tables, resulting in twice the table size for encryption, all elements of the S-box multiplied by 0x01 and 0x02, and ﬁve times for decryption, all elements of the inverse S-box multiplied by 0x01, 0x0E, 0x09, 0x0D and 0x0B. Hence seven tables are used, while we use only two. Although more tables may seem like a good way of speeding up the code, it also requires frequent changing of the S-box pointer. Furthermore, computing xtime costs the same as a lookup from ﬂash (3 cycles), but does not require the input value to be copied to a particular register nor change any pointer registers.

Fast Software AES Encryption 3600

[26] [25] [23] New

3400 3200 Code Size in Bytes

89

3000 2800 2600 2400 2200 2000 1800 1600 1400 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 Cycles for Decryption

3600 3400

Code Size in Bytes

3200 3000 2800 2600 2400 2200 2000 1800 1600 1400 1500

2000

2500 3000 3500 4000 Cycles for Encryption

4500

5000

Fig. 2. Code size versus cycle count for decryption and encryption of diﬀerent AES-128 AVR implementations

To our knowledge, based on a review of the literature, the previous record for the ARM microprocessor is reported in [2] by Atasu, Breveglieri and Macchetti, where performance results are measured on the StrongARM SA-1110. Our implementation is benchmarked on this same ARM processor, the results of which are presented at the bottom of Table 1. The code size for our AES-128 implementation, using a T -table of 1024 (256 × 4) bytes, is 2156 bytes. The code size

90

D.A. Osvik et al.

for the key-expansion is 112 bytes and is calculated to run in 260 cycles. This could not be accurately measured due to practical limitations: this ARM lacks a timestamp counter. The encryption code is calculated to run in 540 cycles and in practice, when performing repeated encryption of large messages, we measured 544 cycles per encryption using wall-clock time. The code-size of the implementation from [2] includes encryption, decryption and key-scheduling code for the AES working on 128-, 192- and 256-bit key lengths. Our result for encryption is 1.17 times faster compared to their 128-bit variant. Table 2 gives AES-128 performance results obtained when running on the SPE architecture and various GPUs. We note that although we use sophisticated techniques to implement the AES on the Cell and the GPUs, see Section 4, we did not implement the AES using hand-optimized assembly (except for one GPU variant). Despite the fact that we implement the AES using the C-programming language, we are able to set new performance records for both platforms.

Table 2. Diﬀerent AES-128 performance results obtained when running on a single SPE of a PS3 and various GPU architectures. (P) = key scheduling is pre-computed, (F) = key scheduling is on-the-ﬂy, (T) = key expansion in texture memory, (S) = key expansion in shared memory Reference [17], 2007 [33], 2007 [13], 2008 This This This This This This This This This This This This This This This

article, article, article, article, article, article article article article, article, article article article article, article,

T -smem T -smem T -smem T -smem T -smem

T -smem T -smem

T -smem T -smem

Algorithm

Architecture

Enc (P) Enc (F) Enc (P)

NVIDIA 8800 GTX, 1.35GHz ATI HD 2900 XT, 750MHz NVIDIA 8800 GTX, 1.35GHz

Enc (P) Enc (F) Enc (T) Dec (F) Dec (T) Enc (F) Enc (T) Enc (S) Enc (F) Enc (T) Dec (F) Dec (T) Dec (S) Dec (F) Dec (T)

NVIDIA 8800 GTX, 1.35GHz

NVIDIA GTX 295, 1.24GHz

Cycles/ Gb/sec byte 1.30 1.71 0.70

8.3 3.5 15.4

0.52 0.84 0.64 1.23 0.65 1.13 0.91 0.92 0.42 0.32 2.46 1.37 1.38 0.66 0.32

23.3 12.9 16.8 8.8 16.6 8.8 10.9 10.8 23.8 30.9 4.0 7.2 7.1 15.1 30.8

[29], 2005 [29], 2005

Enc (P) Dec (P)

SPE, 3.2GHz

12.4 17.1

2.1 1.5

This article This article

Enc (P) Dec (P)

SPE, 3.2GHz

11.3 13.9

2.3 1.8

Fast Software AES Encryption

91

Few benchmarking results of the AES on the SPE architecture are reported in the literature. The only reported performance data we could ﬁnd are from IBM [29]. This single-stream high-performance implementation is optimized for the SPE-architecture to take full advantage of the SIMD properties. Compared to the IBM implementation our SPE implementation, benchmarked on a PS3, is 1.10 and 1.23 times faster for encryption and decryption, respectively. Our byte-sliced key generation routine runs in 62 clock cycles per stream. Our AES-SPE implementation uses the pipeline balancing techniques described in Section 4. For the encryption, in the MixColumns step, the xtime implemented with the long latency but fewer even instructions is used to achieve the best performance results. When decrypting, in the InvMixColumns step, xtime is called six times. In this case, calling each variant three times leads to optimal performance. There have been numerous implementations of the AES on GPUs, we however, only compare against GPUs with support for integer arithmetic. Table 2 compares our GTX 295 and GeForce 8800 GTX implementations with those in [17,33,13]. The GTX 295 results in the table include the memory transfer along with the kernel execution, each stream encrypting 256 random blocks. We further note that although the GTX 295 contains two GPUs (clocked-down version of the GTX 280 GPU), we benchmark on only one of them—performance on both GPUs scales linearly (assuming the memory transfer over PCI-Express is not a limitation). Since the GT200 series GPUs address many of the limitations of the G80 series GPUs, a direct comparison is not appropriate, nonetheless, we note that our implementations using the shared memory to store the T tables outperform previous GPU implementations of the AES. To our knowledge the previous record on the 8800 GTX is that presented in [13]. As shown in Table 2, our fastest GTX 295 (single-GPU) implementation is roughly 4.1 times faster than [17], 5.3 times faster than [33] and 2.2 times faster than [13]. Additionally, although our implementations target the GT200 GPU, which in addition to the previouslymentioned advantages over the G80 have more relaxed memory accesses pattern restrictions and ability for concurrent memory access and kernel execution, for completeness we benchmark our fastest implementations on the 8800 GTX with no modiﬁcation. We also implemented a comparable parallel thread execution (PTX) assembly design using a pre-expanded key (in constant memory). The PTX implementation includes an additional double-buﬀering (to register) optimization which allows for the encryption of a block, while (hiding the latency in) reading the subsequent block. The peak throughput of our 8800 GTX implementation, measuring only the kernel run time, is 2.5, 3.3 and 1.3 times faster than [17], [33], and [13], respectively. With memory transfer, our PTX implementation is about 1.2 times faster than that of [13], delivering 8.6 Gb/sec versus 6.9 Gb/sec. Moreover, with respect to [17] and [13], our C implementation results are comparable and also include key scheduling methods. Compared to [13], we do not limit our implementation to CTR, for which additional improvements can be made [4]. Finally, when compared to the AES implementation in [33],

92

D.A. Osvik et al.

our streams encrypt diﬀerent plaintext messages with diﬀerent keys; tweaking our implementations for applications of key searching as in [33] would further speed up the AES implementation by at least 35% as only one message block would be copied to the device.

6

Conclusion

New software speed records for encryption and decryption when running AES128 on 8-bit AVR microcontrollers, the synergistic processing elements of the Cell Broadband Engine architecture, 32-bit ARM microprocessor and NVIDIA graphics processing units are presented. To achieve these performance records a byte-sliced implementation is employed for the ﬁrst two architectures, while the T -table approach is used for the latter two. Each implementation uses platform speciﬁc techniques to increase performance; the implementations targeting the Cell and GPU architectures process multiple streams in parallel. Furthermore, this is the ﬁrst AES implementation for the GPU which implements both encryption and decryption.

References 1. AMD. ATI CTM Reference Guide. Technical Reference Manual (2006) 2. Atasu, K., Breveglieri, L., Macchetti, M.: Eﬃcient AES implementations for ARM based platforms. In: Symposium on Applied Computing 2004, pp. 841–845. ACM, New York (2004) 3. Atmel Corporation. 8-bit AVR Microcontroller with 8/16K Bytes of ISP Flash and USB Controller. Technical Reference Manual (2008) 4. Bernstein, D.J., Schwabe, P.: New AES software speed records. In: Chowdhury, D.R., Rijmen, V., Das, A. (eds.) INDOCRYPT 2008. LNCS, vol. 5365, pp. 322– 336. Springer, Heidelberg (2008) 5. Biham, E.: A Fast New DES Implementation in Software. In: Biham, E. (ed.) FSE 1997. LNCS, vol. 1267, pp. 260–272. Springer, Heidelberg (1997) 6. Biryukov, A., Khovratovich, D.: Related-key Cryptanalysis of the Full AES-192 and AES-256. Cryptology ePrint Archive, Report 2009/317 (2009), http://eprint.iacr.org/ 7. Biryukov, A., Nikolic, D.K.I.: Distinguisher and Related-Key Attack on the Full AES-256. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 231–249. Springer, Heidelberg (2009) 8. Blythe, D.: The Direct3D 10 system. ACM Trans. Graph. 25(3), 724–734 (2006) 9. Daemen, J., Rijmen, V.: The design of Rijndael. Springer, New York (2002) 10. Dongarra, J., Meuer, H., Strohmaier, E.: Top500 Supercomputer Sites, http://www.top500.org/ 11. Feldhofer, M., Dominikus, S., Wolkerstorfer, J.: Strong authentication for RFID systems using the AES algorithm. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 85–140. Springer, Heidelberg (2004) 12. Frost & Sullivan: Asia Paciﬁc’s Final Wireless Growth Frontier, http://www.infoworld.com/t/networking/ passive-rfid-tag-market-hit-486m-in-2013-102

Fast Software AES Encryption

93

13. Harrison, O., Waldron, J.: Practical Symmetric Key Cryptography on Modern Graphics Hardware. In: USENIX Security Symposium, pp. 195–210 (2008) 14. Hofstee, H.P.: Power Eﬃcient Processor Architecture and The Cell Processor. In: HPCA 2005, pp. 258–262. IEEE Computer Society, Los Alamitos (2005) 15. K¨ asper, E., Schwabe, P.: Faster and timing-attack resistant AES-GCM. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 1–17. Springer, Heidelberg (2009) 16. Klami, K., Hammond, B., Spencer, M.: ARM Announces 10 Billionth Mobile Processor (2009), http://www.arm.com/news/24403.html 17. Manavski, S.A.: CUDA Compatible GPU as an Eﬃcient Hardware Accelerator for AES Cryptography. In: ICSPC 2007, November 2007, pp. 65–68. IEEE, Los Alamitos (2007) 18. Munshi, A.: The OpenCL Speciﬁcation. Khronos OpenCL Working Group (2009) 19. National Institute of Standards and Technology (NIST). FIPS-197: Advanced Encryption Standard, AES (2001), http://www.csrc.nist.gov/publications/fips/fips197/fips-197.pdf 20. NVIDIA. NVIDIA GeForce 8800 GPU Architecture Overview. Technical Brief TB02787-001 v0, 9 (2006) 21. NVIDIA. NVIDIA CUDA Programming Guide 2.3 (2009) 22. Osvik, D.A.: Cell SPEED. In: SPEED 2007 (2007), http://www.hyperelliptic.org/SPEED/slides/Osvik_cell-speed.pdf 23. Otte, D.: AVR-Crypto-Lib (2009), http://www.das-labor.org/wiki/Crypto-avr-lib/en 24. Owens, J.: GPU architecture overview. In: SIGGRAPH 2007, p. 2. ACM, New York (2007) 25. Poettering, B.: AVRAES: The AES block cipher on AVR controllers (2006), http://point-at-infinity.org/avraes/ 26. Rinne, S., Eisenbarth, T., Paar, C.: Performance Analysis of Contemporary LightWeight Block Ciphers on 8-bit Microcontrollers. In: SPEED 2007 (2007), http://www.hyperelliptic.org/SPEED/record.pdf 27. Seal, D.: ARM architecture reference manual, 2nd edn. Addison-Wesley Professional, Reading (2001) 28. Segal, M., Akeley, K.: The OpenGL graphics system: A speciﬁcation (version 2.0). In: Silicon Graphics, Mountain View, CA (2004) 29. Shimizu, K., Brokenshire, D., Peyravian, M.: Cell Broadband Engine Support for Privacy, Security, and Digital Rights Management Applications (October 2005), https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/ 3F88DA69A1C0AC40872570AB00570985 30. Sloss, A., Symes, D., Wright, C.: ARM system developer’s guide: designing and optimizing system software. Morgan Kaufmann Pub., San Francisco (2004) 31. Takahashi, O., Cook, R., Cottier, S., Dhong, S.H., Flachs, B., Hirairi, K., Kawasumi, A., Murakami, H., Noro, H., Oh, H., Onish, S., Pille, J., Silberman, J.: The circuit design of the synergistic processor element of a Cell processor. In: ICCAD 2005, pp. 111–117. IEEE Computer Society, Los Alamitos (2005) 32. Tillich, S., Herbst, C.: Boosting AES Performance on a Tiny Processor Core. In: Malkin, T.G. (ed.) CT-RSA 2008. LNCS, vol. 4964, pp. 170–186. Springer, Heidelberg (2008) 33. Yang, J., Goodman, J.: Symmetric Key Cryptography on Modern Graphics Hardware. In: Kurosawa, K. (ed.) ASIACRYPT 2007. LNCS, vol. 4833, pp. 249–264. Springer, Heidelberg (2007)

Attacking the Knudsen-Preneel Compression Functions 1, ¨ Onur Ozen , Thomas Shrimpton2, , and Martijn Stam1 1

EPFL IC IIF LACAL, Station 14, CH-1015 Lausanne, Switzerland {onur.ozen,martijn.stam}@epfl.ch 2 Dept. of Computer Science, Portland State University, Room 120, Forth Avenue Building, 1900 SW 4th Avenue, Portland OR 97201, USA [email protected]

Abstract. Knudsen and Preneel (Asiacrypt’96 and Crypto’97) introduced a hash function design in which a linear error-correcting code is used to build a wide-pipe compression function from underlying blockciphers operating in Davies-Meyer mode. In this paper, we (re)analyse the preimage resistance of the Knudsen-Preneel compression functions in the setting of public random functions. We give a new non-adaptive preimage attack, beating the one given by Knudsen and Preneel, that is optimal in terms of query complexity. Moreover, our new attack falsiﬁes their (conjectured) preimage resistance security bound and shows that intuitive bounds based on the number of ‘active’ components can be treacherous. Complementing our attack is a formal analysis of the query complexity (both lower and upper bounds) of preimage-ﬁnding attacks. This analysis shows that for many concrete codes the time complexity of our attack is optimal.

1

Introduction

Cryptographic hash functions remain one of the most used cryptographic primitives, and the design of provably secure hash functions (relative to various security notions) is an active area of research. From an appropriate perspective, most hash function designs can be viewed as the Merkle-Damg˚ ard iteration of a blockcipher-based compression function (where a single permutation can be regarded as a degenerate or ﬁxed key blockcipher). The classical PGV blockcipher-based compression functions [16] have an output size matching the blocksize n of the underlying blockcipher. Yet even for the optimally secure ones [1, 23], the (time) complexity of collision- and preimageﬁnding attacks is at most 2n/2 , resp. 2n ; when n = 128 (e.g. AES) the resulting bounds have been deemed unacceptable for current practice.

This work has been supported in part by the European Commission through the ICT programme under contract ICT-2007-216676 ECRYPT II. Supported by a grant of the Swiss National Science Foundation, 200021-122162. Supported by NSF grants CNS-0627752 and CNS-0845610.

S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 94–115, 2010. c International Association for Cryptologic Research 2010

Attacking the Knudsen-Preneel Compression Functions

95

This mismatch between desired output sizes for blockciphers versus hash functions has been recognized early on (dating back to Yuval [27]), and blockcipherbased compression functions that output more than n bits have been put forth. This output expansion is typically achieved by calling the blockcipher multiple times and then combining the resulting blockcipher outputs in some clever way. In the 1990s many so-called double-length constructions (where 2n bits are output) were put forth, but large classes of these were subsequently broken. Only recently have a few double-length constructions been supported by formal security proofs; see e.g. [5, 6, 7, 13, 15] for an overview. In any case, the standard approach in designing wider-output compression functions has been to ﬁx a target output size (and often a target number of blockcipher calls as well) and then to build a compression function that is optimally collision-resistant for that size. In three papers [8, 9, 10], Knudsen and Preneel adopted a diﬀerent approach, namely to let the output size and (relatedly) the number of blockcipher calls vary as needed in order to guarantee a particular security target. Speciﬁcally, given r independent ideal compression functions f1 , . . . , fr , each mapping cn-bits to n bits, they create a new ‘bigger’ compression function outputting rn bits.1 The f1 , . . . , fr are run in parallel, and each of their inputs is some linear combination of the blocks of message and chaining variable that are to be processed; the rn-bit output of their construction is the concatenation of the outputs of these parallel calls. The elegance of the KP construction is in how the inputs to f1 , . . . , fr are computed. They use the generator matrix of an [r, k, d] error-correcting code over 2c to determine how the ck input blocks of the ‘big’ compression function are xor’ed together to form the inputs to the underlying r functions. (In a generalization they consider the fi as mapping from bcn to bn bits instead and use a code over 2bc .) Under a broad—but prima facie not unreasonable—assumption related to the complexity of ﬁnding collisions in parallel compression functions, Knudsen and Preneel show that any attack needs time at least 2(d−1)n/2 to ﬁnd a collision in their construction. Thus for a code with minimum distance d = 3, one obtains a 2n collision-resistance bound. For preimage resistance, Knudsen and Preneel conjecture that attacks will require at least 2(d−1)n time. They also give preimage- and collision-ﬁnding attacks that, curiously, are mostly independent of the minimum distance. For preimage resistance the attacks of Knudsen and Preneel meet their conjectured bound (at least for MDS codes). For collision resistance the story is diﬀerent, as for many of the codes they consider there is a considerable gap between the actual complexity of their attacks and their 2(d−1)n/2 bound. Watanabe [25] has subsequently shown a collision attack that is more eﬃcient than Knudsen and Preneel’s attack for many of their parameter sets. He was even able to show that the collision resistance lower bound given by Knudsen and Preneel is wrong for certain parameters (e.g. for 3 < d ≤ k). 1

Note that Knudsen and Preneel also propose to instantiate the underlying ideal compression functions with a blockcipher run in Davies-Meyer mode and to iterate the compression function to obtain a full blockcipher-based hash function. See the full version for details.

96

¨ O. Ozen, T. Shrimpton, and M. Stam

Table 1. Knudsen-Preneel constructions (cf. [10, Table V]) based on 2n-to-n bit primitive (PuRF or single-key blockcipher). Non-MDS parameters in italic.

Code

sn + mn → sn

[r, k, d]2e [5, 3, 3]4 [8 , 5 , 3 ]4 [12 , 9 , 3 ]4 [9 , 5 , 4 ]4 [16 , 12 , 4 ]4 [6, 4, 3]16 [8, 6, 3]16 [12, 10, 3]16 [9, 6, 4]16 [16, 13, 4]16

2kn → rn (5 + 1)n → 5n (8 + 2)n → 8n (12 + 6)n → 12n (9 + 1)n → 9n (16 + 8)n → 16n (6 + 2)n → 6n (8 + 4)n → 8n (12 + 8)n → 12n (9 + 3)n → 9n (16 + 10)n → 16n

Preimage Resistance Query Our Attack KP-Conj. KP-Attack Complexity Time Time Time 2rn/k 25n/3 28n/5 24n/3 29n/5 24n/3 23n/2 24n/3 26n/5 23n/2 216n/13

Sec. 5 25n/3 28n/5 24n/3 211n/5 27n/3 23n/2 24n/3 26n/5 22n 22n

2(d−1)n 22n 22n 22n 23n 23n 22n 22n 22n 23n 23n

Thm. 1 22n 23n 23n 24n 24n 22n 22n 22n 23n 23n

Our contribution. This paper oﬀers a new security analysis of the KP construction when the underlying compression functions are modeled as public random functions (PuRFs). In the process we also introduce a precise formalization of the Knudsen-Preneel transform and, more generally, blockwise-linear schemes. We directly address the conjectured preimage-resistance security by describing an attack taking into account both query and time-complexity; we see the latter as especially important when considering attacks. Our attacks go well below the conjectured lower bound by Knudsen and Preneel, demonstrating its incorrectness and, more generally, that intuition about security derived from the number of active functions can be misleading. Our main result is a new preimage attack whose time and query complexity (ignoring the constant and logarithmic factors) is summarized2 in Tables 1 and 2. From a practical point of view, the time complexity of our attack beats the one given by KP in every case but two, namely when the code is [4, 2, 3]8 or [5, 2, 4]8 ; in both cases we match the original attack, moreover we show that it is optimal for the former. Startlingly, in the [12, 9, 3]4 case our preimage attack is even faster than the collision attack proposed by Knudsen and Preneel. So in that case we have uncovered a new collision attack as well! Reducing the query complexity. We begin with the simple observation that (0a || x1 ) ⊕ (0a || x2 ) yields a string of the form (0a || X). More generally, any linear combination of strings with the same pattern of ﬁxed zero bits will yield a string with the same form. By restricting the queries (to the PuRFs) to strings with the same (blockwise) pattern, we can optimize the yield (the maximum 2

We note that our attack is not speciﬁc to the cases c ∈ {2, 3}, it also works for instance against the compression functions suggested by Knudsen and Preneel with c = 5 (mimicking the MD4 and MD5 situation).

Attacking the Knudsen-Preneel Compression Functions

97

Table 2. Knudsen-Preneel constructions (cf. [10, Table VIII]) based 3n-to-n bit primitive (PuRF or double-key blockcipher).

Code

sn + mn → sn

[r, k, d]2e 3kn → rn [4, 2, 3]8 (4 + 2)n → 4n [6, 4, 3]8 (6 + 6)n → 6n [9, 7, 3]8 (9 + 12)n → 9n [5, 2, 4]8 (5 + 1)n → 5n [7, 4, 4]8 (7 + 5)n → 7n [10, 7, 4]8 (10 + 11)n → 10n

Preimage Resistance Query Our Attack KP-Conj. KP-Attack Complexity Time Time Time 2rn/k 22n 23n/2 29n/7 25n/2 27n/4 210n/7

Sec. 5 22n 23n/2 29n/7 23n 29n/4 22n

2(d−1)n 22n 22n 22n 23n 23n 23n

Thm. 1 22n 22n 22n 23n 23n 23n

number of compression function evaluations an adversary can make given a particular number of queries to the oraclized, ideal objects that underlie it). This observation allows us to reduce the query complexity of a preimage-ﬁnding attack to the bare minimum and also allows the attack to be deterministic and non-adaptive. The results we derive here (Section 4) are relevant beyond the KP construction. In particular, they apply to all constructions in which the inputs to the blockciphers (or PuRFs) are determined by blockwise linear combinations of blocks of compression function input. This includes the schemes discussed by Peyrin et al. [15] and Seurin and Peyrin [20]. Exploiting the dual code to reduce the time complexity. When mounting our reduced-query attack against a KP construction with parameters [r, k, d]2e , the result is r lists with partial preimages (under each of the fi respectively) and, with high probability, a full preimage is ‘hiding’ among these lists. That is to say, when we consider all possible combinations of partial preimages, some will correspond to a codeword and others will not. To reduce the time complexity we need to be able to ﬁnd such a ‘codeword’ (being an actual, full preimage) eﬃciently among all possibilities. The main innovation of our attack (Section 5) is in how to ﬁnd full preimages from the lists of partial preimages. It is based on the observation that codewords in the dual code can be used to express relations between PuRF-inputs that correspond to a codeword. Using known techniques to solve the generalized birthday problem (see e.g. [2, 3, 19, 24]) this allows us to prune the lists and consequently ﬁnd a preimage for the compression function faster (than a naive approach or than Knudsen and Preneel). In the full version we explore additional reductions in the memory requirements. Proving optimality. A secondary result of this paper is a security proof for preimage resistance of the Knudsen-Preneel compression functions in the informationtheoretic model. Here (Theorem 3) we determine a lower bound on the query complexity for a computationally unbounded adversary to successfully ﬁnd

98

¨ O. Ozen, T. Shrimpton, and M. Stam

preimages. We give a concrete bound and, to interpret it, switch to an asymptotic assessment. This shows that the query complexity of our new attack is essentially optimal (up to a small factor). Since the lower bounds on the query complexity serve as ‘best case’ lower bounds for the complexity of real-world attacks, we can conclude that our new preimage-ﬁnding attack is optimal whenever the time complexity of our attack matches its query complexity. This happens for 9 out of the 16 schemes: for the seven MDS schemes with d = 3, and for codes [8, 5, 3]4 and [12, 9, 3]4. For the remaining seven schemes we leave a gap between the information-theoretic lower bound and the real-life upper bound.

2

Preliminaries

Blockwise-linear compression functions. A compression function is a mapping H : {0, 1}tn → {0, 1}sn for some blocksize3 n > 0 and integer parameters t > s > 0. For positive integers c and n, we let Func(cn, n) denote the set of all functions mapping {0, 1}cn into {0, 1}n . A compression function is PuRF-based if its mapping is computed by a program with oracle access to a ﬁnite number $ of speciﬁed oracles f1 , . . . , fr , where f1 , . . . , fr ← Func(cn, n). When a PuRFbased compression function operates on input W , we write H f1 ,...,fr (W ) for the resulting value. Of primary interest for us will be single-layer PuRF-based compression functions without feedforward. These call all oracles in parallel and compute the output based only on the results of these calls; in particular, input to the compression function is not further considered. Most PuRF-based (and blockcipher-based) compression functions are of a special type. Instead of arbitrary pre- and postprocessing, one ﬁnds only functions that are blockwise linear. For example, consider all of the PGV hash functions. An advantage of a blockwise approach is that it yields simple-looking hash functions whose security is easily seen to be determined by the blocksize n. Linearity allows for relatively eﬃcient implementation via bitwise exclusive-or of n-bit blocks. The Knudsen-Preneel construction is also blockwise linear, so let us deﬁne formally what is a blockwise-linear single-layer PuRF-based compression function without feedforward, an unwieldy name we shorten to blockwise-linear scheme. Definition 1 (Blockwise-linear scheme). Let r, c, b, t, s be positive integers , Cpost ∈ sb×rb be given. We deﬁne H = and let matrices Cpre ∈ rcb×tb 2 2 b pre post ) to be a family of single-layer PuRF-based compression funcBL (C , C tions Hn : {0, 1}tn → {0, 1}sn , for all positive integers n with b|n. Speciﬁcally, let n b = n, and f1 , . . . , fr ∈ Func(cn, n). Then on input W ∈ {0, 1}tn (interpreted as column vector), Hn f1 ...fr (W ) computes the digest Z ∈ {0, 1}sn as follows: 3

We include the blocksize in the deﬁnition for convenience later on—it is not a necessity and only mildly restrictive.

Attacking the Knudsen-Preneel Compression Functions

99

1. Compute X ← (Cpre ⊗ In ) · W ; 2. Parse X = (xi )i=1...r and for i = 1...r compute yi = fi (xi ); 3. Parse (yi )i=1...r = Y and output Z = (Cpost ⊗ In ) · Y . where ⊗ denotes the Kronecker product and In the identity matrix in

n ×n . 2

In the deﬁnition above we silently identiﬁed {0, 1}n with the vector space n2 , etc. The map corresponding to (Cpre ⊗ In ) will occasionally be denoted C pre . It will be convenient for us to write the codomain of C pre as a direct sum, so we r rcn identify {0, 1} with i=1 Vi where Vi = cn 2 for i= 1, . . . , r. If x1 ∈ V1 and x2 ∈ V2 , then consequently x1 + x2 will be in V1 ⊕ V2 . (This extends naturally to L1 + L2 when L1 ⊂ V1 , L2 ⊂ V2 .) If we want to add ‘normally’ in cn 2 we write x1 ⊕ x2 which conveniently corresponds to exclusive or and the result will be in cn 2 as expected. Preimage resistance. A preimage-ﬁnding adversary is an algorithm with access to one or more oracles, and whose goal is to ﬁnd a preimage of some speciﬁed compression function output. We will consider adversaries in two scenarios: the information-theoretic one and a more realistic concrete setting. For informationtheoretic adversaries the only resource of interest is the number of queries made to their oracles. Otherwise, these adversaries are considered (computationally) unbounded. In the concrete setting, on the other hand, we are interested in the actual runtime of the algorithm (when ﬁxing any reasonable computational model) and, to a lesser extent, its memory consumption (and code-size4 ). Without loss of generality, in both settings adversaries are assumed not to repeat queries to oracles nor to query an oracle outside of its speciﬁed domain. There exist several deﬁnitions of preimage resistance, depending on the distribution of the element for which a preimage needs to be found. The strongest notion is that preimage resistance should hold with respect to any distribution, which can be formalized as everywhere preimage resistance [17]. Definition 2 (Everywhere preimage resistance). Let c, r, s, t > 0 be integer parameters, and ﬁx a blocksize n > 0. Let H : {0, 1}tn → {0, 1}sn be a PuRF-based compression function taking r oracles f1 , . . . , fr ∈ Func(cn, n). The everywhere preimage-ﬁnding advantage of adversary A is deﬁned to be $ Advepre (A) = max Pr f1 ...fr ← Func(cn, n), (W ) ← Af1 ...fr (Z) : H Z∈{0,1}sn Z = H f1 ...fr (W ) epre Deﬁne Advepre H (q) and AdvH (t) as the maximum advantage over all adversaries making at most q queries to each of their oracles respectively running in time at most t.

4

We force algorithms to read their own code so the runtime is naturally lower bounded by the code-size.

100

¨ O. Ozen, T. Shrimpton, and M. Stam

Linear error correcting codes. An [r, k, d]2e linear error correcting code C is the set of elements (codewords) in a k-dimensional subspace of r2e , where the minimum distance d is deﬁned as the minimum Hamming weight (taken over all nonzero codewords in C). The dual code [r, r − k, d⊥ ]2e is the set of all elements in the r − k-dimensional subspace orthogonal to C (with respect to the usual inner product), and its minimum distance is denoted d⊥ . Not all parameter sets are possible, in particular r ≥ k (trivial) and the Singleton bound puts a (crude) limit on the minimum distance d ≤ r − k + 1. Codes matching the Singleton bound are called maximum distance separable (MDS). The dual code of an MDS code is MDS itself as well, so d⊥ = k + 1. An [r, k, d]2e code C can be generated by a matrix G ∈ k×r 2e , meaning that C = {x · G|x ∈ k2e } (using row vectors throughout). Without loss of generality, we restrict ourselves to systematic generator matrices, that is G = [Ik |P ] for k×(r−k) P ∈ 2e and Ik the identity matrix in k×k 2e .

3

The Knudsen-Preneel Hash Functions

Knudsen and Preneel [8, 9] introduced a family of hash functions employing error correcting codes. (We use the journal version [10] as our frame of reference). Although their work was ostensibly targeted at blockcipher-based designs, the main technical thread of their work develops a transform that extends the range of an ‘ideal’ compression function (blockcipher-based, or not) in a manner that delivers some target level of security. As is nowadays typical, we understand an ideal compression function to be a PuRF. In fact, the KP transform is a special instance of a blockwise-linear scheme (Deﬁnition 1), in which the inputs to the PuRFs are determined by a linear code over a binary ﬁeld with extension degree e > 1, i.e. 2e , and with Cpost being the identity matrix over rb×rb 2 (corresponding to concatenating the PuRF outputs). The extension ﬁeld itself is represented as a subring of the matrix ring (of dimension equalling the extension degree) over the base ﬁeld. We formalize this by an injective ring homomorphism and let ϕ¯ : r×k → re×ke be the component-wise application ϕ : 2e → e×e 2 2e 2 r×k ) with re×ke (we will use ϕ ¯ for of ϕ and subsequent identiﬁcation of ( e×e 2 2 matrices over 2e of arbitrary dimensions). Definition 3 (Knudsen-Preneel transform). Let [r, k, d] be a linear code e×e over 2e with generator matrix G ∈ k×r be an injec2e → 2e . Let ϕ : 2 tive ring homomorphism and let b be a positive divisor of e such that ek > rb. Then the Knudsen-Preneel compression function H = KPb ([r, k, d]2e ) equals H = BLb (Cpre , Cpost ) with Cpre = ϕ(G ¯ T ) and Cpost = Irb . If H = KPb ([r, k, d]2e ), then Hn : {0, 1}kcn → {0, 1}rn with c = e/b is deﬁned for all n for which b divides n. Moreover, Hn is based on r PuRFs in Func(cn, n). For use of H in an iterated hash function, note that per invocation (of H) one can compress (ck − r) message blocks (hence ek > rb ensures actually compression is taking place), and the rate of the compression function is ck/r − 1.

Attacking the Knudsen-Preneel Compression Functions

101

We will concentrate on the case (b, e) ∈ {(1, 2), (2, 4), (1, 3)} and then in particular on the 16 parameter sets given by Knudsen and Preneel. (Since b is uniquely determined given e, we will often omit it.) For an illustrative example of this formalism, please see Appendix A. Knudsen and Preneel’s security claims. Knudsen and Preneel concentrate on the collision resistance of their compression function in the complexity theoretic model. Under a fairly generous (but plausible) assumption, they essentially5 show that if H = KPb ([r, k, d]2e ), then ﬁnding collisions in Hn takes time at least 2(d−1)n/2 . The intuition behind this result is fairly simple. The use of a code of minimum distance d implies that for any pair of diﬀering compression function inputs W

= W there are at least d diﬀerent PuRF inputs. That is, if (xi )i=1..r and (xi )i=1..r are the respective PuRF inputs, then there is an index set I ⊆ {1, . . . , r} such that |I| ≥ d and for all i ∈ I it holds that xi

= xi . Thus, for W and W to collide, one needs to ﬁnd collisions for the PuRFs fi for all i ∈ I simultaneously. Also, as the dimension of the code is k, there exist k PuRFs that can be attacked independently, say f1 , ..., fk . Now, this is where their assumption comes into play. Namely, ﬁnding a collision is assumed to take 2vn/2 time where v is the number of PuRFs fj , j ∈ {k + 1, . . . , r}, whose inputs = xj once W

= W . From the Singleton bound, one has r − k ≥ d − 1. satisfy xj

Hence, v ≥ d − 1. For preimage resistance Knudsen and Preneel do not give a corresponding theorem and assumption, yet they do conjecture it to be essentially the square of the collision resistance, that is, they conjecture that ﬁnding a preimage will take at least time 2(d−1)n . Known attacks. To lowerbound the security of their construction, Knudsen and Preneel also present two attacks, one for ﬁnding preimages [10, Proposition 3] and one for ﬁnding collisions [10, Proposition 4]. We summarize these here, using our formalism. Theorem 1 (Knudsen-Preneel attacks). Let H = KPb ([r, k, d]2e ) be given and consider Hn (with b dividing n). Then 1. Preimages can be found in time max(2n(r−k) , k2rn/k ), using as many PuRF evaluations and requiring ek2(r−k)n/k n-bit blocks of memory; 2. Collisions can be found in time max(2n(r−k)/2 , k2(r+k)n/2k ), using as many PuRF evaluations and requiring ek2(r−k)n/2k n-bit blocks of memory.

4

Information-Theoretic Considerations

Bounding the yield. In the information-theoretic setting, the yield (Deﬁnition 4) of an adversary captures the number of compression function evaluations the adversary can make given the queries made so far. The concept has proven very fruitful in attacking schemes [12] and proving security [20, 21]. 5

Their actual theorem statements [10, Theorems 3 and 4] are phrased existentially.

102

¨ O. Ozen, T. Shrimpton, and M. Stam

Definition 4. Let H f1 ,...,fr be a compression function based on (ideal) primitives f1 , . . . , fr . The yield of an adversary after a set of queries to f1 , . . . , fr , is the number of inputs to H for which he can compute H f1 ,...,fr given the answers to his queries. With yield(q) we denote the maximum expected yield given q queries to each of the oracles f1 , . . . , fr . For an arbitrary tn-to-sn bit compression function with r underlying cn-to-n primitives (each called once), the known lower bound on the yield [22, Theorem 6] is yield(q) ≥ 2tn (q/2cn )r . However, for the blockwise-linear schemes (Deﬁnition 1) it is possible to obtain a much bigger yield (being single-layer does not help either) and in particular it is independent of the number of primitive calls r. Theorem 2. Let H = BLb (Cpre , Cpost ) be a blockwise linear scheme with parameters c, t, s, r. Consider Hn with b dividing n. Then yield(q) ≥ 2

lg q bc bt

≈ q t/c .

Proof. Recall that t is the number of external n-bit input blocks and c the number of internal n-bit input blocks and that all n-bit blocks are subdivided into b n -bit blocks. Set nq = lg q/(bc) and X = (0n −nq × {0, 1}nq )bc . For each of the bc internal input subblocks, set the ﬁrst n − nq bits identical zero and let the rest range over all possibilities in {0, 1}nq . All combinations of the internal input subblocks are combined (under concatenation) to give (2nq )bc ≤ q distinct inputs for any particular internal function. Query the fi on these inputs (precisely corresponding to X deﬁned above), for i = 1, . . . , r. Consider an external input W that consists of a concatenation of subblocks each with the ﬁrst n − nq bits set to zero. Then, due to linearity, (Cpre ⊗ In ) · W will map to a collection of PuRF-inputs all corresponding to the queries formed above, and hence this W will contribute to the yield. Since there are 2nq bt possible W that adhere to the format, we get the stated lower bound on the yield. The approximation follows by ignoring the ﬂoor and simplifying the resulting expression. Although this can lead to slight inaccuracies, for increasing n there will be more and more values of q for which the expression is precise (so when q = 2αn for rational α the expression is precise inﬁnitely often). Information-theoretic attacks. Intuitively, when the yield for a tn-to-sn compression function gets close to 2sn/2 , a collision is expected (birthday bound) and once it surpasses 2sn a collision is guaranteed (pigeon hole) and a preimage expected. The bounds for permutation-based compression functions by Rogaway and Steinberger [18] are based on formalizing this intuition. In the claims below we relate what the bound on the yield implies for preimage and collision resistance, assuming that the yield results in more or less uniform values. Note that the assumption certainly does not hold in general (hence ‘presumably’), see also the discussion below. Claim (Consequences for blockwise-linear schemes). Let H = BLb (Cpre , Cpost ) be a blockwise linear scheme with parameters c, t, s, r. Consider Hn with b dividing n.

Attacking the Knudsen-Preneel Compression Functions

103

1. If q ≥ 2scn/t then yield(q) ≥ 2sn and a collision in Hn can be found with certainty; preimages can presumably be found with high probability. 2. If q ≥ 2scn/(2t) then yield(q) ≥ 2sn/2 and collisions in Hn can presumably be found with high probability. Information-theoretic security proof. The following result provides a security proof for preimage resistance of the Knudsen-Preneel compression functions in the information-theoretic model. That is, we give a lower bound on the query complexity (for a computationally unbounded adversary) of any preimageﬁnding attack. This bound shows that the query complexity of our new attack is optimal, up to a small factor. Therefore, the time complexity of our preimage attack is optimal whenever the time complexity of our attack matches its query complexity, and this is the case for 9 out of the 16 Knudsen-Preneel schemes. Theorem 3. Let H = KPb ([r, k, d]2e ) and, for b dividing n, consider Hn based on underlying PuRFs fi ∈ Func(cn, n) for i= 1, . . . , r with c = e/b. Then for q ≤ 2cn queries to each of the oracles and δ ≥ 0 an arbitrary real number: Advepre H (q) ≤

q 1+δ 2(r−k)n

+p

where p = Pr B[kq; 2−n ] > kq (1+δ)/k and B[kq; 2−n ] denotes a random variable counting the number of successes in kq independent Bernoulli trials, each with success probability 2−n . Proof (Sketch). Let Z = z1 || · · · || zr be the range point to be inverted. Recall that f1 , . . . , fk are the functions corresponding to the systematic part of the [r, k, d]2e code. Without loss of generality we will restrict our attention to an adversary A asking exactly q queries to each of its oracles and consider the transcript of the oracle queries and responses. Necessarily, in this transcript there is at least one tuple (x1 , . . . , xk ) of queries to f1 , . . . , fk such that for all i= 1, . . . , k we have fi (xi ) = zi . Notice that because f1 , . . . , fk correspond to the systematic portion of the code, any tuple (x1 , . . . , xk ) of queries to these k PuRFs uniquely deﬁnes a tuple of queries (xk+1 , . . . , xr ) to the remaining r − k PuRFs. Thus the number of tuples (x1 , . . . , xk ) in the transcript such fi (xi ) = zi for all i= 1, . . . , k determines the number of tuples (xk+1 , . . . , xr ) that could possibly be a (simultaneous) preimage for zk+1 || · · · || zr . Intuitively, if this number is bounded to be suﬃciently small, then the probability that A could have won the epre game will also be small. Assuming kq (1+δ)/k as an upperbound on the number of partial preimages for the systematic portion assures that at most q (1+δ) tuples (by the arithmetic-geometric mean inequality) can possibly ”work” for the non-systematic portion. So under that assumption, the probability of ﬁnding a preimage is at most q 1+δ /2(r−k)n . The bound follows. The following corollary makes the theorem more concrete. By considering a parameter δ that provides a good balance between the ﬁrst and second terms in the bound, it follows that Ω(2rn/k ) queries are necessary to win the epre

104

¨ O. Ozen, T. Shrimpton, and M. Stam

experiment. Since we already knew that O(2rn/k ) queries are suﬃcient (under a reasonable uniformity assumption), this gives a complete characterization (up to increasingly small factors) of the query-complexity of ﬁnding preimages in Knudsen-Preneel construction. Corollary 1. Let H = KPb ([r, k, d]e ). Then asymptoticallly for n (with b divid n r/k ing n) and q ≤ g(n) 2e with g(n) = o(1): Advepre H (q) = o(1) . n r/k and substitute q = g(n) 2e in the state 1 (r−k) ment of Theorem 3. Substitution in the ﬁrst term yields e (g(n))1+δ which clearly vanishes whenever g does. For the second term in the bound of Theorem 3 we need to bound the tail probability of a binomial distribution. A standard Chernoﬀ bound and substitution of the above δ and q gives that p vanishes as well for g(n) = o(1). Proof (Sketch). Set δ =

5

r(k−1)−k2 r

Preimage Attack against the Knudsen-Preneel Constructions

Setting the stage. Section 4 contains a theoretical attack with a minimal number of queries. This already allows us to turn Knudsen-Preneel’s preimage attack from an adaptive one into a non-adaptive one and reduce its query consumption. In this section, we address reducing the time complexity as well. Let H = KPb ([r, k, d]2e ) and n be given, where b|n (and bn = n and c = e/b as before). Consider a non-adaptive preimage-ﬁnding adversary A against Hn , trying to ﬁnd a preimage for Z ∈ {0, 1}rn. For each i= 1, . . . , r, A will commit to query lists Qi ⊆ Vi which, after querying, will result in a list of partial preimages Li = {xi ∈ Qi |fi (xi ) = zi }. Since fi is presumed random, we can safely assume that Li is a set of approximately |Qi |/2n randomly drawn elements of Qi . Finding a preimage then becomes equivalent to ﬁnding an element X in the range of C pre for which

r xi ∈ Li for all i= 1, . . . , r, or—exploiting the direct sum viewpoint—X ∈ i=1 Li . Due to the linearity of C pre at hand, the rangecheck itself is eﬃcient

rfor any given X, so a naive approach would be to simply exhaustively search i=1 Li . This would take time |L|r . An improvement can already be obtained by observing that, when ka systematic matrix G is used to generate the code, any element X1..k ∈ i=1 Vi can uniquely (and eﬃciently) be extended to some X in the range of C pre . This lies at the heart of Knudsen and Preneel’s adaptive attack and it can be adapted to

k the non-adaptive setting: for all X1..k ∈ i=1 Li compute its unique completion X and check whether for the remaining i= k + 1, . . . , r the resulting xi ∈ Li . This reduces the time-complexity to |L|k . Organization. Still, we can do better. For concreteness, Section 5.1 provides a concrete warm-up example of our attack to the compression function H =

Attacking the Knudsen-Preneel Compression Functions

105

KP([5, 3, 3]4 ). Section 5.2 builds on this and contains the core idea of how to reduce the time complexity of this new non-adaptive attack, as well as its application against compression functions based on MDS codes. The slightly more complicated non-MDS case is discussed in Section 5.3. 5.1

Example: Preimages in KP([5, 3, 3])4 in O(25n/3 ) Time

Before describing our preimage attack in its full generality, we present an example of it applied to the compression function H = KP([5, 3, 3]4 ). Claim. For the compression function H = KP([5, 3, 3]4 ), preimages in Hn can be found in O(25n/3 ) time with a memory requirement of O(24n/3 ) n-bit blocks. Proof. We refer the reader to Appendix A for the details of G, ϕ and Cpre . Let target digest Z = z1 || . . . ||z5 be given, then our aim is to ﬁnd the PuRF inputs xi = (x1i ||x2i ) ∈ {0, 1}2n such that fi (xi ) = zi holds for all i = 1, . . . , 5, and X = (Cpre ⊗ In ) · W for some compression function input W (where X is comprised of the ﬁve xi ). In this case W is a preimage for Z. The attack starts with what we call the Query Phase. Namely, for each i = 1, . . . , 5 and for all x1i , x2i ∈ 0n/6 × {0, 1}5n/6, we query fi (xi ) and keep a list Li of pairs that hit the target digest zi . As a result, a total of 25n/3 queries are made (in 25n/3 time) per PuRF, resulting in |Li | ≈ 25n/3 /2n = 22n/3 (as each query has probability 2−n to hit its target). Since any tuple (x1 , x2 , x3 ) ∈ L1 × . . . × L3 uniquely determines a preimage candidate W , ﬁnding a preimage is equivalent to ﬁnding an element X ∈ L1 × . . . × L5 in the range of C pre . To do this eﬃciently, we will ﬁrst identify the tuples (x1 , x2 , x3 , x4 ) in the lists that can be complemented (not necessarily using x5 ∈ L5 ) to an element in the range. From the generator matrix G it can be seen that this complementation is possible iﬀ x1 ⊕ x2 ⊕ x3 ⊕ x4 = 0. (The task is actually to determine whether a random vector y is a valid codeword; this can easily be detected by checking y · H T = 0 where H is the parity check matrix of the underlying code C.) So let us deﬁne L{1,2,3,4} = {(x1 , x2 , x3 , x4 ) ∈ L1 × L2 × L3 × L4 | x1 ⊕ x2 ⊕ x3 ⊕ x4 = 0} . We can construct L{1,2,3,4} eﬃciently using a standard technique related to the generalized birthday problem. It starts with the Merge Phase, where we create ˜ {1,2} and L ˜ {3,4} deﬁned by the lists L ˜ {1,2} = {((x1 , x2 ), x1 ⊕ x2 ) | (x1 , x2 ) ∈ L1 × L2 } , L ˜ {3,4} = {((x3 , x4 ), x3 ⊕ x4 ) | (x3 , x4 ) ∈ L3 × L4 } L both sorted on their second component. In the Join Phase we look for the ˜ takes collisions in their second components. Since |Li | ≈ 22n/3 , creating either L 4n/3 4n/3 ˜ ) time and O(2 ) memory. (In general, the smallest L is sorted about O(n2 ˜ {1,2} and L ˜ {3,4} both and stored and the other is used for collision check.) Since L 4n/3 have roughly 2 elements and they need to collide on 2n bits, of which n/3 bits

106

¨ O. Ozen, T. Shrimpton, and M. Stam

are set to zero, the expected number of collisions is about (24n/3 )2 /2(2−1/3)n = 2n = |L{1,2,3,4} |. We now have the collision list L{1,2,3,4} and all that needs to be done is to check, for each of its elements, whether the corresponding x5 ∈ Li . If this is the case, then (x1 , x2 , x3 ) is a valid preimage. This ﬁnal phase we call the Finalization phase. It is clear that it cannot take much longer than it took to create L{1,2,3,4} . Moreover, the expected number of preimages output is 1. Note that |L{1,2,3,4} | ≈ 2n and |L5 | ≈ 22n/3 . Again, we need to check the correspondence on 2n bits, of which n/3 are set to zero. Hence, we do expect to ﬁnd 2(1+2/3)n /2(2−1/3)n = 1 preimage. Picking up the stepwise time and memory complexities gives the desired result. 5.2

Generic Attack against MDS Schemes

Our attack on the compression function KP([5, 3, 3]4 ) can be generalized to other Knudsen-Preneel compression functions. Note that the attack above consists of four steps: 1. Query phase to generate the lists of partial preimages; 2. Merge phase where two sets of lists are each merged exhaustively; 3. Join phase where collisions between the two merged lists are selected resulting in fewer partial preimages that however are preimage of a larger part of the target digest. 4. Finalization where the remaining partial preimages are ﬁltered for being a full preimage. (As a slight, standard optimization trick to save some memory one could generate and store one merged list only and, when creating the second merged list amortize with the Join and Finalization phases. Further memory optimizations are explored in the full version.) The core observation. From a high level, our approach is simple: we ﬁrst identify an index set I ⊆ {1, . . . , r} deﬁning a subspace i∈I Vi for which the range of C pre (when restricted to this subspace), is not surjective. By (blockwiselinear) construction, C pre will then map to a subspace of i∈I Vi of at most dimension (|I| − 1)cn (over 2 ). As a consequence, we

will be able to prune signiﬁcantly the total collection of candidate preimages in i∈I Li , keeping only those elements that are possibly in the range of C pre restricted to i∈I Vi . In the following, we will show how to eﬃciently ﬁnd an index set I and how to eﬃciently prune. It turns out that an important parameter determining the runtime of our preimage attack is d⊥ , the minimum distance of the dual code. Let χ be the function that maps h ∈ r2e to the set of indices of non-zero entries in h. Thus, χ(h) ⊆ {1, . . . , r} and |χ(h)| equals the Hamming weight of the codeword. If h ∈ C⊥

, then for I = χ(h) we have precisely the property that allows us to prune i∈I Li for partial preimages. The following proposition develops the key

Attacking the Knudsen-Preneel Compression Functions

107

result for understanding our attack and the role the dual code plays in it. The interpretation follows the proposition. e×re/b

Proposition 1. Let H = KPb ([r, k, d]2e ) and M ∈ 2 be given. Suppose that M = ϕ(h ¯ T ) for some h ∈ r2e , then for all positive integers n it holds that (M ⊗ In ) · (Cpre ⊗ In ) · W = 0 for all W ∈ {0, 1}ken iﬀ h ∈ C ⊥ .

Proof. Let h ∈ r2e and W ∈ {0, 1}ken be given. Let M = ϕ(h ¯ T ) and recall that pre T = ϕ(G ¯ ) where G is a generator of C. Then C (M ⊗ In ) · (Cpre ⊗ In ) · W = (ϕ(h ¯ T ) ⊗ In ) · (ϕ(G ¯ T ) ⊗ In ) · W ¯ T )) ⊗ In ) · W = ((ϕ(h ¯ T ) · ϕ(G T ) ⊗ In ) · W = (ϕ((Gh) ¯

T The statement that (ϕ((Gh) ¯ ) ⊗ In ) · W = 0 for all W ∈ {0, 1}ken is equivalent T to the statement that ϕ((Gh) ¯ ) = 0. Since ϕ is injective, this in turn is equivalent to (Gh)T = 0. By deﬁnition, it holds that Gh = 0 iﬀ h ∈ C ⊥ .

In essence, this proposition tells us that if we are given a codeword h ∈ C ⊥ and an element X ∈ rcn (to be input to the PuRFs), then X can only be in the range 2 of C pre if (ϕ(h ¯ T ) ⊗ In ) · X = 0. Since the only parts of X relevant for this check are those lining up with the nonzero entries of h,

we get that I = χ(h) is the droid we are looking for. Indeed, an element X ∈ i∈χ(h) Li can be completed ¯ ⊗ In ) · (X + 0) = 0 (where we write to an element in the range of C pre iﬀ (ϕ(h) X + 0 for embedding into the larger ⊕ri=1 Vi ). Eﬃcient creation of ⎧ ⎫ ⎨ ⎬ Lh = X ∈ Li | (ϕ(h) ¯ ⊗ In ) · (X + 0) = 0 ⎩ ⎭ i∈χ(h)

is done adapting standard techniques [3, 19, 24] by splitting the codeword in two and looking for all collisions. Suppose that h = h0 + h1 with χ(h0 ) ∩ χ(h1 ) = ∅, and deﬁne, for j = 0, 1 ⎫ ⎧ ⎨ ⎬ ˜ hj = (Xj , (ϕ(h L ¯ j ) ⊗ In ) · (Xj + 0)) | Xj ∈ Li . ⎩ ⎭ i∈χ(hj )

˜ h0 , (X1 , Y1 ) ∈ Then Lh consists of the elements X0 + X1 for which (X0 , Y0 ) ∈ L ˜ ˜ Lh1 , and Y0 = Y1 . By sorting the two L ’s the time complexity of creating Lh ˜ h0 , L ˜ h1 , and Lh is then roughly the maximum cardinality of the three sets L involved. It therefore clearly pays dividends to minimize the Hamming weights of h0 and h1 , which is done by picking a codeword h ∈ C ⊥ of minimum distance d⊥ and splitting it (almost) evenly. For an MDS code, we know that d⊥ = k + 1. As a result, if h attains this, the map C pre is injective when restricted to i∈χ(h) Vi (or else the minimum distance

108

¨ O. Ozen, T. Shrimpton, and M. Stam Algorithm 1 (Preimage attack against MDS-based schemes). Input: H = KPb ([r, k, d]2e ), block size n with b|n and target digest Z ∈ {0, 1}rn . Output: A preimage W ∈ {0, 1}tn such that Hn (W ) = Z. 1. Query Phase. Deﬁne n

rn

rn

X = ({0} b − ek × {0, 1} ek )e and, for i= 1, . . . , r let Qi = X ⊂ Vi . Query fi on all xi ∈ Qi . Keep a list Li of all partial preimages xi ∈ Qi satisfying fi (xi ) = yi . 2. First Merge Phase. Find a nonzero codeword h ∈ C ⊥ of minimum Hamming weight d⊥ . Let h = h0 + h1 with χ(h0 ) ∩ χ(h1 ) = ∅ and of Hamming weights d⊥ /2 and d⊥ /2 respectively. Create, for j = 0, 1 ⎫ ⎧ ⎨ ⎬ ˜ h = (Xj , (ϕ(h L ¯ j ) ⊗ In ) · (Xj + 0)) | Xj ∈ Li j ⎩ ⎭ i∈χ(hj )

both sorted on their second component. 3. First Join Phase. Create Lh consisting exactly of those elements X0 +X1 ˜ h , (X1 , Y1 ) ∈ L ˜ h , and Y0 = Y1 . for which (X0 , Y0 ) ∈ L 0 1 4. Finalization. For all X ∈ Lh create the unique W corresponding to it and check whether it results in xi ∈ Li for all i= 1, . . . , r. If so, output W .

would be violated). Hence, we know that all possible preimages given all the lists Li ˜ h . We can ﬁnalize by simply are represented by the partial preimages contained in L r ˜ checking for all elements in Lh whether its unique completion to X ∈ i=1 Vi corresponds to xi ∈ Li for all i= 1, . . . , r (where checking for i ∈ χ(h) can be omitted). The complete preimage-ﬁnding algorithm is given in Algorithm 1. Reinterpreting the example. Let us revisit our preimage attack example on H = KP([5, 3, 3]4 ) to see how it ﬁts within the general framework. In the example we more or less magically came up with the relation x1 ⊕ x2 ⊕ x3 ⊕ x4 = 0. We can now appreciate that this constraint is really imposed by the dual codeword h = (1 1 1 1 0). Thus our example corresponds to Algorithm 1 with χ(h0 ) = {1, 2} and χ(h1 ) = {3, 4} (leading to a completely even division). Note that one can also perform the attack based on other dual codewords of minimum distance, for instance h = 1 w w2 0 1 . These two minimum distance dual codewords can easily be found based on the given systematic generator matrix G = [Ik |P ] of the original code. Namely, the dual codeword in the (j−k)th row of the corresponding generator matrix of the dual code G⊥ = [P T |Ir−k ] is used as h to check the membership for the list Lj for j > k. (In general ﬁnding a minimum distance codeword might be more involved, but the dimensions are suﬃciently small to allow exhaustive search.)

Attacking the Knudsen-Preneel Compression Functions

109

Analysis of the preimage attack. We proceed with the analysis of the generic preimage attack by providing the justiﬁcations of our claims and the overall time and memory complexities. We initially maintain d⊥ in the expressions for future use (when discussing non-MDS codes). The proof of Theorem 4 (together with that of Theorem 5) is given in Appendix B. Theorem 4. Let H = KPb ([r, k, d]2e ) be given and let d⊥ be the minimum distance of the dual code of C. Suppose C is MDS and consider the preimage attack described in Algorithm 1 run against Hn using q = 2rn/k queries (|Qi | = 2rn/k ). Then the expected number of preimages output equals one and the expectations for the internal list sizes are: |Li | = 2

(r−k)n k

, |Lh | = 2

(d⊥ (r−k)−r)n k

˜ h0 | = 2 , |L

d⊥ 2

(r−k)n k

˜ h1 | = 2 , |L

d⊥ 2

(r−k)n k

.

The average case time and memory complexity (expressed in the number cn-bit blocks) of the algorithm is O(2αn ) and O(2βn ) respectively where (substituting d⊥ = k + 1) α = max

r k+1 , k 2

r−k k

which for d = 3 simpliﬁes to α =

r k

,r − k − 1

=1+

2 k

,

k+1 β= 2

and β ≤

r−k k

k+1 k .

In our attack, we set the number of queries as suggested by a yield-based bound. Hence, as long as this ﬁrst querying phase is dominating, we know that our attack is optimal, as in the case for example against KP([5, 3, 3]4 ). When the querying phase is not dominating (indicated by a gap between the time complexities of our attack and the lower bounds given in Tables 1 and 2) further improvements might be possible. 5.3

Extending the Attack to Non-MDS Constructions

For non-MDS codes we can try to mount the preimage attack given by Algorithm 1, but in the Finalization we encounter a problem. Since d⊥ < k + 1 for non-MDS codes, the map C pre restricted to ⊕i∈χ(h) Vi is no longer injective and we can no longer reconstruct a unique W corresponding to some X ∈ Lh . There are two possible ﬁxes to this problem. One is to simply merge as yet unused lists Li into Lh until reconstruction does become unique. We will refer to this as Algorithm 1 . However, a more eﬃcient approach is to perform a second stage of merging and joining. In Algorithm 2 we simply paste in extra Merge and Join phases in order to maintain the low complexity. We have only included one extra mergejoin phase for non-MDS codes. For the parameters proposed by Knudsen and Preneel, this will always suﬃce. For other parameters possibly extra merge-join phases are required before full rank is achieved, we did not investigate this.

110

¨ O. Ozen, T. Shrimpton, and M. Stam Algorithm 2 (Preimage attack against non-MDS-based schemes). Input: H = KPb ([r, k, d]2e ), block size n with b|n and target digest Z ∈ {0, 1}rn . Output: A preimage W ∈ {0, 1}tn such that Hn (W ) = Z. 1. Query Phase. As in Algorithm 1. 2. First Merge Phase. As in Algorithm 1. 3. First Join Phase. As in Algorithm 1. 4. Second Merge Phase. Find a codeword h ∈ C ⊥ \ 2e h of minimum Hamming weight (possibly exceeding d⊥ ). Let h = h0 + h1 with χ(h0 ) ∩ χ(h1 ) = ∅, χ(h1 ) ∩ χ(h) = ∅, and of Hamming weights yet to be determined. Create ⎧ ⎫ ⎨ ⎬ ˜ h = L X0 , (ϕ(h ¯ 0 ) ⊗ In ) · (X0 + 0) | X0 ∈ Lh + Li 0 ⎩ ⎭ i∈χ(h0 )\χ(h) ⎧ ⎫ ⎨ ⎬ ˜ h = ) · (X1 + 0) | X1 ∈ X ¯ ) ⊗ I L L . 1 , (ϕ(h i n 1 1 ⎩ ⎭ i∈χ(h1 )

5. Second Join Phase. Create Lh consisting exactly of those elements ˜ h , (X1 , Y1 ) ∈ L ˜ h , and Y0 = Y1 . X0 + X1 for which (X0 , Y0 ) ∈ L 0 1 6. Finalization. For all X ∈ Lh create the unique W corresponding to it and check whether it results in xi ∈ Li for all i= 1, . . . , r. If so, output W .

Analysis of the attack. Although the addition of one extra round of Mergeing and Joining sounds relatively simple, the analysis of it is slightly tedious, mainly because the ﬁrst joining creates some asymmetry between the lists (that was not present before). We note that in Theorem 5 below the value i for which T1 attains its minimum really only has a choice of two, but its algebraic optimization would not ease readability and obscure the underlying meaning. Note that for the memory analysis, we use a modiﬁed version of the algorithm, that is memory-optimized without aversely aﬀecting the running time. Because it is less clear from the theorem what the actual cardinalities will end up being (and consequently which step will be dominating), Table 3 summarizes the relevant quantities for the four non-MDS compression functions KP([r, k, d]4 ) suggested by Knudsen and Preneel [10]. Only for the [9, 5, 4]4 code the second stage dominates the overall runtime. Theorem 5. Let [r, k, d] ∈ {[8, 5, 3], [12, 9, 3], [9, 5, 4], [16, 12, 4]} be given with a generator matrix G for [r, k, d]4 (as given by Magma’s BKLC routine); let d⊥ be the minimum distance of the dual code of C. For H = KP([r, k, d]4 ) consider the preimage attack described in Algorithm 2 run against Hn using q = 2rn/k queries (|Qi | = 2rn/k ). Then, the expected number of preimages output equals

Attacking the Knudsen-Preneel Compression Functions

111

Table 3. Preimage attacks on the KP compression functions based on 2n → n PuRFs and non-MDS-codes Code [r, k, d]2e d [8, 5, 3]4 [12, 9, 3]4 [9, 5, 4]4 [16, 12, 4]4

⊥

4 7 4 11

Cardinalities related to Overall Alg. 1 our attack ˜ h | |L ˜ h | |Lh | max |L ˜ h |, |L ˜ h | |Lh | Time Memory Time |Qi | |L 0 1 0 1 28n/5 24n/3 29n/5 24n/3

26n/5 2n 8n/5 2 25n/3

26n/5 24n/3 28n/5 22n

24n/5 2n 7n/5 2 27n/3

27n/5 24n/3 211n/5 27n/3

2n 2n 22n 22n

28n/5 24n/3 211n/5 27n/3

26n/5 2n 8n/5 2 25n/3

22n 22n 23n 23n

one and the expectations for the internal list sizes are for the ﬁrst merge-join are as before (see Theorem 4) and for the second merge-join phase ˜ h |, |L ˜ h |) ≤ 2T1 n , min(|L ˜ h |, |L ˜ h |) ≤ 2T2 n , |Lh | ≤ 2(r−k−2)n max(|L 0 1 0 1 where T1 =

min

i∈{0,...,k−d⊥ +1}

max{

(k − d⊥ + 2 − i)(r − k) ((i + d⊥ )(r − k) − r) , } k k

and T2 = r − k + kr − 2 − T1 . The expected time complexity of the algorithm is a small constant multiple of ˜ h1 |, |Lh |, |L ˜ h |, |L ˜ h |, |Lh | max q, |L 0 1 ˜ h0 |, min(|L ˜ h |, |L ˜ h |) (expressed in requiring expected memory around max |L 0 1 the number cn-bit blocks). Choice of code. Our attacks against the four non-MDS codes were based on the generator matrix given by Magma’s BKLC routine. It is conceivable that diﬀerent, non-equivalent codes perform diﬀerently under our attack. Most importantly, they might not have the same d⊥ which will certainly change some of the cardinalities involved in our attack. Although this does not automatically means the attack becomes faster or slower, it is certainly a possibility. We note that there is a trivial bound d⊥ ≤ k (or else the code would be MDS), but in none of the four cases we achieved this bound. Stronger bounds on d⊥ might be possible by extending the recently developed primal-dual distance bounds [11] to the 4 setting.

6

Conclusion

In this paper, we provide a new security analysis of the KP construction by directly addressing its conjectured preimage-resistance security. Firstly, we describe an attack taking into account both query and time-complexities. Our

112

¨ O. Ozen, T. Shrimpton, and M. Stam

attacks demonstrate that the conjectured lower bound by Knudsen and Preneel is incorrect and exemplify that security bounds derived from the number of active functions can be misleading. Secondly, we determine a lower bound on the query complexity for a computationally unbounded adversary to successfully ﬁnd preimages. This shows that the query complexity of our new attack is essentially optimal (up to a small factor). Moreover, we can conclude that the time complexity of our new preimage-ﬁnding attack is optimal for 9 out of the 16 schemes. For the remaining seven schemes we leave a gap between the information-theoretic lower bound and the real-life upper bound. Acknowledgement. We thank the anonymous referees for their comments, in particular pointing out the work of Watanabe [25].

References 1. Black, J., Rogaway, P., Shrimpton, T.: Black-box analysis of the block-cipher-based hash-function constructions from PGV. In: Yung [26], pp. 320–335 2. Camion, P., Patarin, J.: The Knapsack Hash Function proposed at Crypto 1989 can be broken. In: Davies, D.W. (ed.) EUROCRYPT 1991. LNCS, vol. 547, pp. 39–53. Springer, Heidelberg (1991) 3. Chose, P., Joux, A., Mitton, M.: Fast correlation attacks: An algorithmic point of view. In: Knudsen, L.R. (ed.) EUROCRYPT 2002. LNCS, vol. 2332, pp. 209–221. Springer, Heidelberg (2002) 4. Dunkelman, O. (ed.): FSE 2009. LNCS, vol. 5665. Springer, Heidelberg (2009) 5. Fleischmann, E., Gorski, M., Lucks, S.: On the security of Tandem-DM. In: Dunkelman [4], pp. 84–103 6. Fleischmann, E., Gorski, M., Lucks, S.: Security of cyclic double block length hash functions. In: Parker [14], pp. 153–175 7. Knudsen, L., Muller, F.: Some attacks against a double length hash proposal. In: Roy, B.K. (ed.) ASIACRYPT 2005. LNCS, vol. 3788, pp. 462–473. Springer, Heidelberg (2005) 8. Knudsen, L.R., Preneel, B.: Hash functions based on block ciphers and quaternary codes. In: Kim, K.-c., Matsumoto, T. (eds.) ASIACRYPT 1996. LNCS, vol. 1163, pp. 77–90. Springer, Heidelberg (1996) 9. Knudsen, L.R., Preneel, B.: Fast and secure hashing based on codes. In: Kaliski Jr., B.S. (ed.) CRYPTO 1997. LNCS, vol. 1294, pp. 485–498. Springer, Heidelberg (1997) 10. Knudsen, L.R., Preneel, B.: Construction of secure and fast hash functions using nonbinary error-correcting codes. IEEE Transactions on Information Theory 48(9), 2524–2539 (2002) 11. Matsumoto, R., Kurosawa, K., Itoh, T., Konno, T., Uyematsu, T.: Primal-dual distance bounds of linear codes with application to cryptography. IEEE Transactions on Information Theory 52(9), 4251–4256 (2006) 12. Nandi, M., Lee, W., Sakurai, K., Lee, S.: Security analysis of a 2/3-rate double length compression function in black-box model. In: Gilbert, H., Handschuh, H. (eds.) FSE 2005. LNCS, vol. 3557, pp. 243–254. Springer, Heidelberg (2005) ¨ 13. Ozen, O., Stam, M.: Another glance at double-length hashing. In: Parker [14], pp. 176–201

Attacking the Knudsen-Preneel Compression Functions

113

14. Parker, M.G. (ed.): Cryptography and Coding 2009. LNCS, vol. 5921. Springer, Heidelberg (2009) 15. Peyrin, T., Gilbert, H., Muller, F., Robshaw, M.: Combining compression functions and block cipher-based hash functions. In: Lai, X., Chen, K. (eds.) ASIACRYPT 2006. LNCS, vol. 4284, pp. 315–331. Springer, Heidelberg (2006) 16. Preneel, B., Govaerts, R., Vandewalle, J.: Hash functions based on block ciphers: A synthetic approach. In: Stinson, D.R. (ed.) CRYPTO 1993. LNCS, vol. 773, pp. 368–378. Springer, Heidelberg (1994) 17. Rogaway, P., Shrimpton, T.: Cryptographic hash-function basics: Deﬁnitions, implications and separations for preimage resistance, second-preimage resistance, and collision resistance. In: Roy, B.K., Meier, W. (eds.) FSE 2004. LNCS, vol. 3017, pp. 371–388. Springer, Heidelberg (2004) 18. Rogaway, P., Steinberger, J.: Security/eﬃciency tradeoﬀs for permutation-based hashing. In: Smart, N.P. (ed.) EUROCRYPT 2008. LNCS, vol. 4965, pp. 220–236. Springer, Heidelberg (2008) 19. Schroeppel, R., Shamir, A.: A T = O(2n/2 ), S = O(2n/4 ) algorithm for certain NP-complete problems. SIAM Journal on Computing 10, 456–464 (1981) 20. Seurin, Y., Peyrin, T.: Security analysis of constructions combining FIL random oracles. In: Biryukov, A. (ed.) FSE 2007. LNCS, vol. 4593, pp. 119–136. Springer, Heidelberg (2007) 21. Shrimpton, T., Stam, M.: Building a collision-resistant compression function from non-compressing primitives. In: Aceto, L., Damg˚ ard, I., Goldberg, L.A., Halld´ orsson, M.M., Ing´ olfsd´ ottir, A., Walukiewicz, I. (eds.) ICALP 2008, Part II. LNCS, vol. 5126, pp. 643–654. Springer, Heidelberg (2008) 22. Stam, M.: Beyond uniformity: Better security/eﬃciency tradeoﬀs for compression functions. In: Wagner, D. (ed.) CRYPTO 2008. LNCS, vol. 5157, pp. 397–412. Springer, Heidelberg (2008) 23. Stam, M.: Block cipher based hashing revisited. In: Dunkelman [4], pp. 67–83 24. Wagner, D.: A generalized birthday problem. In: Yung [26], pp. 288–303 25. Watanabe, D.: A note on the security proof of Knudsen-Preneel construction of a hash function (unpublished manuscript) (2006), http://csrc.nist.gov/groups/ST/hash/documents/WATANABE_kp_attack.pdf 26. Yung, M. (ed.): CRYPTO 2002. LNCS, vol. 2442. Springer, Heidelberg (2002) 27. Yuval, G.: How to swindle Rabin. Cryptologia 3, 187–189 (1979)

A

Illustrating the KP-Transform: KP1 ([5, 3, 3]22 )

Consider the compression function H = KP([5, 3, 3]4 ). This builds a 6n → 5n compression function using ﬁve underlying PuRFs each mapping 2n → n (so the rate is (2 · 3 − 5)/5 = 1/5). The preprocessing function C pre of H is deﬁned by a generator matrix G of the code [5, 3, 3]4 . Using the G proposed in [10] and deﬁning ϕ by (that is also the one given in [10]) 00 10 11 01 ϕ(0) = , ϕ(1) = , ϕ(w) = and ϕ(w2 ) = . 00 01 01 11

¨ O. Ozen, T. Shrimpton, and M. Stam

114

we get ⎞ 1000001010 ⎜0 1 0 0 0 0 0 1 0 1⎟ ⎟ ⎜ ⎜0 0 1 0 0 0 1 0 1 1⎟ ⎟ . ⎜ =⎜ ⎟ ⎜0 0 0 1 0 0 0 1 0 1⎟ ⎝0 0 0 0 1 0 1 0 0 1⎠ 0000010111 ⎛

⎛

⎞ 1001 1 G = ⎝ 0 1 0 1 w ⎠ and Cpre 0 0 1 1 w2

Therefore, given W ∈ {0, 1}6n , Hn computes the digest Z ∈ {0, 1}5n as follows: 1. Compute X ← (Cpre ⊗ In ) · W ; 2. Parse X = (xi )i=1...5 and for i = 1...5 compute yi = fi (xi ); 3. Parse (yi )i=1...5 = Y and output Z = (I5 ⊗In )·Y , equivalently Z = y1 ||...||y5 .

B

Runtime Analysis (Proof of Theorems 4 and 5)

We will proceed step by step to prove our claims. The ﬁrst three steps are common for the MDS and non-MDS case (where for MDS codes d⊥ = k + 1). The remaining steps are treated separately. In the complexity estimations below, we concentrate on expected values and largely ignore (the eﬀects of) polynomial factors in n (e.g. due to memory access). Throughout memory is measured in multiples of cn-bit blocks. 1. Query Phase. The time complexity of this step is simply 2rn/k PuRF evaluations as q = 2rn/k . Per Li we need q/2n = 2(r−k)n/k memory. 2. First Merge Phase. The main computational part of this step is the ˜ hj for j = 0, 1. The time required for generating L ˜ h0 and generation of the lists L |χ(h )| 0 ˜ h1 essentially equals their respective sizes, namely |Li | L and |Li ||χ(h1 )| . We left i unspeciﬁed since all the Li should be about the same size, namely |Li | ≈ 2(r−k)n/k . Since by construction, |χ(h0 )| = d⊥ /2 and |χ(h1 )| = d⊥ /2, ⊥ ⊥ the relevant cardinalities become 2(r−k)d /2n/k and 2(r−k)d /2 n/k . This is clearly dominated by the latter. 3. First Join Phase. This step constructs Lh by ﬁnding collisions between ˜ ˜ h1 in their second components. Since |L ˜ h0 | · |L ˜ h1 | ≈ 2d⊥ (r−k)n/k and Lh0 and L ⊥ we are interested in collisions on rn/k bits, we have |Lh | ≈ 2(d (r−k)−r)n/k (which equals 2(r−k−1)n for MDS codes, given that d⊥ = k + 1). Each colliding element can be forwarded directly to the next step eliminating a need to store Lh . Moreover, the collision search can be performed in conjunction with step 2 ˜ h0 and checking (and processing) collisions on the ﬂy when generating storing L ˜ h1 . This way the memory requirements are reduced to |L ˜ h0 | ≈ 2(r−k)d⊥ /2n/k . L 4. Second Merge Phase and 5. Second Join Phase (for non-MDS codes). For this scenario we restrict to the four codes suggested by Knudsen and Preneel. For the (chosen) systematic generator matrices we can always ﬁnd (by inspection) h, h ∈ C ⊥ with the property that h has minimal weight, {1, . . . , k} ⊂

Attacking the Knudsen-Preneel Compression Functions

115

χ(h) ∪ χ(h ) (so we reach reach full rank and can Finalize afterwards) and the number of i ∈ χ(h ) for which i ∈ / χ(h) equals k − d⊥ + 2. As a result, the relation deﬁned by h (and thus the second phase) will involve / χ(h) and i ∈ χ(h )) with |Li | = k − d⊥ + 2 ‘fresh’ lists Li (those for which i ∈ (r−k)n/k (d⊥ (r−k)−r)n/k 2 , as well as Lh , for which |Lh | = 2 . Hence, regardless of the way of Mergeing and Joining there will be 2

(d⊥ (r−k)−r)n k

·2

(k−d⊥ +2)(r−k)n k

= 2(r−k+ k −2)n r

elements in total to be checked for collisions. As in the previous step, collisions will be searched for on rn/k bits. This leads to a list Lh of roughly |Lh | = 2(r−k−2)n elements at the end of Second Join Phase. To minimize the complexity of the merging phase, we need to ﬁnd the sets r χ(h0 ) and χ(h1 ) such that χ(h0 ) ∩ χ(h1 ) = ∅ and the full 2(r−k+ k −2)n elements (involved in the merging) are distributed as evenly as possible without violating the constraints imposed by the asymmetric list sizes. ˜ h | = 2αn and |L ˜ h | = 2βn for α and β to be determined. The Suppose |L 0 1 condition χ(h) ∩ χ(h1 ) = ∅ implies that Lh is assigned to h0 ; assume that i further fresh Li are used for h0 . This automatically means that |χ(h1 )| = ˜ h | = 2(k−d⊥ +2−i)(r−k)n/k and |L ˜ h | = (k − d⊥ + 2 − i) and furthermore that |L 1 0 ⊥

2i(r−k)n/k+(d (r−k)−r)n/k , implying α = ((d⊥ + i)(r − k) − r)/k. Given a particular i, the merging time will be governed by the maximum of α and β whereas the storage requirement is similarly the minimum of that pair. In order to optimize the overall time complexity, we take the minimum (of the maximum just mentioned) over all i and denote the value by T1 and, for the value i used, denote by T2 the corresponding ‘memory’-minimum. Note that T1 + T2 = r − k + kr − 2. Collision ﬁnding can then be performed in 2T1 n time with a memory requirement of roughly 2T2 n . 6. Finalization. For each element in Lh (resp. in Lh for non-MDS codes) we need to perform a simple check (that we assume costs unit time and constant memory). For MDS codes, after the First Join Phase, we have that Lh has size roughly |Lh | = 2(r−k−1)n . For the non-MDS case, we have already shown that |Lh | = 2(r−k−2)n (at least for the four non-MDS codes provided by Knudsen and Preneel). Picking up the obtained complexities for the various steps gives the desired overall complexity.

Finding Preimages of Tiger Up to 23 Steps Lei Wang1 and Yu Sasaki2 1

The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, Tokyo, 182-8585 Japan [email protected] 2 NTT Information Sharing Platform Laboratories, NTT Corporation 3-9-11 Midori-cho, Musashino-shi, Tokyo, 180-8585 Japan [email protected]

Abstract. This paper evaluates the preimage resistance of the Tiger hash function. We will propose a pseudo-preimage attack on its compression function up to 23 steps with a complexity of 2181 , which can be converted to a preimage attack on 23-step Tiger hash function with a complexity of 2187.5 . The memory requirement of these attacks is 222 words. Our pseudo-preimage attack on the Tiger compression function adopts the meet-in-the-middle approach. We will divide the computation of the Tiger compression function into two independent parts. This enables us to transform the target of finding a pseudo-preimage to another target of finding a collision between two independent sets of some internal state, which will reduce the complexity. In order to maximize the number of the attacked steps, we derived several properties or weaknesses in both the key schedule function and the step function of the Tiger compression function, which gives us more freedom to separate the Tiger compression function. Keywords: Tiger, hash function, meet-in-the-middle, preimage attack, independent chunks.

1

Introduction

Tiger is a cryptographic hash function designed by Anderson and Biham [1]. It adopts the well-known Merkle-Damg˚ ard structure, and produces 192-bit hash digests. Throughout this paper, “Tiger” and “tiger” are referred to as the Tiger hash function and the Tiger compression function respectively. This paper will evaluate the preimage resistance of Tiger. If Tiger is secure, it should take no less than 2192 tiger computations to ﬁnd a preimage of a given hash digest. At WEWoRC 2007, Indesteege et al. proposed a preimage attack on Tiger reduced to 13 steps with a complexity of 2128.5 [2], where the full version of Tiger consists of 24 steps. At FSE 2009, Isobe et al. published another preimage attack on Tiger, which extended the number of the attacked steps to 16 with a complexity of 2161 and a memory requirement of 232 words [3]. At AFRICACRYPT 2009, Mendel published his preimage attack on Tiger up to 17 steps with a complexity of 2185 and a memory requirement of 2160 words [4]. S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 116–133, 2010. c International Association for Cryptologic Research 2010

Finding Preimages of Tiger Up to 23 Steps

117

Besides preimage resistance, cryptographers also pay attention to the collision resistance of Tiger. Several papers have been published to evaluate the collision resistance of Tiger [5] [6] [7]. Here we only point out that the maximum number of the attacked steps of Tiger in the sense of collision resistance is 19 [6]. Our contributions. This paper will propose a preimage attack on Tiger up to 23 steps with a complexity of 2187.5 tiger computations, which are lower than the exhaustive search complexity. This attack is based on a meet-in-the-middle pseudo-preimage attack on tiger with a complexity of 2181 . The memory requirement of the above attacks is 222 words. A comparison with previous related works is shown in Table 1. Table 1. Comparison with previous preimage attacks Reference #steps Complexity Memory requirement [2] 13 2128.5 Negligible [3] 16 2161 232 [4] 17 2185 2160 187.5 This paper 23 2 222

The applicability of the meet-in-the-middle pseudo-preimage attack on tiger essentially depends on the existence of two sets of message words independent from each other and suitable for applying the attack. This paper denotes the independent sets of message words as independent chunks. If such independent chunks do exist as a matter of fact (maybe cryptographers have not found them yet), the preimage resistance of Tiger will surely be broken by the meet-inthe-middle attack. In order to evaluate the maximum number of the attacked steps, we exploit all the properties we found on tiger. From its key schedule function, we derive several properties which can be adopted to make message words independent from each other. In speciﬁc, we use the following properties. 1) Bit-shift operations eliminate some information. This gives more freedom to search for independent chunks. 2) In our attack, we add several least signiﬁcant bits of two variables and several most signiﬁcant bits of the same two variables. The large word-size (64 bits) helps us to make these two additions independent because the carry from the lower bits is hard to transmit to the upper bits due to the large number of intermediate bits. 3) Even if Tiger uses addition, subtraction, and XOR as its operations, they can be linearized by setting conditions, and it is possible to cancel two diﬀerent operations. From its step function, we ﬁnd several properties that enable us to make the related techniques work for more steps. Finally we ﬁnd the independent chunks that can be applied for a preimage attack on Tiger reduced to 23 out of 24 steps. Organization of the paper. Section 2 describes the speciﬁcation of Tiger. Section 3 introduces the meet-in-the-middle preimage attack procedure on Tiger. Section 4 shows our independent chunks feasible up to 23 steps. Section 5 illustrates the preimage attack procedure. Section 6 gives a conclusion.

118

2

L. Wang and Y. Sasaki

Specification of Tiger

An input message M of Tiger will be padded and then divided into 512-bit message blocks {M0 , M1 , . . . , Ml−1 }. The padding rule is simple: ﬁrst add a single ‘1’, then add a minimum number of ‘0’s to make the bit length become 448 modulo 512, and ﬁnally add the bit length of the original M to the last 64 bits. Message blocks will be fed into tiger sequentially from M0 until Ml−1 as follows: hi+1 ← tiger(hi , Mi ), for i = 0, 1, . . . , l − 1, where h0 is a public constant and each hi from {h0 , . . . , hl } has 192 bits. hl is the hash digest of M . Specification of tiger. The inputs hi and Mi are divided into 64-bit variables, denoted as (A0 , B0 , C0 ) and (X0 , X1 , . . . , X7 ) respectively. Correspondingly addition, subtraction, and multiplication are carried out with modulo 264 . Hereafter we will omit the description “modulo 264 ” for simplicity. tiger consists of 24 step functions, regrouped into three 8-step passes. The step function at step t (1 ≤ t ≤ 24) is as follows, which is also shown in Fig. 1:

Fig. 1. Step function

At = (Bt−1 + odd(Ct−1 ⊕ Xt−1 )) × st−1 , Bt = Ct−1 ⊕ Xt−1 , Ct = At−1 − even(Ct−1 ⊕ Xt−1 ), where st−1 is a constant, even(·) and odd(·) are two non-linear functions based on S-boxes, and Xt−1 (9 ≤ t ≤ 24) is derived from {X0 , X1 , . . . , X7 }. The constant st−1 diﬀers for each pass, which is 5, 7, and 9 for the ﬁrst, second, and third passes respectively. Details of even(·) and odd(·) are as follows: even(C) = T0 (c[0]) ⊕ T1 (c[2]) ⊕ T2 (c[4]) ⊕ T3 (c[6]), odd(C) = T3 (c[1]) ⊕ T2 (c[3]) ⊕ T1 (c[5]) ⊕ T0 (c[7]), where each T from {T0 , T1 , T2 , T3 } is a S-box mapping 8-bit values to 64-bit values, and the input C is divided to 8 bytes (c7 , c6 , · · · , c0 ) with c7 as the most signiﬁcant byte and c0 as the least signiﬁcant byte.

Finding Preimages of Tiger Up to 23 Steps

119

The variables of {X8 , . . . , X23 } are derived from {X0 , . . . , X7 } by computing a Key Schedule Function (KSF ): (X8 , . . . , X15 ) = KSF (X0 , . . . , X7 ), (X16 , . . . , X23 ) = KSF (X8 , . . . , X15 ). Here we will pick (X8 , · · · , X15 ) as an example to describe the details of KSF . Y0 = X0 − X7 ⊕ const1 ,

X8 = Y0 + Y7 ,

Y1 = X1 ⊕ Y0 , Y2 = X2 + Y1 ,

X9 = Y1 − (X8 ⊕ ((¬Y7 ) 19)), X10 = Y2 ⊕ X9 ,

Y3 = X3 − (Y2 ⊕ ((¬Y1 ) 19)),

X11 = Y3 + X10 ,

Y4 = X4 ⊕ Y3 , Y5 = X5 + Y4 ,

X12 = Y4 − (X11 ⊕ ((¬X10 ) 23)), X13 = Y5 ⊕ X12 ,

Y6 = X6 − (Y5 ⊕ ((¬Y4 ) 23)), Y7 = X7 ⊕ Y6 ,

X14 = Y6 + X13 , X15 = Y7 − X14 ⊕ const2 ,

where const1 and const2 are 0xA5A5A5A5A5A5A5A5 and 0x0123456789ABCDEF respectively, and ¬ means bitwise complement. KSF is invertible. We will denote by KSF −1 the inverse computation of KSF in this paper. Finally the output hi+1 is computed as follows: hi+1 = (A24 ⊕ A0 )||(B24 − B0 )||(C24 + C0 ).

3

Meet-in-the-Middle Preimage Attack on Tiger

This section introduces the application of a meet-in-the-middle attack procedure, which was proposed by Aoki et al. [8], to preimage attacks on Tiger. Isobe et al.’s preimage attack on Tiger up to 16 steps adopted this meet-in-the-middle attack approach [3]. 3.1

Notations

The notations in Table 2 are used to explain the meet-in-the-middle preimage attack procedure. We will describe (Xi2 , Xj ) as independent words, and (X 2 , X ) as independent chunks. Xt ∈ X 2 sometimes is denoted as Xt2 for simplicity. Similarly Xt , Xt,2 and Xt∗ denote Xt ∈ X , Xt ∈ X ,2 and Xt ∈ X ∗ respectively. During the independent computations of E 2 and E , the internal states will be denoted as (A2 , B 2 , C 2 ) and (A , B , C ) correspondingly. 3.2

Meet-in-the-Middle Preimage Attacks on Tiger

A preimage attack on Tiger is constructed by combining a meet-in-the-middle pseudo-preimage attack on tiger and a meet-in-the-middle attack on MerkleDamg˚ ard structure.

120

L. Wang and Y. Sasaki Table 2. Notations for our meet-in-the-middle preimage attack

Xi2 , Xj : X 2: X : X ,2 : X ∗: E 2: E :

Two message words whose values change independently. A set of message words which change with only Xi2 . A set of message words which change with only Xj . A set of message words which change with both Xi2 and Xj . A set of message words which are fixed as constant values. Consecutive step functions with input message words from only X 2 X ∗ . Consecutive step functions with input message words from only X X ∗ .

Pseudo-preimage attacks on tiger. Tiger is designed following the DaviesMeyer scheme. Recall the structure of Davies-Meyer: h = E(M, h) ⊕ h, where E is a block cipher, M is a message block, h is the current intermediate hash value, and h is the next intermediate hash value. More precisely, M is expanded to X0 || · · · ||X23 . Note that h is not calculated by h ⊕ E(M, h) in tiger. But in this section, we regard h as h ⊕ E(M, h) for simplicity. The main novelty of the pseudo-preimage attacks on tiger is dividing X0 || · · · ||X23 into suitable independent chunks. The simplest case is X0 || · · · ||X23 −→ X ||X 2 , which is also shown in Fig. 2. The high-level description of ﬁnding a pseudo-preimage (h, M ) for a given value h is as follows. 1. Set a random value to h, which also ﬁxes the output of E as h ⊕ h. 2. For all the values of X , calculate E (h, X ), and store them in a table T . 3. For each value of X 2 , calculate E 2 (h ⊕ h , X 2 ), and compare it with all the elements in T . If it is equal to one element in T , a pseudo-preimage of h is found. 4. If no pseudo-preimage is found after trying all the values of X 2 , change the value of h at step 1, and repeat steps 2 − 4. Suppose there is enough degree of freedom for the independent chunks. The above meet-in-the-middle attack procedure only takes 296 tiger computations and 296 memories to ﬁnd a pseudo-preimage with a good probability. Moreover, the above attack procedure can be transformed to a memoryless meet-in-themiddle attack [9], where the complexity becomes 297 tiger computations. Meet-in-the-middle attacks on Merkle-Damg˚ ard [10]. Suppose ﬁnding a pseudo-preimage on tiger takes 2s tiger computations. Denote by h the given hash digest. First generate 2 m

2

192−s 2

i

192−s 2

i

pseudo-preimages of h : {(h1 , m1 ), . . . , (h2

192−s 2

,

)}, where tiger(h , m ) = h . Then randomly select a message m, calculate n−s

tiger(h0 , m), and compare it with all the values of {h1 , . . . , h2 2 }. If it is equal 192+s to hi for some i, m||mi is a preimage of h . After 2 2 m are tried, one preimage 192+s will be found with a good probability. The total complexity is 2 2 +1 tiger com192−s putations and 2 2 memories, which will be lower than the exhaustive search complexity as long as s < 190.

Finding Preimages of Tiger Up to 23 Steps

3.3

121

Related Techniques

The applicability of the meet-in-the-middle pseudo-preimage attack on tiger depends on whether suitable independent chunks exist in X0 || · · · ||X23 . The example in Section 3.2 is the simplest case. Usually the attacker has to deal with more complicated cases. Cryptographers have developed several techniques for more complicated cases. Aoki et al. proposed splice-and-cut, partial-matching and partial-fixing [8]. Sasaki et al. proposed initial-structure [11]. Splice-and-cut. This technique is based on the fact that once the value h is determined, the output of E will be ﬁxed as h ⊕ h. Therefore, the ﬁrst step and the last step of E can be regarded to be consecutive. For example, the attacker obtains the independent chunks as follows: X0 || · · · ||X23 −→ X ||X 2 ||X , which is also shown in Fig. 3. Obviously, the procedure in Section 3.2 cannot be applied directly. However, by adopting the splice-and-cut technique, the attacker will randomly determine the internal state IS, where X and X 2 separate from each other, and then compute E and E 2 independently.

Fig. 2. Simplest meet-in-the-middle attack

Fig. 3. Splice-and-cut

Fig. 4. Partial-matching and partial-fixing

Fig. 5. Initial-structure

Partial-matching and partial-fixing. These two techniques are based on the fact that the output of one step function can be partially determined with the knowledge of only part of the input. Therefore internal states at diﬀerent step positions can be partially compared if their step distance is reasonable. For instance, the attacker obtains the independent chunks as follows: X0 || · · · ||X23 −→ X ||X ,2 ||X 2 , which is also shown in Fig. 4. In such a case, during applying the attack procedure in Section 3.2, the internal state E (h, X ) is not at the same step position as the internal state E 2 (h ⊕ h , X 2 ), but with a severalstep distance. By adopting the partial-matching and partial-ﬁxing techniques, E (h, X ) and E 2 (h ⊕ h , X 2 ) will be partially compared.

122

L. Wang and Y. Sasaki

Initial-structure. We will pick an example to illustrate this technique. Suppose the attacker obtains two independent chunks as follows: X0 || · · · ||X23 −→ X || X ,2 ||X 2 ||X , which is also shown in Fig. 5. By adopting the initial-structure technique, for each value of X , the attacker generates a corresponding IS . For each value of X 2 , the attacker generates a corresponding IS 2 . Moreover, for any pair of (IS , X ) and any pair of (IS 2 , X 2 ), IS always matches with IS 2 using X ,2 . Therefore, the attacker can carry out the independent calculations E and E 2 using (IS , X ) and (IS 2 , X 2 ) respectively.

4

Our Independent Chunks

As we discussed in Section 3, one most important part of the meet-in-the-middle pseudo-preimage attack on tiger is how to separate the message words into two independent chunks (X 2 , X ), which is hard because the key schedule function of tiger is complicated. We implemented an automated independent chunk search program based on several properties of the key schedule function and the step function of tiger that we found. For the details of our program, refer to the full version of this paper [12]. This section will describe the independent chunks, which can be used to attack 23-step tiger. The independent words are X15 and X23 , which will be denoted 2 as X15 and X23 respectively. The overview of the two independent chunks is detailed in Table 3, following the notations in Table 2. Table 3. Our independent chunks X02

X12

X22

Y0,2 Y1,2 Y2

Y3∗

X4∗ Y4∗

X5∗ Y5∗

X6∗ Y6∗

X7,2 Y7,2

X9

∗ X10

∗ X11

∗ ∗ ∗ X12 X13 X14

,2 X15

Y8,2 Y92

2 Y10

2 Y11

2 Y12

2 Y14

,2 Y15

2 X18

2 X19

2 2 2 X20 X21 X22

,2 X23

X8∗ 2 X16

4.1

X3,2

2 X17

2 Y13

The Independent Chunk X

This section will explain the independence/dependence from X23 for each mes∗ ∗ sage word in detail. In this section the notation Xi (resp. Yi ) means that Xi (resp. Yi ) is independent from X23 . Roughly speaking, we will ﬁrst regard the message words X16 , . . . , X22 as independence from X23 , and then determine the relation between the other message words and X23 backwards utilizing the properties of KSF −1 . Before explaining the details for each message word, we point out that several conditions are set on the message words in order to make this chunk work, which in order to make are listed in Table 4. We can only change the 19 MSBs of X23 Y9 be independent from it. More details are given below.

Finding Preimages of Tiger Up to 23 Steps

123

Table 4. The conditions on the message words Xi,j2 −j1 (resp. Yi,j2 −j1 ) is the consecutive bits from j1 to j2 of Xi (resp. Yi ). X0,63−45 = 1 · · · 1; X1,63−45 = const1,63−45 ; X2,63−45 = 0 · · · 0; Y6,63−45 = const1,63−45 ; Y7,44−26 = 0 · · · 0 X8,63−45 = 1 · · · 1; X10,63−45 = 0 · · · 0; X14,63−45 = const2,63−45 ; Y9,63−45 = 0 · · · 0; Y14,63−45 = const1,63−45 ; X16,63−45 = 1 · · · 1; No carry occurs from bits 44 to 45 during the following computations: X16 − Y15 ; Y8 + (X15 ⊕ const1 ); X15 + (X14 ⊕ const2 ); X9 + (X8 ⊕ ((¬Y7 ) 19)); X8 − Y7 ; Y0 + (X7 ⊕ const1 ); Y2 − Y1 ; The message words (Y8 , . . . , Y15 ). Y10 , . . . , Y14 are independent from X23 −1 ∗ ∗ because they are computed by KSF using X17 , . . . , X22 . ∗ – Y15 : Y15 = X23 + (X22 ⊕ const2 ) Obviously the 19 MSBs of Y15 will change with X23 . ∗ ∗ – Y9 : Y9 = X17 + (X16 ⊕ ((¬Y15 ) 19)) Since only the 19 MSBs of Y15 change with X23 and these bits disappear after the bit-shift operation, Y9 is independent from X23 . This is also the reason why we can only change the 19 MSBs of X23 . ∗ – Y8 : Y8 = X16 − Y15 The 19 MSBs of Y8 will change with X23 . Moreover, from two conditions in Table 4: (1) X16,63−45 = 1 · · · 1; and (2) no carry occurs from bits 44 to 45 during X16 − Y15 , we can get that the 19 MSBs of Y8 are always the , which is denoted as Y8,63−45 = bitwise complement of the 19 MSBs of Y15 ¬Y15,63−45 . Hereafter we will denote all the message words, which change with X23 , as equations on Y15,63−45 . The message words (X8 , . . . , X15 ). X10 , . . . , X14 are independent from X23 −1 ∗ ∗ because they are computed by KSF using Y9 , . . . , Y14 . ∗ ⊕ Y14 – X15 : X15 = Y15 The 19 MSBs of X15 will change with X23 . From one condition in Ta ble 4: Y14,63−45 = const1,63−45 , we can get that X15,63−45 = Y15,63−45 ⊕ const1,63−45 . – X9 : X9 = Y9∗ ⊕ Y8 The 19 MSBs of X9 will change with X23 . From one condition in Table 4: Y9,63−45 = 0 · · · 0, we can get that X9,63−45 = Y8,63−45 = ¬Y15,63−45 .

124

L. Wang and Y. Sasaki

– X8 : X8 = Y8 + (X15 ⊕ const1 ) Since Y8 and X15 will only change their 19 MSBs with X23 , the 45 LSBs of X8 , namely X8,44−0 , are independent from X23 . Moreover, from one condition in Table 4: no carry occurs from bits 44 to 45 during Y8 + (X15 ⊕ const1 ), we can get that X8,63−45 = Y8,63−45 + (X15,63−45 ⊕ const1,63−45 ) = ) + (Y15,63−45 ⊕ const1,63−45 ⊕ const1,63−45 ) = 1 · · · 1. Note that (¬Y15,63−45 X8,63−45 is predetermined to be 1 · · · 1 as a condition in Table 4. Therefore X8,63−45 does not change with X23 . Finally we get that X8 is independent . from X23 beThe messag words (Y0 , . . . , Y7 ). Y3 , . . . , Y6 are independent from X23 −1 ∗ ∗ using X10 , . . . , X14 . cause they are computed by KSF ∗ – Y7 : Y7 = X15 + (X14 ⊕ const2 ) The 19 MSBs of Y7 will change with X23 . From two conditions in Table 4: (1) X14,63−45 = const2,63−45 ; and (2) no carry occurs from bits 44 to 45 during = X15,63−45 = Y15,63−45 ⊕ X15 + (X14 ⊕ const2 ), we can get that Y7,63−45 const1,63−45 . ∗ ⊕ X9 – Y2 : Y2 = X10 The 19 MSBs of Y2 will change with X23 . From one condition in Table 4: X10,63−45 = 0 · · · 0, we can get that Y2,63−45 = X9,63−45 = ¬Y15,63−45 . ∗ – Y1 : Y1 = X9 + (X8 ⊕ ((¬Y7 ) 19)) , which disappear afBecause Y7 will only change its 19 MSBs with X23 ∗ ter the bit-shift operation, X8 ⊕ ((¬Y7 ) 19) is independent from X23 . Therefore the 19 MSBs of Y1 will change with X23 . From three conditions in Table 4: (1) X8,63−45 = 1 · · · 1; (2) Y7,44−26 = 0 · · · 0; and (3) no carry occurs from bits 44 to 45 during X9 + (X8 ⊕ ((¬Y7 ) 19)), we can get Y1,63−45 = X9,63−45 = ¬Y15,63−45 . ∗ – Y0 : Y0 = X8 − Y7 The 19 MSBs of Y0 will change with X23 . From two conditions in Table 4: (1) X8,63−45 = 1 · · · 1; and (2) no carry occurs from bits 44 to 45 during X8 − Y7 , we can get that Y0,63−45 = ¬Y7,63−45 = ¬(Y15,63−45 ⊕ const1,63−45 ).

The message words (X0 , . . . , X7 ). X4 , X5 and X6 are independent from X23 , because they are computed by KSF −1 using Y3∗ , Y4∗ , Y5∗ and Y6∗ . – X7 : X7 = Y7 ⊕ Y6∗ The 19 MSBs of X7 will change with X23 . From one condition in Table 4: Y6,63−45 = const1,63−45 , we can get that X7,63−45 = Y7,63−45 ⊕ const1,63−45 = (Y15,63−45 ⊕ const1,63−45 ) ⊕ const1,63−45 = Y15,63−45 . – X3 : X3 = Y3∗ + (Y2 ⊕ (¬Y1 ) 19) Y1 only changes its 19 MSBs with X23 , which will disappear after the bit-shift . The 19 MSBs of X3 will operation. So (¬Y1 ) 19 is independent from X23

Finding Preimages of Tiger Up to 23 Steps

125

change with X23 . Here we cannot determine the relation between X3,63−45 and Y15,63−45 , but it is actually not necessary for this chunk. The reason is that step 4, where X3 is used, will be skipped by the partial-matching and partial-ﬁxing techniques. More details are shown in Section 5.3. – X2 : X2 = Y2 − Y1 Since Y2 and Y1 will only change their 19 MSBs with X23 , the 45 LSBs of X2 , namely X2,44−0 , will be independent from X23 . From one condition in Table 4: no carry occurs from bits 44 to 45 during Y2 − Y1 , we can get − Y1,63−45 = (¬Y15,63−45 ) − (¬Y15,63−45 ) = 0 · · · 0. that X2,63−45 = Y2,63−45 Note that X2,63−45 is predetermined to be 0 · · · 0 as a condition in Table 4. X2,63−45 does not change with X23 . Therefore X2 is independent from X23 . – X1 : X1 = Y1 ⊕ Y0 Similarly the 45 LSBs of X1 , namely X1,44−0 , are independent from X23 . X1,63−45=Y1,63−45 ⊕ Y0,63−45=(¬Y15,63−45 ) ⊕ (¬(Y15,63−45 ⊕ const1,63−45 )) = const1,63−45 . Note that X1,63−45 is predetermined to be const1,63−45 as a condition in Table 4, so X1,63−45 does not change with X23 . Therefore X1 is independent from X23 . – X0 : X0 = Y0 + (X7 ⊕ const1 ) Similarly X0,44−0 is independent from X23 . From one condition in Table 4: no carry occurs from bits 44 to 45 during Y0 + (X7 ⊕ const1 ), we can + (X7,63−45 ⊕ const1,63−45 )= (¬(Y15,63−45 ⊕ get that X0,63−45 = Y0,63−45 const1,63−45 )) + (Y15,63−45 ⊕ const1,63−45 ) = 1 · · · 1. Note that X0,63−45 is predetermined to be 1 · · · 1 as a condition in Table 4, so X0,63−45 is also . Therefore X0 is independent from X23 . independent from X23

4.2

The Independent Chunk X 2

2 This section will explain the independence/dependence from X15 for each mes∗ sage word in detail. In this section, the notations Xi (resp. Yi∗ ) means that 2 Xi (resp. Yi ) is independent from X15 . Roughly speaking, we will ﬁrst deﬁne 2 the message words X8 , . . . , X14 are independent from X15 , and then determine 2 the relation between the other message words and X15 backwards and forwards utilizing the properties of KSF −1 and KSF respectively. We point out that the independence/dependence of the other message words 2 from X15 is determined just following the speciﬁcations of KSF and KSF −1 . We only need to pay attention to make sure that this chunk does not inﬂuence the bit positions, where the conditions in Table 4 are set, in order to guarantee the two chunks are really independent. Note that all the conditions in Table 4 2 in order to locate at upper bits. We decide to change several lower bits of X15 avoid bit overlap at some message word. Finally we will change bits 19 − 9 of 2 X15 , namely X15,19−9 . 1 Moreover, in order to clearly make sure that this chunk 1

The reason why we choose 11 lower bits is because of our attack procedure in Section 5.4.

126

L. Wang and Y. Sasaki

will not inﬂuence the conditions in Table 4, we set several conditions on the message words to control bit-carry propagations, which are listed in Table 5. Table 5. Conditions on message words to control carry propagation X7,22 = const1,22 X8,21 = 1; X14,20 = const2,20 ; Y12,41 = 0;

Y0,22 = 0; X8,40 = 1; X14,42 = 1; Y13,42 = 0;

Y1,41 = 0; Y2,41 = 1; Y7,21 = 0; X8,20 = 1; X9,40 = 0 X10,21 = 0; X11,40 = 1; X13,41 = 0; X15,20 = 0; Y8,43 = 0; Y9,21 = 0; Y10,40 = 1; Y15,43 = 0;

In the following discussion, we will mainly pay attention to which bit positions 2 of the message words will change with X15 . 2 The message words (Y0 , . . . , Y7 ). Y2 , . . . , Y6 will not change with X15 since −1 ∗ ∗ they are computed by KSF using X9 , . . . , X14 . 2 ∗ – Y7 : Y7 = X15 + (X14 ⊕ const2 ) From two conditions in Table 5: X15,20 = 0 and X14,20 = const2,20 , no carry will occur from bits 20 to 21 no matter how X15 change its bits 19 − 9. 2 Therefore, only Y7,20−9 will change with X15 . ∗ ∗ 2 – Y1 : Y1 = X9 + (X8 ⊕ ((¬Y7 ) 19)) From two conditions in Table 5: X8,40 = 1 and Y7,21 = 0, bit 40 of X8∗ ⊕ ((¬Y72 ) 19) is 0. From another condition in Table 5: X9,40 = 0, no carry will occur from bits 40 to 41 during Y72 changes. Therefore, only Y1,40−28 2 will change with X15 . ∗ 2 – Y0 : Y0 = X8 − Y7 From two conditions in Table 5: X8,21 = 1 and Y7,21 = 0, no carry will happen from bits 21 to 22 during Y72 changes. Therefore, only Y0,21−9 will 2 change with X15 . 2 The message words (X0 , . . . , X7 ). X4 , X5 and X6 will not change with X15 −1 ∗ ∗ ∗ ∗ because they are computed by KSF using Y3 , Y4 , Y5 and Y6 .

– X7 : X7 = Y72 ⊕ Y6∗ 2 X7,20−9 will change with X15 . ∗ ∗ – X3 : X3 = Y3 + (Y2 ⊕ ((¬Y12 ) 19)) Since no condition has been set on X3 in Table 4, we do not need to pay 2 attention to which bit positions of X3 will change with X15 , but only the 2 fact that it will change with X15 . – X2 : X2 = Y2∗ − Y12 From two conditions in Table 5: Y1,41=0 and Y2,41=1, no carry will occur from 2 bits 41 to 42 during Y12 changes. Therefore X2,41−28 will change with X15 . – X1 : X1 = Y12 ⊕ Y02 2 X1,40−9 will change with X15 . 2 2 – X0 : X0 = Y0 + (X7 ⊕ const1 ) From two conditions in Table 5: Y0,22 = 0 and X7,22 = const1,22 , no carry will occur from bits 22 to 23 during Y02 and X72 change. Therefore X0,22−9 2 will change with X15 .

Finding Preimages of Tiger Up to 23 Steps

127

The message words (Y8 , . . . , Y15 ). 2 – Y8 : Y8 = X8∗ − (X15 ⊕ const1 ) From two conditions in Table 5: X8,20 = 1 and X15,20 = 0 (const1,20 = 0), 2 changes. So Y8,20−9 will no carry will occur from bits 20 to 21 during X15 2 change with X15 . – Y9 : Y9 = X9∗ ⊕ Y82 2 Y9,20−9 will change with X15 . ∗ 2 – Y10 : Y10 = X10 + Y9 From two conditions in Table 5: X10,21 =0 and Y9,21 =0, no carry will occur 2 from bits 21 to 22 during Y92 changes. So Y10,21−9 will change with X15 . ∗ 2 – Y11 : Y11 = X11 − (((¬Y92 ) 19) ⊕ Y10 ) From two conditions in Table 5: Y9,21 = 0 and Y10,40 = 1, bit 40 of ((¬Y92 ) 2 is 0. From another condition in Table 5: X11,40 = 1, no carry will 19) ⊕ Y10 2 occur from bits 40 to 41 during Y92 and Y10 change. Therefore Y11,40−9 , will 2 change with X15 . ∗ 2 – Y12 : Y12 = X12 ⊕ Y11 2 Y12,40−9 will change with X15 . ∗ 2 – Y13 : Y13 = X13 + Y12 From two conditions in Table 5: X13,41 = 0 and Y12,41 = 0, no carry will occur 2 2 changes. So Y13,41−9 will change with X15 . from bits 41 to 42 during Y12 ∗ 2 2 – Y14 : Y14 = X14 − (Y13 ⊕ ((¬Y12 ) 23)) 2 2 ⊕ ((¬Y12 ) 23) is 0. From one condition in Table 5: Y13,42 = 0, bit 42 of Y13 From another condition in Table 5: X14,42=1, no carry will occur from bits 42 2 2 2 and Y13 change. Therefore Y14,42−0 will change with X15 . to 43 during Y12 2 2 – Y15 : Y15 = X15 ⊕ Y14 2 Y15,42−0 will change with X15 .

The message words (X16 , . . . , X23 ). We do not need to pay attention to 2 , but which bit positions of the message words X17 , . . . , X23 will change with X15 2 only the fact that these message words will change with X15 . 2 – X16 : X16 = Y82 + Y15 From two conditions in Table 5: Y8,43 = 0 and Y15,43 = 0, no carry will occur 2 from bits 43 to 44 during Y82 and Y15 change. Therefore X16,43−0 will change 2 with X15 .

4.3

Summary of Our Independent Chunks

We will ﬁrst ﬁnd a message block that can satisfy all the conditions in Tables 2 4 and 5. Then we will change the 19 MSBs of X23 and bits 19 − 9 of X15 independently to apply a pseudo-preimage attack on tiger.

5

Preimage Attack on 23-Step Tiger

This section will propose a pseudo-preimage attack on tiger up to 23 steps, which will be converted to a preimage attack on 23-step Tiger. The overview of the attack has been shown in Table 6. The attack target is the ﬁrst 23 steps. Hence, X23 is erased from Table 6.

128

L. Wang and Y. Sasaki Table 6. Overview of our pseudo-preimage attack on tiger X02 X12 X22 −→ −→ −→ |

X3,2

X4∗ X5∗ X6∗ partial-matching

X7,2

,2 ∗ ∗ ∗ ∗ ∗ X11 X12 X13 X14 X15 X8∗ X9 X10 | ←− ←− E calculations ←− | initialfor X chunk structure 2 2 2 2 2 2 2 X17 X18 X19 X20 X21 X22 X16 2 | −→ E calculations −→ −→ for X 2 chunk

5.1

Precomputation

Before starting the pseudo-preimage attack on tiger, we need to ﬁnd a message block X0 || · · · ||X7 , which can satisfy all the conditions in Tables 4 and 5. The total number of the conditions in these two tables is 237. But the complexity of searching such a message block will be greatly reduced by the message modiﬁcation technique. Moreover, we stress that the precomputation will be executed only once during the pseudo-preimage attack on tiger. The search procedure is as follows. 1. Randomly choose a message block and modify X0 , X1 , X2 and X7 to satisfy the conditions on them. 2. Modify Y0 to satisfy the condition Y0,22 = 0, and then inversely calculate X0 without changing the other message words. Due to the long bit distance from bits 22 to 45, the conditions on X0 will not be inﬂuenced with an overwhelming probability. 3. Similarly make the conditions on Y1 and Y2 be satisﬁed by modifying X1 , X2 and X3 . 4. Modify Y6 to satisfy the conditions, and then inversely compute X6 . 5. Make the conditions on Y7 be satisﬁed by modifying Y6 and X6 . 6. Note that the conditions on X8 in Table 4 will be automatically satisﬁed if both X0 and Y6 satisfy the conditions on them. 7. The remaining conditions will be satisﬁed by the exhaustive search. In total there are 115 conditions, which will be satisﬁed by the exhaustive search at step 7. Although more conditions can be satisﬁed by applying message modiﬁcation, we will not discuss more about the precomputation due to limited space and the fact 2115 2192 . 5.2

Apply the Initial-Structure Technique at Step 16

We will illustrate how the initial-structure technique works at step 16 of tiger, which is also shown in Fig. 6. Recall that the 19 MSBs of X15 will change with the X chunk, its bits 19 − 9 will change with the X 2 chunk, and the other bits

Finding Preimages of Tiger Up to 23 Steps

129

wll be constant. Let the 19 MSBs, the 20 LSBs and the intermediate 25 bits of ,2 ,2 2 X15 be X15 , X15 and α respectively. Then X15 is written as (X15 ||0(45) ) ⊕ 2 (0(19) ||α||X15 ), where 0(b) represents b-bit sequential ‘0’s. We can analyze the impact to step 16 from X and from X 2 independently. We ﬁrst ﬁx the constant numbers const, const , and const marked in Fig. , we 6 to randomly chosen values. Then, every time we obtain the value of X15 compute (A 15 , B15 , C15 ) ← (const, const , (const ⊕ (X15 0(45) ))). 2 Similarly, every time we obtain the value of X15 , we compute 2 temp ← const ⊕ (0(19) α X15 ), 2 2 , B , C ) ← ((const + odd(temp)) × 7, temp, const − even(temp)). (A2 16 16 16 2 2 2 Finally, we can compute (A 15 , B15 , C15 ) and (A16 , B16 , C16 ) independently even though X15 are aﬀected by both chunks.

Fig. 6. Initial-structure at step 16

5.3

Apply the Partial-Matching and Partial-Fixing Techniques at Steps 8−4

2 2 We will illustrate how to partially compare (A2 3 , B3 , C3 ) with (A8 , B8 , C8 ) using the partial-matching and partial-ﬁxing techniques, which is also shown in Fig. 7. The main idea is that for both E and E 2 , the value of A5 will be partially computed. With this idea, we can compare the 45 LSBs of A2 5 and of A5 .

– For the E 2 computation, since only the 19 MSBs of X3 change with X , bits 44 − 0 of X3,2 are known. At step 4, we compute bits 44 − 0 of B42 . Then we guess byte 6 of X3 ⊕ C3 , namely bits 55 − 48, and compute C42 . At step 5, we compute bits 44 − 0 of A2 5. – For the E computation, we can compute A 5 easily from step 8.

130

L. Wang and Y. Sasaki

Fig. 7. Partial-matching and partial-fixing for steps 8-4

5.4

Pseudo-preimage Attack on Tiger

1. Generate a message block satisfying all the conditions in Tables 4 and 5. The details have been shown in Section 5.1. 2. Set const, const and const in Fig. 6 to random values. , 3. For all the values of the 19 MSBs of X23 (a) Compute the value of all Xi ∈ X and partial value of all Xi ∈ X ,2 , ,2 namely, all bits of X9 and partially-known bits of X15 , X7,2 and ,2 X3 . Then, compute the corresponding IS , that is (A15 , B15 , C15 ). The details have been explained in Section 5.2. to obtain the (b) From (A 15 , B15 , C15 ) and X14 , X13 , . . . , X8 , compute E value of (A8 , B8 , C8 ). (c) By following the backward computation of the partial-matching and partial-ﬁxing techniques explained in Section 5.3, compute the values of A 5 . (d) Store (X , A 5 , A8 , B8 , C8 ) in a table T . 2 4. For all the values of bits 19 − 9 of X15 ,

(a) Compute the value of all Xi ∈ X 2 and partial value of all Xi ∈ X ,2 , 2 2 2 , X17 , . . . , X22 , X02 , X12 , X22 and partially-known namely, all bits of X16 ,2 ,2 ,2 bits of X3 , X7 , and X15 . Then, compute the corresponding IS 2 , 2 2 that is (A2 16 , B16 , C16 ). The details have been explained in Section 5.2. 2 2 2 (b) From (A16 , B16 , C16 ) and X16 , X17 , . . . , X22 , X0 , X1 , X2 , compute E 2 to 2 2 obtain the value of (A2 3 , B3 , C3 ). ,2 2 2 (c) At Step 4, we know all bits of (A2 . We 3 , B3 , C3 ) and the 45 LSBs of X3 ,2 2 compute the 45 LSBs of B4 by C3 ⊕ X3 . Then, we exhaustively guess byte 6 of X3,2 , which are 8 bits (bits 48−55) of X3,2 . Based on each guessed value, we compute even(·) function and obtain all bits of C42 .

Finding Preimages of Tiger Up to 23 Steps

131

2 (d) At Step 5, we compute the 45 LSBs of A2 5 by using the 45 LSBs of B4 2 and all bits of C4 . (e) Check whether or not the obtained bit-values of A2 5 will match one A5 in T . (f) If it matches, recover the value of X3,2. Then, obtain the values of 2 2 2 2 2 (A2 8 , B8 , C8 ) by updating (A3 , B3 , C3 ) with recovered X3 and already ﬁxed X4 , X5 , . . . , X7 , and check whether or not the remaining 147 bits match or not. (g) If all bits match, the corresponding M and A0 , B0 , C0 is a valid pseudopreimage with a probability 2−8 (the success probability of guess at Step 4c). If a matched pair does not exist for all the degree of freedom, we will change the value at step 2, and repeats steps 3 and 4.

5.5

The Complexity of Our Pseudo-preimage Attack on Tiger

We regard one tiger computation as a unit. Step 1. The complexity is 2115 . This step will be executed only once. Since 2115 2192 , we will ignore the complexity of step 1. Step 2. Negligible. Step 3a. The complexity is 219 computations of KSF . 7 . Step 3b. The complexity is 219 × 23 3 19 Step 3c. The complexity is 2 × 23 . Step 3d. The memory requirements is 222 message words. (X consists of 4 ,2 words X9 , X15 , X7,2 , X3,2 ). Step 4a. The complexity is 211 computations of KSF . Step 4b. The complexity is 211 × 10 23 . 1 Step 4c. The complexity is 219 × 23 . 1 19 Step 4d. The complexity is 2 × 23 . Step 4e. Negligible. With the complexity less than 219 , we can compare 238 pairs and will ﬁnd 2−7 pairs that match the 45 LSBs. Note that each guess at Step 4c has a success probability 2−8 . Therefore, by repeating the steps 2−4 of the above procedure 2162 (= 2192−45+7+8 ) times, we expect to obtain a pseudo-preimage. Finally the complexity of ﬁnding a pseudo-preimage for Tiger compression function is 2181 (= 219 · 2162 ). The dominant memory use is 222 words at Step 3d. 5.6

Preimage Attack on Tiger

Our pseudo-preimage attack on tiger can be converted to a preimage attack on Tiger adopting the meet-in-the-middle attack on Merkle-Damg˚ ard structure detailed in Section 3.2. The complexity is 2187.5 and the memory requirement is 222 words. Note that we have to ﬁx bit 56 of X6 to be ‘1’ and the 9 LSBs of X7 to be binary-encoding of 447 in order to make the bit length matched.

132

6

L. Wang and Y. Sasaki

Open Discussion and Conclusion

Compared with the MD4-family, Tiger uses a stronger key schedule function, a stronger step function, but a smaller number of steps. However, based on our analyses, we found several properties of both the key schedule function and the step function, which can be used for the preimage attack. For the key schedule function, we found the following properties. – Bit-shift is easily used to introduce independence of computations. – The large word size is suitable to make upper and lower bits independent with respect to carry. – Mixing the use of addition, subtraction and XOR does not introduce enough non-linearity. They can be linearized by setting conditions. For the step function, we found the following properties. – Even though the whole internal state is updated at each step function, a part of internal state (At , Bt ) are updated by using independent values (only even bytes and only odd bytes of Ct−1 ⊕ Xt−1 ). – Tiger’s S-boxes are so-called target heavy, which map 8-bit values to 64-bit values. This enables us to obtain the knowledge of a large number of bits by only guessing the values of a small number of bits, and later eﬃciently ﬁnd out the correct guesses by matching the large bits. So far, we have not found a preimage attack on full-step Tiger yet.2 However, by considering the future attack improvement, the number of steps seems a little bit small with respect to the preimage resistance. For the conﬁdence of security, we suggest that the number of steps should be increased.

7

Conclusion

This paper presented a meet-in-the-middle pseudo-preimage attack on tiger up to 23 steps with a complexity of 2181 . This was converted to a preimage attack on 23-step Tiger with a complexity of 2187.5 . The memory requirement of our attacks is 222 words. Acknowledgments. The authors would like to thank Kazuo Ohta, Kazuo Sakiyama and anonymous reviewers for their valuable comments.

References 1. Anderson, R., Biham, E.: Tiger: A Fast New Hash Function. In: Gollmann, D. (ed.) FSE 1996. LNCS, vol. 1039, pp. 89–97. Springer, Heidelberg (1996) 2. Indesteege, S., Preneel, B.: Preimages for Reduced-Round Tiger. In: Lucks, S., Sadeghi, A.-R., Wolf, C. (eds.) WEWoRC 2007. LNCS, vol. 4945, pp. 90–99. Springer, Heidelberg (2008) 2

We notice that recently Guo et al. announced that they found a preimage attack on full-step Tiger [13].

Finding Preimages of Tiger Up to 23 Steps

133

3. Isobe, T., Shibutani, K.: Preimage Attacks on Reduced Tiger and SHA-2. In: Dunkelman, O. (ed.) FSE 2009. LNCS, vol. 5665, pp. 139–155. Springer, Heidelberg (2009) 4. Mendel, F.: Two Passes of Tiger Are Not One-Way. In: Preneel, B. (ed.) AFRICACRYPT 2009. LNCS, vol. 5580, pp. 29–40. Springer, Heidelberg (2009) 5. Kelsey, J., Lucks, S.: Collisions and Near-Collisions for Reduced-Round Tiger. In: Robshaw, M.J.B. (ed.) FSE 2006. LNCS, vol. 4047, pp. 111–125. Springer, Heidelberg (2006) 6. Mendel, F., Preneel, B., Rijmen, V., Yoshida, H., Watanabe, D.: Update on Tiger. In: Barua, R., Lange, T. (eds.) INDOCRYPT 2006. LNCS, vol. 4329, pp. 63–79. Springer, Heidelberg (2006) 7. Mendel, F., Rijmen, V.: Cryptanalysis of the Tiger Hash Function. In: Kurosawa, K. (ed.) ASIACRYPT 2007. LNCS, vol. 4833, pp. 536–550. Springer, Heidelberg (2007) 8. Aoki, K., Sasaki, Y.: Preimage Attacks on One-Block MD4, 63-Step MD5 and More. In: Avanzi, R.M., Keliher, L., Sica, F. (eds.) SAC 2008. LNCS, vol. 5381, pp. 103–119. Springer, Heidelberg (2009) 9. Morita, H., Ohta, K., Miyaguchi, S.: A Switching Closure Test to Analyze Cryptosystems. In: Feigenbaum, J. (ed.) CRYPTO 1991. LNCS, vol. 576, pp. 183–193. Springer, Heidelberg (1992) 10. Menezes, A.J., van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography. CRC Press, Boca Raton (1997) 11. Sasaki, Y., Aoki, K.: Finding Preimages in Full MD5 Faster than Exhaustive Search. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 134–152. Springer, Heidelberg (2009) 12. Wang, L., Sasaki, Y.: Finding Preimages of Tiger Up to 23 Steps (full version of this paper), http://www.oslab.ice.uec.ac.jp/member/wang/ 13. Guo, J., Ling, S., Rechberger, C., Wang, H.: Advanced Meet-in-the-Middle Preimage Attacks: First Results on Full Tiger, and Improved Results on MD4 and SHA-2, http://eprint.iacr.org/2010/016.pdf

Cryptanalysis of ESSENCE Mar´ıa Naya-Plasencia1, Andrea R¨ ock2, , Jean-Philippe Aumasson3, , 1 Yann Laigle-Chapuy , Ga¨etan Leurent4 , Willi Meier5,† , and Thomas Peyrin6 2

1 INRIA project-team SECRET, France Aalto University School of Science and Technology, Finland 3 Nagravision SA, Cheseaux, Switzerland 4 ´ Ecole Normale Sup´erieure, Paris, France 5 FHNW, Windisch, Switzerland 6 Ingenico, France

Abstract. ESSENCE is a hash function submitted to the NIST Hash Competition that stands out as a hardware-friendly and highly parallelizable design. Previous analysis showed some non-randomness in the compression function which could not be extended to an attack on the hash function and ESSENCE remained unbroken. Preliminary analysis in its documentation argues that it resists standard diﬀerential cryptanalysis. This paper disproves this claim, showing that advanced techniques can be used to signiﬁcantly reduce the cost of such attacks: using a manually found diﬀerential characteristic and an advanced search algorithm, we obtain collision attacks on the full ESSENCE-256 and ESSENCE512, with respective complexities 267.4 and 2134.7 . In addition, we show how to use these attacks to forge valid (message, MAC) pairs for HMACESSENCE-256 and HMAC-ESSENCE-512, essentially at the same cost as a collision. Keywords: cryptanalysis, hash functions, SHA-3.

1

Introduction

Since the results [1,2,3,4] on the two most deployed hash functions, MD5 and SHA-1, recent years have seen a surge of research on cryptographic hashing. The consequent lack of conﬁdence in the current NIST standard SHA-2 [5], stemming from its similarity with those algorithms, motivated NIST to launch the NIST Hash Competition, a public competition to develop a new hash standard, which will be called SHA-3 and announced by 2012 [6,7]. NIST received 64 submissions, accepted 51 as ﬁrst round candidates, and in July 2009, selected 14 second round

†

This work is supported in part by European Commission through the ICT programme under contract ICT-2007-216676 ECRYPT II and by the French Agence Nationale de la Recherche under contract ANR-06-SETI-013-RAPIDE. The work was started during my PhD at INRIA project-team SECRET, France. Work done while this author was with FHNW, Switzerland, and supported by the Swiss National Science Foundation under project no. 113329. ¨ STIFTUNG, project no. GRS-069/07. Supported by GEBERT RUF

S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 134–152, 2010. c International Association for Cryptologic Research 2010

Cryptanalysis of ESSENCE

135

candidates [7,8]. That competition catches the attention not only from many academics, but also from industry—with candidates from IBM, Hitachi, Intel, Sony—and from governmental organizations. ESSENCE [9,10] was a ﬁrst round candidate in the NIST Hash Competition that like many others has two main instances, operating on 32- and 64-bit words, respectively: ESSENCE-256 and ESSENCE-512. These functions process messages using a binary tree structure, and use a simple compression algorithm based on two nonlinear feedback shift registers (NFSR’s). This paper presents collision attacks on the full hash functions ESSENCE-256 and ESSENCE-512. At the heart of our attacks is a single diﬀerential characteristic, found manually. Our main technical achievement is an original method for searching inputs conforming to this characteristic at a reduced cost. Supplementary, we describe how to use these attacks for forging valid message/MAC pairs for HMAC-ESSENCE-256 and HMAC-ESSENCE-512 in far fewer than 2n/2 trials. These ﬁndings show that ESSENCE does not satisfy the security requirements set by NIST for the future SHA-3. In a parallel work, Mouha et al. [11] presented results on reduced versions of ESSENCE, including a pseudo-collision attack on ESSENCE-512 reduced to 31 steps. They exploited a diﬀerential characteristic of a diﬀerent type than ours, and also use diﬀerent techniques to search for conforming inputs. Preprints of [11] and of the present paper were published simultaneously in June 2009, and in July ESSENCE was not selected as a second round candidate by NIST. The rest of the paper is organized as follows: §2 brieﬂy introduces ESSENCE; §3 describes our method for searching collisions and its complexity analysis; §4 shows how to attack the HMAC construction when instantiated with ESSENCE, and ﬁnally §5 concludes.

2

Brief Description of ESSENCE

We give a brief description of the ESSENCE hash functions, which should be suﬃcient to understand our attacks. A complete speciﬁcation can be found in [9,10]. Henceforth statements of (non) linearity are with respect to the ﬁeld GF(2) = {0, 1} and its extensions. 2.1

Structure

ESSENCE processes a message by constructing a balanced binary tree of bounded depth whose leaves correspond to calls to a compression function with message chunks as input. The size of the message chunks and the height of the tree are tunable parameters. More precisely, each leaf corresponds to a hash done by a Merkle Damg˚ ard (MD) construction [12,13] and a unique initial value for each leaf that depends on several parameters of the hash function. Likewise, nodes correspond to a combination by a MD construction of the childrens chaning value and a unique IV. After creation of all tree roots, one appends a final block to the data to be hashed. This block contains parameters of the function as well as messagedependent information, and it potentially assists to prevent near-collision attacks.

136

M. Naya-Plasencia et al.

2.2

Compression Function

The compression function of ESSENCE takes as input an eight-word chaining value and an eight-word message block. Words are 32-bit for ESSENCE-256 and 64-bit for ESSENCE-512, so blocks are respectively 256- and 512-bit. Versions of ESSENCE with 224- and 384-bit digests are derived from the main instances by tweaking parameters and truncation of the ﬁnal digest. The compression function uses two NFSR’s, each operating on a register of eight words: • r = (r0, . . . , r7) is initialized with the chaining value, and • k = (k0, . . . , k7) is initialized with the message block. At each step of the compression algorithm, the mechanism in Fig. 1 is clocked using a nonlinear bitwise function F (see Fig. 2), and a linear function L that provides diﬀusion across word slices. Let us consider the feedback in more details for the example of the register handling the message. The word k7 is combined by XOR with F (k6, k5, k4, k3, k2, k1, k0) and L(k0). Thus, the nonlinear function F is inﬂuenced by the seven words k0, . . . , k6, any diﬀerence in k7 is forwarded directly and any diﬀerence δ in k0 is transformed into a diﬀerence L(δ). The register initialized by the chaining value employes almost the same feedback function. The only diﬀerence is that at each step we combine in addition the word k7 from the second register.

Fig. 1. Overview of the ESSENCE compression function logic

F (a, b, c, d, e, f, g) = abcdef g + abcdef + abcef g + acdef g + abceg + abdef + abdeg + abef g + acdef + acdf g + acef g + adef g + bcdf g + bdef g + cdef g + abcf + abcg + abdg + acdf + adef + adeg + adf g + bcde + bceg + bdeg + cdef + abc + abe + abf + abg + acg + adf + adg + aef + aeg + bcf + bcg + bde + bdf + beg + bf g + cde + cdf + def + deg + df g + ad + ae + bc + bd + cd + ce + df + dg + ef + f g + a+b+c+f +1 Fig. 2. The F function of ESSENCE, which takes seven words as input and operates in a bit sliced way (that is, the i-th bit of the output word only depends on the i-th bits of the input words)

Cryptanalysis of ESSENCE

137

The documentation of ESSENCE recommends at least 24 steps, and sets 32 steps in the actual submission for extra precaution [10, §4]. The whole mechanism deﬁnes a permutation and the compression function returns as new chaining value the XOR of the r register with its initial value, as in the Davies-Meyer scheme.

3

Collision Attacks on ESSENCE

Table 1 presents a diﬀerential characteristic for ﬁnding collisions on the compression function of ESSENCE. It is used for both ESSENCE-256 and ESSENCE512. We found this characteristic manually, i.e., without the assistance of any automated search. Because it has no input diﬀerence in the chaining value, it can directly be used for searching colliding message blocks with respect to the same chaining value. The collision attack will then consist of 1. Finding one message block that fulﬁlls the characteristic on the right part. 2. Trying chaining values until one conforms to the characteristic on the left part. For the second phase of the attack, distinct pseudorandom chaining values are obtained by picking a ﬁrst pseudorandom (sequence of) message block(s), and then checking diﬀerences after the insertion of the next message block. This allows us to ﬁnd a collision for the full hash function. The subsequent sections work out the details of the attack as follows: • §3.1 explains how the characteristic works. • §3.2 presents an eﬃcient method for ﬁnding a message block that conforms to the characteristic. • §3.3 discusses computation of the complexity; contrary to many similar differential attacks, an approximation solely based on Hamming weights is insuﬃcient to obtain accurate probability estimates. Actually such heuristics underestimate the actual complexity of the basic attack, as we will see later. Thereafter we use the following notations: ∨ for logical OR between two bits (or two words); ∧ for logical AND; ¬ for bitwise negation; |w| for the Hamming weight of word w; wi for the i-th bit of word w, 0 ≤ i < 32 for ESSENCE-256, and 0 ≤ i < 64 for ESSENCE-512. 3.1

The Diﬀerential Characteristic

The diﬀerential characteristic in Table 1 starts with a diﬀerence in the message block, and no diﬀerence in the chaining value. To follow the characteristic, the only assumption that we make is that the function F will “absorb” certain diﬀerences (actually most of them) and “preserve” some others (at step 11). Therefore, the probability that a randomly chosen input conforms to the differential characteristic essentially depends on the Hamming weight of the word wise diﬀerences α and β = L(α). Critical steps are listed below:

138

M. Naya-Plasencia et al.

Table 1. Diﬀerential characteristic for ﬁnding collisions on (both versions of) ESSENCE; α, β and γ are diﬀerences such that β = L(α), γ = L(β) and α∨β∨γ = α∨β. A “·” denotes an absence of diﬀerence. Values in the column “Pr” are heuristic approximations of the probability to reach the next diﬀerence (exact probabilities signiﬁcantly diﬀer, and can be estimated empirically, cf. §3.3). Pr.

1 −|α|

2 2−|α| 2−|α| 2−|α| 2−|α| 2−|α| 2−|α| 1 1 1 1 1 1 1 1 1 2−|α| 2−|α| 2−|α| 2−|α| 2−|α| 2−|α| 2−|α| 1 1 1 1 1 1 1 1 1

Chaining value part

Message part

r7

r6

r5

r4

r3

r2

r1

r0

· · · · · · · · α · · · · · · · · · · · · · · · α · · · · · · · ·

· · · · · · · α · · · · · · · · · · · · · · · α · · · · · · · · ·

· · · · · · α · · · · · · · · · · · · · · · α · · · · · · · · · ·

· · · · · α · · · · · · · · · · · · · · · α · · · · · · · · · · ·

· · · · α · · · · · · · · · · · · · · · α · · · · · · · · · · · ·

· · · α · · · · · · · · · · · · · · · α · · · · · · · · · · · · ·

· · α · · · · · · · · · · · · · · · α · · · · · · · · · · · · · ·

· α · · · · · · · · · · · · · · · α · · · · · · · · · · · · · · ·

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pr.

k7

k6

k5

k4

k3

k2

k1

k0

α β · · · · · · α · · · · · · · α β · · · · · · α · · · · · · · α

β · · · · · · α · · · · · · · α β · · · · · · α · · · · · · · α ?

· · · · · · α · · · · · · · α β · · · · · · α · · · · · · · α ? ?

· · · · · α · · · · · · · α β · · · · · · α · · · · · · · α ? ? ?

· · · · α · · · · · · · α β · · · · · · α · · · · · · · α ? ? ? ?

· · · α · · · · · · · α β · · · · · · α · · · · · · · α ? ? ? ? ?

· · α · · · · · · · α β · · · · · · α · · · · · · · α ? ? ? ? ? ?

· α · · · · · · · α β · · · · · · α · · · · · · · α ? ? ? ? ? ? ?

2−|β| 2−|α| 2−|α| 2−|α| 2−|α| 2−|α| 2−|α| 2−|α| 1 2−|α| 2−|α∨β| 2−|α∨β| 2−|α∨β| 2−|α∨β| 2−|α∨β| 2−|α∨β| 2−|β| 2−|α| 2−|α| 2−|α| 2−|α| 2−|α| 2−|α| 2−|α| 1 1 1 1 1 1 1 1 1

• Step 0: α is fed back to r0 via an XOR and it does not enter F , unlike β. To ensure that no diﬀerence appears in the output of F , we need all the |β| bit diﬀerences be absorbed, which is expected to occur with probability 2−|β| (such heuristic estimates should not be used systematically, as discussed later). • Step 1: the relation β = L(α) makes diﬀerences introduced in r0 vanish. This always works, but we also need that α adds no diﬀerence, that is, F needs to absorb |α| bit diﬀerences, thus the probability 2−|α| on both parts.

Cryptanalysis of ESSENCE

139

• Steps 2 to 7: we assume again that the |α| diﬀerences introduced in F are absorbed. • Step 8: the two α diﬀerences cancel out in the middle of the mechanism, but α is also fed back to k0. • Step 9: unlike as in step 1, α introduces a diﬀerence L(α) = β in k0, which propagates during steps 11 to 17. • Step 10: to avoid the introduction of new diﬀerences, we need the output of F to have diﬀerences L(β) = γ, in order for the diﬀerences to vanish in the feedback operation. This is only possible if α ∨ β ∨ γ = α ∨ β. As we will see later, to avoid impossibilities in the diﬀerential characteristic, we also have to add the condition γ ∧ α ∧ ¬β = 0. • Steps 16 to 24: the characteristic is the same as in steps 0 to 8. • Steps 25 to 32: note that diﬀerences in the right side do not aﬀect the value returned by the compression function after 32 steps. We thus put no condition on those particular diﬀerences. After ﬁnding this generic characteristic, it remains to search for an α that minimizes the cost of the attack. But before that, we present a generic method for ﬁnding a message block conforming to the right part of the characteristic. 3.2

Eﬃcient Search for a Conforming Block

Once we have found low-weight α, β = L(α) and γ = L(β) such that α∨β ∨γ =α∨β

and

γ ∧ α ∧ ¬β = 0 ,

the complexity of ﬁnding a conforming block by repeated trials is heuristically 215|α|+2|β|+6|α∨β| . This complexity is well above the birthday bound 2n/2 for all diﬀerences we found, let alone the fact that it underestimates the real complexity. For example, for the diﬀerence that we use to attack ESSENCE-256, the above expression yields a complexity 2210 , whereas a birthday attack needs only 2128 trials. Strategy. To ﬁnd a conforming block at a reduced cost, we use an “inside-out” strategy similar in spirit to that of the rebound attack [14], namely, we start by ﬁnding conforming values for the low-probability characteristic in the middle, then we check that they follow the simpler characteristic in both directions. What we call the middle part corresponds to steps 8 to 17, inclusive. More precisely, we 1. Find many values that conform to the middle part (i.e., steps 8 to 17); 2. Search, among those values, one that conforms to the diﬀerential characteristic in steps 0 to 8, and 17 to 24 (any such value then follows the characteristic up to step 32). We need to ﬁnd approximately 214|α|+|β| messages in the ﬁrst phase, in order to have a conforming one with high probability in the second phase. Below we expose our strategy for eﬃciently ﬁnding many values conforming to the characteristic between steps 8 and 17.

140

M. Naya-Plasencia et al. Table 2. Message part in steps 8-17 8 9 10 11 12 13 14 15 16 17

x0 ⊕ α x1 x2 x3 x4 x5 x6 x7 x8 ⊕ α x9 ⊕ β

x1 x2 x3 x4 x5 x6 x7 x8 ⊕ α x9 ⊕ β x10

x2 x3 x4 x5 x6 x7 x8 ⊕ α x9 ⊕ β x10 x11

x3 x4 x5 x6 x7 x8 ⊕ α x9 ⊕ β x10 x11 x12

x4 x5 x6 x7 x8 ⊕ α x9 ⊕ β x10 x11 x12 x13

x5 x6 x7 x8 ⊕ α x9 ⊕ β x10 x11 x12 x13 x14

x6 x7 x8 ⊕ α x9 ⊕ β x10 x11 x12 x13 x14 x15

x7 x8 ⊕ α x9 ⊕ β x10 x11 x12 x13 x14 x15 x16 ⊕ α

Notations. To describe the state during the middle part: in Table 2 each xj corresponds to a 32 or 64-bit word, depending on the version used. We write S the set of all indices where α ∨ β is nonzero, that is, S = {i, 0 ≤ i < 32, αi ∨ βi = 1} S = {i, 0 ≤ i < 64, αi ∨ βi = 1}

for ESSENCE-256, for ESSENCE-512.

We write s = |α ∨ β| = |S| the cardinality of S. For example, if α = 80000000 and β = 00000004, then α31 = β2 = 1, and so S = {2, 31} and s = 2. We also write for the word bit length (32 or 64, depending on the version of ESSENCE). Eﬃcient Search. To search for values conforming to the middle part, we ﬁrst look at an arbitrary slice i, and we count the number of possible tuples (x1 , . . . , x15 )i that fulﬁll the characteristic between steps 8 and 17. This corresponds to all tuples that satisfy the following equations: F (x1 , x2 , x3 , x4 , x5 , x6 , x7 )i F (x2 , x3 , x4 , x5 , x6 , x7 , x8 )i F (x3 , x4 , x5 , x6 , x7 , x8 , x9 )i F (x4 , x5 , x6 , x7 , x8 , x9 , x10 )i F (x5 , x6 , x7 , x8 , x9 , x10 , x11 )i F (x6 , x7 , x8 , x9 , x10 , x11 , x12 )i F (x7 , x8 , x9 , x10 , x11 , x12 , x13 )i F (x8 , x9 , x10 , x11 , x12 , x13 , x14 )i F (x9 , x10 , x11 , x12 , x13 , x14 , x15 )i

= F (x1 , x2 , x3 , x4 , x5 , x6 , x7 )i = F (x2 , x3 , x4 , x5 , x6 , x7 , x8 ⊕ α )i = F (x3 , x4 , x5 , x6 , x7 , x8 ⊕ α, x9 ⊕ β )i ⊕ γi = F (x4 , x5 , x6 , x7 , x8 ⊕ α, x9 ⊕ β, x10 )i = F (x5 , x6 , x7 , x8 ⊕ α, x9 ⊕ β, x10 , x11 )i = F (x6 , x7 , x8 ⊕ α, x9 ⊕ β, x10 , x11 , x12 )i = F (x7 , x8 ⊕ α, x9 ⊕ β, x10 , x11 , x12 , x13 )i = F (x8 ⊕ α, x9 ⊕ β, x10 , x11 , x12 , x13 , x14 )i = F (x9 ⊕ β, x10 , x11 , x12 , x13 , x14 , x15 )i

This property is only interesting for i ∈ S, since for i ∈ S there are no diﬀerences. For slices such that γi = 1, we need to have a diﬀerence in F as well, to erase γi . Table 3 reports the number of solutions for the xi ’s depending on (αi , βi , γi ). As we will see later (Tab. 4), in the case (1, 0, 1) there is no tuple satisfying the whole diﬀerential characteristic, thus this case will not be used.

Cryptanalysis of ESSENCE

141

Table 3. Number of solutions for the (x1 , . . . , x15 ) depending on the input diﬀerences γi

(0, 1)

0 1

96 128

(αi , βi ) (1, 0) (1, 1) 96 120

96 176

Then, for each slice i ∈ S we ﬁx one of these tuples and try to compute the missing bits. The number of possibilities to choose the tuples for i ∈ S is Nα = 96|α∧¬β∧¬γ| × 96|α∧β∧¬γ| × 96|¬α∧β∧¬γ| × 176|α∧β∧γ| × 128|¬α∧β∧γ| . Note that to follow the characteristic, the equations below (directly derived from the ESSENCE mechanism) must hold: s bits ﬁxed s bits ﬁxed L( x7 ) = x0 ⊕ x8 ⊕ F (x1 , x2 , x3 , x4 , x5 , x6 , x7 ) (1) s bits ﬁxed s bits ﬁxed L( x8 ) = x1 ⊕ x9 ⊕ F (x2 , x3 , x4 , x5 , x6 , x7 , x8 ⊕ α ) (2) s bits ﬁxed s bits ﬁxed ) = x2 ⊕ x10 ⊕ F (x3 , x4 , x5 , x6 , x7 , x8 ⊕ α, x9 ⊕ β ) ⊕ γ (3) L( x9 s bits ﬁxed s bits ﬁxed L( x10 ) = x3 ⊕ x11 ⊕ F (x4 , x5 , x6 , x7 , x8 ⊕ α, x9 ⊕ β, x10 ) (4) s bits ﬁxed s bits ﬁxed ) = x4 ⊕ x12 ⊕ F (x5 , x6 , x7 , x8 ⊕ α, x9 ⊕ β, x10 , x11 ) (5) L( x11 s bits ﬁxed s bits ﬁxed x12 ) = x5 ⊕ x13 ⊕ F (x6 , x7 , x8 ⊕ α, x9 ⊕ β, x10 , x11 , x12 ) (6) L( s bits ﬁxed s bits ﬁxed L( x13 ) = x6 ⊕ x14 ⊕ F (x7 , x8 ⊕ α, x9 ⊕ β, x10 , x11 , x12 , x13 ) (7) s bits ﬁxed s bits ﬁxed L( x14 ) = x7 ⊕ x15 ⊕ F (x8 ⊕ α, x9 ⊕ β, x10 , x11 , x12 , x13 , x14 ) (8) s bits ﬁxed s bits ﬁxed L( x15 ) = x16 ⊕ x8 ⊕ F (x9 ⊕ β, x10 , x11 , x12 , x13 , x14 , x15 ) (9) The bits ﬁxed in x1 , . . . , x15 are those in slices i ∈ S. Consider new intermediate variables R8 ,R9 ,. . . ,R14 corresponding to the value of the right hand sides of Eq. (2)-(8). Each of these equations corresponds to a linear system L(xj ) = Rj , for j in {8, . . . , 14}. These are systems of equations between bits, wherein 2s variables are ﬁxed and 2( − s) variables are free. Due to the linearity, we can rewrite them as

142

M. Naya-Plasencia et al.

L(xj,S ) ⊕ Rj,S = L(xj,S ) ⊕ Rj,S ,

(10)

where S is the complement of the set S and xj,T is the vector (xj,i ) with values 0 for i not in T ∈ {S, S}, thus xj,S = xj ∧ (α ∨ β) and xj,S = xj ∧ ¬(α ∨ β). The position of the free variables depends only on S. We can therefore perform a Gaussian elimination once for all on the left hand side of Equation (10). We have more equations than free variables, so if the system is of maximal rank, we obtain 2s − equations which must be satisﬁed by the ﬁxed variables in order for solutions to exist. For our seven linear systems we have in total 7(2s − ) equations. Thus for any choice of (x1 , . . . , x15 ) ﬁxed at i ∈ S we have a probability 2−7(2s−) of ﬁnding a valid solution for all the 7 systems. Once we know that our choice corresponds to a solution, we can compute eﬃciently the remaining bits of xj , Rj , j ∈ {8, . . . , 14} by the other 7(2 − 2s) equations of the Gaussian elimination. To ﬁnd a solution x0 , . . . , x16 which satisﬁes the middle part, one thus proceeds as follows: 1. Fix the s bits in x1 , . . . , x15 to one of the Nα admissible values; 2. Try to solve the linear systems L(xj ) = Rj , for j in {8, . . . , 14}. If there is no solution, go back to step 1. Once a solution to the seven systems is found, we have all bits of xj , Rj ﬁxed for j in {8, . . . , 14}. In x1 , . . . , x7 , x15 we only have the s bit ﬁxed from the previous step, and in x0 , x16 no bits at all are ﬁxed. 3. Freely choose the value of the ( − s) remaining bits in x7 since modifying those bits does not aﬀect the previous steps. 4. We have now to consider the system R8 = x1 ⊕ x9 ⊕ F (x2 , x3 , x4 , x 5 , x6 , x7 , x8 )

(11)

R9 = x2 ⊕ x10 ⊕ F (x3 , x4 , x5 , x6 , x7 , x8 , x9 ) R10 = x3 ⊕ x11 ⊕ F (x4 , x5 , x6 , x7 , x8 , x9 , x10 )

(12) (13)

R11 = x4 ⊕ x12 ⊕ F (x5 , x6 , x7 , x8 , x9 , x10 , x11 ) R12 = x5 ⊕ x13 ⊕ F (x6 , x7 , x8 , x9 , x10 , x11 , x12 )

(14) (15)

R13 = x6 ⊕ x14 ⊕ F (x7 , x8 , x9 , x10 , x11 , x12 , x13 ) R14 = x7 ⊕ x15 ⊕ F (x8 , x9 , x10 , x11 , x12 , x13 , x14 )

(16) (17)

where the Rj ’s are ﬁxed. In these equations, we can skip α, β and γ since we chose admissible values for (x1 , . . . , x15 ). This system is almost in triangular form : Eq. (17) ﬁxes x15 ; Eq. (16) ﬁxes x6 ; Eq. (15) ﬁxes x5 ; Eq. (14) ﬁxes x4 ; Eq. (13) ﬁxes x3 ; Eq. (12) ﬁxes x2 ; Eq. (11) ﬁxes x1 . Finally, Eq. (1) ﬁxes x0 and Eq. (9) ﬁxes x16 . Each valid solution in step 2 gives us 2−s results, by exploiting the extra degrees of freedom in step 3. We obtain in total about Nα · 27(−2s) · 2−s · 2−1 possible

Cryptanalysis of ESSENCE

143

pairs that satisfy the characteristic from step 8 to 17. The factor 2−1 comes from the fact that we counted each possible pair twice. We can improve this general method in two ways. • First, we can do better than trying 27(2s−) tuples to ﬁnd a solution in step 2. This is based on a Gaussian elimination on the 2s− equations allowing us to explore the set of all candidate tuples by a depth-ﬁrst search procedure. For the sake of simplicity, we will explain the details later on the example of ESSENCE-256 in §3.4. Without this method we would need 27(2s−) trials to ﬁnd one solution, which would increase the complexity a lot. • Secondly, we can improve the choice of (x1 , . . . , x15 )i , i ∈ S. This time we only consider a tuple (x1 , . . . , x15 )i admissible if it can be extended to a whole characteristic from 0 to 24. This reduces the values in Table 3 to the following ones. We can see that the case (αi , βi , γi ) = (1, 0, 1) leads to an impossibility. Table 4. Number of solutions for the (x1 , . . . , x15 ) which can be extended to satisfy the whole characteristic γi

(0, 1)

0 1

96 128

(αi , βi ) (1, 0) (1, 1) 2 0

4 2

Finally, for each slice i ∈ S we ﬁx one of these tuples and try to compute the missing bits. The number of possibilities to choose the tuples for i ∈ S is ˜α = 2|α∧¬β∧¬γ| × 4|α∧β∧¬γ| × 96|¬α∧β∧¬γ| × 2|α∧β∧γ| × 128|¬α∧β∧γ| . N This method increases the probability of passing the rest of the characteristic as we will see in §3.3. The choice of rounds 8 − 17 for the middle part was done to ˜α , see Appendix A. get the lowest possible value for N The subsequent sections discuss the complexity of performing the search of the rest of the characteristic, and give concrete complexity estimates for each instance of ESSENCE. 3.3

Finding Accurate Probabilities

Relying only on the Hamming weight to approximate the probability of the differential characteristic gives unacceptably inaccurate approximations. Indeed, for a given word slice, probabilities of diﬀerences to be absorbed at each step are not independent, and neglecting this leads to estimates far from actual values. For example, a single bit diﬀerence is absorbed during seven steps with probability 2−8.4 , which is signiﬁcantly lower than the heuristic estimate 2−7 based on the one bit diﬀerence. However, for the characteristic considered, the dependency between word slices seems negligible. We thus give complexities with respect to

144

M. Naya-Plasencia et al.

empirical estimates, computed independently for each word slice. That is, we compute the probability of the diﬀerential characteristic as 32 (or 64) independent diﬀerential characteristics, i.e., one for each slice. We could estimate the real probability of our characteristic for any given diﬀerence α. We found that having αi = 1, βi = 0 and γi = 1 leads to an impossibility (the diﬀerential cannot be satisﬁed for that α). This is why we need the condition γ ∧ α ∧ ¬β = 0 . When considering the middle part, we also computed the real probability of verifying the sliced characteristic once this part of the characteristic is satisﬁed. The complexities given in the next section were computed with respect to those empirical estimates, not with the heuristic values based only on the Hamming weight. Reusing our notation α, β, γ, we give in Tab. 5(a) the probabilities for a given slice i to follow the complete characteristic on 32 steps (the impossible cases—of probability zero—are not included), depending on (αi , βi , γi ) ∈ {0, 1}3. To compute the probability for a given diﬀerence (αi , βi , γi ) we count the number of possible bit sequences following the whole diﬀerential characteristic. The probability that a random input follows the characteristic is the product of those probabilities, with each raised to a power that equals the number of slices corresponding to this case. For the α’s used in our attacks and the exact values of the probabilities, we obtain probabilities 2−240.6 and 2−478.9 , respectively for ESSENCE-256 and ESSENCE-512. Taking into account our basic technique in §3.2 for solving the middle part at a reduced cost, we obtain the probabilities in Tab. 5(b). If we consider only those tuples (x1 , . . . , x15 ) where there is at least one possibility of verifying the whole characteristic we get the values in Tab. 5(c). In both cases, we count for every diﬀerence (αi , βi , γi ) the number of extensions of the valid tuples (x1 , . . . , x15 ) satisfying the whole characteristic and compare it to the number of arbitrary extensions. Given those numbers, we ﬁnd that the probability that a value conforming to the middle part follows the rest of the characteristic is 2−87.1 for ESSENCE-256 and 2−158.7 for ESSENCE-512 with the basic method and respectively 2−62.2 and 2−116.1 for the improved one. There are at least two ways to compute the total number of message pairs that is going to satisfy the whole characteristic. As we will obtain nearly the same result with both of them, we can verify the soundness of the probabilities after

Table 5. Probability of passing the diﬀerential characteristic depending on the input diﬀerences (αi , βi , γi ) (a) (b) (c)

complete characteristic basic method improved method

(0, 0, 0) 1 1 1

(0, 1, 0)

(0, 1, 1)

(1, 0, 0)

(1, 1, 0)

(1, 1, 1)

2−9.5 2−1.1 2−1.1

2−9.1 2−1.1 2−1.1

2−24.4 2−16 2−10.4

2−23 2−14.6 2−10

2−26 2−18.5 2−12

Cryptanalysis of ESSENCE

145

solving the middle characteristic. First, some additional notations are required: we let ρ0 , . . . , ρ−1 denote the probabilities for each slice in 0, . . . , − 1 of conforming to the diﬀerential, i.e., each ρi lies in {1, 2−9.5, 2−9.1 , 2−24.4 , 2−23 , 2−26 }; and we let τ0 , . . . , τ−1 be the conditional probabilities for each slice to follow the diﬀerential characteristic, assuming that the middle part is satisﬁed. Now, the two equivalent ways to express the number of conforming messages are: −1 1. The probability of the whole characteristic is i=0 ρi , hence the number of pairs of conforming messages is 2

8

·

−1

ρi ,

i=0

where 8 is the digest bit length. 2. The probability of the characteristic once the middle part is satisﬁed is −1 i=0 τi ; calling N the number of pairs conforming to the middle part, the number of conforming message pairs is then N·

−1

τi .

i=0

In both cases we ﬁnd each possible message pair twice. We veriﬁed that these two ways of computing the total number yield similar values (up to rounding approximations), which shows that the probabilities of verifying the whole characteristic and the characteristic after solving the middle part correspond to each other. For example for our α: for ESSENCE-256 we ﬁnd 2256 ·2−240.6 = 215.4 message pairs considering the whole characteristic. With our improved version we have N˜α = 2106.5 ; a probability of 2−42 of solving the seven linear systems; 213 times more solutions for free and thus N = 277.5 . Using the probability 2−62.1 of passing the rest of the characteristic we again get 215.4 message pairs. 3.4

Collisions for ESSENCE-256

For ESSENCE-256, we could perform an exhaustive search over all 232 possible diﬀerences α and found as optimal value α = 80102040, for which |α| = 4, |β| = 18, and |α ∨ β| = s = 19. Heuristic estimates based on Hamming weights suggest that we need about 214×4+18 = 274 messages that conform to the middle part to ﬁnd at least one conforming to the diﬀerential characteristic on the right side. However, the empirical complexity is (cf. §3.3) approximately 287.1 for the basic method and 262.2 for the improved one. In the following, we focus on the improved version. Solving the Right Side. For that α, we have in total N˜α = 966 × 21 × 1289 × 23 ≈ 2106.5 possibilities to set the bits in S. We have a probability 27(32−2×19) = 2−42 of ﬁnding a solution to the seven systems deﬁned by Eq. (3) to (9). Following our

146

M. Naya-Plasencia et al.

assumption in §3.2, we get about 264.5 solutions. For each solution, we obtain 213 additional solutions by varying the bits (x7 )j for j not in S, yielding in total up to 277.5 solutions. For each message pair found, we must check that it satisﬁes the rest of the characteristic. As found in §3.3, we need about 262.2 values conforming to the middle part to ﬁnd one value following the rest of the characteristic. Below we detail the cost of ﬁnding those messages. We look for solutions of systems (2) to (8). The linear systems L(xj ) = Rj consist each of 32 equations and 26 free variables and has full rank. We have thus 6 linear equations which the ﬁxed bits must fulﬁll to guarantee that there exists a solution of the linear system. We can choose those equations such that: • choosing the values of the bit slices 0, 1, 2, 3, 5, 6, 9, 10, 12, 16, 18 ﬁxes the parity of the ﬁrst equation. This does not change with the choice of the remaining slices; • if we choose in addition the values of the bit slices 7, 11, 13, we ﬁx the second equation; • if we choose in addition the values of the bit slices 4, 17, we ﬁx the third equation; • if we choose in addition the values of the bit slices 14, 15, we ﬁx the fourth equation; • ﬁnally, choosing the value for the last slice, the eighth one, ﬁxes the parity of the remaining two equations. This allows us to explore the set of candidate tuples eﬃciently by a depth-ﬁrst search. Moreover, we can precompute the parity corresponding to the last three equations for any 3-tuple of choices for slices 8, 14, and 15. That way, we do not even need to test the diﬀerent tuples, but only to enumerate the ones giving us a valid solution. The cost to ﬁnd a solution is therefore very low. Using the degree of freedom coming from step 3 of the solving procedure, our implementation is able to generate solutions for the middle part systems and to test the rest of the characteristic at a rate of approximately 650 cycles per candidate on an Intel Core 2 processor, against about 1600 cycles for hashing 256 bits1 . Solving the Left Side. Once a conforming pair of message blocks is found, we just need to try approximately 267.4 distinct random chaining values to ﬁnd a collision (for comparison, the heuristic estimate based on Hamming weights is 214×4 = 256 ). This value limits our attack. Since there is no α with a hamming weight of 3, which veriﬁes the characteristic on the right side, we cannot improve this value. Note that our attack can be carried out with negligible memory (the 262.2 messages that satisfy the middle part don’t have to be stored: we test repeatedly each candidate message, and discard it if it does not conform to the full characteristic). 1

See eBASH: http://bench.cr.yp.to/ebash.html

Cryptanalysis of ESSENCE

147

If we only search for a semi-free-start collision we can reduce the complexity of the left side to 233.7 , which makes again the right side the limiting part. We apply the same techniques as for the right side to compute IV pairs that passes from step 1 to step 8. We have to compute about 233.7 IV pairs to ﬁnd one satisfying the whole characteristic. This method was applied in Appendix C. 3.5

Collisions for ESSENCE-512

For ESSENCE-512, the best diﬀerence is α = 8408400000480082, giving |α| = 8, |β| = 35, and |α ∨ β| = s = 39. We tested all 64-bit diﬀerences with a Hamming weight of up to 10. Since the weight of α is the limiting property on the left side and thus for the whole attack, we are sure to have found the best value for the whole attack. For our α, the matrix of the linear system has again full rank, thus we can directly apply the same techniques as for ESSENCE-256. As discussed in §3.3, with the basic method we need about 2158.7 solutions of the middle part to ﬁnd one solution for the right side of the characteristic (against 2147 with heuristic estimates based on Hamming weights). With the improved method we only need 2116.1 solutions. In the following, we consider only the improved method. Solving the Right Side. For our α we have N˜α = 9614 × 24 × 43 × 12817 × 21 × ≈ 2222.2 possibilities for the tuples at the indices i ∈ S and a probability of about 2−98 to ﬁnd a solution for all the systems of Eq. (2)-(8). Thus, we expect about 2124.2 solutions. Using the free bits, we get for each solution 264−39 = 225 additional solutions. In total, there are thus about 2149.2 solutions, which is high enough for ﬁnding one conforming to the full characteristic (trying 2116.1 is suﬃcient). Solving the Left Side. Now, we have a pair of messages that verify the differential characteristic. The probability for a random chaining value of verifying the diﬀerential characteristic is approximately 2−134.7 . Again, this value is the limiting part of our attack.

4

Attacking HMAC-ESSENCE

HMAC [15] is a widely used construction for building message authentication codes out of hash functions. Proposed in 1996 by Bellare, Canetti, and Krawczyk, HMAC has been standardized by NIST in 2002 [16] and requirements for SHA-3 include compatibility with HMAC. The results in §3, can directly be turned into a distinguisher for ESSENCE256 and ESSENCE-512 when used in keyed mode, be it with an unknown preﬁx message, or within HMAC. More precisely, we use the property that we can precompute a conforming message block once, and then separately seek a conforming chaining value. We just make the standard assumption that we can query an oracle (non-adaptively) with messages, and that this returns the digests produced by the keyed ESSENCE with this message as input, for a randomly preselected key.

148

M. Naya-Plasencia et al.

A distinguisher then works as follows: 1. Find a pair of blocks (x, y) that conforms to the message part diﬀerential. 2. Repeat until a collision is found: 3. Pick a unique preﬁx m. 4. Query for oracle with m x and m y. Ideally 2128 trials are expected before a collision for ESSENCE-256, but here we’ll make only 267.4 trials in average, after a precomputation of complexity 262.2 . For ESSENCE-512, we have a complexity 2134.7 instead of 2256 ideally. We can also mount an existential forgery attack by making one additional adaptive query: 1. Run the distinguisher above to obtain blocks m, x, y such that m x and m y collide by HMAC-ESSENCE. 2. Pick an arbitrary block m . 3. Query the oracle for the MAC of m x m , obtain a value z. 4. Return z as forgery of m y m . The complexity of this attack is essentially the same as that of the simple distinguisher.

5

Conclusion

We presented collision attacks on ESSENCE-256 and ESSENCE-512 of respective complexities 267.4 and 2134.7 . More precisely, these values are upper bounds on the cost of running our attacks, in terms of compression-equivalent units. Implementations of our attacks need only negligible memory, and in particular avoid expensive memory accesses. We combine several methods to achieve our goal: separate treatment of message and chaining value, exact estimation of the probabilities, computation of the low probability part, eﬃcient solution ﬁnding for linear systems and reduction of the search space by considering the whole characteristic. The attacks were experimentally veriﬁed on reduced versions of ESSENCE, and also apply to the versions of ESSENCE with 224- and 384-bit digests. An example of a practical free-start-collision on 29 out of 32 rounds can be found in Appendix C. Attacks to the HMAC are usually much harder than collision attacks, as we can see at the examples of MD4 [17], MD5 [17,18] or SHA-1 [19]. However, we could show direct applications of our collision attack to the HMAC construction instantiated with ESSENCE, giving a distinguisher and an existential forgery attack with same complexity as the collision attacks. Our results reveal signiﬁcant weaknesses in the version of ESSENCE submitted to NIST. Acknowledgments. We would like to thank for their help: Anne Canteaut, St´ephane Jacob, Nicky Mouha, Gautham Sekar, and Fabien Viger.

Cryptanalysis of ESSENCE

149

References 1. Wang, X., Yu, H.: How to break MD5 and other hash functions. In: Cramer, R. (ed.) EUROCRYPT 2005. LNCS, vol. 3494, pp. 19–35. Springer, Heidelberg (2005) 2. Wang, X., Yin, Y.L., Yu, H.: Finding collisions in the full SHA-1. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 17–36. Springer, Heidelberg (2005) 3. Canni`ere, C.D., Rechberger, C.: Finding SHA-1 characteristics: General results and applications. In: Lai, X., Chen, K. (eds.) ASIACRYPT 2006. LNCS, vol. 4284, pp. 1–20. Springer, Heidelberg (2006) 4. Stevens, M., Lenstra, A.K., de Weger, B.: Chosen-preﬁx collisions for MD5 and colliding X.509 certiﬁcates for diﬀerent identities. In: Naor, M. (ed.) EUROCRYPT 2007. LNCS, vol. 4515, pp. 1–22. Springer, Heidelberg (2007) 5. NIST: FIPS 180-2 – secure hash standard (2002) 6. NIST: Announcing request for candidate algorithm nominations for a new cryptographic hash algorithm (sha-3) family. In: Federal Register, November 2007, vol. 72(212) (2007) 7. NIST: Cryptographic hash algorithm competition, http://csrc.nist.gov/groups/ST/hash/sha-3/index.html 8. ECRYPT II: The sha-3 zoo, http://ehash.iaik.tugraz.at/wiki/The_SHA-3_Zoo 9. Martin, J.W.: ESSENCE: A candidate hashing algorithm for the NIST competition. Submission to NIST (2008) 10. Martin, J.W.: ESSENCE: A family of cryptographic hashing algorithms. Submission to NIST (2008) 11. Mouha, N., Sekar, G., Aumasson, J.P., Peyrin, T., Thomsen, S.S., Turan, M.S., Preneel, B.: Cryptanalysis of the ESSENCE family of hash functions. In: Inscrypt 2009. LNCS. Springer, Heidelberg (2009) 12. Merkle, R.C.: One way hash functions and DES. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 428–446. Springer, Heidelberg (1990) 13. Damg˚ ard, I.: A Design Principle for Hash Functions. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 416–427. Springer, Heidelberg (1990) 14. Mendel, F., Rechberger, C., Schl¨ aﬀer, M., Thomsen, S.S.: The rebound attack: Cryptanalysis of reduced Whirlpool and Grøstl. In: Dunkelman, O. (ed.) FSE 2009. LNCS, vol. 5665, pp. 260–276. Springer, Heidelberg (2009) 15. Bellare, M., Canetti, R., Krawczyk, H.: Keying hash functions for message authentication. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp. 1–15. Springer, Heidelberg (1996) 16. NIST: FIPS 198 – the keyed-hash message authentication code, HMAC (2002) 17. Wang, L., Ohta, K., Kunihiro, N.: New key-recovery attacks on HMAC/NMACMD4 and NMAC-MD5. In: Smart, N.P. (ed.) EUROCRYPT 2008. LNCS, vol. 4965, pp. 237–253. Springer, Heidelberg (2008) 18. Wang, X., Yu, H., Wang, W., Zhang, H., Zhan, T.: Cryptanalysis on HMAC/NMAC-MD5 and MD5-MAC. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 121–133. Springer, Heidelberg (2009) 19. Rechberger, C., Rijmen, V.: New results on NMAC/HMAC when instantiated with popular hash functions. Journal of Universal Computer Science 14(3), 347–376 (2008)

150

A

M. Naya-Plasencia et al.

Choice of the Position of the Middle Part

We can see in Table 6 that the choice of rounds 8 to 17 minimizes the number of solutions for the middle part (x1 , . . . , x15 ). Since the total number of solutions is always the same, a smaller number of solutions in the middle part means that we have a higher probability of passing the rest of the characteristic. Table 6. Number of solutions for the (x1 , . . . , x15 ) which can be extended to satisfy the whole characteristic, depending on the input diﬀerences and on the rounds Rounds 0-9 1-10 2-11 3-12 4-13 5-14 6-15 7-16 8-17 9-18 10-19 11-20 12-21 13-22 14-23

B

(αi , βi , γi ) (1, 0, 0) (0, 1, 0) (1, 1, 0) (0, 1, 1) (1, 1, 1) 12 12 8 8 4 4 4 4 2 4 4 4 4 4 4

3968 1984 3072 2160 1152 576 288 192 96 96 96 176 352 512 1024

8 8 8 8 4 4 8 8 4 8 12 12 12 12 16

4960 2480 3840 2640 1408 704 352 224 128 128 128 208 384 640 1280

4 4 4 4 4 4 4 4 2 4 4 4 4 4 4

Probabilities of the Right Side of the Characteristic

Table 7 compares heuristic probability estimates based on Hamming weights with actual, empirically veriﬁed, probabilities. The empirically veriﬁed values are taken from the improved method, considering only values (x1 , . . . , x15 )i , i ∈ S, which are able to satisfy the rest of the characteristic. Therefore, the overall probability is higher than the heuristic approximation.

C

Practical Semi-Free-Start Collision on 29 Out of the 32 Rounds

Once we have a message passing the right side, we can apply the same techniques that we used to compute from step 8 to step 17, to get a semi-free start collision. We compute IV pairs that passes from step 1 to 8 on the left side and test if they pass the rest of the characteristic. With our α for ESSENCE-256 we have to test about 233.7 IV pairs. Together with a message pair we found passing 29 out of the 32 rounds we got the semi-free-start collision presented in Table 8.

Cryptanalysis of ESSENCE

151

Table 7. Comparison between the heuristic approximations of the probability to reach the next diﬀerence and the real probabilities, empirically estimated Message part

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Heuristic approximation

Empirical estimate of improved method

· α · · · · · · · α β · · · · · · α · · · · · · · α ? ? ? ? ? ? ?

2−18 2−4 2−4 2−4 2−4 2−4 2−4 2−4 — — — — — — — — — 2−4 2−4 2−4 2−4 2−4 2−4 2−4 1 1 1 1 1 1 1 1 1

2−21.3 2−4 2−1.7 2−4 2−4 1 2−4 1 — — — — — — — — — 1 2−3 2−1 2−4 2−3 2−8 2−4 1 1 1 1 1 1 1 1 1

Total

2−74

2−62

k7

k6

k5

k4

k3

k2

k1

k0

α β · · · · · · α · · · · · · · α β · · · · · · α · · · · · · · α

β · · · · · · α · · · · · · · α β · · · · · · α · · · · · · · α ?

· · · · · · α · · · · · · · α β · · · · · · α · · · · · · · α ? ?

· · · · · α · · · · · · · α β · · · · · · α · · · · · · · α ? ? ?

· · · · α · · · · · · · α β · · · · · · α · · · · · · · α ? ? ? ?

· · · α · · · · · · · α β · · · · · · α · · · · · · · α ? ? ? ? ?

· · α · · · · · · · α β · · · · · · α · · · · · · · α ? ? ? ? ? ?

Initial values for k k7 k6 k5 k4 k3 k2 k1 k0 4CD35806 4759FB6D 3ED267E5 17641536 BE1F35ED 688B0C3C DF126549 5FAE0827

round diﬀerences round diﬀerences round 0 0 0 0 0 0 0 0 0 0 80102040 537874EB 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 80102040 1 537874EB 0 0 0 0 0 0 80102040 1 0 0 0 0 0 0 80102040 0 2 0 0 0 0 0 0 80102040 0 2 2 3 0 0 0 0 0 80102040 0 0 3 0 0 0 0 0 80102040 0 0 3 0 0 0 0 80102040 0 0 0 4 0 0 0 0 80102040 0 0 0 4 4 5 0 0 0 80102040 0 0 0 0 5 0 0 0 80102040 0 0 0 0 5 6 0 0 80102040 0 0 0 0 0 6 0 0 80102040 0 0 0 0 0 6 0 80102040 0 0 0 0 0 0 7 0 80102040 0 0 0 0 0 0 7 7 8 80102040 0 0 0 0 0 0 0 8 80102040 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 80102040 9 9 10 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 80102040 537874EB 10 0 0 0 0 0 0 0 0 11 0 0 0 0 0 80102040 537874EB 0 11 11 12 0 0 0 0 0 0 0 0 12 0 0 0 0 80102040 537874EB 0 0 12 13 0 0 0 0 0 0 0 0 13 0 0 0 80102040 537874EB 0 0 0 13 0 0 0 0 0 0 0 0 14 0 0 80102040 537874EB 0 0 0 0 14 14 15 0 0 0 0 0 0 0 0 15 0 80102040 537874EB 0 0 0 0 0 15 0 0 0 0 0 0 0 0 16 80102040 537874EB 0 0 0 0 0 0 16 16 17 0 0 0 0 0 0 0 80102040 17 537874EB 0 0 0 0 0 0 80102040 17 18 0 0 0 0 0 0 80102040 0 18 0 0 0 0 0 0 80102040 0 18 0 0 0 0 0 80102040 0 0 19 0 0 0 0 0 80102040 0 0 19 19 20 0 0 0 0 80102040 0 0 0 20 0 0 0 0 80102040 0 0 0 20 0 0 0 80102040 0 0 0 0 21 0 0 0 80102040 0 0 0 0 21 21 22 0 0 80102040 0 0 0 0 0 22 0 0 80102040 0 0 0 0 80000040 22 23 0 80102040 0 0 0 0 0 0 23 0 80102040 0 0 0 0 80000040 38C32419 23 0 0 0 0 0 0 0 24 80102040 0 0 0 0 80000040 38C32419 3B50EAEF 24 24 80102040 25 0 0 0 0 0 0 0 0 25 0 0 0 0 80000040 38C32419 3B50EAEF E9F738F8 25 0 0 0 0 0 0 0 0 26 0 0 0 80000040 38C32419 3B50EAEF E9F738F8 D59E6BC4 26 26 27 0 0 0 0 0 0 0 0 27 0 0 80000040 38C32419 3B50EAEF E9F738F8 D59E6BC4 519ECD90 27 28 0 0 0 0 0 0 0 0 28 0 80000040 38C32419 3B50EAEF E9F738F8 D59E6BC4 519ECD90 8199374F 28 0 0 0 0 0 0 0 0 29 80000040 38C32419 3B50EAEF E9F738F8 D59E6BC4 519ECD90 8199374F 1B9B997C 29 29 30 0 0 0 0 0 0 0 80000040 30 38C32419 3B50EAEF E9F738F8 D59E6BC4 519ECD90 8199374F 1B9B997C A7EF91F9 30 0 0 0 0 0 0 80000040 102040 31 3B50EAEF E9F738F8 D59E6BC4 519ECD90 8199374F 1B9B997C A7EF91F9 21E1C70 31 31 32 0 0 0 0 0 80000040 102040 3336DACE 32 E9F738F8 D59E6BC4 519ECD90 8199374F 1B9B997C A7EF91F9 21E1C70 1B715D5F 32

Initial values for r r7 r6 r5 r4 r3 r2 r1 r0 B0741769 BA2BA1A1 349A4DC8 54204D82 292006B1 80096194 D23020E1 9098A7EA

Table 8. Example of semi-free-start collision on 29 of the 32 rounds of the diﬀerential characteristic, for α = 80102040 and β = 537874EB

152 M. Naya-Plasencia et al.

Domain Extension for Enhanced Target Collision-Resistant Hash Functions Ilya Mironov Microsoft Research, Silicon Valley Campus

Abstract. We answer the question of Reyhanitabar et al. from FSE’09 of constructing a domain extension scheme for enhanced target collisionresistant (eTCR) hash functions with sublinear key expansion. The eTCR property, introduced by Halevi and Krawczyk [1], is a natural ﬁt for hashand-sign signature schemes, oﬀering an attractive alternative to collisionresistant hash functions. We prove a new composition theorem for eTCR, and demonstrate that eTCR compression functions exist if and only if oneway functions do.

1

Introduction

Hash functions are the staple of cryptographic protocols. Mapping out the necessary and suﬃcient assumptions on hash functions is an important research program, with implications for constructions and modes of operations of hash functions. While collision resistance is the strongest and arguably the most universally applicable property one may expect of a hash function, weaker security properties are often suﬃcient and may be easier to design for. The focus of this paper is on a new security property of hash functions put forth by Halevi and Krawczyk [1] for the purpose of strengthening hash-andsign signatures against collision-ﬁnding attacks. The standard hash-and-sign paradigm, which is the basis of virtually all practical signature schemes, calls for hashing the message using a collision-resistant hash function and then signing the result. Should collisions in the hash function be found, the hash-and-sign signature becomes obviously insecure as a signature on one of the colliding messages is a valid signature on the other. Halevi and Krawczyk propose hashing the message under a randomly chosen key and then including the key as part of the signature. The new security property of the hash function, under which the scheme preserves security of the underlying ﬁxed input-length signature scheme, is called enhanced target collision-resistance. They demonstrate practical and theoretical advantages of this approach as well as a concrete instantiation of the hash function satisfying this property based on a randomized Merkle-Damg˚ ard construction, with a minimal computational overhead. The construction can be proved secure under several non-standard assumptions on the ﬁxed input-length keyless compression function. We argue that a compelling alternative is starting with a keyed eTCR compression function with ﬁxed input length and extending its domain with a dedicated construction. S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 153–167, 2010. c International Association for Cryptologic Research 2010

154

I. Mironov

The question of domain extension for eTCR was ﬁrst considered by Reyhanitabar et al. [2] who found that only one of many known domain extension schemes preserves the eTCR property and this construction expands the key linearly, rendering it impractical for application in signatures. In the main contribution of this paper we propose a new eTCR-preserving domain extension scheme with logarithmic key expansion thus settling the open question from [2]. The organization of the paper is as follows. In Sections 2, 3, 4 we survey prior and related work, recall standard deﬁnitions, and give an overview of domain extension techniques. In Section 5 we consider the question of placing eTCR in the complexity-theoretic hierarchy of hardness assumptions and prove that eTCR compression functions may be constructed from one-way functions. Finally, we introduce a new domain extension scheme in Section 6 and instantiate it in Section 7.

2

Related Work

The hash-and-sign paradigm for cryptographic signatures goes back to the Rabin signature scheme [3], where the hash function was used for message compression and, rather presciently, for input randomization (see also [4]). The notion of collision-resistant hash functions, which are essential for the signatures’ security, was formally deﬁned by Damg˚ ard [5]. All standards for digital signatures in use today are based on some variant of the hash-and-sign paradigm. In fact, ﬁrst standards of dedicated hash functions, such as MD2, MD5, or SHA [6,7,8], were explicitly aimed at securing digital signature schemes. Security of most practical signature schemes cannot be argued based on the collision-resistance property of the hash function alone. Instead, proofs are given in the random oracle model, where the hash function is replaced by an ideal functionality with oracle access [9,10]. As part of a larger research program of reducing hardness assumptions necessary for proving security of various cryptographic primitives, a series of seminal papers reduced existence of secure signatures to that of one-way functions [11,12,13] (see also Katz and Koo [14] for a complete proof and Haitner et al. [15] for an alternative proof of the last reduction). The crucial intermediate step of the construction, proposed by Naor and Yung, is universal one-way hash functions (UOWHFs), also known as target collision-resistant (TCR) hashes. TCR hashes appear to be a fundamentally weaker primitive than collisionresistant hashes, and thus may be easier to construct, as evidenced by the work of Simon [16]. He showed that collision-resistant hashes cannot be built using one-way functions in a black-box manner, as opposed to target collision-resistant hashes that can. In practice, collision-resistance of standardized hash functions has been under assault recently, starting with remarkable attacks on the MD and SHA families by Wang et al. [17,18,19,20]. The attacks spurred interest in using hash functions that are less fragile than collision-resistant hashes. Even before that, the Cramer-Shoup signature scheme [21], which was a ﬁrst eﬃcient short signature scheme provable in the

Domain Extension for Enhanced Target Collision-Resistant Hash Functions

155

standard model, included a TCR hash as an option. The main diﬃculty in using TCR hashes as a drop-in replacement for broken or vulnerable collision-resistant functions is in handling the key, since the TCR hashes are keyed unlike keyless collision-resistant hashes. Moreover, the key of a TCR hash cannot be chosen ahead of time: to take advantage of its security guarantee the key must be chosen by the signer, signed and communicated as part of the signature. More concretely, if σ(·) is a secure (existentially unforgeable [11]) signature scheme for ﬁxed-length inputs and Hk (·) is a TCR hash, σ(k||Hk (M )), k where k is chosen at signing, will be a secure signature scheme for variable-length messages. To address the problem of inﬂating the signature length, Mironov [22], generalized by Pasini and Vaudenay [23], suggests reusing the randomness already present in the signature scheme to key the TCR hash. A diﬀerent approach, which is the main motivation for the present work, is due to Halevi and Krawczyk [1]. They propose a new security deﬁnition for hash functions, called enhanced target collision-resistance (eTCR), which allows one to leave out the hash function key from the signature’s input. Namely, if the keyed hash function Hk (·) satisﬁes the new deﬁnition, the following signature—σ(Hk (M )), k—inherits security of the σ(·) scheme. Constrained by backwards compatibility, Halevi and Krawczyk ingeniously replace legacy keyless hash functions in signature schemes such as RSA or DSA with a keyed eTCR function without changing existing implementations of the signing algorithm. Instead, they transform the message with a randomly chosen key, sign the output of the transformation, and append the key to the signature. Thus, their signature scheme takes the form of σ(H(RMXk (M ))), k, where RMX is the keyed randomization scheme. A concrete speciﬁcation of the RMX transform is available as a NIST special publication [24] and an IETF Internet draft [25]. The security of the combined signature scheme follows from existential unforgeability of σ(·) and from H(RMXk (·))’s being eTCR if H is an iterative (based on the Merkle-Damg˚ ard scheme [26,27]) hash function whose compression function satisﬁes one of several non-standard cryptographic properties. For security of the RMX transform applied to blockcipher-based Davies-Meyer hash functions (i.e., most of current standards) see Gauravaram and Knudsen [28]. Given its practical signiﬁcance, we consider the eTCR property as a natural and intriguing extension of target collision-resistance that can be studied on its own. Two fundamental questions arise in connection with a new deﬁnition of security for hash functions: its place in the hierarchy of hardness assumptions, and existence and eﬃciency of domain extension schemes. In other words, what is the simplest primitive eTCR can be reduced to, and once we have a ﬁxed-length input eTCR, how can we apply it to arbitrary-length inputs? The ﬁrst question was previously considered by Yasuda [29], which we revisit for compressing eTCR in Section 5. The second question was initially raised by Reyhanitabar et al. [2], who found that most known domain extension schemes do not preserve the eTCR property. The only construction on their list that does, achieves so by expanding the key linearly with the size of the message. They conclude the

156

I. Mironov

paper by leaving open the problem of constructing a (key-length) eﬃcient eTCRpreserving domain extension scheme. We answer it in Section 6.

3

Definitions

To simplify notation we state standard deﬁnitions of security properties for hashes in the asymptotic setting, parameterized with security parameter λ ∈ N, which is omitted when clear from the context. A function of λ is negligible if it is less in absolute value than 1/|p(λ)| for any polynomial p and large enough λ. Since we deal with both ﬁxed and variable-input length functions, the deﬁnitions are given using abstract domain and range sets, which can be substituted for ﬁxed-length binary strings {0, 1}n or bounded-length {0, 1}
k ← Kλ ; M ← A2 (1λ , k, state); output success if Hλ (k, M ) = Hλ (k, M ) and M = M . The probability is taken over the adversary’s random tape and the choice of k. For notation’s brevity, we often write the hash function’s key as a subscript also omitting the security parameter, as in Hk (M ) instead of Hλ (k, M ). When the function’s domain D is a direct product of two or more sets, say X × Y, we may write the input of the hash function as several arguments, such as Hk (x, y), where (x, y) ∈ X × Y = D. The diﬀerence is purely syntactical. Definition 2 (Enhanced target collision-resistance—eTCR [1]). We call {Hλ }λ∈N an enhanced target collision-resistant family of functions Hλ : Kλ × Dλ → Rλ if for any polynomial-time adversary consisting of two randomized algorithms A = (A1 , A2 ) the probability of outputting success in the following experiment is negligible: Expetcr H,A (λ): (M, state) ← A1 (1λ ); $

k ← Kλ ; (k , M ) ← A2 (1λ , k, state); = (k , M ). output success if Hλ (k, M ) = Hλ (k , M ) and (k, M ) The probability is taken over the adversary’s random tape and the choice of k. In other words, in the Exptcr H,A (λ) experiment (the TCR game) the adversary commits to a message M , receives a randomly sampled key k, and is tasked

Domain Extension for Enhanced Target Collision-Resistant Hash Functions

157

with producing a message M such that it collides with M under Hk : Hk (M ) = Hk (M ). The adversary in the eTCR game is more powerful—after committing to M and receiving k as before, he wins the game if he can ﬁnd M and possibly a diﬀerent key k such that Hk (M ) = Hk (M ) subject to the condition that (k, M ) = (k , M ). We note that our versions of the deﬁnitions of TCR and eTCR do not explicitly require the functions to be compressing. In fact, since we allow variablelength inputs, the functions may sometimes map short inputs into longer strings. However, throughout the paper the ranges Rλ are always assumed to consist of bit-strings of some ﬁxed length m (depending on λ) and the domains Dλ typically include strings longer than m.

4

Overview of Domain Extension Schemes for TCR

The most common method of designing hash functions is to construct a ﬁxed input-length compression function and then extend its domain by repeated applications via some composition scheme. By far the best known such scheme is Merkle-Damg˚ ard [26,27] that provably and without any loss in exact security extends a collision-resistant compression function. It is important to note limitations of this approach: (1) the extended-domain hash is only as secure as the underlying compression function; (2) the domain extension scheme may not preserve security properties other than those for which it was proved secure; (3) exact security (i.e., the one most relevant in practice) may deteriorate under composition. Several recent papers stress these points regarding the Merkle-Damg˚ ard scheme: multi-block collision-ﬁnding attacks on MD5 and SHA0 [18,19,30] span several applications of the compression function; multicollisions, second-preimage and other attacks may take advantage of the Merkle-Damg˚ ard iterative structure [31,32,33]. More relevantly and also reinforcing point (2) from the above, target collisionresistance is not preserved under the Merkle-Damg˚ ard iterative composition [34], which motivates construction of dedicated TCR-preserving domain extension schemes. Two such schemes appear in the Naor-Yung paper, the ﬁrst being a Merkle-Damg˚ ard-like sequential composition of independently keyed compression functions (linear hash), and the second similar to the Wegman-Carter treebased method [35] (basic tree, according to [34]). Both schemes expand the key, whose length (the shorter, the better) is an important characteristic of a TCR function. Reducing key expansion in sequential and parallelizable settings was the subject of several papers starting with [34]. Remarkably, both the linear hash and basic tree composition schemes preserve the eTCR property. The ﬁrst statement was shown in [2] and the second, observed by Dodis and Haitner [36], is implied by the following argument. Each level of the tree construction can be modeled as an independently keyed hash function, which is a concatenation of multiple eTCR and thus an eTCR itself. The basic tree construction is eTCR by applying the same argument as in the proof of the linear hash scheme to the composition of layers.

158

I. Mironov

Let the ﬁxed input-length compression function be H : {0, 1} × {0, 1}m → {0, 1}n, where is the key length, and let the message length be L > m. Linear hash expands the key to L/(m − n) . Naor-Yung basic tree for hashing messages of size L results in key length logm/n (L/n) , improved for large to + mlogm/n (L/n) by Bellare and Rogaway with the XOR tree construction [34]. A sequential composition scheme due to Shoup [37] expands the key to + nlog2 (L/(m − n)) , which is an improvement over the XOR tree construction by approximately a factor of m/n log2 m/n. The Shoup construction’s linear hash: M1

M2

H

M3

H

k1

M4

H

k2

M5

H

k3

M6

H

k4

H

k5

k6

Basic tree: M1 k1

H

k2

H

k3

H

M2 k1

M3

H

k1

H

k2

H

M4 k1

H

Shoup construction: M1

M2

H

k

M3

H

k

M4

H

k

k0

M5

H

k

M6

H

k

k0

M7

H

k

H k

k0

k1

k1 k2

Fig. 1. Extenders for TCR hashes: linear, basic tree, and Shoup constructions

Domain Extension for Enhanced Target Collision-Resistant Hash Functions

159

key expansion was proved optimal in a certain restricted model by Mironov [38]. The construction and the proof of optimality were generalized to DAG-based constructions (including tree-based schemes) by Sarkar [39,40]. Depending on the parameters of the compression function and the length of the message, one of the following composition schemes is the least key-expanding: sequential composition, basic tree, or Shoup’s construction. The three schemes are illustrated in Figure 1. Two comments are in order. First, we leave the issue of padding and length encoding for Section 7. Second, we present the linear hash and the Shoup construction without the usual initial value (IV) constant (see also [41]). Instead, the ﬁrst application of the compression function processes the initial m-bit long block of the message. For the narrowly deﬁned purpose of TCR domain extension these constructions can be proved secure using standard methods. On the upside, our versions have the pleasant property of defaulting to the compression function when the message (after length encoding and padding) is m-bit long and of enabling chaining composition of several such functions (see Figure 2 below). For more comprehensive treatment of domain extension schemes with an eye towards simultaneously preserving many cryptographic properties of hash functions, such as second-preimage resistance, multi-collision resistance, random oracleness, etc., we refer to several recent papers [42,43,44] on domain extension for keyed and keyless functions.

5

From One-Way Hashes to eTCR

Motivated by compatibility with existing standards and APIs, Halevi and Krawczyk describe an elegant eTCR construction framed as a message-preprocessing scheme. Indeed, the composition of the RMX transform (pronounced remix ), ard hash deﬁned as RMX(r, M ) = (r, M1 ⊕ r, . . . , Ml ⊕ r), with a Merkle-Damg˚ function satisfying certain properties is eTCR. They prove security of the composition based on either one of two rather strong assumptions on the compression function, one of which implies existence of collision-resistant hashes, and the other is a property of the composition scheme itself. In light of known diﬃculty of constructing (plain) TCR functions from one-way functions without key expansion, it is unlikely that RMX can be proved secure based on a property reducible to one-way functions. We pursue a diﬀerent approach, starting with TCR hashes, which in turn can be reduced to one-way functions (see also [29] for related results). Before presenting our construction, we recall the Naor-Yung method of building TCR compressing functions (originally introduced as UOWHFs) from one-way permutations, and demonstrate that the resulting function is not an eTCR. Select a bijection between GF(2n ) and {0, 1}n by ﬁxing an irreducible polynomial over F2 of degree n. Let π : {0, 1}n → {0, 1}n be a one-way permutation and let ga,b : {0, 1}n → {0, 1}n−1 be the function deﬁned as ga,b (x) = chop(ax + b), where x, a, b ∈ {0, 1}n , arithmetic is done in GF(2n ), and chop drops the last bit of its input. Naor and Yung prove that the following composition function

160

I. Mironov

is target collision-resistant: Ha,b (x) = ga,b (π(x)). Indeed, by assuming the opposite, we ﬁnd a pre-image under π of a given z chosen uniformly at random from {0, 1}n as follows. First, the adversary produces some x ∈ {0, 1}n, we then choose a, b ∈ {0, 1}n such that ga,b (π(x)) = ga,b (z). If the adversary succeeds in ﬁnding y = x that collides with x under Ha,b (·), it means that π(y) = z, since Ha,b (x) has exactly two preimages under ga,b (·), one of which is π(x) and the other is z. It contradicts π’s one-wayness. The above construction compresses the input by one bit. Naor and Yung prove that a sequential composition of independently-keyed TCR hashes is also TCR, and thus one can achieve arbitrary (polynomial) compression ratio. We reproduce the proof here because ha,b (·) fails as enhanced TCR for the same reason it can be proved (plain) TCR. The function is deﬁned in such a way that by choosing the key it can be forced to take any given value on any ﬁxed input. Likewise, a concrete instantiation of a TCR function based on hardness of the subset-sum problem due to Impagliazzo and Naor [45] is trivially not an eTCR. However, there is a simple transformation that converts a TCR function into ˆ k (x) = Hk (x)||k is eTCR. More an enhanced TCR. If hk (·) is a TCR, then H formally, Proposition 1. If {Hλ } is a TCR family of functions Hλ : Kλ × Dλ → Rλ , the ˆ λ : Kλ × Dλ → Rλ × Kλ deﬁned as following family H ˆ H(k, M ) = H(k, M )||k is eTCR. Proof. Assuming the opposite, there is an adversary A = (A1 , A2 ) that wins the ˆ Let the output of A1 be M , the random key be k, and eTCR game against H. ˆ k (M ) = Hk (M )||k and H ˆ k (M ) = H ˆ (M ), the output of A2 be (k , M ). Since H k it means that k = k , and the adversary A wins the TCR game against H with ˆ the same probability it wins the eTCR game against H. The TCR-to-eTCR transform as applied to the basic Naor-Yung construction does not yield a compressing eTCR, because the key of the original construction is longer than the input. For instance, to achieve the compression ratio of two, the key must be quadratic in the input length. However, domain extenders for TCR with sublinear key expansion, surveyed in Section 4, do result in TCR hashes where the length of the key concatenated with the output is less than the hash function’s input length. Combining this with Proposition 1, and invoking the result of Rompel [13,14] stating that one-way functions are suﬃcient for TCR compressing hashes, we establish the following: Theorem 1. Compressing eTCR hash functions exist if and only if one-way functions do. To the best of our knowledge, the above theorem is the ﬁrst application of domain extension schemes with sublinear key expansion in complexity-theoretic treatment of cryptographic hash functions.

Domain Extension for Enhanced Target Collision-Resistant Hash Functions

6

161

Domain Extension for eTCR

As the main contribution of the paper we construct a domain extender for eTCR. The construction is recursive and uses domain extension schemes for TCR. It is based on the observation that a composition of a TCR function with an independently keyed eTCR function is eTCR. Since an eTCR compression function is also (plain) TCR, whose domain we know how to extend, it suﬃces to iterate the composition scheme until the input into the eTCR function becomes shorter than the compression function’s input or a linear hash extension scheme can be applied. Suppose we are given a TCR hash H tcr and an eTCR hash F etcr . We argue that the following composition function is eTCR: (Hktcr (M ), k1 ). Gk1 ,k2 (M ) = Fketcr 2 1 Indeed, if after committing to M and receiving (k1 , k2 ), the adversary ﬁnds a collision of the type Gk1 ,k2 (M ) = Gk1 ,k2 (M ), where M = M , k1 = k1 , and tcr tcr Hk1 (M ) = Hk1 (M ), it (intuitively) means that the adversary broke target = k or Hktcr (M ) = Hktcr (M ), we collision-resistance of the H tcr function. If k1 1 1 etcr would use the adversary to win the eTCR game against F . More formally: Theorem 2. If Hλtcr : Kλ1 × Dλ → R1λ is a TCR family indexed by security parameter λ and Fλetcr : Kλ2 × R1λ × Kλ1 → R2λ is eTCR, then the following family of functions G : Kλ1 × Kλ2 × Dλ → R2λ is eTCR: (Hktcr (M ), k1 ). Gk1 ,k2 (M ) = Fketcr 2 1 Proof. Assume towards a contradiction that there is an adversary A = (A1 , A2 ) winning the eTCR game against G. Let (M, state) be the output of A1 , and let A2 produce a collision Gk1 ,k2 (M ) = Gk1 ,k2 (M ) given random (k1 , k2 ). We classify the collisions into two types: inner collisions, where M = M , k1 = k1 tcr and Hktcr (M ) = H (M ), and outer collisions (all others). If we guess at random k1 1 which of these two cases takes place, we succeed with probability at least a half. Case I: Inner collision. In this case, we may use the adversary to break the H tcr function. Deﬁne algorithm B I = (B1I , B2I ) as follows: Algorithm B1I (1λ ):

Algorithm B2I (1λ , state, k1 ):

1. Run (M, state) ← A1 (1λ ). 2. Output (M, state).

1. 2. 3. 4.

$

Pick random k2 ← K2 . (M , k1 , k2 ) ← A2 (1λ , state, k1 , k2 ). Fail if this is not an inner collision. Output M .

(M ) = By the deﬁnition of an inner collision (i.e., M = M , k1 = k1 , and Hktcr 1 tcr Hk (M )), the algorithm BI outputs a valid collision on H tcr . 1

Case II: Outer collision. In this case, we attack the F etcr function. Let B O = (B1O , B2O ) be the following:

162

I. Mironov

Algorithm B1O (1λ ): 1. Run (M, state) ← A1 (1λ ).

Algorithm B2O (1λ , k1 ||state, k2 ): 1. (M , k1 , k2 ) ← A2 (1λ , state, k1 , k2 ).

$

2. Fail if this is not an outer collision. 2. Pick random k1 ← K1 . 3. Output (Hktcr (M ), k , k ||state). 3. Output (k2 , Hktcr (M ), k1 ). 1 1 1 1

If A succeeds, we know that Gk1 ,k2 (M )=Gk1 ,k2 (M ) and (k1 , k2 , M ) = (k1 , k2 , M ). etcr To conclude, we must verify that this gives a valid collision on F , where the col tcr (M ), k ) and (k , H (M ), k1 ). By deﬁliding key-message pairs is (k2 , Hktcr 1 2 k1 1 nition of an outer collision, at least one of the following holds true: M = M , or etcr k1 = k1 , or Hktcr (M ) = Hktcr are (M ). In the last two cases, the inputs into F 1 1 obviously distinct. If M = M , then by deﬁnition of the eTCR game won by A, we = (k1 , k2 ), which results in a valid collision on F etcr . know that (k1 , k2 ) Remark 1. The domain of the outer function F etcr in the theorem statement is R1 ×K1 , whereas in practice the function’s input is most likely to be a bit string. Thus, to apply the theorem one has to ensure that F etcr ’s input can be uniquely parsed into the output of H tcr and its key. It is indeed the case when either the output of the function (the most common option) or the key have ﬁxed length, or alternatively, by using an encoding scheme where the boundary between the two strings can be unambiguously identiﬁed. The above composition theorem gives a domain extension scheme for eTCR except for one potential problem: Since there are no known domain extenders for TCR hashes without key expansion, the key k1 , which is concatenated with H’s output and fed into the eTCR function F , may be longer than the compression function’s input length. The construction may be applied recursively, or it may terminate by using the linear hash domain extension scheme for eTCR (see below). If one uses the Shoup construction for H and the linear hash for F , the total key expansion is ( + nlog2 (L/(m − n)) ) × (1 + /(m − n)).

7

Length Variability and Concrete Scheme

Before we present a concrete instantiation of the scheme proven secure in Theorem 2 we must address the question of message padding and length encoding by the underlying schemes. As is, domain extenders for TCR hashes from Figure 1 either require message pre-processing or become insecure when applied to variable-length messages. A generic method for domain extension to variablelength inputs is described by Bellare and Rogaway [34] and requires one additional application of an independently keyed compression function. To streamline the construction, we encode the message length into the input of the eTCR function F etrc , which allows us to use domain extenders satisfying a weaker deﬁnition of TCR∗ (deﬁned in [2]), where the function accepts variable-length inputs but the adversary in the TCR game is restricted to ﬁnding a collision of equal-length messages. Namely, assuming that the keyed function of three inputs F etcr is eTCR and H tcr is TCR∗ , the following function is eTCR:

Domain Extension for Enhanced Target Collision-Resistant Hash Functions

163

G∗k1 ,k2 (M ) = Fketcr (Hktcr (M ), k1 , |M |). 2 1 The proof is analogous to Theorem 2. In our construction below we make sure that the input to F etcr be uniquely parsed into three parts. The following construction, which consists of pre-processing followed by application of independently keyed compression functions, is very similar to the linear hash scheme of [34]. They diﬀer in two important aspects: the IV, which is replaced with input material in our construction, and handling of the message length, which is encoded in the last block. A proof of the claim that the resulting construction is eTCR a straightforward adaption of Theorem 7 from the full version of [2]. Pre-processing function Input: message M , length len < 2d Output: blocks M1 , . . . , Mt format M as M1 , . . . , Mt , where (a) |M1 | = |M | if |M | < m (b) |M1 | = m if |M | ≥ m (c) |Mi | = m − n for 1 < i < t (d) |Mt | ≤ m − n m if t = 1 s← m − n otherwise if |Mt | > s − d then pad Mt with zeros to s bits Mt+1 ← empty t ← t+1 pad Mt with zeros to m − n − d bits Mt ← Mt ||[len]d2 , where [len]d2 is len encoded as d-bit binary

The following example (Figure 2) is a domain extension scheme for the compression function H : {0, 1}128 × {0, 1}640 → {0, 1}256, which can be based on the compression function of SHA-256 with 128 input bits allocated for the key. The composite function takes input of length 256 + 4 × 384 = 1792 bits and expands the key to 3 × 128 + 2 × 256 = 896 bits. The example illustrates two features of our construction. First, its two main components, the TCR∗ and eTCR functions, may be selected independently from each other. In particular, Linear hashing Given: function F : {0, 1} × {0, 1}m → {0, 1}n Input: blocks M1 , . . . , Mt ; keys k1 , . . . , kt Output: hash of length n bits h1 ← Fk1 (M1 ) for i := 2 to t do hi ← Fki (hi−1 ||Mi ) output ht

164

I. Mironov

the choice of the inner function TCR∗ may depend on the message length (for instance, in our example a more keysize-eﬃcient choice of the inner function would have been the linear hash; we prefer the Shoup construction for illustrative purposes). Second, rather than separately encoding the length of its input, as prescribed by the linear hash scheme, and the length of the message, called for by the construction of G∗ , the outer eTCR function may only do the latter as long as the message length uniquely determines the length of eTCR’s input. In fact, our pre-processing function accepts the length of the message as a separate input, which allows this optimization.

M1

M2

H

k

M3

H

k

H

k

k1 ||[len]128 2

k||k0

H k

H k2

H k3

k0

k0 TCR∗

M4

k1

eTCR

Fig. 2. Example of an eTCR domain extension function for H : {0, 1}128 × {0, 1}640 → {0, 1}256

8

Conclusion

We study the enhanced target collision-resistant (eTCR) property of hash functions introduced by Halevi and Krawczyk as a method of securing signature schemes in lieu of traditionally used collision-resistant hash functions [1]. While the deﬁnition was initially proposed to facilitate proof of security of the RMX transform, it is an interesting variant of the TCR property that may have applications on its own. In our ﬁrst contribution, we explore connections between TCR and eTCR hash functions, demonstrating that the TCR construction of Naor-Yung is provably not eTCR. On the other hand, eTCR hashes can be constructed from TCR compressing functions, placing them in the same complexity-theoretic class of functions that can be based on one-way functions in a black-box manner. It separates them from collision-resistant hashes that cannot be reduced to oneway functions or permutations via a black-box construction. Secondly, we answer the question raised in [2] on constructing a key lengtheﬃcient domain extender for eTCR hashes by presenting a domain extension scheme with logarithmic key expansion.

Domain Extension for Enhanced Target Collision-Resistant Hash Functions

165

Acknowledgements We thank anonymous reviewers and attendees of the FSE’10 conference for their comments, and Yevgeniy Dodis and Iftach Haitner for their valuable observation that the basic tree construction preserves the eTCR property.

References 1. Halevi, S., Krawczyk, H.: Strengthening digital signatures via randomized hashing. In: Dwork, C. (ed.) CRYPTO 2006. LNCS, vol. 4117, pp. 41–59. Springer, Heidelberg (2006) 2. Reyhanitabar, M.R., Susilo, W., Mu, Y.: Enhanced target collision resistant hash functions revisited. In: Dunkelman, O. (ed.) FSE 2009. LNCS, vol. 5665, pp. 327–344. Springer, Heidelberg (2009); Full version available at Cryptology ePrint Archive, Report 2009/506 3. Rabin, M.O.: Digitalized signatures and public-key functions as intractable as factorization. Technical Memo MIT/LCS/TR-212, MIT (January 1979) 4. Davies, D.W., Price, W.L.: The application of digital signatures based on publickey cryptosystems. In: Salz, J. (ed.) Proceedings of the Fifth Intl. Conference on Computer Communications, pp. 525–530 (1980) 5. Damg˚ ard, I.: Collision free hash functions and public key signature schemes. In: Price, W.L., Chaum, D. (eds.) EUROCRYPT 1987. LNCS, vol. 304, pp. 203–216. Springer, Heidelberg (1988) 6. Kaliski Jr., B.S.: The MD2 message-digest algorithm. RFC 1115, The Internet Engineering Task Force (April 1992) 7. Rivest, R.L.: The MD5 message-digest algorithm. RFC 1321, The Internet Engineering Task Force (April 1992) 8. National Institute of Standards and Technology: Secure hash standard (SHS) (May 1993) 9. Fiat, A., Shamir, A.: How to prove yourself: Practical solutions to identiﬁcation and signature problems. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 186–194. Springer, Heidelberg (1987) 10. Bellare, M., Rogaway, P.: Random oracles are practical: A paradigm for designing eﬃcient protocols. In: ACM Conference on Computer and Communications Security, pp. 62–73 (1993) 11. Goldwasser, S., Micali, S., Rivest, R.L.: A digital signature scheme secure against adaptive chosen-message attacks. SIAM Journal on Computing 17, 281–308 (1988) 12. Naor, M., Yung, M.: Universal one-way hash functions and their cryptographic applications. In: Proceedings of the Twenty First Annual ACM Symposium on Theory of Computing, May 15–17, pp. 33–43 (1989) 13. Rompel, J.: One-way functions are necessary and suﬃcient for secure signatures. In: Proceedings of the Twenty Second Annual ACM Symposium on Theory of Computing, May 14–16, 1990, pp. 387–394 (1990) 14. Katz, J., Koo, C.Y.: On constructing universal one-way hash functions from arbitrary one-way functions. J. Cryptology (to appear); Available on Cryptology ePrint Archive, Report 2005/328 15. Haitner, I., Holenstein, T., Reingold, O., Vadhan, S., Wee, H.: Universal oneway hash functions via inaccessible entropy. In: Advances in Cryptology— EUROCRYPT 2010 (to appear, 2010); Available on Cryptology ePrint Archive, Report 2010/120

166

I. Mironov

16. Simon, D.R.: Finding collisions on a one-way street: Can secure hash functions be based on general assumptions? In: Nyberg, K. (ed.) EUROCRYPT 1998. LNCS, vol. 1403, pp. 334–345. Springer, Heidelberg (1998) 17. Wang, X., Lai, X., Feng, D., Chen, H., Yu, X.: Cryptanalysis of the hash functions MD4 and RIPEMD. In: [46], pp. 1–18 18. Wang, X., Yu, H.: How to break MD5 and other hash functions. In: [46], pp. 19–35 19. Wang, X., Yu, H., Yin, Y.L.: Eﬃcient collision search attacks on SHA-0. In: [48], pp. 1–16 20. Wang, X., Yin, Y.L., Yu, H.: Finding collisions in the full SHA-1. In: [48], pp. 17–36 21. Cramer, R., Shoup, V.: Signature schemes based on the strong RSA assumption. ACM Trans. on Information and System Security (TISSEC) 3(3), 161–185 (2000) 22. Mironov, I.: Collision-resistant no more: Hash-and-sign paradigm revisited. In: Yung, M., Dodis, Y., Kiayias, A., Malkin, T.G. (eds.) PKC 2006. LNCS, vol. 3958, pp. 140–156. Springer, Heidelberg (2006) 23. Pasini, S., Vaudenay, S.: Hash-and-sign with weak hashing made secure. In: Pieprzyk, J., Ghodosi, H., Dawson, E. (eds.) ACISP 2007. LNCS, vol. 4586, pp. 338–354. Springer, Heidelberg (2007) 24. Dang, Q.: Randomized hashing for digital signatures. NIST Special Publication 800-106, National Institute of Standards and Technology (February 2009) 25. Halevi, S., Krawczyk, H.: Strengthening digital signatures via randomized hashing. Internet Draft draft-irtf-cfrg-rhash-01, Internet Engineering Task Force (October 2007) (Work in progress) 26. Merkle, R.C.: One way hash functions and DES. In: [47], pp. 428–446 27. Damg˚ ard, I.: A design principle for hash functions. In: [47], pp. 416–427 28. Gauravaram, P., Knudsen, L.R.: On randomizing hash functions to strengthen the security of digital signatures, pp. 88–105 29. Yasuda, K.: How to ﬁll up Merkle-Damg˚ ard hash functions. In: Pieprzyk, J. (ed.) ASIACRYPT 2008. LNCS, vol. 5350, pp. 272–289. Springer, Heidelberg (2008) 30. Biham, E., Chen, R., Joux, A., Carribault, P., Lemuet, C., Jalby, W.: Collisions of SHA-0 and reduced SHA-1. In: [46], pp. 36–57 31. Joux, A.: Multicollisions in iterated hash functions. Application to cascaded constructions. In: Franklin, M.K. (ed.) CRYPTO 2004. LNCS, vol. 3152, pp. 306–316. Springer, Heidelberg (2004) 32. Kelsey, J., Schneier, B.: Second preimages on n-bit hash functions for much less than 2n work. In: [46], pp. 474–490 33. Kelsey, J., Kohno, T.: Herding hash functions and the Nostradamus attack. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 183–200. Springer, Heidelberg (2006) 34. Bellare, M., Rogaway, P.: ion-resistant hashing: Towards making UOWHFs practical. In: Kaliski Jr., B.S. (ed.) CRYPTO 1997. LNCS, vol. 1294, pp. 470–484. Springer, Heidelberg (1997) 35. Wegman, M.N., Carter, L.: New hash functions and their use in authentication and set equality. J. Comput. Syst. Sci. 22(3), 265–279 (1981) 36. Dodis, Y., Haitner, I.: Private communication 37. Shoup, V.: A composition theorem for universal one-way hash functions. In: Preneel, B. (ed.) EUROCRYPT 2000. LNCS, vol. 1807, pp. 445–452. Springer, Heidelberg (2000) 38. Mironov, I.: Hash functions: From Merkle-Damg˚ ard to Shoup. In: Pﬁtzmann, B. (ed.) EUROCRYPT 2001. LNCS, vol. 2045, pp. 166–181. Springer, Heidelberg (2001)

Domain Extension for Enhanced Target Collision-Resistant Hash Functions

167

39. Sarkar, P.: Masking based domain extenders for UOWHFs: Bounds and constructions. IEEE Transactions on Information Theory 51(12), 4299–4311 (2005) 40. Sarkar, P.: Construction of universal one-way hash functions: Tree hashing revisited. Discrete Applied Mathematics 155(16), 2174–2180 (2007) 41. Sarkar, P.: Domain extender for collision resistant hash functions: Improving upon Merkle-Damg˚ ard iteration. Discrete Applied Mathematics 157(5), 1086–1097 (2009) 42. Bellare, M., Ristenpart, T.: Multi-property-preserving hash domain extension and the EMD transform. In: Lai, X., Chen, K. (eds.) ASIACRYPT 2006. LNCS, vol. 4284, pp. 299–314. Springer, Heidelberg (2006) 43. Andreeva, E., Neven, G., Preneel, B., Shrimpton, T.: Seven-property-preserving iterated hashing: ROX. In: Kurosawa, K. (ed.) ASIACRYPT 2007. LNCS, vol. 4833, pp. 130–146. Springer, Heidelberg (2007) 44. Bellare, M., Ristenpart, T.: Hash functions in the dedicated-key setting: Design choices and MPP transforms. In: Arge, L., Cachin, C., Jurdzi´ nski, T., Tarlecki, A. (eds.) ICALP 2007. LNCS, vol. 4596, pp. 399–410. Springer, Heidelberg (2007) 45. Impagliazzo, R., Naor, M.: Eﬃcient cryptographic schemes provably as secure as subset sum. J. Cryptology 9(4), 199–216 (1996) 46. Cramer, R. (ed.): EUROCRYPT 2005. LNCS, vol. 3494. Springer, Heidelberg (2005) 47. Brassard, G. (ed.): CRYPTO 1989. LNCS, vol. 435. Springer, Heidelberg (1990) 48. Shoup, V. (ed.): CRYPTO 2005. LNCS, vol. 3621. Springer, Heidelberg (2005)

Security Analysis of the Mode of JH Hash Function Rishiraj Bhattacharyya1, Avradip Mandal2 , and Mridul Nandi3, 1

3

Indian Statistical Institute, Kolkata, India rishi [email protected] 2 Universit´e du Luxembourg, Luxembourg [email protected] NIST, USA and Computer Science Department, The George Washington University [email protected]

Abstract. Recently, NIST has selected 14 second round candidates of SHA3 competition. One of these candidates will win the competition and eventually become the new hash function standard. In TCC’04, Maurer et al introduced the notion of indifferentiability as a generalization of the concept of the indistinguishability of two systems. Indifferentiability is the appropriate notion of modeling a random oracle as well as a strong security criteria for a hash-design. In this paper we analyze the indifferentiability and preimage resistance of JH hash function which is one of the SHA3 second round candidates. JH uses a 2n bit fixed permutation based compression function and applies chopMD domain extension with specific padding. – We show under the assumption that the underlying permutations is a 2nbit random permutation, JH mode of operation with output length 2n − s bits, is indifferentiable from a random oracle with distinguisher’s advantage 2 3 bounded by O( q2sσ + 2qn ) where σ is the total number of blocks queried by distinguisher. – We show that the padding rule used in JH is essential as there is a simple indifferentiablity distinguisher (with constant query complexity) against JH mode of operation without length padding outputting n bit digest. – We prove that a little modification (namely chopping different bits) of JH mode of operation enables us to construct a hash function based on random permutation (without any length padding) with similar bound of sponge constructions (with fixed output size) and with same efficiency. – On the other hand, we improve the preimage attack of query complexity 2510.3 due to Mendel and Thompson. Using multicollisions in both forward and reverse direction, we show a preimage attack on JH with n = 512, s = 512 in 2507 queries to the permutation. Keywords: JH, SHA-3 candidate, Indifferentiability, chop-MD, random permutation.

1 Introduction Designing secure hash function is a primary objective of symmetric key cryptography. Popular methods to build a hash function involve two steps. First, one designs a

Supported in part by the National Science Foundation, Grant CNS-0937267.

S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 168–191, 2010. c International Association for Cryptologic Research 2010

Security Analysis of the Mode of JH Hash Function

169

compression function f : {0, 1}m → {0, 1}n where m > n. Then a domain extension algorithm that utilizes f as a black box1 is applied to implement the hash function H f : {0, 1}∗ → {0, 1}n. This is also known as design or mode of the hash function. The well known Merkle-Damg˚ard domain extension technique is very popular as it preserves the collision resistance property of the compression function: If f is collision resistant then so is H f . This enables the designers to focus on designing collision resistant compression functions. I NDIFFERENTIABILITY. While collision resistance remains an essential property of a cryptographic hash function, current usage indicates that it no more suffices the modern security goals. Today hash functions are used as PRFs, MACs, (2nd) preimage-secure or even as to replace Random Oracles in different Cryptographic Protocols. In [6], Coron et al considered the problem of designing secure cryptographic hash function based on the indifferentiability framework of Maurer et al [15]. Informally speaking, to prove indifferentiablity of an iterated hash function H (based on some ideal primitive f ), one has to design a simulator S. The job of S is to simulate the behavior of f while maintaining consistency with the random oracle R. If no distinguisher D can distinguish the output distribution of the pair (H f , f ) from that of (R, S R ), the construction H is said to be indifferentiable from a Random Oracle (RO). By proving indifferentiability, we are guaranteed that there is no trivial flaw in the design of the hash function; the design is secure against generic attacks. Today, indifferentiability is considered to be a desirable property of any secure hash function design. Coron et al showed in [6], the design principle (Strengthened Merkle-Damg˚ard ) behind the current standard hash functions like MD5 or SHA-1 does not satisfy indifferentiability from RO property. They also proved that different variant of MD constructions, including chopped MD constructions can be proven indifferentiable from a Variable Input Length Random Oracle if the compression function is constructed as an ideal component like Fixed Input Length Random Oracle or from Ideal Cipher with Davis Meyer technique. Subsequently, authors of [2,4,9,12] proved indifferentiability of different constructions of iterated hash functions. In [5], Chang and Nandi proved an indifferentiability bound beyond birthday bound for chopped MD constructions under the assumption that the compression function is a fixed input length random oracle. In 2007, NIST announced a competition for a new hash function standard, to be called SHA-3. 64 designs were submitted and after an internal review of the submissions, 51 were selected for meeting the minimum submission requirements and accepted as the First Round Candidates. Recently, NIST declared the names of 14 candidates for the second round of the competition. One of these candidates will win the competition and eventually become the next standard cryptographic hash function. Hence, it is essential for these candidate designs to be indifferentiable from an RO to guarantee its robustness against generic attacks. In this paper, we consider the mode of operation of the JH hash function, one of the second round candidates of SHA3 competition. It uses a novel construction, somewhat reminiscent of a sponge construction [4], to build a hash algorithm out of a single, large, fixed permutation using chopped-MD domain extension [21]. We also consider a little 1

The domain extension can be applied independent of compression functions except that it depends on the parameters m and n.

170

R. Bhattacharyya, A. Mandal, and M. Nandi

modified mode of operation of JH where the chopping is done on the other bits. For a formal and detailed description of mode of operation of JH and the modified mode of operation, we refer the reader to Section 2. Although the mode of JH is novel, it has withstand many cryptanalysis attempts so far. The only noticeable attack is due to Mendel and Thompson who has recently shown a preimage attack on JH mode of operation through finding r- multicollisions in the forward direction of JH mode [16]. The query complexity of their attack is 2510.3 to get a preimage of JH outputting 512-bits. 1.1 Our Result In this paper we examine the indifferentiability and preimage resistance of JH mode of operation in 2n bit random permutation model. Let s denote the number of chopped bits. We extend the technique of Chang and Nandi [5] to random permutation model. We prove that under the assumption that the fixed permutation of JH is a random permutation, JH mode of operation with specific length padding is indifferentiable from random 2 3 oracle with distinguisher’s advantage bounded by O( q2sσ + 2qn ). When s = 3n/2 (as in case of JH hash function with 256 bit output), our result gives beyond the birthday barrier security guarantee for JH 2 . This implies that finding collision in the output is not enough to distinguish a Random Oracle from JH hash function with n/2-bit output. Although chopMD constructions do not need the length padding in general, we show the padding is essential for JH mode. We construct one indifferentiability attacker, working in constant number of queries against JH mode of operation without length padding at last block with n-bit output. This result also shows that the method used in [4] to prove indifferentiability of sponge constructions (where length padding in last block is not required) based on random permutations cannot be readily extended to prove indifferentiability of JH. Next we consider the preimage resistance of JH mode of operation and improve the preimage attack of Mendel and Thompson [16]. Our preimage attack works with query complexity 2507 for finding a preimage of 512-bit JH hash function. Even though it marginally reduces the complexity of the previous known attack (with 2510.3 queries), theoretically the new attack requires asymptotically less complexity. Looking ahead, we exploit the multicollision in both forward and backward direction unlike in only forward direction used in [16]. Simultaneously, we look at other constructions, modifying JH mode of operation, where the chopping is done on the first instead of last s bits. – We show that when the length of longest query is less than 2n/2 , then the modified JH mode of operation without the length padding is indifferentiable from an RO q2 ) where q is the maximum with distinguisher’s advantage bounded by O( 2min(s,n) number of queries made by the distinguisher. – We show one indifferentiability attacker against modified JH mode of operation with Ω(2n/2 ) query complexity. This shows for s ≥ n the previous security bound is actually optimal. 2

According to birthday paradox, for a uniform random function with n-bit digest, collision can n be found with significant probability in O(2 2 ) queries. This is known as the birthday barrier n as security against more than O(2 2 ) queries is non-trivial; when at all possible.

Security Analysis of the Mode of JH Hash Function

IV

M1

M2

M3

M

f

f

f

f

171

C f (M )

Fig. 1. Merkle-Damg˚ard mode of operation based on compression function f

– If we set s = n, we get a random permutation based secure mode of operation with n-bit digest using 2n bit permutation. We note that this construction is atleast as secure as the sponge construction based on 2n bit random permutation. where 2 the indifferentiability bound is O( σ2n ) [4]. Here σ is the number blocks that the adversary queries. On a secondary note, even though our proof techniques for indifferentiable security bounds are closely related to the techniques used in [5,12], we give a more formal argument behind some implicit assumptions made over there. The rest of the paper is organized as follows. In next section, we mention the notations, formal description of JH mode of operation and modified mode of operation, a short introduction to Indifferentiability of hash functions and some useful definitions and facts. In Section 3, we build our tools for extending Chang and Nandi’s proof to random permutation model. For simplicity of of explanation, first we describe the indifferentiability of modified JH mode without length padding at last block in Section 4 followed by indifferentiability of original JH mode with padding in Section 5. In Section 6 and Section 7, we describe our indifferentiability distinguisher against JH mode of operation and modified mode of operation without the padding. Finally in Section 8, we present our improved preimage attack on JH mode of operation with padding.

2 Preliminaries In this section we describe the notations and definitions used throughout the paper. Let us begin with a formal definition of mode of operation. Mode of Operation: Informally speaking, a mode of operation is an algorithm to construct a hash function from a compression function. Definition 1. A mode of operation C with oracle access to compression function f {0, 1}m → {0, 1}n an algorithm which defines a function C f : {0, 1}∗ → {0, 1}n. Let IV ∈ {0, 1}n be a fixed initial value. It is well known that given a compression function f : {0, 1}m → {0, 1}n, Merkle-Damg˚ard mode of operation is defined as M Df (m1 m2 . . . ml ) = f (f (. . . f (f (IV m1 )m2 ) . . .)ml ) where m1 , m2 , . . . ml ∈ {0, 1}m−n.

172

R. Bhattacharyya, A. Mandal, and M. Nandi

There is a subtle difference between a hash function and a mode of operation. The mode of operation is actually a domain extension algorithm. If we supply a particular compression function f to the mode of operation algorithm we get a particular hash function. So when we think about a hash function, the compression function is fixed. JH Mode of Operation: The compression function of JH, f π : {0, 1}3n → {0, 1}2n is defined as follows: f π (h1 h2 m) = π(h1 (h2 ⊕ m)) ⊕ (m0n ) where h1 , h2 , m ∈ {0, 1}n and π : {0, 1}2n → {0, 1}2n is a fixed permutation. m h1 h2

π

h1 h2

Fig. 2. The JH compression function

The JH mode of operation based on a permutation π is the chopMD mode of operation based on the above compression function f π . The usual Merkle-Damg˚ard technique is applied on f π and the output of the hash function is the first 2n − s bits of the final f π query output. For any, 0 ≤ s ≤ |m|, C HOP R s (m) is defined as mL where m = mL mR and |mR | = s. Formally the JH mode of operation based on a permutation π with initial value IV1 IV2 is defined as π

JH π (·) : ({0, 1}n )+ → {0, 1}2n−s ≡ C HOP R s (M Df (·)). π

Where, M Df is the Merkle-Damg˚ard mode of operation with initial value as IV1 IV2 and compression function as f π . According to [21], typically s = n. Also it is suggested to have s ≥ n. We also define a modified version of JH mode of operation (referred as JH throughout the paper) where instead of chopping right most s bits we chop left most s bits. Let for 0 ≤ s ≤ |m|, C HOP Ls (m) is defined as mR where m = mL mR and |mL | = s. π

JH π (·) : ({0, 1}n )+ → {0, 1}2n−s ≡ C HOP L s (M Df (·)). Throughout the paper JH-t denotes the JH mode of operation with t bit output. Similarly JH -t denotes JH mode of operation with t bit output. Padding Rule: To encode messages whose lengths are not multiple of block size (n bit) we need some padding rule, so that padded message becomes a multiple of block size. A simple padding rule can be zero padding, that is adding sufficient number of zero bits so that the padded message becomes a multiple of block size, even though this is not secure. We will see as in the case of JH a well designed padding rule leads to additional security guarantee.

Security Analysis of the Mode of JH Hash Function

173

Definition 2. A padding rule P is a tuple of two efficiently computable functions P ≡ (PAD : {0, 1}∗ → ({0, 1}n )+ , D EPAD : ({0, 1}n)+ → {0, 1}∗ ∪ {⊥}) such that for any M ∈ {0, 1}∗ we have D EPAD(PAD (M )) = M. D EPAD(y) outputs ⊥ if there exists no M ∈ {0, 1}∗ such that, PAD (M ) = y. The function PAD takes a message of arbitrary length and outputs the padded message which is multiple of block length. Where as, the function D EPAD takes the padded message which is multiple of block length and outputs the original message. Normally, when we specify a padding rule we only specify the function PAD, but usually definition of D EPAD can be trivially derived from the description of PAD. In our context, we are interested in a specialized class of padding rules, namely with the following additional properties. 1. |PADn(M)| = |M| n + 1. 2. For any M ∈ ({0, 1}n)+ , LB(M ) ⊆ {0, 1}n be the set of n-bit elements (possible last blocks) such that, D EPAD(M m) =⊥ for any m ∈ LB(M ). We want, |LB(M )| to be small for all M ∈ {0, 1}∗(smaller than some constant). Here, if x ∈ {0, 1}∗, |x| denotes the length of x in bits. Also, if A is a set, |A| denotes the number of elements in A. Any padding which satisfies the above two properties is called good padding rule. Now we are ready to define the JH mode of operation with padding. Definition 3. With respect to a padding rule P = (PAD, D EPAD) and a permutation π, the JHP mode of operation is defined as follows, π

JHPπ (·) : {0, 1}∗ → {0, 1}2n−s ≡ JH π (PAD (·)) ≡ C HOP Ls (M Df (PAD (·))). The JH Padding rule: In [21], the following padding rule is mentioned for JH hash function with block length n = 512. Suppose that the length of the message M is (M ) bits. Append the bit 1 to the end of the message, followed by 384 − 1 + (−(M ) mod 512) zero bits. Then the binary representation of (M ) in big endian form is concatenated. This padding rule ensures that at least one block of 512 bits is padded after the message (irrespective of whether the message length is multiple of 512) . It is easy to check the above padding rule is actually a good padding rule with |LB(M )| ≤ 2. Indifferentiability: The notion of indifferentiability, introduced by Maurer et. al. in [15], is a generalization of classical notion of indistinguishability. Loosely speaking, if an ideal primitive G is indifferentiable with a construction C based on another ideal primitive F , then G can be safely replaced by C F in any cryptographic construction. In other terms if a cryptographic construction is secure in G model then it is secure in F model.

174

R. Bhattacharyya, A. Mandal, and M. Nandi

Definition 4. Advantage Let Fi , Gi be probabilistic oracle algorithms. We define advantage of the distinguisher A at distinguishing (F1 , F2 ) from (G1 , G2 ) as AdvA ((F1 , F2 ), (G1 , G2 )) = | Pr[AF1 ,F2 = 1] − Pr[AG1 ,G2 = 1]|. Definition 5. Indifferentiability [15] A Turing machine C with oracle access to an ideal primitive F is said to be (t, qC , qF , ε) indifferentiable from an ideal primitive G if there exists a simulator S with an oracle access to G and running time at most t, such that for any distinguisher D, it holds that AdvD ((C F , F ), (G, S G )) < ε. The distinguisher makes at most qC queries to C or G and at most qF queries to F or S. Similarly, C F is said to be (computationally) indifferentiable from G if running time of D is bounded above by some polynomial in the security parameter k and ε is a negligible function of k.

F

C

S

G

D Fig. 3. The indifferentiability notion

We stress that in the above definition G and F can be two completely different primitives. As shown in Fig 3 the role of the simulator is to not only simulate the behavior of F but also remain consistent with the behavior of G. Note that, the simulator does not know the queries made directly to G, although it can query G whenever it needs. In this paper G is a variable input length Random oracle and F is a random permutation. Intuitively a random function (oracle) is a function f : X → Y chosen uniformly at random from the set of all functions from X to Y . Definition 6. f : X → Y is said to be a random oracle if for each x ∈ X the value of f (x) is chosen uniformly at random from Y . More precisely, Pr[f (x) = y | f (x1 ) = y1 , f (x2 ) = y2 , . . . , f (xq ) = yq ] =

1 |Y |

where |Y | is finite and x ∈ / {x1 , . . . , xq } and y, y1 , . . . , yq ∈ Y . A random permutation is similar to random oracle except that it is a permutation. So similarly one can view a random permutation π : X → X as a permutation chosen uniformly at random from the set of all permutation from X to X.

Security Analysis of the Mode of JH Hash Function

175

Definition 7. π : X → X is said to be a random permutation if for each x ∈ X we have, Pr[π(x) = y | π(x1 ) = y1 , π(x2 ) = y2 , . . . , π(xq ) = yq ] =

1 |X| − q

where |X| is finite and x ∈ / {x1 , . . . , xq }, y1 , . . . , yq ∈ X and y ∈ X \ {y1 , . . . , yq }. Definition 8. F H : {0, 1}2n → {0, 1}n is a function which outputs first n bit of any 2n bit number. Similarly, LH : {0, 1}2n → {0, 1}n is a function which outputs last n bit of any 2n bit number. Often we refer F H as left half and LH as right half. Below we state a few basic inequalities as a lemma which will be useful later. Lemma 1. For any y ∈ {0, 1}2n−s, c ∈ {0, 1}n, S ⊆ {0, 1}2n and T ⊆ {0, 1}n we have, 1. |{z ∈ {0, 1}s : yz ∈ S}| ≤ |S| and |{z ∈ {0, 1}s : zy ∈ S}| ≤ |S| 2. |{z ∈ {0, 1}s : F H(yz) ⊕ c ∈ T }| ≤ 2n |T | and |{z ∈ {0, 1}s : F H(zy) ⊕ c ∈ 2s T }| ≤ 2min(s,n) |T |

3 Main Tools for Bounding Distinguisher’s Advantage We follow a similar approach to [5,12] for proving indifferentiability security over here. We start with modeling the attacker. Then we construct a simulator, for which the information the attacker sees remain statistically close whether the attacker is interacting with JH Hash function and the random permutation it is based on, or it is interacting with a random function and the simulator. Compared to [5] we do not restrict ourselves to some particular type of irreducible views. The underlying small domain oracle being a random permutation we also need to answer inverse queries. Consistent Oracles Intuitively, a small domain oracle is said to be consistent to a big domain oracle with respect to some mode of operation if querying the mode of operation based on the small domain oracle is equivalent to querying the big domain oracle. Definition 9. A (small domain) probabilistic oracle algorithm G2 is said to be consistent to a (big domain) probabilistic oracle algorithm G1 with respect to M O-mode of operation if for any point x (from the big domain), we have Pr[G1 (x) = M OG2 (x)] = 1. The notion of consistent oracles is nothing new. In fact, in all the previous works e.g. [4,5,6,7,9,10,12] and many others, the simulators mentioned over there are always consistent to the big domain oracle (or they abort, when they fail to be consistent). Also note, π is always consistent to JH π with respect to JH-mode of operation. Evaluatable queries There might be some point x for which the value of M OG2 (x) gets fixed by the relations G2 (x1 ) = y1 , · · · , G2 (xq ) = yq . Such x’s are called evaluatable by the relations G2 (x1 ) = y1 , · · · , G2 (xq ) = yq . Formally,

176

R. Bhattacharyya, A. Mandal, and M. Nandi

Definition 10. A point x ∈ Domain(M OG2 ) is called evaluatable with respect to M O-mode of operation (based on G2 ) by the relations G2 (x1 ) = y1 , · · · , G2 (xq ) = yq , if there exist a deterministic algorithm B such that, Pr[M OG2 (x) = B(x, (x1 , y1 ), · · · , (xq , yq ))|G2 (x1 ) = y1 , · · · , G2 (xq ) = yq ] = 1. Modeling the adversary In this paper the adversary is modeled as a deterministic, computationally unbounded3 distinguisher A which has access to two oracles O1 and O2 . Recall that A tries to distinguish the output distribution of (JH π , π) from that of (R, S R ). We say A queries O1 when it queries the oracle JH π or R and queries O2 when it queries the oracle π or S R . As we model π as a random permutation, the distinguisher is allowed to make inverse queries to oracle O2 . We denote the forward query as (O2 (+, ·, ·)) and inverse query as (O2 (−, ·, ·)). The view V of the distinguisher is the list query-response tuple ((M1 , h1 ), . . . , (Mq1 , hq1 ), (x11 , x21 , y11 , y12 ), . . . , (x1q2 +q3 , x2q2 +q3 , yq12 +q3 , yq22 +q3 )) (1) Where, O1 (M1 ) = h1 , . . . , O1 (Mq1 ) = hq1 O2 (+, x11 , x21 ) = (y11 , y12 ), . . . , O2 (+, x1q2 , x2q2 ) = (yq12 , yq22 ) O2 (−, yq12 +1 , yq22 +1 ) = (x1q2 +1 , x2q2 +1 ), . . . , O2 (−, yq12 +q3 , yq22 +q3 ) = (x1q2 +q3 , x2q2 +q3 ) Definition 11. For any view V as in (1), we define Input View I(V) and Output View O(V) as follows, I(V) = (M1 , . . . , Mq , (x11 , x21 ), . . . , (x1q2 , x2q2 ), (yq12 +1 , yq22 +1 ), . . . , (yq12 +q3 , yq22 +q3 )) O(V) = (h1 , . . . , hq , (y11 , y12 ), . . . , (yq12 , yq22 ), (x1q2 +1 , x2q2 +1 ), . . . , (x1q2 +q3 , x2q2 +q3 )) Below we point out some important observations, 1. V, I(V) and O(V) are actually ordered tuples. That means, the position of any element inside the tuple actually denotes the corresponding query number. So, in general O1 (.), O2 (+, (., .)) and O2 (−, (., .)) queries should not be grouped together. But we write it like this to avoid further notational complexity. 2. For any deterministic non-adaptive attacker I(V) is always fixed. 3. For any deterministic adaptive attacker I(V) is actually determined by O(V) [18]. 4. For any deterministic attacker (adaptive or non-adaptive) V is actually determined by O(V). Irreducible Views Loosely speaking an irreducible view does not contain any duplicate query, and none of the O1 queries are evaluatable from the O2 queries present in the view. 3

Any deterministic adversary with unlimited resource is as powerful as a randomized adversary [18].

Security Analysis of the Mode of JH Hash Function

177

Definition 12. A view, V = ((M1 , h1 ), . . . , (Mq1 , hq1 ), (x11 , x21 , y11 , y12 ), . . . , (x1q2 +q3 , x2q2 +q3 , yq12 +q3 , yq22 +q3 )) is called irreducible if – – – –

M1 , . . . , Mq1 are distinct, (x11 , x21 ), . . . , (x1q2 +q3 , x2q2 +q3 ) are distinct, (y11 , y12 ), . . ., (yq12 +q3 , yq22 +q3 ) are distinct, M1 , · · · , Mq1 are not evaluatable by the relations π(x11 , x21 ) = (y11 , y12 ), . . . , π(x1q2 +q3 , x2q2 +q3 ) = (yq12 +q3 , yq22 +q3 ) with respect to M D-mode of operation based on f π .

Also, any view which is not irreducible is called reducible view. Definition 13. For an attacker A, an output view OV is called irreducible if the corresponding view V is irreducible. Any output view which is not irreducible is called reducible output view. Let OV A O1 ,O2 be the random variable corresponding to the output view of attacker A, A be the random variable correobtained after interacting with O1 , O2 . Also, VO 1 ,O2 sponding to the view of attacker A, obtained after interacting with O1 , O2 . The theorem below shows, if the probability distributions for all possible output views in two scenarios are close, then the attacker advantage is small. Theorems similar to this were mentioned in literatures before [5,12,18]. The only difference is, here we concentrate on output views instead of views. In fact, for a fixed attacker A, there is always an one to one mapping between any view and output view. Theorem 1. Fi , Gi be the probabilistic oracle algorithms. If for an attacker A, the relation A Pr[OV A F1 ,F2 = OV] ≥ (1 − ε) Pr[OV G1 ,G2 = OV], holds for all possible output views OV, then we have, AdvA ((F1 , F2 ), (G1 , G2 )) ≤ ε. In general it is hard to show the necessary condition of Theorem 1 for all possible output views. Theorem 2 proves that it is sufficient to work with irreducible output views instead of all possible output views. In fact, one can reduce any output view to an irreducible output view and then can apply Theorem 1. Theorem 2. If there exists a simulator S R consistent to a random oracle R with respect to JH-mode of operation, such that for any attacker A making at most q queries, the relation A Pr[OV A JH π ,π = OV] ≥ (1 − ε) Pr[OV R,S R = OV], holds for all possible irreducible output views OV (with respect to A); then for any attacker A making at most q queries, we have AdvA ((JH π , π), (R, S R )) ≤ ε.

178

R. Bhattacharyya, A. Mandal, and M. Nandi

Proof. This theorem differs from Theorem 1, only in the aspect that here probability distributions are close only for the irreducible output views. For any reducible output view OV and the corresponding attacker A, let V be the view fixed by OV and A. Let, V be the view obtained by deleting the computable O1 queries and repeated O2 queries of V. The input view I(V ) actually specifies a non-adaptive attacker A . The output view OV = O(V ) is actually an irreducible output view with respect to A . As, π is consistent to JH π and S R is consistent to R with respect to JH-mode of operation we have,

A Pr[OV A JH π ,π = OV] = Pr[OV JH π ,π = OV ]

A Pr[OV A R,S R = OV] = Pr[OV R,S R = OV ].

Note, A actually makes less number of queries compared to A. Hence, even for reducible views, we have

A Pr[OV A JH π ,π = OV] = Pr[OV JH π ,π = OV ]

≥ (1 − ε) Pr[OV A R,S R = OV ]

= (1 − ε) Pr[OV A R,S R = OV]. So the required condition of Theorem 1 remains true. Now, by applying Theorem 1 we get the result. In many previous works e.g. [4,5,12] ideas similar to Theorem 2 have been used implicitly. But to our knowledge, we are the first to formalize it.

4 Indifferentiability Security Analysis of J H 4.1 Simulator and Its Interpolation Probability The simulator maintains one partial permutation e1 : {0, 1}2n → {0, 1}2n initially empty, one partial function e∗1 : ({0, 1}n)∗ → {0, 1}2n initialized with e∗1 (φ) = IV1 IV2 . It also maintains two sets C1 , C2 initialized as C1 = {IV1 } and C2 as empty. Let I1 denotes the set of points on which e1 is defined, O1 denotes the output points of e1 . F H, LH : {0, 1}2n → {0, 1}n be the two functions outputting first n-bits and last n-bits of any 2n-bit number respectively. The goal of the simulator is to remain consistent to R with respect to JH-mode of operation while behaving like a random permutation. Before describing the simulator, we give some insight informally on how the simulator works. 1. In the partial permutation e1 , the simulator maintains its history. 2. In the partial function e∗1 , the simulator maintains the list of queries evaluatable by e1 with respect to JH-mode of operation. 3. C1 is the set of first half (first n-bits) of e∗1 outputs. 4. Even though e∗1 is evaluatable by the partial permutation e1 , it might happen that e1 is also defined at some points which do not help in evaluating e∗1 . C2 is the set of first half of such points.

Security Analysis of the Mode of JH Hash Function

179

5. The simulator makes sure, C1 and C2 always remain mutually exclusive. 6. Because of 5, there are no so called accidents. That means when the attacker is interacting with (R, S R ) and it wants to evaluate O1 (m1 · · · m ) through a series of O2 queries, she will always have to make a series of queries starting with O2 (IV1 , IV2 ⊕ m1 ). The attacker can not hope to skip a query in the middle. We note at any point of time, the following conditions hold. |O1 | ≤ q2 + q3 and |I1 | ≤ q2 + q3 and |C1 ∪ C2 | ≤ q2 + q3 and |C1 | ≤ q2 + 1 Theorem 3. For any attacker A against JH and any irreducible output view OV with respect to it, we have Pr[OV A R,S R = OV] ≤

1 × 2(2n−s)q1 +2n(q2 +q3 ) (1 −

1 2(q2 +q3 ) q2 ) 2min(s,n)

×

1 (1 −

2(q2 +q3 ) q3 ) 2n

where 2s > 2(q2 + q3 )2min(s,n) . Proof. As OV is irreducible, R query outputs are independent of the other queries, 1 hence R being a Random Function for q1 many R queries we get the term 2(2n−s)q . 1 R For an S (+, ·, ·) queries, simulator is giving output as wy, there are two scenarios. 1. y is distributed uniformly over {0, 1}2n−s and w is distributed uniformly over {0, 1}s \ {z ∈ {0, 1}s : zy ∈ O1 or F H(zy) ⊕ (x ⊕ x2 ) ∈ C1 ∪ C2 }. 2. wy is distributed uniformly over {0, 1}2n \ O1 . By Lemma 1 we know, |{z ∈ {0, 1}s : yz ∈ O1 }| ≤ |O1 | ≤ (q2 + q3 ). On the other hand, using Lemma 1 here we have, |{z ∈ {0, 1}s : F H(zy) ⊕ (x ⊕ x2 ) ∈ C1 ∪ C2 }| ≤ ≤

2s 2min(s,n) s

|C1 ∪ C2 |

2 (q2 + q3 ). 2min(s,n)

Hence, for 2s > 2(q2 + q3 )2min(s,n) and any (wy) ∈ {0, 1}2n we have, Pr[S R (+, ·, ·) query outputs (wy)] 1 1 1 ≤ max , 2s 22n−s 2s − 2min(s,n) (q2 + q3 ) − (q2 + q3 ) 22n − (q2 + q3 ) 1 1 ≤ 2n 2 +q3 ) 2 (1 − 2(q min(s,n) ) 2

R

For S (−, ·, ·) query giving output as z1 z2 we know, 1. z1 is uniformly distributed over {0, 1}n \ C1 2. z2 is uniformly distributed over {0, 1}n \ {w ∈ {0, 1}n : z1 w ∈ I1 }

180

R. Bhattacharyya, A. Mandal, and M. Nandi

We know, |C1 | ≤ (q2 + 1) and |I1 | ≤ (q2 + q3 ). Hence, for any (z1 z2 ) ∈ {0, 1}2n we have 1 1 2n − (q2 + 1) 2n − (q2 + q3 ) 1 1 ≤ 2n 2(q 3) 2 1 − 2 +q n

Pr[S R (−, ·, ·) query outputs (z1 z2 )] ≤

2

Hence, all together we get Pr[OV A R,S R = OV] ≤

1 2(2n−s)q1 +2n(q2 +q3 )

×

1 (1 −

2(q2 +q3 ) q2 ) 2min(s,n)

×

1 (1 −

2(q2 +q3 ) q3 ) 2n

Next we wish to show that our simulator is efficient. The condition 2min(s,n) > 4(q2 + q3 )2n ensures the G OTO statement at Step 5 in forward query in Figure 4 gets executed with probability less than 12 at each iteration. We also know |O1 | ≤ (q2 + q3 ) and |C1 ∪ C2 | ≤ (q2 + q3 ). Hence except with negligible probability, Step 5 takes at most O(q2 + q3 ) time to satisfy the condition. The same argument holds for other G OTO statements as well. Hence we get the following result. Theorem 4. If 2min(s,n) > 4(q2 + q3 ), the simulator S R takes at most O(q2 + q3 ) time to answer any query (except with exponentially small probability).

S R (+, x1 , x2 )

S R (−, y1 , y2 )

– I F e1 (x1 x2 ) = z R ETURN z – I F there exists M , s.t e∗1 (M ) = x1 x 1. m = x ⊕ x2 2. y = R(M m) ⊕ C HOP L(m0n ) 3. w ∈R {0, 1}s 4. z = wy 5. I F ( z ∈ O1 O R F H(z) ⊕ m ∈ C1 ∪ C2 ) • G OTO 3 6. C1 = C1 ∪ {F H(z) ⊕ m} 7. e∗1 (M m) = z ⊕ (m0n ) 8. e1 (x1 x2 ) = z 9. R ETURN z – E LSE 10. z ∈R {0, 1}2n 11. I F z ∈ O1 • G OTO 10 12. e1 (x1 x2 ) = z 13. C2 = C2 ∪ {x1 } 14. R ETURN z

– I F there exists z1 z2 such that e1 (z1 z2 ) = y1 y2 • R ETURN z1 z2 – E LSE 1. z1 ∈R {0, 1}n 2. I F z1 ∈ C1 • G OTO 1 3. z2 ∈R {0, 1}n 4. I F z1 z2 ∈ I1 • G OTO 3 5. C2 = C2 ∪ {z1 } 6. R ETURN z1 z2

Fig. 4. Simulator for JH

Security Analysis of the Mode of JH Hash Function

181

4.2 Interpolation Probability of OV A J H π ,π In Theorem 3 we have shown upper bound for Pr[OV A R,S R = OV] for any irreducible output views OV. The Theorem below gives a lower bound for Pr[OV A JH π ,π = OV] for any irreducible output view OV. Later we will apply Theorem 2 to prove the indifferentiability bound using these upper and lower bounds. Theorem 5. For any attacker A and any irreducible output view OV with respect to it, we have Pr[OV A JH π ,π = OV] ≥

1 2(2n−s)q1 +2n(q2 +q3 )

× (1 −

2σ 2 2q1 (q1 + q2 + q3 ) ) × (1 − ). 22n 2min(s,n)

The proof of the above theorem involves two steps. Starting with an attacker A against π π JH π ≡ C HOP L(M Df ) we construct another attacker A against M Df which essentially makes same queries as A but has access to unchopped output view. – First we define the notion of MD-irreducible view (irreducible view with respect to Merkle-Damg˚ard mode of operation) and then we show for the output view OV MD corresponding to any MD-irreducible view we actually have, 1

π Pr[OV A MD f ,π = OV MD ] ≥

22nq1 +2n(q2 +q3 )

× (1 −

2σ 2 ) 22n

– In Theorem 7 we show, given an irreducible output view OV and an attacker A, if OVMD is the set of all MD-irreducible output views for the attacker A such that,

A Pr[OV A JH π ,π = OV|OV MD f π ,π = OV MD ] = 1

for all OV MD ∈ OVMD ; then |OVMD | ≥ 2sq1 × (1 −

2q1 (q1 + q2 + q3 ) ) 2min(s,n)

The above two results will readily imply Theorem 5. Definition 14. The set of relations M Df

O2

(M1 m1 ) = g1 , . . . , M Df

O2 (x11 , x21 )

=

O2

(Mq1 mq1 ) = gq1

(y11 , y12 ), . . . , O2 (x1q2 +q3 , x2q2 +q3 )

= (yq12 +q3 , yq22 +q3 ) . . . Rel A

is MD-irreducible if, 1. g1 ⊕ (m1 0n ), . . . , gq1 ⊕ (mq1 0n ), y11 y12 , . . . , yq12 +q3 yq22 +q3 are all different. 2. For i = 1, . . . , q1 , one of the following two conditions hold (a) F H(gi ) is different from x11 , . . . , x1q2 +q3 and IV1 . O2 (b) Σ be the set of all message blocks present in M Df queries. If F H(gi ) = IV1 , then LH(gi ) ⊕ IV2 ∈ Σ. If F H(gi ) = x1j for some 1 ≤ j ≤ q2 + q3 , 2 then LH(gi ) ⊕ xj ∈ Σ.

182

R. Bhattacharyya, A. Mandal, and M. Nandi

3. M1 m1 , . . . , Mq1 mq1 are not evaluatable by the relations O2 (x11 , x21 ) = (y11 , y12 ), . . . , O2 (x1q2 +q3 , x2q2 +q3 ) = (yq12 +q3 , yq22 +q3 ) with respect to M D-mode of operations based on f O2 . We also say the tuple, v = ((M1 m1 , g1 ), . . . , (Mq1 mq1 , gq1 ), (x11 , x21 , y11 , y12 ), . . . , (x1q2 +q3 , x2q2 +q3 , yq12 +q3 , yq22 +q3 )) is MD-irreducible if and only if the corresponding Rel-A is MD-irreducible. The definition above is similar to the definition of irreducible view (Definition 12). But here we are interested in the view without any chopping. Note, condition 2 ensures O2 Mi mi is not evaluatable even with the help of the relations M Df (Mj mj ) = hj for j = i. Loosely speaking, the Theorem below gives a lower bound of the probability of getting a particular MD-irreducible tuple v, when a attacker interacts with π (M Df , π). Theorem 6. Let a tuple v = ((M1 m1 , g1 ), . . . , (Mq1 mq1 , gq1 ), (x11 , x21 , y11 , y12 ), . . . , (x1q2 +q3 , x2q2 +q3 , yq12 +q3 , yq22 +q3 )) is MD-irreducible, then the number of permutations π such that, π

π

f f M DIV (M1 m1 ) = g1 , . . . , M DIV (Mq1 mq1 ) = gq1 1 IV2 1 IV2

π(x11 , x21 ) = (y11 , y12 ), . . . , π(x1q2 +q3 , x2q2 +q3 ) = (yq12 +q3 , yq22 +q3 ) . . . Rel B is at least

|Π| 22nq1 +2n(q2 +q3 )

× (1 −

2σ 2 ), 22n

where |Π| = (22n )! is the total number of permutations from {0, 1}2n to {0, 1}2n and σ is the total number of message blocks queried. Also for a MD-irreducible tuple v, the probability that Rel B holds is at least 1 22nq1 +2n(q2 +q3 )

× (1 −

2σ 2 ), 22n

when π is a random permutation. π

f Proof. Let D be the set of all elements from ({0, 1}n )+ whose M DIV values are 1 IV2 determined from the relations

π(x11 , x21 ) = (y11 , y12 ), . . . , π(x1q2 +q3 , x2q2 +q3 ) = (yq12 +q3 , yq22 +q3 ).

Security Analysis of the Mode of JH Hash Function

183

Since v is MD-irreducible, Mi mi ∈ / D for all 1 ≤ i ≤ q1 . let P denote the set of all nonempty prefixes of Mi ’s. More precisely, P = {M ∈ ({0, 1}n )+ : M is prefix of Mi for some 1 ≤ i ≤ q1 }. We enumerate the set P \ D ≡ {N1 , . . . , Nσ }. Note that, |P | + q1 ≤ i Mi . Now, we have σ = q2 + q3 + Mi ≥ q2 + q3 + |P | + q1 ≥ q1 + q2 + q3 + σ ≡ σ i O2

f Similar to the proof of Lemma 1 in [5], we can choose outputs of M DIV (N1 ), . . . , 1 IV2 O2

f (Nσ ) in at least M DIV 1 IV2

(22n − 2(q1 + q2 + q3 ))(22n − 2(q1 + q2 + q3 + 1)) . . . (22n − 2(q1 + q2 + q3 + σ − 1)) ways. (In the negative term, the factor 2 comes because, any output value should not be same as other output values and the next input value induced by the output value should not be same as other input values.) Hence, |{π : {0, 1}2n → {0, 1}2n such that π is a permutation and satisfies Rel B}| ≥ (22n − σ )! ×

σ −1

(22n − 2(q1 + q2 + q3 + i))

i=0

≥

2n

2

(2 )! 2σ |π| 2σ 2 2nσ × 2 × (1 − ) ≥ × (1 − ) 22nσ 22n 22n 22nq1 +2n(q2 +q3 )

Definition 15. With respect to an irreducible view V = ((M1 m1 , h1 ), . . . , (Mq1 mq1 , hq1 ), (x11 , x21 , y11 , y12 ), . . . , (x1q2 +q3 , x2q2 +q3 , yq12 +q3 , yq22 +q3 )) an MD-irreducible tuple v is said to be C HOP L-matching if v = ((M1 m1 , w1 h1 ), . . . , (Mq1 mq1 , wq1 hq1 ), (x11 , x21 , y11 , y12 ), . . . , (x1q2 +q3 , x2q2 +q3 , yq12 +q3 , yq22 +q3 )), for some q1 -tuple w = (w1 , . . . , wq1 ). Let MV be the set of all such C HOP L-matching MD-irreducible tuples. Theorem 7. For any irreducible view V = ((M1 m1 , h1 ), . . . , (Mq1 mq1 , hq1 ), (x11 , x21 , y11 , y12 ), . . . , (x1q2 +q3 , x2q2 +q3 , yq12 +q3 , yq22 +q3 )) we have,

q1 (q1 + q2 + q3 ) ). 2min(s,n) For the proof of this theorem, the reader is referred to the full version of the paper. |MV | ≥ 2sq1 × (1 −

184

R. Bhattacharyya, A. Mandal, and M. Nandi

Now we are ready to prove Theorem 5, with help of Theorem 6 and Theorem 7.Let V be the irreducible view determined by A and irreducible output view OV. Consider an Attacker A , which makes queries at the same input points as of A, but has access to O2 M Df instead of JH O2 . Hence, A A Pr[OV A Pr[VMD f π ,π = v] JH π ,π = OV] = Pr[VJH π ,π = V] = ≥

v∈MV

2

v∈MV

≥

1 2σ × (1 − 2n ) 2 22nq1 +2n(q2 +q3 ) 1

2σ 2 2q1 (q1 + q2 + q3 ) ) × 2sq1 × (1 − ) 2n 2 2min(s,n) 2σ 2 2q1 (q1 + q2 + q3 ) × (1 − 2n ) × (1 − ) 2 2min(s,n)

× (1 −

22nq1 +2n(q2 +q3 ) 1 = (2n−s)q +2n(q +q ) 1 2 3 2

4.3 Indifferentiability Security Bound We are now ready to prove the main result of this section. For any attacker A, making at most q1 , q2 , q3 queries to the oracles O1 , O2 (+, ·, ·), O2 (−, ·, ·) respectively we show an upper bound for AdvA . Theorem 8. The JH π -construction (with (2n − s)-bit output) based on a random permutation π is (O(q2 + q3 )), q1 , q2 + q3 , ) indifferentiable from a random oracle R, with 2σ 2 2q3 (q2 + q3 ) 2q2 (q2 + q3 ) + 2q1 (q1 + q2 + q3 )

≤ 2n + + , 2 2n 2min(s,n) where σ is the maximum number of message blocks queried, q1 is the maximum number queries to JH π or R, q2 + q3 is the maximum number of queries to π, π −1 or S R (+, ·, ·), S R (−, ·, ·). Here we also assume, q2 + q3 < 2min(s,n) /4. Proof. For any attacker A and an irreducible output view OV from Theorem 3 and Theorem 5 we have, Pr[OV A JH π ,π = OV]

2σ 2 2q3 (q2 + q3 ) 2q2 (q2 + q3 ) + 2q1 (q1 + q2 + q3 ) ≥ 1 − 2n + + 2 2n 2min(s,n) × Pr[OV A R,S R = OV]

Now, applying Theorem 2 we get the required result.

When maximum query length is smaller than 2n/2 , for any attacker A (making at most q many queries) against the JH construction we have AdvA = O(

q2 2min(s,n)

)

Security Analysis of the Mode of JH Hash Function

185

5 Indifferentiability Security Analysis of J HP In this section we prove the indifferentiability of JH mode of operation with padding. 5.1 Simulator and Its Interpolation Probability We describe our simulator in Fig 5. Similar to previous section, the following notation we used in describing the simulator. – Partial permutation e : {0, 1}2n → {0, 1}2n, initially empty. I denotes set of points where e is defined and O denotes the output points of e. – Partial function e∗ : ({0, 1}n)∗ → {0, 1}2n initialized to e∗ (φ) = IV1 IV2 . – Set C ⊆ {0, 1}n initialized to C = {IV1 } is the F H (first half) of e∗ outputs. For a padding rule P = (PAD , D EPAD) and M ∈ ({0, 1}n)+ , we recall LB(M ) ⊆ =⊥}. As in case of the actual JH {0, 1}n is defined as {m ∈ {0, 1}n : D EPAD(M m) padding rule we assume, |LB(M )| ≤ 2. We recall the design philosophy behind the JH simulator from Section 4.1. Over there the simulator was maintaining a list of evaluatable queries and their non-chopped outputs in the partial permutation e∗1 . When the simulator receives some query the goal of the simulators are three fold. 1. Give a random output keeping in mind the permutation property. 2. Do not create some new evaluatable query unless forced to do so. That means output of the simulator will never create a new evaluatable query with the exception of the following scenario. 3. It might happen, only the input of the simulator forces another new evaluatable query. (This happens if attacker is trying to find some O1 query output through O2 query.) If this happens, then adjust the output of the simulator so that it remains consistent to R, w.r.t. the new evaluatable query. One crucial point is, during one simulator query the simulator must prevent creation of more than one evaluatable query. Otherwise, the simulator can not remain consistent to both of them. In forward queries to JH simulator with s = n, when the attacker has forced creation of one new evaluatable query the LH (last half) of the possible output gets fixed by R response of that evaluatable query, but the simulator has control over F H output with which it makes sure, another evaluatable query is not created. Here the situation is reversed. F H gets fixed by R, the simulator has control only over LH. This is problematic, because only F H can lead to creation of more evaluatable queries (with one more message block after the current evaluatable query). In fact, in Section 6 the attacker against JH mode operation (without length padding at last block) exploits this fact. But the simulator can play with LH to change the actual evaluatable query (even though it can not prevent the creation.) By doing so, the simulator ensures the new evaluatable query is not a valid padded message, hence for that query the simulator does not need to be consistent with R. The simulator also need to be careful such that no new evaluatable queries of length (current evaluatable query length + 2) or more are created. However, that can easily be handled.

186

R. Bhattacharyya, A. Mandal, and M. Nandi

S R (+, x1 , x2 )

S R (−, y1 , y2 )

1. I F e(x1 x2 ) = z R ETURN z 1. 2. I F there exists M , s.t. e∗ (M ) = x1 x (a) m = x ⊕ x2 (b) I F M = φ A ND m ∈ LB(M ) 2. i. y = R(D EPAD(M m))⊕C HOP R(m0n ) ii. w ∈R {0, 1}s iii. z = yw (c) E LSE i. z ∈R {0, 1}2n (d) I F z ∈ O G OTO 2b (e) e∗ = e∗ (f) C = C (g) F OR E ACH i1 i2 ∈ I ∪ {x1 x2 } i. I F F H(z) ⊕ m = i1 C ONTINUE ii. I F LH(z) ⊕ i2 ∈ LB(M m) – G OTO 2b iii. I F i1 i2 = x1 x2 – o1 o2 = z iv. E LSE – o1 o2 = e(i1 i2 ) v. e∗ (M mLH(z) ⊕ i2 ) = (o1 ⊕ LH(z) ⊕ i2 )o2 vi. C = C ∪ {o1 ⊕ LH(z) ⊕ i2 } vii. F OR E ACH i1 i2 ∈ I ∪ {x1 x2 } – I F LH(z) ⊕ i2 = o1 ⊕ i1 • G OTO 2b (h) e∗ = e∗ (i) C = C (j) e∗ (M m) = z ⊕ (m0n ) (k) C = C ∪ {F H(z) ⊕ m} 3. E LSE (a) z ∈R {0, 1}2n (b) I F z ∈ O G OTO 3a 4. e(x1 x2 ) = z 5. R ETURN z

I F there exists z1 z2 such that e(z1 z2 ) = y1 y2 – R ETURN z1 z2 E LSE (a) z1 ∈R {0, 1}n (b) I F z1 ∈ C – G OTO 2a (c) z2 ∈R {0, 1}n (d) I F z1 z2 ∈ I – G OTO 2c (e) e(z1 z2 ) = y1 y2 (f) R ETURN z1 z2

Fig. 5. Simulator for JH with padding

The next two theorems describe the running time and interpolation probability upper bound corresponding to the simulator. Theorem 9. For any attacker A against JHPπ mode of operation and any irreducible output view OV with respect to it, we have Pr[OV A R,S R = OV] ≤

1 × 2(2n−s)q1 +2n(q2 +q3 ) (1 −

when (q2 + q3 + 3)2 < 2min(s,n) .

1 (q2 +q3 +3)2 q2 ) 2s

×

1 (1 −

(q2 +q3 +1)2 q3 ) 2n

Security Analysis of the Mode of JH Hash Function

187

Theorem 10. If 2(q2 + q3 + 3)2 < 2min(s,n) , the simulator S R takes at most O((q2 + q3 )2 ) time to answer any query (except with exponentially negligible probability). The proof of two theorems above is similar to the proof of Theorem 3 and Theorem 4. Due to space constraint we skip the proof and refer the reader to full version of the paper. 5.2 Interpolation Probability of OV A J H π ,π P

The following theorem is analogous to Theorem 5, used in Section 4. Theorem 11. For any attacker A and any irreducible output view OV with respect to it, we have Pr[OV A π ,π = OV] ≥ JHP

1 2(2n−s)q1 +2n(q2 +q3

×(1− )

2σ 2 2σq1 (q1 + q2 + q3 ) )×(1− ). 22n 2s

For the proof of above theorem we refer the reader to the full version. 5.3 Indifferentiability Security Bound Theorem 12. The JHPπ mode of operation (with (2n − s)-bit output) based on a random permutation π is (O((q2 + q3 )2 ), q1 , q2 + q3 , ) indifferentiable from a random oracle R, with

≤

2σ 2 q2 (q2 + q3 + 3)2 q3 (q2 + q3 + 1)2 2σq1 (q1 + q2 + q3 ) + + + , 2n s 2 2 2n 2s

where σ is the maximum number of message blocks queried, q1 is the maximum number queries to JHPπ or R, q2 + q3 is the maximum number of queries to π, π −1 or S R (+, ·, ·), S R (−, ·, ·). Here we also assume, 2(q2 + q3 + 3)2 < 2min(s,n) . Under reasonable assumptions, for an attacker making at most q queries with total σ many compression function invocations we have AdvA = O(

σ2 q3 q2 σ + + ). 22n 2n 2s

6 Distinguisher A for JH without Length Padding at Last Block Recall that the compression function of JH is based on a fixed permutation π. On input of the n-bit message block m and 2n-bit chaining value h1 ||h2 the compression function outputs f (m, h1 , h2 ) = π(h1 , h2 + m) + m||0n . JH applies chopped MerkleDamg˚ard transformation and outputs first t (t = 2n − s) bits of the output of final compression function. Here s denotes the number of chopped bits. In case of JH-n, we have s = n. Our distinguisher first queries h = C π (M ) with a random n-bit message M . The distinguisher appends 0n with h and queries t1 t2 = π(+, h0n ). Note that when the distinguisher is interacting with (π, C π ), the second π query made by C π (M ||M2 ) will be on the input (h||z) where z is the last n bit

188

R. Bhattacharyya, A. Mandal, and M. Nandi

output of π(+, IV1 , IV2 ⊕ M1 ) xor-ed with M2 . So if we set M2 to be the last n bit output of π(+, IV1 , IV2 ⊕ M1 ) then z = 0n . Note that in case of JH with length padding, we could not choose M2 this way as the length block is fixed. To get M2 , the distinguisher queries z1 z2 = O2 (+, IV1 IV2 ⊕ M ). Now D sets M2 = z2 and queries h2 = C π (M z2 ). Finally the distinguisher checks whether h2 = t1 ⊕ z2 . Formal algorithm of the distinguisher is described in Figure 6(a). Theorem 13. If the simulator S makes at most k many R queries for answering a single query, then AdvA ≥ 1 − 2k+1 2n . Distinguisher A M ∈R {0, 1}n . h = O1 (M ). t1 t2 = O2 (+, h0n ). z1 z2 = O2 (+, IV1 IV2 ⊕ M ). h2 = O1 (M z2 ). = h2 ⊕ z2 I F t1 – return 1. 7. return 0. (a) 1. 2. 3. 4. 5. 6.

Distinguisher for JH without length padding at last block – Choose distinct n-bit numbers m1 , . . . , mk – For i = 1, . . . , k yi1 yi2 = O2 (+, IV1 IV2 ⊕ mi ) – If for i = 1, . . . , k, (y1i ⊕ mi )’s are distinct return 1 – else • Find distinct j1 , j2 such that (yj11 ⊕ mj1 ) = (yj12 ⊕ mj2 ) • m ∈R {0, 1}n • x1 = O1 (mj1 (m ⊕ yj21 )) • x2 = O1 (mj2 (m ⊕ yj22 )) • if x1 ⊕ C HOP L((m ⊕ yj21 )0n ) = x2 ⊕ C HOP L((m ⊕ yj22 )0n ) ∗ return 1 – return 0 (b)

Fig. 6. (a): Distinguisher for JH-n without length padding, (b): Distinguisher for JH without length padding

7 Distinguisher for JH In this section, we show one distinguisher with Ω(2n/2 ) many queries, which is successful against any simulator with non-negligible probability. Hence, when maximum query length is bounded by 2n/2 , we get tight security bound. The distinguisher has access to two oracles O1 , O2 and is trying to differentiate between the two scenarios whether (O1 , O2 ) is (JH π , π) or (R, S R ). Formal description of our distinguisher is given in Fig 6(b). The success probability of the distinguisher is established by following theorem. For a proof we refer the reader to the full version of the paper. Theorem 14. With k = Ω(2n/2 ), AdvA is non-negligible for any simulator S. Note, if we use C HOP R instead of C HOP L then the same attack actually applies for the original JH mode of construction without length padding at last block as well.

Security Analysis of the Mode of JH Hash Function

189

8 Preimage Attack on JH In this section we demonstrate a preimage attack on Merkle-Damg˚ard based the JH π compression function. As the JH hash output is a part of M Df , having preimage π attack on M Df immediately translate a preimage attack on JH hash function. We use multicollision as it has been used in [16]. Let Q(r) denote expected number of queries to get r-collision of a n-bit random oracle. In [20], it was shown that Q(r) ∼ 2n(r−1)/r (r!)1/r . In [16], a preimage attack on JH has been shown based on multicollision of the forward direction of the JH mode. The query-complexity of the attack is O(Q(r)) where r is a solution of the equation r1/2 Q(r) = 2n . We use two sided multicollision (both from forward and backward direction) to improve the attack complexity little bit. The new query-complexity is O(Q(r)) where Q(r)r = 2n . Now we describe π our preimage attack for M Df where f π is the compression function defined in JH based on a permutation π (see Fig 2). Let hh ∈ {0, 1}2n be a randomly chosen target. Note that given any m, h, h , f −1 (h, h , m) is easily computable by making only one π −1 query. 1. Choose an arbitrary message block M5 with correct padding, and compute H4 := h4 h4 = f −1 (hh , M5 ). 2. Compute Q(r) candidates for H3 = f −1 (H4 , M4 ) to obtain r-collision on the last half of H3 . This is possible since we assume that π is a random permutation. Let L be the list of r many H3 ’s such that LH(H3 )’s are identical to say h3 . 3. Similarly we do it for forward computation of f for the first message block M1 . We have a list L of r values of H1 such that F H(H1 ) = h1 for all H1 ∈ L . 4. Now we run a kind of meet-in-the-middle attack for the chaining value H2 . We compute Q(r) values of π(h1 , h1 ) and π −1 (h3 , h3 ) for Q(r) choices of h1 and h3 . Note that h1 and h3 are fixed from the previous two steps. Find h1 and h3 such that F H(π(h1 , h1 )⊕π −1 (h3 , h3 ))⊕h1 ∈ L , LH(π(h1 , h1 )⊕π −1 (h3 , h3 ))⊕h3 ∈ L. 2

For any pair (h1 , h3 ) the probability of the above event is 2r2n . Since we have Q2 (r) such pairs we can expect one such pair (h1 , h3 ) satisfying the above condition provided r is the at least the solution of the equation rQ(r) = 2n . Let M2 = F H(π(h1 , h1 ) ⊕ π −1 (h3 , h3 )), M3 = LH(π(h1 , h1 ) ⊕ π −1 (h3 , h3 )). Moreover we choose M1 and M4 from the list L and L respectively such that H1 := h1 h1 and H3 := h3 h3 are the corresponding chaining value. π

It is easy to verify that M Df (M1 M2 M3 M4 M5 ) = hh . In [16], r = 51 to satisfy the equation r1/2 Q(r) = 2512 where n = 512. The query complexity of their attack is roughly 2510 . We can choose r = 46 a solution of rQ(r) ∼ 2512 . In this case the query complexity of π and π −1 is roughly 2507 . Compared with the previous preimage attack, it does not have significant reduction in complexity. However, asymptotically it has non-trivial reduction finding preimage of JH. The solution of r in r1/2 Q(r) = 2n is larger than that of rQ(r) = 2n . Since Q(r) is strictly increasing function in r our attack complexity is asymptotically less than that of [16]. However, we do not know any concrete forms of the query complexities for these two attacks.

190

R. Bhattacharyya, A. Mandal, and M. Nandi

9 Conclusion 14 candidates has been selected for the second round of SHA3 competition. Over the next few years one of these candidates will win and become the next hash function standard. In this paper we considered the security of a second round candidate, JH, in the indifferentiability framework. We showed that under the assumption that the underlying permutation is a random permutation, JH mode of operation with specific padding rule is indifferentiable from a Random Oracle. We also considered a modified design of JH, called JH , by chopping different bits. We analyzed the indifferentiability of JH’ mode with optimal bounds. We also presented a distinguisher for JH mode without length padding ( with any other prefix free padding). Finally we constructed a preimage attack with 2507 query which is better than the complexity of known preimage attacks. However, our attack doesnot pose any serious threat to JH hash function.

Acknowledgements We sincerely thank Jean S´ebastien Coron for his valuable comments on initial drafts of this paper. We also thank anonymous reviewers for their thoughtful suggestions.

References 1. Bellare, M., Rogaway, P.: Random Oracles Are Practical: A Paradigm for Designing Efficient Protocols. In: 1st Conference on Computing and Communications Security, pp. 62–73. ACM, New York (1993) 2. Bellare, M., Ristenpart, T.: Multi-Property-Preserving Hash Domain Extension and the EMD Transform. In: Lai, X., Chen, K. (eds.) ASIACRYPT 2006. LNCS, vol. 4284, pp. 299–314. Springer, Heidelberg (2006) 3. Barke, R.: On the Security of Iterated MACs. Diploma Thesis 2003. ETH Zurich (2003) 4. Bertoni, G., Daemen, J., Peeters, M., Van Assche, G.: On the indifferentiability of the sponge construction. In: Smart, N.P. (ed.) EUROCRYPT 2008. LNCS, vol. 4965, pp. 181–197. Springer, Heidelberg (2008) 5. Chang, D., Nandi, M.: Improved Indifferentiability Security Analysis of chopMD Hash Function. In: Nyberg, K. (ed.) FSE 2008. LNCS, vol. 5086, pp. 429–443. Springer, Heidelberg (2008) 6. Coron, J.S., Dodis, Y., Malinaud, C., Puniya, P.: Merkle-Damgard Revisited: How to Construct a Hash Function. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 430–448. Springer, Heidelberg (2005) 7. Coron, J.S., Patarin, J., Seurin, Y.: The Random Oracle Model and the Ideal Cipher Model Are Equivalent. In: Wagner, D. (ed.) CRYPTO 2008. LNCS, vol. 5157, pp. 1–20. Springer, Heidelberg (2008) 8. Damg˚ard, I.: A Design Principles for hash functions. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 416–427. Springer, Heidelberg (1990) 9. Dodis, Y., Pietrzak, K., Puniya, P.: A new mode of operation for block ciphers and lengthpreserving MACs. In: Smart, N.P. (ed.) EUROCRYPT 2008. LNCS, vol. 4965, pp. 198–219. Springer, Heidelberg (2008) 10. Dodis, Y., Reyzin, L., Rivest, R., Shen, E.: Indifferentiability of Permutation-Based Compression Functions and Tree-Based Modes of Operation, with Applications to MD6. In: Dunkelman, O. (ed.) FSE 2009. LNCS, vol. 5665, pp. 104–121. Springer, Heidelberg (2009)

Security Analysis of the Mode of JH Hash Function

191

11. Dodis, Y., Ristenpart, T., Shrimpton, T.: Salvaging Merkle-Damg˚ard for Practical Applications. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 371–388. Springer, Heidelberg (2009) 12. Chang, D., Lee, S., Nandi, M., Yung, M.: Indifferentiable security analysis of popular hash functions with prefix-free padding. In: Lai, X., Chen, K. (eds.) ASIACRYPT 2006. LNCS, vol. 4284, pp. 283–298. Springer, Heidelberg (2006) 13. Canetti, R., Goldreich, O., Halevi, S.: The random oracle methodology, revisited. In: STOC 1998, ACM, New York (1998) 14. Maurer, U.: Indistinguishability of Random Systems. In: Knudsen, L.R. (ed.) EUROCRYPT 2002. LNCS, vol. 2332, pp. 110–132. Springer, Heidelberg (2002) 15. Maurer, U., Renner, R., Holenstein, C.: Indifferentiability, Impossibility Results on Reductions, and Applications to the Random Oracle Methodology. In: Naor, M. (ed.) TCC 2004. LNCS, vol. 2951, pp. 21–39. Springer, Heidelberg (2004) 16. Mendel, F., Thomsen, S.: An Observation on JH-512, http://ehash.iaik.tugraz.at/uploads/d/da/Jh_preimage.pdf 17. Nielsen, J.: Separating Random Oracle Proofs from Complexity Theoretic Proofs: The Noncommitting Encryption Case. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, p. 111. Springer, Heidelberg (2002) 18. Nandi, M.: A Simple and Unified Method of Proving Indistinguishability. In: Barua, R., Lange, T. (eds.) INDOCRYPT 2006. LNCS, vol. 4329, pp. 317–334. Springer, Heidelberg (2006) 19. SHA 3 official website, http://csrc.nist.gov/groups/ST/hash/sha-3/Round1/ submissions rnd1.html 20. Suzuki, K., Tonien, K.D., Kurosawa, K., Toyota, K.: Birthday Paradox for Multi-collisions. In: Rhee, M.S., Lee, B. (eds.) ICISC 2006. LNCS, vol. 4296, pp. 29–40. Springer, Heidelberg (2006) 21. Wu, H.: The Hash Function JH. Submission to NIST (2008), http://icsd.i2r.a-star.edu.sg/staff/hongjun/jh/jh.pdf 22. Vaudenay, S.: Decorrelation: A Theory for Block Cipher Security. J. Cryptology 16(4), 249– 286 (2003)

Enhanced Security Notions for Dedicated-Key Hash Functions: Definitions and Relationships Mohammad Reza Reyhanitabar, Willy Susilo, and Yi Mu Centre for Computer and Information Security Research, School of Computer Science and Software Engineering University of Wollongong, Australia {rezar,wsusilo,ymu}@uow.edu.au

Abstract. In this paper, we revisit security notions for dedicated-key hash functions, considering two essential theoretical aspects; namely, formal deﬁnitions for security notions, and the relationships among them. Our contribution is twofold. First, we provide a new set of enhanced security notions for dedicated-key hash functions. The provision of this set of enhanced properties has been motivated by the introduction of the enhanced target collision resistance (eTCR) property by Halevi and Krawczyk at Crypto 2006. We notice that the eTCR property does not belong to the set of the seven security notions previously investigated by Rogaway and Shrimpton at FSE 2004; namely: Coll, Sec, aSec, eSec, Pre, aPre and ePre. The fact that eTCR, as a new useful property, is the enhanced variant of the well-known TCR (a.k.a. eSec or UOWHF) property motivates one to investigate the possibility of providing enhanced variants for the other properties. We provide such an enhanced set of properties. Interestingly, there are six enhanced variants of security notions available, excluding “ePre” which can be demonstrated to be non-enhanceable. As the second and main part of our contribution, we provide a full picture of the relationships (i.e. implications and separations) among the (thirteen) security properties including the (six) enhanced properties and the previously considered seven properties. The implications and separations are supported by formal proofs (reductions) and/or counterexamples in the concrete-security framework. Keywords: hash functions, security notions, deﬁnitions, relationships.

1

Introduction

Cryptographic hash functions are widely used in many applications, most importantly in digital signature schemes and message authentication codes (MACs), as well as commitment schemes, password protection, and key derivation, to mention some. Unlike many other cryptographic primitives which are usually intended to fulﬁll speciﬁc security notions, hash functions, as workhorses of cryptography, are often expected to satisfy a wide and application dependent spectrum of security notions, ranging from merely being a one-way function to acting as a truly random function or random oracle (ideal hash). S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 192–211, 2010. c International Association for Cryptologic Research 2010

Enhanced Security Notions for Dedicated-Key Hash Functions

193

Cryptographic hash functions originally were used as “secure” compressing functions to make digital signatures more eﬃcient [6, 19, 11, 12, 4, 5], and this application of hash functions in signature schemes, following hash-and-sign paradigm, requires them to satisfy three well-known classic security properties; namely, collision resistance, second-preimage resistance and preimage resistance. These properties have been traditionally considered as the basic “necessary” security properties for a hash function to be used in signature schemes, as well as in several other applications of hash functions. There seems to be no clear consensus on speciﬁcation of a set of properties that can be considered as a suﬃcient property set for a hash function in the standard model of security [3]. The current literature contains many diﬀerent informal and formal deﬁnitions for some basic and widely-used security properties of hash functions (such as [4, 5, 12, 13, 28, 18, 10, 24, 26]). For a formal treatment of the security properties and their relationships, it is essential to clearly specify the hash function setting; that is, whether the hash function is speciﬁed as a keyless function H : M → C which only admits an input message, or it is a dedicated-key (i.e. two-argument) function H : K × M → C with an explicit key input in addition to a message input. A dedicated-key hash function H : K × M → C can also be viewed as a family of functions {HK : M → C}K∈K by considering the key as the index for the instance functions. Although, historically, most of the widely used hash functions, like MD5 [23], SHA-xxx (for xxx=1, 224, 256, 384, 512) [14, 15], are keyless hash functions, the situation seems to be changing in favor of the dedicated-key hash function setting, which has been more popular in rigorous formal treatments of hash functions; e.g. [4, 5, 7]. For example, several new (practical and eﬃcient) dedicated-key hash functions have been proposed to the recent SHA-3 hash function competition run by NIST, e.g. SHAvite-3 and Skein, which do have (optional) dedicated-key inputs [17]. Rogaway and Shrimpton [24, 25] provided formal deﬁnitions for seven variants of the three basic properties; namely, collision resistance (denoted by ‘Coll’ in [24]), three variants of second-preimage resistance (Sec, aSec, eSec) and three variants of preimage resistance (Pre, aPre, ePre), as well as, all relationships among these seven properties, in the dedicated-key hash function setting. Figure 1 shows the overall picture of these relationships. We note that the original formal deﬁnition of the collision resistance and UOWHF properties were proposed in the asymptotic-security framework, by Damg˚ ard [4], and by Naor and Yung [13]; respectively. UOWHF property was later called “target collision resistance” (TCR) by Bellare and Rogaway [1] (in the concert-security framework), and also renamed as “eSec” according to the nomenclature provided by Rogaway and Shrimpton [24]. Halevi and Krawczyk at Crypto 2006 [8] introduced the “enhanced target collision resistance” (eTCR) property, as a strengthened (or enhanced) variant of the TCR property. eTCR is the property sought from the Randomized Hashing construction [8], recently announced in NIST SP 800- 106 [16], for strengthening digital signatures. In our previous work at FSE 2009 [20], we showed a separation

194

M.R. Reyhanitabar, W. Susilo, and Y. Mu

eTCR

Coll aSec

Coll eSec

aSec

Sec aPre

eSec Sec

ePre Pre

Seven security properties for hash functions and their relationships investigated by Rogaway and Shrimpton in [25].

aPre

ePre Pre

Relationships between eTCR and each of the seven security properties for hash functions investigated by Reyhanitabar, Susilo, and Mu in [20, 21].

Fig. 1. Known relationships among the security notions for dedicated-key hash functions: a directed path shows an implication (dashed lines represent “provisional implications” in which the strength of the implications depends on the amount of compression achieved by the hash function) and lack of a path shows a separation [24, 25, 20, 21].

between the eTCR and Coll properties, and further completed the relationships between eTCR and each of the seven security notions in [21]. Figure 1 also depicts these relationships. In this paper we continue this line of research by further investigating the security notions for dedicated-key hash functions. The fact that the interesting eTCR property is an enhanced variant of the well-known TCR (a.k.a. eSec or UOWHF) property has been our main motivation to investigate the possibility of further completing the set of current security notions for dedicated-key hash functions, by providing enhanced variants for the other properties. We note that an enhanced variant of the collision resistance property, called “eColl”, also was recently noticed by Yasuda in [27]. Nomenclature. For the seven security notions, we use the same nomenclature; i.e. Coll, Sec, aSec, eSec, Pre, aPre, ePre, as proposed by Rogaway and Shrimpton in [24]. The remaining six new strengthened variants (among the thirteen properties) are denoted by adding a preﬁx ‘s-‘ to the names of the related (weaker) notions; that is, s-Coll, s-Sec, s-aSec, s-eSec, s-Pre, s-aPre; respectively, where s-Coll is the strengthened variant of Coll, and so forth. We use preﬁx s− (for ‘strengthened’) instead of e− (for ‘enhanced’), to prevent any ambiguity among the names, as the preﬁx ‘e’ has already been used by Rogaway and Shrimpton in [24] to stand for ‘everywhere’ variants in eSec and ePre properties. Note that now according to our new notations, ‘s-eSec‘ stands for the ‘eTCR’ property of [8] and s-Coll is the same property as eColl in [27].

Enhanced Security Notions for Dedicated-Key Hash Functions

195

Our Contributions. First, we provide a new extended set of strengthened (enhanced) security notions for dedicated-key hash functions, which includes the eTCR property, put forth by Halevi and Krawczyk [8] (denoted by ‘s-eSec’ in this paper), the eColl property, introduced by Yasuda [27] (denoted by ‘s-Coll’ in this paper), as well as four new properties which we introduce in this paper; namely, s-Sec, s-aSec, s-Pre, s-aPre. Then, as our second and main contribution, we work out all the relationships among the (thirteen) security properties, including the (six) enhanced properties; namely, s-Coll, s-Sec, s-aSec, s-eSec, s-Pre, saPre, and the well-known seven properties; namely, Coll, Sec, aSec, eSec, Pre, aPre, ePre. Figure 2 illustrates the relationships among the security notions. A solid directed edge ‘A → B’ shows a security-preserving reduction from the notion A to the notion B, and a dashed directed edge ‘A B’ represents a provisional reduction (i.e. with some security loss) from A to B. (Formal deﬁnitions of the security-preserving and provisional implications are given in Sec. 3.) The top graph illustrates the essential “edges” that can be composed to construct the “paths” showing all other implications; for instance, combining Coll → eSec and eSec → Sec edges one gets Coll → Sec (which is not explicitly shown in the graph), and so on. The lack of a directed path from A to B in the graph means a separation. The three tables below the graph detail all the relationships, where an entry at row A and column B shows whether the property A implies the property B, or there is a separation; trivial equivalences are denoted by ‘=’. Notations. If A is a randomized algorithm then by y = A(x1 , · · · , xn ; R) it is meant that y is the output of A on inputs x1 , · · · , xn when it is provided $ with random coins (tape) R. By y ← A(x1 , · · · , xn ) it is meant that the tape R is chosen at random and y is set to be y = A(x1 , · · · , xn ; R). To show that $

an algorithm A is run without any input, we use either the notation y ← A() $

or y ← A(∅). By time complexity of an algorithm we mean the running time, relative to some ﬁxed model of computation (e.g. RAM) plus the size of the description of the algorithm using some ﬁxed encoding method. If X is a ﬁnite $ set, by x ← X it is meant that x is chosen from X uniformly at random. For a binary string M = M1 ||M2 || · · · ||Mm , let M1...n denote the ﬁrst n bits of M and |M | denote its length in bits (where n ≤ m = |M |). Let val(.) be a function that on input a binary string M = M1 · · · Mm , considered as an unsigned binary number with Mm as the least signiﬁcant bit (LSB), returns its decimal value. For a positive integer x, let xb denotes binary representation of x by a string of length exactly b bits where the rightmost bit represents the LSB and some of the most signiﬁcant bits are chopped when log2 (x) > b. If S is a ﬁnite set we denote size of S by |S|. The set of all binary strings of length n bits (for some positive integer n) is denoted as {0, 1}n , the set of all binary strings whose ≤N lengths are variable but upper-bounded by N is denoted by {0, 1} and the ∗ set of all binary strings of arbitrary length is denoted by {0, 1} .

196

M.R. Reyhanitabar, W. Susilo, and Y. Mu

s-aSec

s-Coll

s-aPre s-Sec

s-eSec

s-Pre

aSec

Coll

aPre

Sec eSec

Pre

ePre

s-Coll (eColl) s-Sec s-aSec s-eSec (eTCR) s-Pre s-aPre s-Coll (eColl) s-Sec s-aSec s-eSec (eTCR) s-Pre s-aPre Coll Sec aSec eSec (TCR) Pre aPre ePre

s-Coll (eColl) =

Coll

s-Coll

[20]

s-Sec s-aSec =

Sec

[21]

s-Sec s-aSec

s-eSec (eTCR) [27]

=

aSec

=

eSec (TCR)

[21]

s-eSec (eTCR) [20] [21] [21] [21] [21] [21] [21]

[21]

s-Pre

s-Pre

= Pre

[21]

s-aPre

= aPre

[21]

ePre

[21]

s-aPre

Fig. 2. A full picture of the relationships among the security notions. Note that the top graph only illustrates the essential “edges” that can be composed to construct the “paths” showing all other implications. The lack of a directed path in the graph means a separation, while separations are explicitly denoted by in the tables.

Enhanced Security Notions for Dedicated-Key Hash Functions

2

197

Definitions of Security Notions

In this section, adopting the conventions of the concrete-security framework, we provide deﬁnitions of the security notions for a dedicated-key hash function n H : K × M → C, where C = {0, 1} for some positive integer n, the key space ∗ K is some nonempty ﬁnite set and the message space M ⊆ {0, 1} ; such that δ {0, 1} ⊆ M for at least a positive integer δ. For any M ∈ M and K ∈ K, we use the notations HK (M ) and H(K, M ) interchangeably. Note that this description of a hash function is generic enough to be applied when one considering: a FixedInput-Length (FIL) hash function (i.e. a compression function), where M = m <λ {0, 1} ; a Variable-Input-Length (VIL) hash function, where M = {0, 1} for 64 some (huge) value λ (e.g. λ = 2 as in SHA-1); or even an Arbitrary-Input∗ Length (AIL) hash function, where M = {0, 1} . Let TH,δ denote the time complexity of the most eﬃcient algorithm that can δ compute H(K, M ), for any M ∈ {0, 1} ⊆ M and K ∈ M, plus the time complexity of the most eﬃcient algorithm that can sample from the ﬁnite set K. As usual in concrete-security deﬁnitions, we use the resource parameterized function Advxxx H (t, ) to denote the maximal value of the adversarial advantage xxx (i.e. Advxxx H (t, ) = maxA {AdvH (A)} ) over all adversaries A, attacking xxx property of H, that have time complexity at most t and use messages of length at most bits. We say that H is (t, , )-xxx secure if Advxxx H (t, ) < . In the sequel, we ﬁrstly review the seven properties; namely, Coll, Sec, aSec, eSec, Pre, aPre and ePre, put forth by Rogaway and Shrimpton in [24]. Then, we proceed by providing a new set of extended properties for a dedicated-key hash function, which includes enhanced (or strengthened) variants of the security properties considered by Rogaway and Shrimpton in [24]. We remind that the security notions for a dedicated-key hash functions can be either known-key properties, or secret-key (a.k.a. hidden-key ) properties. All the security properties considered in this paper belong to the known-key security setting where at some stage during the attack game, key(s) will be known to the adversary. There are some other applications of dedicated-key hash functions; e.g. as a MAC or PRF primitive, where the key must be kept secret throughout the attack game. 2.1

Previously Considered Seven Security Notions

The advantage measures for an adversary A, attacking any of the seven security properties of a dedicated-key hash function H, are deﬁned (compactly) in Fig. 3. Note that for some of the notions (namely, Sec[δ], aSec[δ], eSec[δ], Pre[δ], and aPre[δ]) the advantage function is parameterized by a parameter δ where δ {0, 1} ⊆ M. In the case of eSec property the parameter δ is implicit in the deﬁnition and assumed to be the length of the ﬁrst (i.e. the target) message M output by the adversary.

198

M.R. Reyhanitabar, W. Susilo, and Y. Mu

AdvColl H (A) Sec[δ]

AdvH

aSec[δ]

AdvH

=

(A)

=

(A)

=

$ $ Pr K ← K; (M, M ) ← A(K) : M = M ∧ HK (M ) = HK (M ) $ $ δ K ← K; M ← {0, 1} ; Pr $ : M = M ∧ HK (M ) = HK (M ) M ← A(K, M ) ⎡ ⎤ $ (K, State) ← A1 (); ⎢ ⎥ $ δ Pr ⎣ M ← ⎦ {0, 1} ; $

⎡ eSec[δ]

AdvH

(A)

=

M ← A2 (M, State)

:

$

(M, State) ← A1 ();

⎢ $ Pr ⎣ K ← K;

P re[δ]

(A)

=

Pr

aP re[δ] AdvH (A)

=

M ← A2 (K, State)

$

:

δ

M ← A(K, Y ) $

(K, State) ← A1 (); ⎢ $ δ Pr ⎣ M ← {0, 1} ; Y ← HK (M );

=

HK (M ) = Y ⎤

:

⎥ ⎦

$

re AdveP (A) H

M = M ∧ HK (M ) = HK (M ) K ← K; M ← {0, 1} ; Y ← HK (M ); $

$

⎡

⎤ ⎥ ⎦

$

AdvH

M = M ∧ HK (M ) = HK (M )

M ← A2 (Y, State) $

: $

$

HK (M ) = Y

Pr (Y, State) ← A1 (); K ← K; M ← A2 (K, State) :

HK (M ) = Y

Fig. 3. Deﬁnitions of the seven security notions for a dedicated-key hash function [24]

2.2

Enhanced Security Notions

We have noticed that the newly emerged notion of “enhanced target collision resistance” (eTCR), put forth by Halevi and Krawczyk in [8], does not belong to the set of the seven properties, and actually eTCR is an strengthened variant of TCR (i.e. UOWHF or eSec) property. Considering the deﬁnition of eTCR property and its application, we are motivated to study whether it is possible to provide (sensible) enhanced variants for the other properties of the set of the seven security properties in [24], in a similar way that TCR (eSec) is enhanced to eTCR. That is, by giving the adversaries more freedom in selecting a new (second) key and relaxing the corresponding success (winning) conditions in the attack games deﬁning the properties. Interestingly, all properties except ‘ePre’ are shown to be enhanceable. The deﬁnitions and discussions are provided in the sequel. Definitions. For the six strengthened security notions, the advantage functions of an adversary A attacking H are deﬁned in Fig. 4. For any property xxx ∈ {s-Coll, s-Sec[δ], s-aSec[δ], s-eSec[δ], s-Pre[δ], s-aPre[δ]}, we say that H is (t, , )-xxx if Advxxx H (t, ) < . Note that some of the notions (namely, sSec[δ], s-aSec[δ], s-eSec[δ], s-Pre[δ] and s-aPre[δ]) are parameterized by δ where δ {0, 1} ⊆ M. In the case of s-eSec (i.e. eTCR) property the parameter δ is implicit in the deﬁnition and assumed to be the length of the ﬁrst (i.e. target) message M output by the adversary. If H is a compression function (i.e. an FIL hash function), then parameters δ and will be the same as the (ﬁxed) input length of the compression function and hence are omitted from the notations.

Enhanced Security Notions for Dedicated-Key Hash Functions

199

Advs-Coll (A) H

=

$ $ Pr K ← K; (M, M , K ) ← A(K) : (K, M ) = (K , M ) ∧ HK (M ) = HK (M )

s-Sec[δ] AdvH (A)

=

Pr

=

$

δ

$

⎡

s-aSec[δ] (A) AdvH

$

K ← K; M ← {0, 1} ; K , M ← A(K, M )

: (K, M ) = (K , M ) ∧ HK (M ) = HK (M )

$

(K, State) ← A1 (); ⎢ $ δ Pr ⎣ M ← {0, 1} ; $

⎡

s-eSec[δ] (A) AdvH

=

K , M ← A2 (M, State)

=

Pr

=

⎤ ⎥ ⎦

$

K , M ← A2 (K, State)

: (K, M ) = (K , M ) ∧ HK (M ) = HK (M ) $ $ δ K ← K; M ← {0, 1} ; Y ← HK (M ); $

⎡

s-aPre[δ] AdvH (A)

: (K, M ) = (K , M ) ∧ HK (M ) = HK (M )

(M, State) ← A1 (); ⎢ $ Pr ⎣ K ← K;

s-Pre[δ] (A) AdvH

⎥ ⎦

$

⎤

K , M ← A(K, Y )

HK (M ) = Y ⎤

:

$

(K, State) ← A1 (); ⎢ $ δ Pr ⎣ M ← {0, 1} ; Y ← HK (M );

$

K , M ← A2 (Y, State)

⎥ ⎦ :

HK (M ) = Y

Fig. 4. Deﬁnitions of enhanced properties for a dedicated-key hash function

The Case for ePre. Unlike the other six properties, ePre notion of security cannot be strengthened by allowing the adversary to select a new key K in the second phase of its attack, as used to deﬁne new enhanced variants in Fig. 4. This is because there will remain no random challenge to be given to the adversary in such a game and hence a trivial adversary will always exist. To make this clear, let’s try to strengthen the ePre property in the same way that was done for other properties in Fig. 4. Doing so, one gets the following advantage measure: $ $ $ Pr (Y, State) ← A1 (); K ← K; (K , M ) ← A2 (K, State) : HK (M ) = Y Clearly, as the winning condition (i.e. HK (M ) = Y ) does not involve the only random challenge (i.e. K) in the attack game, a trivial adversary, which selects arbitrary K and M ; computes HK (M ) = Y , and outputs Y and (K , M ), always wins this game with probability one. Remark 1. We notice that the parametrization of some of the security properties by δ is mainly aimed to handle some subtle technical issues as follows: – Eﬃcient sampling from a set of messages according to the uniform distribution requires the set to be ﬁnite. For an arbitrary-input-length hash function, with M = {0, 1}∗ , the message space is inﬁnite, and hence cannot be sampled uniformly at random. Clearly, if H is an FIL hash function (i.e. a compresm sion function), with M = {0, 1} , then parameter δ (and also the resource parameter ) will be the same as the ﬁxed input length of the compression function (i.e. δ = = m), and hence can be omitted.

200

M.R. Reyhanitabar, W. Susilo, and Y. Mu

– The ideal security level, measured in terms of time complexity of attacks, for (variants of) second preimage resistance and preimage resistance properties n considering a hash function H : K × M → {0, 1} is 2n , due to a simple generic (random search) attack. Clearly, if the length of target message strings is “too short” (e.g. δ < n), then one will be able to simply search the input message space in less than 2n steps. On the other hand, for iterated hash functions if the length of a target input message is “too long”; e.g. δ = 2l blocks for some large l, then there are generic long message second preimage attacks, put forth by Kelsey and Schneier [9], with reduced time complexity compared to the ideal 2n level for too long target messages, e.g. when l = n/2. Therefore, explicitly parameterizing the properties by the length of the target messages, i.e. δ, can clearly show these dependencies of the advantage functions on the target message length.

3

Relationships among the Security Notions

In this section, we provide the details of the “new ” relationships (implications and separations) between any two properties among the thirteen security properties as deﬁned in Fig. 3 and Fig. 4. Noticing that the relationships among the seven security properties in Fig. 3 were shown in [24, 25], and the relationships between eTCR (s-eSec) and other properties were demonstrated in [21, 20, 27], we complete all the remaining new relationships. The summary of our results is shown in Fig. 2, where we use the conventions explained in the sequel to represent the relationships. 3.1

Security Preserving Implications

A solid directed line from a security notion xxx to a security notion yyy (i.e. xxx → yyy) is used to represent a security-preserving reduction from xxx to yyy. All security-preserving implications in this paper are easily provable by a tight xxx concrete bound of the form Advyyy H (t ) ≤ AdvH (t), where t = t − c for some n small constant c. That is, for any hash function H : K × M → {0, 1} , if H is secure in the xxx sense then it is also secure in the yyy sense. Lemma 1. For any dedicated-key hash function H : K × M → {0, 1}n , and for δ any ﬁxed value of δ such that {0, 1} ⊆ M, let xxx be any property ∈ {Coll , Sec[δ], aSec[δ], eSec[δ], Pre[δ], aPre[δ] }; we have s-xxx → xxx. s−xxx Proof. It is straightforward to see that Advxxx (t), just by considH (t) ≤ AdvH ering the deﬁnitions of the security notions and their “strengthened ” variants in Fig. 3 and Fig. 4, respectively. Note that in deﬁning game for a strengthened notion s-xxx, the adversary is given more power by being allowed to choose a diﬀerent key K , and hence any adversary A that can succeed in playing xxx game (where it does not get to choose a second key) will clearly succeed in the game deﬁning s-xxx (where it gets to choose a second key at will).

Enhanced Security Notions for Dedicated-Key Hash Functions

201

The following implications are also straightforward to show by simple securitypreserving reductions. A proof can be found in the full version of this paper in [22]. n

Theorem 1. For any dedicated-key hash function H : K × M → {0, 1} , and δ for any ﬁxed value of δ such that {0, 1} ⊆ M, we have: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 3.2

s-Coll → s-Sec[δ] s-Coll → s-eSec[δ] s-Coll → Sec[δ] s-Coll → eSec[δ] s-aSec[δ] → s-Sec[δ] s-aSec[δ] → Sec[δ] s-eSec[δ] → s-Sec[δ] s-eSec[δ] → Sec[δ] s-aPre[δ] → s-Pre[δ] s-aPre[δ] → Pre[δ] Provisional Implications

A provisional implication from a security notion xxx to a security notion yyy; denoted by xxx yyy, means that there is a reduction from xxx to yyy, but the reduction is not security-preserving. That is, we can upper-bound Advyyy H (t ) as xxx a function of AdvH (t), but the inherited security guarantee for the notion yyy using such a bound is provisioned on the exact security degradation characteristics of the reduction, which usually depends on the hash function parameters, such as the lengths of input (δ) and output (n), and the size of the key space (|K|). Therefore, these provisional implications should be interpreted carefully, as for some values of the parameters (e.g. when there is little or no compression, i.e. δ ≈ n) these reductions may eﬀectively vanish. n

Theorem 2. For any dedicated-key hash function H : K × M → {0, 1} , and δ for any ﬁxed value of parameter δ such that {0, 1} ⊆ M, we have: re 1. s-Coll ePre: AdveP (t ) ≤ Advs−Coll (t) + 1/|K| H H 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

s−P re[δ]

s-Coll s-Pre[δ]: AdvH (t ) ≤ 3 Advs−Coll (t) + 2n−δ H P re[δ] s−Coll s-Coll Pre[δ]: AdvH (t ) ≤ 3 AdvH (t) + 2n−δ s−P re[δ] s−Sec[δ] s-Sec[δ] s-Pre[δ]: AdvH (t ) ≤ 3 AdvH (t) + 2n−δ P re[δ] s−Sec[δ] s-Sec[δ] Pre[δ]: AdvH (t ) ≤ 3 AdvH (t) + 2n−δ s−P re[δ] s−aSec[δ] s-aSec[δ] s-Pre[δ]: AdvH (t ) ≤ 3 AdvH (t) + 2n−δ P re[δ] s−aSec[δ] s-aSec[δ] Pre[δ]: AdvH (t ) ≤ 3 AdvH (t) + 2n−δ aP re[δ] s−aSec[δ] s-aSec[δ] aPre[δ]: AdvH (t ) ≤ 3 AdvH (t) + 2n−δ s−aP re[δ] s−aSec[δ] s-aSec[δ] s-aPre[δ]: AdvH (t ) ≤ 3 AdvH (t) + 2n−δ s−P re[δ] s−eSec[δ] s-eSec[δ] s-Pre[δ]: AdvH (t ) ≤ 3 AdvH (t) + 2n−δ P re[δ] s−eSec[δ] s-eSec[δ] Pre[δ]: AdvH (t ) ≤ 3 AdvH (t) + 2n−δ

202

M.R. Reyhanitabar, W. Susilo, and Y. Mu

where t = t − cTH,δ , for some small non-negative constant c, and TH,δ denotes the time for one computation of H. Proof. We prove the ﬁrst, fourth, and ninth cases, i.e. ‘s-Coll ePre’, ‘s-Sec[δ] s-Pre[δ]’, and ‘s-aSec[δ] s-aPre[δ]’. Note that these are the three new essential provisional implications as depicted in Fig. 2. (We remind that the other two essential provisional implications, namely ‘Sec Pre’ and ‘aSec aPre’, in Fig. 2, are already known from [25].) All other provisional implications can be straightforwardly obtained by combining these essential provisional implications with the security-preserving implications. Proof of ‘s-Coll ePre’. We employ the Reset Lemma from [2] for our purpose. The ﬁrst and main step is to express our problem in a format which can be considered as a special case of the Reset Lemma, and then we can apply the probabilistic analysis of the Reset Lemma. To simplify the representation of our proof, we denote an adversary as a single probabilistic algorithm A which uses a State variable to keep track of its several attack steps, rather than viewing A as consisting of two sub-algorithms A = (A1 , A2 ). Let Verify(M, K, Y ) be a deterministic predicate to compute a boolean decision as follows: 1 if HK (M ) = Y Verify(M, K, Y ) = 0 otherwise r

Let R ∈ {0, 1} denote the random tape (i.e. coins) used by the probabilistic algorithm A. Using the above predicate, we can rewrite the experiment deﬁning the ePre attack by A against H (in a format which is appropriate for our analysis) as below; where ∅ means an empty string: ePre Experiment r

$

R ← {0, 1} ; (Y, State) = A(∅; R); $

K ← K; M = A(K, State; R); d = Verify(M, K, Y ); Return d Clearly probability that the above ‘ePre Experiment’ returns 1 is equal to re (A). Now consider the following Reset Experiment: AdveP H Reset Experiment: r

$

R ← {0, 1} ; (Y, State) = A(∅; R); $

K1 ← K; M 1 = A(K1, State; R); d1 = Verify(M 1, K1, Y ); $

K2 ← K; M 2 = A(K2, State; R); d2 = Verify(M 2, K2, Y ); If (d1 = 1 ∧ d2 = 1 ∧ K1 = K2) then return 1 else return 0 The proof of the following proposition can be deduced as a special case of that of the Reset Lemma in [2]. We provide a proof of this probabilistic claim here for completeness, as it will also surface in several other cases in the following.

Enhanced Security Notions for Dedicated-Key Hash Functions

203

Proposition 1. Let p denote the probability that the ePre Experiment returns 1 re (i.e. p = AdveP H √(A)), and q be the probability that the Reset Experiment returns 1; we have p ≤ q + 1/|K|. r

Proof (Proof of the Proposition). For any R ∈ {0, 1} , let YR and MR denote the target hash value and the message, output by ePre adversary A having a random tape R. Deﬁne two functions X : {0, 1}r → [0, 1] and Y : {0, 1}r → [0, 1] as follows: X(R) Pr[Verify(MR , K, YR ) = 1] (1) where the probability is taken over random selection of K from the key space k {0, 1} , and Y (R) Pr[Verify(M 1R , K1, YR ) = 1 ∧ Verify(M 2R , K2, YR ) = 1 ∧ K1 = K2] (2) where the probability is taken over random and independent selection of K1 and K2 from the key space K. By a simple argument, noting that K1 and K2 are chosen independently and using the fact that Pr(E ∧ F ) ≥ Pr(E) − Pr(F ) for any two events E and F , we have: Y (R) = Pr[Verify(M 1R , K1, YR ) = 1] . Pr[Verify(M 2R , K2, YR ) = 1 ∧ K1 = K2] ≥ X(R)[X(R) − 1/|K|]

(3)

We can view functions X and Y as random variables over sample space {0, 1}r of random tape used by probabilistic algorithm A. Now, note that the probabilities that the ‘ePre Experiment’ and the ‘Reset Experiment’ return 1 are, respectively, the expected values of the random variables X and Y with respect to R, i.e. p = E[X] and q = E[Y ]. Using the inequality (3) and letting c = 1/|K| we have: q = E[Y ] ≥ E[X(X − c)] = E[X 2 ] − cE[X] ≥ E[X]2 − cE[X] = p2 − cp Using the above relation we have: c c2 c2 (p − )2 = p2 − cp + ≤q+ 2 4 4 √ √ √ and using the fact that a + b ≤ a + b for a, b ≥ 0 we have: p−

c √ c ≤ q+ 2 2

that is, (remembering c = 1/|K|) we get the ﬁnal result as p ≤

√ q + 1/|K|.

To complete our proof for ‘s-Coll ePre’, we construct an adversary B against s-Coll property of H; such that Advs−Coll (B) = q as follows. Adversary B, on H receiving the ﬁrst random key K, chooses another random key K and employs A as shown in the ‘Reset Experiment’, by putting K1 = K and K2 = K . B returns (M 1, K1) and (M 2, K2) as colliding pair in its own s-Coll game. Advantage of B in s-Coll game will be the same as the probability that the ‘Reset Experiment’

204

M.R. Reyhanitabar, W. Susilo, and Y. Mu

returns 1. This can be easily veriﬁed by considering the condition that the ‘Reset Experiment’ returns 1; noticing the deﬁning game of s-Coll property in Fig. 4, and the deﬁnition of predicate Verify(., ., .). Note that the Reset Experiment returns 1 if (Verify(M 1, K1, Y ) = 1 ∧ Verify(M 2, K2, Y ) = 1 ∧ K1 = K2), and from the deﬁnition of Verify(., ., .) this means that (HK1 (M 1) = HK2 (M 2) = Y ∧ K1 = K2). Hence, whenever the Reset Experiment returns 1 the pair (K1, M 1) = (K2, M 2) and HK1 (M 1) = HK2 (M 2), i.e. B succeeds in s-Coll attack against H. This ends the proof of ‘s-Coll ePre’. The cases for ‘s-Sec[δ] s-Pre[δ]’ and ‘s-aSec[δ] s-aPre[δ]’. Proofs of these two cases can be found in the full version of this paper in [22]. 3.3

Separations

We use xxx yyy to show that notion xxx does not imply notion yyy. These separation results are shown by providing counterexamples. Namely, assuming k m n that there exists a dedicated-key hash function H : {0, 1} × {0, 1} → {0, 1} that is (t, ) − xxx secure, we construct (as a counterexample) a dedicated-key k m n hash function G : {0, 1} × {0, 1} → {0, 1} which is (t , ) − xxx secure, but yyy completely insecure in yyy sense; i.e. AdvG (c) ≈ 1, where c is a small constant. The concrete relations between adversarial advantages (i.e. = Advxxx H (t) and = Advxxx G (t )) and the resource parameters (t and t ) are given explicitly for each case. The following simple lemma will be quite useful in stating the separation results compactly. Lemma 2. Let xxx, yyy, and zzz be any three security properties deﬁned for a n hash function H : K × M → {0, 1} . If zzz → yyy, then from xxx yyy we can conclude that xxx zzz. zzz Proof. Note that zzz → yyy (in this paper) means that Advyyy H (t ) ≤ AdvH (t), where t = t− c for a small constant c. Hence, if one constructs a counterexample k m n hash function G : {0, 1} × {0, 1} → {0, 1} that has the property xxx, but is yyy insecure in the yyy sense (i.e. AdvG (c) ≈ 1, for a small constant c), then clearly Advzzz G (c ) ≈ 1 for a small constant c ; that is, G will also be insecure in the zzz sense.

Remark 2. We should mention that for extreme ranges of the parameter values, when the provisional implications vanish (e.g. when there is no compression; δ = n), Rogaway and Shrimpton [24] considered the possibility of showing some “unconditional separation” results, but as they stated in [24]: “That unconditional separations are (sometimes) possible in this domain is a consequence of the fact that, for some values of the domain and range, secure hash functions trivially exist (e.g. the identity function HK (M ) = M is collision-free [but not preimage resistant]). ” In this paper, we do not consider such unconditional separations and instead we emphasize that provisional implications must be interpreted carefully according to the exact bounds shown by related reductions.

Enhanced Security Notions for Dedicated-Key Hash Functions

205

Figure 5 lists the counterexamples that we use to prove the separation results. Construction of some of these counterexamples are inspired from those of [24, 20, 21], where they were utilized to show other separation results.

G1K (M ) = G2K (M ) = G3K (M ) = G4K (M ) =

C∗ HK (M )

if K = K ∗ otherwise

K1...n HK (M )

if val(M ) = val(K) otherwise

HK (0m−k ||K) HK (M ) C∗ HK (M )

if M = 1m−k ||K otherwise

if M = 0m ∨ M = 1m otherwise

G5K (M ) = HK (M1···m−1 ||0) G6K (M ) =

G7K (M ) =

C∗ HK (M )

⎧ ⎨ K1...n HK (val(K)m ) ⎩ HK (M )

⎧ ⎨ K1...n HK (M ∗ ) G8K (M ) = ⎩ HK (M ) G9K (M ) =

if M = M ∗ otherwise

K1...n HK (M )

if val(M ) = val(K) if val(M ) = val(K) ∧ HK (M ) = K1...n otherwise

if M = M ∗ if M = M ∗ ∧ HK (M ) = K1...n otherwise

(1) (2) (3)

(1) (2) (3)

if M = M ∗ otherwise

Fig. 5. Construction of counterexample hash functions Gi : {0, 1}k ×{0, 1}m → {0, 1}n , for 1 ≤ i ≤ 9, from a given hash function H : {0, 1}k × {0, 1}m → {0, 1}n . For the cases of G2, G3, G7, G8, G9, it is assumed that m > k ≥ n. The parameters K ∗ ∈ {0, 1}k ; M ∗ ∈ {0, 1}m ; and C ∗ ∈ {0, 1}n have arbitrary and fixed values; e.g. K ∗ = 0k , M ∗ = 0m , C ∗ = 0n .

Referring to (the three tables in ) Fig. 2, it can be seen that there are 87 separations among the properties, of which 11 separations are already known from [20, 21]. In the sequel, we complete the study of all the remaining 76 new separations. The proofs are organized as follows: – Theorem 3 (showing 2 separations) and Theorem 4 (showing 22 separations) together with Lemma 2 and the security-preserving implications (see Fig. 2) provide details of the 41 new separations shown in the top two tables in Fig. 2. – Theorem 5 (showing 7 separations) and Theorem 6 (showing 7 separations) together with Lemma 2 provide the remaining 35 new separations shown in the bottom table in Fig. 2.

206

M.R. Reyhanitabar, W. Susilo, and Y. Mu

Theorem 3. s-Coll aSec and s-Coll aPre Proof. We use counterexample G1, deﬁned in Fig. 5, to prove these separations. Let’s ﬁrst demonstrate that G1 is completely insecure in both the aSec sense and the aSec aPre sense. – AdvG1 (c ) = 1: Consider the following simple adversary A = (A1 , A2 ) playing aSec game against G1. A1 chooses the key as K = K ∗ , and A2 after receiving the ﬁrst randomly selected message M , outputs any diﬀerent message M = M . It can be easily seen that this adversary, spending a small constant c , always wins the aSec game because M = M , and by the construction of G1 we have G1K ∗ (M ) = G1K ∗ (M ) = C ∗ . re – AdvaP G1 (c ) = 1: Consider the following simple adversary A = (A1 , A2 ) playing aPre game against G1. A1 chooses the key as K = K ∗ , and A2 after receiving the hash value Y = G1K ∗ (M ) = C ∗ , outputs any arbitrary message M ∈ {0, 1}m . Adversary A = (A1 , A2 ) always wins the aPre game because, according to the construction of G1, we have G1K ∗ (M ) = C ∗ for m any M ∗ ∈ {0, 1} . To complete the proof, we show that G1 inherits the s-Coll property of H by demonstrating that Advs−Coll (t ) ≤ Advs−Coll (t) + Advs−Coll (t) + 2−k+1 . G1 H H Let A be any adversary that can win s-Coll game against G1 with success probability = Advs−Coll (A) and having time complexity at most t . Consider G1 the following adversary B against s-Coll property of H which uses A as a subroutine (and simply forwards whatever it returns): Algorithm B(K) 10: if K = K ∗ then bad ← true $

20: (M, M , K ) ← A(K); 30: if HK (M ) = C ∗ then bad ← true 40: return (M, M , K ) We note that the use of the ﬂag ‘bad’ (whose initial value is assumed to be false) in the description of B is only aimed to make the proof easier to follow; otherwise, the lines 10 and 30 in the description of B are dummy and can be omitted from B without aﬀecting its operation. Let Bad be the event that the ﬂag bad is set to true by B, i.e. either K = K ∗ or HK (M ) = C ∗ . We show that if Bad does not happen then B will succeeds in the s-Coll attack against H whenever A succeeds in the s-Coll attack against G1. Note that A succeeds in the s-Coll attack against G1 whenever (M, K) = (M , K ) and G1K (M ) = G1K (M ). Assuming that the event Bad does not happen; that is, K = K ∗ ∧ HK (M ) = C ∗ , and referring to the construction of G1, it can be observed that in this case G1K (M ) = G1K (M ) will imply that HK (M ) = HK (M ); that is, B also succeeds in the s-Coll attack against H. As it is assumed that H is (t, )−s-Coll, we have: ≥ Pr[B succeeds] = Pr[A succeeds ∧ Bad] ≥ Pr[A succeeds] − Pr[Bad] = − Pr[Bad]. Rearranging the terms we have: ≤ + Pr[Bad] (4)

Enhanced Security Notions for Dedicated-Key Hash Functions

207

Now we need to upperbound Pr[Bad] = Pr[K = K ∗ ∨ HK (M ) = C ∗ ]. Using the union bound we have: Pr[Bad] ≤ Pr[K = K ∗ ] + Pr[HK (M ) = C ∗ ] = 2−k + Pr[HK (M ) = C ∗ ]

(5)

It remains to upper-bound p = Pr[HK (M ) = C ∗ ]. We claim that: √ Claim. p = Pr[HK (M ) = C ∗ ] ≤ 2−k + . Before continuing to prove this claim, note that the inequalities (4), (5) and the above claim complete √ the proof of the Theorem 3, i.e. we get the target upperbound as ≤ + + 2−k+1 . Clearly, the time complexity of B (denote by t) is that of A (denote by t ) plus a small constant time c, i.e. t = t + c. Proof of the Claim: Let Verify(M, K) be a deterministic boolean predicate which is deﬁned as follows: 1 if HK (M ) = C ∗ Verify(M, K) = 0 otherwise According to the description of B, the probability p = Pr[HK (M ) = C ∗ ] is taken over the random coins used by A and the random selection of the ﬁrst key K. r Let R ∈ {0, 1} denote the random tape used by A. Referring to the description of B it can be seen that p equals to the probability that the following experiment returns 1: Experiment I R ← {0, 1}r ; $

K ← {0, 1}k ; (M, M , K ) = A(K; R); d = Verify(M, K); return d $

Let q be the probability that the following reset experiment returns 1: Experiment II r

$

R ← {0, 1} ; $

k

$

k

K1 ← {0, 1} ; (M 1, M 1, K1 ) = A(K1; R); d1 = Verify(M 1, K1); K2 ← {0, 1} ; (M 2, M 2, K2 ) = A(K2; R); d2 = Verify(M 2, K2); If (d1 = 1 ∧ d2 = 1 ∧ K1 = K2) then return 1 else return 0 The proof of the following proposition is similar to that of Proposition 1. √ Proposition 2. p ≤ q + 2−k . To complete the proof of the Claim, we show that q ≤ . We construct an adver(C) = q, as follows: The sary C against s-Coll property of H, such that Advs−Coll H adversary C, on receiving a random key K1, chooses another random key K2, and uses A by reseting it as shown in the Experiment II. C returns (K2, M 1, M 2) in its s-Coll game. Advantage of C in s-Coll game will be the same as the probability that the Experiment II returns 1. This can be easily veriﬁed by considering

208

M.R. Reyhanitabar, W. Susilo, and Y. Mu

the condition that the Experiment II returns 1; noticing the deﬁning game of s-Coll property in Fig. 4, and the deﬁnition of predicate Verify(., .). Note that Experiment II returns 1 if Verify(M 1, K1) = 1 ∧ Verify(M 2, K2) = 1 ∧ K1 = K2, and from the deﬁnition of Verify(., .) this means that HK1 (M 1) = HK2 (M 2) = C ∗ ∧ K1 = K2. Hence whenever the Experiment II returns 1, the pair (K1, M 1) = (K2, M 2) and HK1 (M 1) = HK2 (M 2), i.e. C succeeds in s-Coll attack against H. Theorem 4. Fix the values of the parameters for hash functions as indicated in Fig. 5. The following separations hold (where c and c are small constant values and t = t − c): −m+1 1. s-Sec Coll: Advs−Sec (t ) ≤ Advs−Sec (t) + 2 , and AdvColl G3 (c ) = 1. G3 H

2. s-Sec aSec: Advs−Sec (t ) ≤ Advs−Sec (t) + G1 H

Advs−Sec (t) + 2−k−m + 2−k , H

(t ) ≤ Advs−Sec (t) + 3. s-Sec aPre: Advs−Sec G1 H

Advs−Sec (t) + 2−k−m + 2−k , H

and AdvaSec G1 (c ) = 1.

re and AdvaP G1 (c ) = 1.

(t ) ≤ Advs−Sec (t)+ 4. s-Sec eSec: Advs−Sec G4 H

and AdveSec G4 (c ) = 1.

5. s-Sec ePre: Advs−Sec (t ) ≤ Advs−Sec (t)+ G4 H

Advs−Sec (t)+2−k−m +2−m+1 , H

Advs−Sec (t)+2−k−m +2−m+1 , H

re and AdveP G4 (c ) = 1. Coll 6. s-aSec Coll: Advs−aSec (t ) ≤ Advs−aSec (t) + 2−m+1 G3 H , and AdvG3 (c ) = 1.

7. s-aSec eSec: Advs−aSec (t ) ≤ Advs−aSec (t) + G4 H

Advs−aSec (t) + 3 × 2−m , H

8. s-aSec ePre: Advs−aSec (t ) ≤ Advs−aSec (t) + G4 H

Advs−aSec (t) + 3 × 2−m , H

and AdveSec G4 (c ) = 1.

re and AdveP G4 (c ) = 1. 9. s-eSec s-Coll: Advs−eSec (t ) ≤ Advs−eSec (t) + 2−k+1 , and G3 H s−Coll AdvG3 (c ) = 1.

10. s-eSec s-aSec: Advs−eSec (t ) ≤ Advs−eSec (t) + G1 H

Advs−eSec (t) + 2−k+1 , H

11. s-eSec s-aPre: Advs−eSec (t ) ≤ Advs−eSec (t) + G1 H

Advs−eSec (t) + 2−k+1 , H

and Advs−aSec (c ) = 1. G1

12. 13. 14. 15.

re and Advs−aP (c ) = 1. G1 re re s-Pre Coll: Advs−P (t ) ≤ 2Advs−P (t), and AdvColl G5 (c ) = 1. G5 H s−P re s−P re Sec (t), and AdvG5 (c ) = 1. s-Pre Sec: AdvG5 (t ) ≤ 2AdvH re s−P re s-Pre aSec: Advs−P (t ) ≤ 2Adv (t), and AdvaSec G5 (c ) = 1. G5 H s−P re s−P re eSec (t), and AdvG5 (c ) = 1. s-Pre eSec: AdvG5 (t ) ≤ 2AdvH

re re (t ) ≤ Advs−P (t) + 16. s-Pre aPre: Advs−P G1 H

re Advs−P (t) + 2−k , and H

re re 17. s-Pre ePre: Advs−P (t ) ≤ Advs−P (t) + G6 H

re Advs−P (t) + 2−m , and H

re AdvaP G1 (c ) = 1. re AdveP G6 (c ) = 1.

Enhanced Security Notions for Dedicated-Key Hash Functions

18. 19. 20. 21.

s-aPre s-aPre s-aPre s-aPre

209

re re Coll: Advs−aP (t ) ≤ 2Advs−aP (t), and AdvColl G5 (c ) = 1. G5 H s−aP re s−aP re Sec (t ) ≤ 2AdvH (t), and AdvG5 (c ) = 1. Sec: AdvG5 re re (t ) ≤ 2Advs−aP (t), and AdvaSec aSec: Advs−aP G5 (c ) = 1. G5 H s−aP re s−aP re eSec (t ) ≤ 2AdvH (t), and AdvG5 (c ) = 1. eSec: AdvG5

re re 22. s-aPre ePre: Advs−aP (t ) ≤ Advs−aP (t) + G6 H re AdveP G6 (c ) = 1.

re Advs−aP (t) + 2−m, and H

The proof of the cases 2–5, 7–8, 10–11, 16–17, and 22 in this Theorem are quite similar in main parts to that of Theorem 3, where we adapt the Reset Lemma to obtain the square root terms in our upper-bounds. The reductions for the other cases are also straightforward, and hence the proofs are omitted. Theorem 5. For any property xxx ∈ {Coll, Sec, aSec, eSec, Pre, aPre, ePre}, we have xxx s-Pre. Proof. The proof is divided into two parts and can be found in the full version of this paper in [22]: First, G7 is used as a counterexample to show that xxx sPre, for any xxx ∈ {Coll, Sec, aSec, eSec}. Then we use G2 as a counterexample to demonstrate that xxx s-Pre, for any xxx ∈ {Pre, aPre, ePre}. Theorem 6. For any property xxx ∈ {Coll, Sec, aSec, eSec, Pre, aPre, ePre}, we have xxx s-Sec. Proof. The proof can be found in the full version of this paper in [22], where counterexample function G8 is used to show that xxx s-Sec, for any xxx ∈ {Coll, Sec, aSec, eSec}, and G9 is used as a counterexample to show that xxx s-Sec, for any xxx ∈ {Pre, aPre, ePre}.

4

Conclusion

We have extended the set of security notions for dedicated-key hash functions by providing new set of enhanced (strengthened) properties, which includes the well-known enhanced target collision resistance property. The latter property has been proven to be useful to enrich the notions of hash functions, in particular with its application to construct the Randomized Hashing mode for strengthening digital signatures. We have also provided a full picture of relationships among the (thirteen) security properties including the (six) enhanced properties and the previously considered seven properties. It is expected that by future researches the new enhanced properties, introduced in this paper, may also ﬁnd interesting applications in practice. Meanwhile, we notice that these new enhanced properties can be considered as easier targets to attack, by cryptanalysts who are trying to ﬁnd (either certiﬁcational or major) weaknesses in the dedicated-key hash functions; e.g., in some of the NIST SHA-3 candidates. For instance, it might be the case that a hash function resists against the attacks on the (conventional) Coll property, but becomes vulnerable to attacks against the strengthened Coll (i.e. s-Coll) property.

210

M.R. Reyhanitabar, W. Susilo, and Y. Mu

Acknowledgments. We thank Angela Piper, Josef Pieprzyk, Jennifer Seberry, and the anonymous reviewers of FSE 2010 for their constructive comments and suggestions.

References [1] Bellare, M., Rogaway, P.: Collision-Resistant Hashing: Towards Making UOWHFs Practical. In: Kaliski Jr., B.S. (ed.) CRYPTO 1997. LNCS, vol. 1294, pp. 470–484. Springer, Heidelberg (1997) [2] Bellare, M., Palacio, A.: GQ and Schnorr Identiﬁcation Schemes: Proofs of Security against Impersonation under Active and Concurrent Attacks. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, pp. 162–177. Springer, Heidelberg (2002) [3] Contini, S., Steinfeld, R., Pieprzyk, J., Matusiewicz, K.: A Critical Look at Cryptographic Hash Function Literature. In: ECRYPT Hash Workshop (2007) [4] Damg˚ ard, I.: Collision Free Hash Functions and Public Key Signature Schemes. In: Price, W.L., Chaum, D. (eds.) EUROCRYPT 1987. LNCS, vol. 304, pp. 203–216. Springer, Heidelberg (1988) [5] Damg˚ ard, I.B.: A Design Principle for Hash Functions. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 416–427. Springer, Heidelberg (1990) [6] Diﬃe, W., Hellman, M.E.: New directions in cryptography. IEEE Trans. on Information Theory IT-22(6), 644–654 (1976) [7] Goldreich, O.: Foundations of Cryptography. Basic Applications, vol. 2. Cambridge University Press, Cambridge (2004) [8] Halevi, S., Krawczyk, H.: Strengthening Digital Signatures Via Randomized Hashing. In: Dwork, C. (ed.) CRYPTO 2006. LNCS, vol. 4117, pp. 41–59. Springer, Heidelberg (2006) [9] Kelsey, J., Schneier, B.: Second Preimages on n-Bit Hash Functions for Much Less than 2n Work. In: Cramer, R. (ed.) EUROCRYPT 2005. LNCS, vol. 3494, pp. 474–490. Springer, Heidelberg (2005) [10] Menezes, A.J., van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography. CRC Press, Boca Raton (1996) [11] Merkle, R.C.: Secrecy, Authentication, and Public Key Systems. UMI Research Press (1979) [12] Merkle, R.C.: One Way Hash Functions and DES. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 428–446. Springer, Heidelberg (1990) [13] Naor, M., Yung, M.: Universal One-Way Hash Functions and Their Cryptographic Applications. In: Proceedings of the 21st ACM Symposium on the Theory of Computing–STOC 1989, pp. 33–43. ACM, New York (1989) [14] National Institute of Standards and Technology. FIPS PUB 180-2: Secure Hash Standard (August 2002) [15] National Institute of Standards and Technology. FIPS PUB 180-3: Secure Hash Standard (June 2007) [16] National Institute of Standards and Technology. NIST SP 800-106: Randomized Hashing for Digital Signatures (February 2009), http://www.csrc.nist.gov/publications/PubsSPs.html#800-106 (September 20, 2009) [17] National Institute of Standards and Technology. Cryptographic Hash Algorithm Competition, http://csrc.nist.gov/groups/ST/hash/sha-3/index.html (September 20, 2009)

Enhanced Security Notions for Dedicated-Key Hash Functions

211

[18] Preneel, B.: Analysis and Design of Cryptographic Hash Functions. Doctoral dissertation, K. U. Leuven (1993) [19] Rabin, M.O.: Digitalized Signatures. In: Lipton, R., DeMillo, R. (eds.) Foundations of Secure Computation, pp. 155–166. Academic Press, New York (1978) [20] Reyhanitabar, M.R., Susilo, W., Mu, Y.: Enhanced Target Collision Resistant Hash Functions Revisited. In: Dunkelman, O. (ed.) FSE 2009. LNCS, vol. 5665, pp. 327–344. Springer, Heidelberg (2009) [21] Reyhanitabar, M.R., Susilo, W., Mu, Y.: An Investigation of the Enhanced Target Collision Resistance Property for Hash Functions. Cryptology ePrint Archive, Report 2009/506 (2009) [22] Reyhanitabar, M.R., Susilo, W., Mu, Y.: Enhanced Security Notions for Dedicated-Key Hash Functions: Deﬁnitions and Relationships. Cryptology ePrint Archive, Report 2010/022 (2010) [23] Rivest, R.: The MD5 Message-Digest Algorithm. RFC 1321 (April 1992), http://www.ietf.org/rfc/rfc1321.txt (September 19, 2009) [24] Rogaway, P., Shrimpton, T.: Cryptographic Hash-Function Basics: Deﬁnitions, Implications, and Separations for Preimage Resistance, Second-Preimage Resistance, and Collision Resistance. In: Roy, B.K., Meier, W. (eds.) FSE 2004. LNCS, vol. 3017, pp. 371–388. Springer, Heidelberg (2004) [25] Rogaway, P., Shrimpton, T.: Cryptographic Hash-Function Basics: Deﬁnitions, Implications, and Separations for Preimage Resistance, Second-Preimage Resistance, and Collision Resistance. Cryptology ePrint Archive: Report 2004/035 (Revised version of [24]: August 9, 2009) [26] Rogaway, P.: Formalizing Human Ignorance: Collision-Resistant Hashing without the Keys. In: Nguyˆen, P.Q. (ed.) VIETCRYPT 2006. LNCS, vol. 4341, pp. 211– 228. Springer, Heidelberg (2006) [27] Yasuda, K.: How to Fill Up Merkle-Damg˚ ard Hash Functions. In: Pieprzyk, J. (ed.) ASIACRYPT 2008. LNCS, vol. 5350, pp. 272–289. Springer, Heidelberg (2008) [28] Zheng, Y., Matsumoto, T., Imai, H.: Connections among several versions of oneway hash functions. In: Proceedings of IEICE, Special Issue on Cryptography and Information Security, Japan (1990)

A Uniﬁed Method for Improving PRF Bounds for a Class of Blockcipher Based MACs Mridul Nandi National Institute of Standards and Technology and The George Washington University, Computer Science Department [email protected]

Abstract. This paper provides a unified framework for improving PRF (pseudorandom function) advantages of several popular MACs (message authentication codes) based on a blockcipher modeled as RP (random permutation). In many known MACs, the inputs of the underlying blockcipher are deﬁned to be some deterministic aﬃne functions of previously computed outputs of the blockcipher. Keeping the similarity in mind, a class of ADEs (aﬃne domain extensions) and a wide subclass of SADEs (secure ADEs) are introduced in the paper which contain following constructions C = {CBC-MAC, GCBC∗ , OMAC, PMAC}. We prove that all SADEs have PRF advantages O(tq/2n + N (t, q)/2n ) where t is the total number of blockcipher computations needed for all q queries and N (t, q) is a parameter deﬁned in the paper. The PRF advantage of any SADE is O(t2 /2n ) as we can show that N (t, q) ≤ 2t . Moreover, N (t, q) = O(tq) for all members of C and hence these MACs have improved advantages O(tq/2n ). Eventually, our proposed bounds for CBC-MAC and GCBC∗ become strictly better than previous best known bounds. Keywords: aﬃne domain extension, PRF, random permutation, CBCMAC.

1

Introduction

Domain extension is a method to construct an extended function over an arbitrary domain when underlying function(s) over small domain are given. A common practice is to design domain extensions whose extended functions achieve some desired security whenever their underlying functions are assumed to have similar security. For example, it is well known that Merkle-Damg˚ ard with length strengthening padding [7,16] extends a collision resistant compression function to a collision resistant hash function. Similarly MACs (message authentication codes) are also domain extensions extending small domain PRPs (pseudorandom permutations [13]) or PRFs (pseudorandom functions [8]) to arbitrary domain PRFs. A PRF and PRP have negligible advantage to be distinguished from the RF (random function) and RP (random permutation) respectively by any (q, t, )distinguisher (which makes q queries with and t invocations of RP to compute the output of the longest query and all queries respectively). Any tuple of q such S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 212–229, 2010. c International Association for Cryptologic Research 2010

A Uniﬁed Method for Improving PRF Bounds

213

queries or messages are also called (q, t)- or (q, t, )-messages. In this paper we study MAC domain extensions based on a single blockcipher, modeled to be a RP on {0, 1}n (or the Galois ﬁeld (F2n , +, ·, 0, 1) treated equally in the paper). 1.1

Related Works: PRF Security Analysis of Known MACs

The basic and old domain extension method based on blockcipher is CBC[3] which was proven secure for preﬁx-free message spaces. Afterwards, many different variants of CBC are proven secure for arbitrary domains. In this paper we are mainly interested in the following domain extensions: C = {CBC-MAC [5], OMAC [9], GCBC∗ [19]1 , PMAC [6]} and (directed acyclic graph) DAG-based PRFs [11,20]. Our paper continues the following two lines of research which have been studied recently. (1) Unifying Known Domain Extensions: In [11] a class of DAG based domain extensions (Jutla’s class) was proposed where each non-singular DAG or a family of non-singular DAGs (see Deﬁnition 4) corresponds to a domain extension. Each node of a DAG represents blockcipher invocation with input as a message block xor-ed with previously computed blockcipher-outputs corresponding to the predecessor nodes. Even though the Jutla’s class contains CBC, GCBC∗ and others, it does not include those which encrypt a constant block (e.g. OMAC and PMAC encrypt the zero block 0). If we add a special node representing to the encryption of 0 then OMAC and PMAC can be included (as described in Nandi’s class [20]). (2) Finding Improved Bounds of PRF Advantages: The original PRF bound for the members of C and DAG-based constructions is about t2 /2n (sometimes 2 q 2 /2n ) [3,4,5,6,9,10,11,20,22]. The improved bound q 2 /2n for CBC-MAC was shown ﬁrst time in [2]. Afterwards, similar or better improved bounds were shown for PMAC, OMAC [14,15,18] and others, e.g. EMAC [22,23], XCBC and TMAC [5,12,14] (see Table 1 for existing PRF bounds). 1.2

Motivation and Our Results

(1) We Unify Many Known Domain Extensions. It is always worthwhile to unify similar objects and study them under one umbrella. It helps us to understand the basic proof nature and to come up with new eﬃcient constructions. In this paper we consider a more general class, called ADEs (aﬃne domain extensions). It consists of all known domain extensions which invoke the underlying blockcipher π in a sequence such that inputs of π are determined from previous outputs via some aﬃne functions. Moreover, the output of the domain extension is the last output of π. 1

GCBC∗ is one example of one-key GCBC [19], a general class of CBC-type constructions which can include any number of keys. For simplicity, we only consider a particular one-key GCBC∗ which is eventually included in [11].

214

M. Nandi

Deﬁnition 1. [Aﬃne Domain Extension or ADE] A domain extension D is called ADE over a message space M if for each message M ∈ M there is a lower triangular matrix C = ((ci,j ))0≤j≤ 1≤i≤ (i.e. ci,j = 0, for all i ≤ j) and = (M ) such that Dπ (M ) = y(), ∀π ∈ Pn (the set of all permutations on F2n ) where the ith intermediate output y(i) = π(x(i)) and the ith intermediate input x(i) = ci,0 + i−1 j=1 ci,j · y(j), 1 ≤ i ≤ . The matrix C is called coeﬃcient matrix corresponding to M . In Section 4, we identify a class of PRF secure domain extensions and call them SADE (secure aﬃne domain extensions). It contains all modiﬁed non-singular DAG-based PRFs (i.e. Nandi’s class) and the members of C. In Theorem 2 we prove that CBC-MAC with preﬁx-free message space, OMAC, PAMC and GCBC∗ are secure aﬃne domain extensions. Non-secure ADE does not necessarily mean insecure constructions. We do not know any generic method to distinguish nonsecure ADE. The other mentioned constructions such as EMAC, XCBC, TMAC, etc. do not directly ﬁt into this class due to presence of one or more extra independent keys (auxiliary keys or/and blockcipher keys). They mostly have underlying CBC type structure. For example, the PRF security of EMAC can be reduced to collision probability of a ADE CBC-MAC [23]. Generalized CBC class or GCBC includes these constructions and they have been studied in [19]. This is beyond scope of our paper. (2) We Find a PRF Bound for the Unified Class. Security analysis of all DAG-based constructions are based on the model that the underlying blockcipher is a random function and hence we can not go beyond t2 /2n bound in the RPmodel of blockcipher due to the switching lemma [3] (switching from RF to RP costs t2 /2n ). So we need to ﬁnd a diﬀerent method to obtain improved bounds. The proof idea of [2] for the CBC-MAC uses structure graph which is also not suitable for a general ADE. We use equivalence relation which is more appropriate to describe collision patterns on inputs of the blockcipher π during outputs. qthe computation of a ADE i Given any q messages M1 , . . . , Mq , let t = i=1 i , i = (Mi ) and ti = j=1 i . We write all intermediate inputs x(1), . . . , x(t) (also outputs y(1), . . . , y(t)) in the order of the computations of Dπ (M1 ), . . . , Dπ (Mq ). We call the function xπ or x : [1, t] → F2n intermediate input function associated with permutation π (and the messages M1 , . . . , Mq ). Similarly we deﬁne intermediate output function y (or y π ). Since intermediate inputs are eventually aﬃne functions of intermediate outputs, there is a t × (t + 1) lower triangular matrix A (called joint coeﬃcient matrix) such that A·y = x where y = (1, y(1), . . . , y(t)) and x = (x(1), . . . , x(t)), called intermediate output and input vectors respectively. Deﬁnition 2 (Collision Relation). To any permutation π and a tuple of q messages (M1 , . . . , Mq ) ∈ Mq we associate an equivalence relation ∼ (called collision relation) on [1, t] := {1, 2, . . . , t} where all colliding inputs of π during the

A Uniﬁed Method for Improving PRF Bounds

215

computations of Dπ (M1 ), . . . , Dπ (Mq ) represents the equivalence classes. More precisely, i ∼ j if and only if x(i) = x(j) where x(1), . . . , x(t) is the sequence of all inputs of π while computing outputs of M1 , . . . , Mq . Whenever we ﬁx messages M1 , . . . , Mq or understood from the context, we denote the collision relation by ∼π . The collision relation characterizes that which intermediate inputs collide and which do not, independent of the actual values of inputs. There may be more than one permutation associating a collision relation. On the other hand, there may exist equivalence relations which are not collision relations for any permutation. We provide a characterization of collision relation in Lemma 2. We later see that some collisions on inputs (i.e. x(i) = x(j) for i = j) are trivially derived from the previous occurred collisions (see Example 1). A set of accidents is the largest set of collisions which can not be derived from the other collisions only. All non-accident collisions are derived form the accidents. The set of accidents is very much analogous with a basis of a vector space2 . A more formal deﬁnition of accidents in terms of basis of a vector space is given in Deﬁnition 3. Let N (Mi , Mi ) N (t, q) = max (q,t)−messages (M1 ,...,Mq )

1≤i
where N (Mi , Mi ) is the number of collision relations ∼ with one accident such that Dπ (Mi ) or Dπ (Mi ) collide with a diﬀerent intermediate output. A precise deﬁnition is given in Section 6. In Theorem 3 we prove that for any SADE D, 3qt+N (t,q) Advprf if ≤ 2n/3−1 . Moreover, in Theorem 4 we show that D (q,t,) ≤ 2n t t2 N (t, q) ≤ 2 and hence Advprf D (q, t, ) ≤ 2n−2 . (3) We Find Improved PRF Bounds for CBC and GCBC∗ . Because of Theorem 3, we only need to have a better estimation of N (t, q) for a given SADE. In Section 6, we show that N (t, q) is O(tq) for each member of C and hence we obtain the improved PRF bounds O(tq/2n ) for all members of C. In Theorem 5 we show that CBC with preﬁx-free message space, OMAC, GCBC∗ , PMAC have PRF advantages O(tq/2n ). We do not know whether this upper bound holds for all secure aﬃne domain extensions or not (this would be a challenging future work). Our improved bounds (see Table 1 for comparison) are better than some of the previously known best bounds, namely q 2 /2n for CBC-MAC[2], and t2 /2n for GCBC∗ [19]. Note that the bound q 2 /2n can be worse compare to tq/2n or even t2 /2n if the query sizes are scattered enough. For example, when = q = t/2 = 2n/3 (this can happen if one message has blocks and all other messages have only one or two blocks) then q 2 /2n = 1 and hence no security is guaranteed with q 2 /2n bound. On the other hand, 2qt/2n = 4t2 /2n = 2n/3 are negligible. So proving t2 /2n or tq/2n bound still guarantee the security. 2

In fact, it is deﬁned via a basis or generator of a set of vectors Veq deﬁned in Section 3.1 (also see Section 3.2).

216

M. Nandi Table 1. PRF bounds for (q, t, )-distinguishers. R1: < 2n/3−1 Name of PRF CBC-MAC [3] PMAC [6] OMAC [9] GCBC∗ [19] DAG-based [11,20] SADE [this paper]

2

Our PRF bounds Best Known Bounds Other Bounds 11tq 20q 2 t2 (R1) (R1) [2] [4,20] n n 2 2 2n 2 5tq 5tq 10q (R1) [15] [14] 2n 2n 2n2 9tq 5tq 3.5t (R1) (R1) [18] [10] 2n 2n 2 2n 11tq 4t (R1) [19] 2n2 2n 2 t t [11,20] 2n−2 2n 3tq N (t, q) + (R1) 2n 2n

Preliminaries

We follow the notations described in introduction throughout the paper. We write P(m, r) = m(m − 1) · · · (m − r + 1). We denote (i, j)th entry and ith row of an s × (s + 1)-matrix A by ai,j and Ai respectively, 1 ≤ i ≤ s, 0 ≤ j ≤ s. The domain and range of a function g : J → F2n are J and g(J) := {g(j) : j ∈ J} respectively and denoted by D(g) and R(g) respectively. If J ⊆ J then g(J ) is the range of g|J restricted on the domain J . We denote the functions having domain [1, t] := {1, 2, . . . , t} (index-set) by x, y, f, g etc. The equivalence class containing i is [[i]] and the minimum element of the class is called leader. The set of all leaders is denoted by Ld(∼) := {ı1 , . . . , ıs }. For any function g : J → F2n , the induced equivalence relation ∼g is deﬁned as i ∼g j if and only if g(i) = g(j).3 Two function f and g of same domain are said to be equality-matching if ∼f =∼g and we denote it by f g. 2.1

Decorrelation Theorem

We state a useful result (Lemma 22 of [26]) for PRF security analysis (the result is also applicable for (strong) pseudorandom permutation [24], pseudo online cipher [20], etc.) and we call it Decorrelation Theorem. The main idea of the theorem was described as Patarin’s “coeﬃcient H-techniques” [21] (according to Vaudenay [26,25]). Diﬀerent generalized versions of the theorem are stated in [4,20]. For a deterministic adaptive distinguisher, the queries and the number of blocks of queries may be dependent random variables. The decorrelation theorem gets rid of the correlation and reduces PRF security analysis of D to show that the q-decorrelation probability μM,w := Pr[DΠ (M1 ) = w1 , . . . , DΠ (Mq ) = ∗ 1 (equals to q-decorrelation probability for RF on wq : Π ← Pn ] is very close to 2nq F2n ) where M = (M1 , . . . , Mq ) ∈ Mq and w = (w1 , . . . , wq ) ∈ Fq2n two q-tuples 3

We distinguish the notation ∼g (induced relation from a function) and ∼π (collision relation associated with a permutation π) as g is a function and π is a permutation.

A Uniﬁed Method for Improving PRF Bounds

217

of distinct elements. We call such M and w coordinate-wise distinct. A big advantage in the computation of μM,w is that the source of randomness is only ∗ from the uniform distribution of Π over Pn which we denote by Π ← Pn . We write the set {w1 , . . . , wq } by W . The distinguishing advantage of a domain extension D over a message space M based on a random permutation Π is deﬁned as follows: Π RF Advprf = 1] − Pr[AD = 1], and D Π (A) = Pr[A prf Advprf D Π (q, t, ) = max AdvD Π (A) A

where the maximum is taken over all (q, t, )-distinguishers. Theorem 1. Decorrelation Theorem (Lemma 22 of [26]) Let q, t and be ﬁxed integers, be some positive real number (may depend on q, t, ) and Dπ : M → F2n be a domain extension, π ∈ P n such that μM,w ≥ (1 − ) × 2−nq for all coordinate-wise distinct M, w with qi=1 (Mi ) ≤ t and (q, t, ) ≤ + q(q−1) . maxi (Mi ) ≤ . Then Advprf 2n+1 DΠ The intuitive reason why it works for bounding PRF advantage is the following: Any adaptive distinguisher eventually makes decision based on all queries and responses. So if for any possible set of queries, the responses of D is almost uniformly random then no adaptive distinguisher can distinguish it from a random function with non-negligible probability. The proof of the above theorem is given in the Appendix. Security analysis of PRF base on any blockcipher EK is same if we incorporate the PRP advantage of the EK by using the well known hybrid argument technique. By using hybrid technique it is well known that prp Advprf (q, t, ) ≤ Advprf D Π (q, t, ) + AdvEK (t). D EK

3

Results on Intermediate Input, Output Functions and Collision Relations

From now onwards we ﬁx an aﬃne domain extension D (see Deﬁnition 1) and q distinct messages M1 , . . . , Mq , q ≥ 1. Let At×(t+1) = ((ai,j )) be its joint coeﬃcient matrix (see Section 1). Note that when q = 1 the joint coeﬃcient matrix is nothing but the coeﬃcient matrix. We denote the ith row of the matrix by Ai . Recall that to any permutation π, we associate an intermediate input function x (or xπ ) and output function y (or y π ), respectively where x(i) = i−1 j=0 ai,j · y(j) (denoted by y x) and π(x(i)) = y(i), ∀i. 3.1

Intermediate Output Function and Collision Relation

An intermediate output function y can be associated with more than one permutations, e.g. y π1 = · · · = y πr for some permutations π1 , . . . , πr . Let Pn [y] denote the set of all permutations with y as an output function. In other words, Pn [y] is the collection of all permutations so that the computation of Dπ (M1 ), . . . , Dπ (Mq ) gives exactly the same sequence of intermediate inputs and outputs namely x(1),

218

M. Nandi

y(1), . . ., x(t), y(t). Clearly, all these permutations have to agree on the sets of all intermediate inputs as π(x(i)) = y(i), ∀i, ∀π ∈ Pn [y]. We denote the number of distinct elements of x(i)’s (or y(i)’s) by s. Hence #Pn [y] = (2n − s)! whenever y is an output function. Recall that x y if x(i) = x(j) ⇔ y(i) = y(j) (i.e. the collision patterns of x and y are the same) which is a necessary condition for intermediate input and output functions. So a function y : [1, t] → F2n is not an intermediate output function if x y where y x. Lemma 1. (characterization of an intermediate output function). A function y : [1, t] → F2n is an intermediate output function if and only if y x where y x and in this case #Pn [y] = (2n − s)!. We recall that a collision relation is the equivalence relation capturing the collision pattern of xπ (and equivalently y π ) associated with π (see Deﬁnition 2). We have already characterized intermediate output function in Lemma 1. In this section we provide a characterization of collision relations since all equivalence relations on [1, t] are not necessarily collision relations. For any (t+1)-vector v = (v0 , v1 , . . . , vt ) ∈ Ft+1 2n and any arbitrary equivalence relation ∼ we deﬁne a ∼reduced vector v∼ = (v0 , v1∼ , . . . , vt∼ ) where vi∼ = j∈[[i]] vj , if i ∈ Ld(∼), o.w. vi∼ = 0. We mainly consider the ∼-reduced vectors for all row vectors Ai ’s of the joint coeﬃcient matrix A. If y = (1, y(1), . . . , y(t)) for an intermediate output ∼ function inducing a collision relation ∼=∼y then (A∼ i −Aj )·y = x(i)−x(j) = 0 if and only if i ∼ j. So we also deﬁne the following sets of (t + 1)-vectors. ∼ 1. Veq := {A∼ i − Aj : i ∼ j}, ∼ 2. Vneq := {Ai − A∼ ∼ j}. j : i

Thus, ∀v ∈ Veq , v · y = 0. Similarly, ∀v ∈ Vneq , v · y = 0. Let ek ∈ Ft+1 2n be th the (t + 1)-vector whose k entry is 1 and all others are 0, 0 ≤ k ≤ t. Then (ei − ej ) · y = y(i) − y(j) = 0 whenever i ∼ j. Also, e0 · y = 1 = 0. So we deﬁne the following two sets of vectors: ∗ ∗∗ ∗ Vneq := Vneq ∪ {ei − ej : i ∼ j}, Vneq := Vneq ∪ {e0 }. ∗∗ We have ∀v ∈ Vneq , v·y = 0. In summary, intermediate output vector y is ∗∗ in the null space of the set of vectors Veq and not in the null space of Vneq . So ∗∗ clearly a necessary condition for a collision relation is that the vectors of Vneq is not in the span of Veq . This necessary condition is also a suﬃcient condition as described in the following lemma. We provide an estimate of the size of the set Pn [∼] = {π :∼π =∼}, the set of all permutations associating the collision relation ∼ (a similar deﬁnition is given for Pn [y] and we distinguish this two notation by the argument y which is a function, and ∼ which is an equivalence relation). We use the well known result regarding the number of solutions of a system of linear equations.

Lemma 2 (Characterization of Collision Relation). Let ∼ be a collision relation then there exists y : [1, t] → F2n such that

A Uniﬁed Method for Improving PRF Bounds

219

N1: v0 + tj=1 vj · y(j) = 0 for all (v0 , v1 , . . . , vt ) ∈ Veq , ∗∗ = 0 for all (v0 , v1 , . . . , vt ) ∈ Vneq . N2: v0 + tj=1 vj · y(j) ∗∗ Hence, a necessary condition for a collision relation is that each vector of Vneq ∗∗ is linearly independent with Veq . Conversely, if Vneq is linearly independent with Veq (i.e. ∼ satisﬁes the above necessary condition) then

(2n − s)! × 2n(s−a) × (1 −

∗ #Vneq ) ≤ #Pn [∼] ≤ (2n − s)! × P(2n , s − a) 2n

where s = #Ld(∼) and a = acc(∼) := rank(Veq ) (the number of accidents). ∗ Hence ∼ is a collision relation if #Vneq < 2n . The proof of the lemma is given in the full version of the paper [17]. We provide a sketch of the proof. It is easy to see that if both N1 and N2 are true for some function y then y x for y x and hence y is an intermediate output function. From Lemma 1 we know that the number of permutations associated to each such intermediate output function is (2n − s)! where s is the number of equivalence classes of ∼. It remains to estimate the number of intermediate output functions inducing the collision relation ∼. By using the well known result regarding the number of solutions of the system of linear equations, we know that the number of vectors satisfying N1 is exactly 2n(s−a) . If we restrict the solutions such that y(i) = y(j) whenever i ∼ j then the number of solutions is at most P(2n , s − a). n So #Pn [∼] ≤ (2 − s)! × P(2n, s − a). To obtain the lower bound, note that any ∗∗ vector v ∈ Vneq is linearly independent with Veq and hence the rank of Veq ∪ {v} ∗∗ is (a + 1). If we remove all solutions satisfying N1 and v · y = 0 for each v ∈ Vneq , from the set of solutions of N1 then it gives the set of all solutions satisfying N1 and N2. So we have the lower bound. Corollary 1. Pr[∼Π =∼] ≤ 3.2

1 P(2n −s+a,a)

where a = acc(∼).

Generator and Number of Accidents of a Collision Relation

So far we have deﬁned intermediate input function xπ , output function y π and a collision relation ∼π associated with any permutation π. Now we see that not all collisions (i.e. x(i) = x(j), i = j) are unexpected. That means there is a set of collision pairs which imply all other collisions independent of the underlying permutation. Generator is a set representing the minimum such set and the number of such collisions are called accident (which is eventually the rank of Veq , the size of a basis of Veq ). There can be more than one basis. We choose a special basis in a particular manner so that it uniquely determines the collision relation. Deﬁnition 3. The generator Gen := Gen(∼) = ((i1 , j1 ), . . . , (ia , ja )) of a relation ∼ corresponds to a maximal linearly independent set of vectors (basis) B := {Ai1 − Aj1 , . . . , Aia − Aja } of Veq where the pairs of indices (ik , jk )’s are

220

M. Nandi

chosen as smallest as possible w.r.t. the dictionary order ≺ on [1, t](2) := {(i, j) : i > j}.4 The number of accident is deﬁned as a = acc(∼) := rank(Veq ). Note that the number of accidents a, and the generator Gen deﬁned above must be unique which can be deﬁned recursively as follows. The pair (ik , jk ) is ∼ the smallest related pair (i, j) larger than (ik−1 , jk−1 ) such that (A∼ i − Aj ) is ∼ ∼ not linearly independent with {Aic − Ajc : c < k}. So a relation ∼ uniquely determines the generator Gen(∼). Now we state that the converse is also true, i.e. a generator uniquely determines a relation. The proof is given in [17]. From now onwards, all missing proofs of our paper are given in [17]. Lemma 3. Any relation ∼ satisfying the necessary condition of Lemma 2 is uniquely determined by its generator Hence the number of collision Gen(∼). a relations with a accident is at most 2t . ∗

Corollary 2. Pr[acc(∼Π ) ≥ 2 : Π ← Pn ] ≤

t2 2n

if t < 2n/2−1 .

Remark 1. The generator of a collision relation actually represents the set of all unexpected collisions. Each unexpected collision can occur with probability roughly about 1/2n and these are independent to each other (Corollary 2). All other collisions present in the collision relation are implied from these. For example, CBC can have intermediate collisions for two messages if the messages have common preﬁx. We see the following example where there are are three collisions among which two of these are unexpected and the third collision can be derived from these two. Example 1. This is an example considered for CBC in [2]. Now we revisit it in our joint coeﬃcient matrix notations. Let M = (α1 , α2 , α3 ) and M = (α1 , α2 , α3 ) such that α1 ⊕ α3 = α1 ⊕ α3 . Now, consider a relation ∼= {{1, 6}, {2, 5}, {3, 4}}. Thus, Ld(∼) = {1, 2, 3}. The coeﬃcient matrix A = AM,M of CBC and the ∼ reduced matrix A are computed below: ⎛ ⎞ ⎛ ⎞ α1 0 0 0 0 0 0 α1 0 0 0 0 0 0 ⎜ α2 1 0 0 0 0 0 ⎟ ⎜ α2 1 0 0 0 0 0 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ α3 0 1 0 0 0 0 ⎟ α3 0 1 0 0 0 0 ⎟ M,M ∼ ⎜ ⎜ ⎟ A =⎜ ⎟, A = ⎜ α1 0 0 0 0 0 0 ⎟ ⎜ α1 0 0 0 0 0 0 ⎟ ⎜ ⎟ ⎝ α2 0 0 0 1 0 0 ⎠ ⎝ α2 0 0 1 0 0 0 ⎠ α3 0 0 0 0 1 0 α3 0 1 0 0 0 0 ∼ ∼ ∼ Note that A∼ 1 +A6 = A3 +A4 and hence the collision y1 = y6 is determined by ∼ the collision y3 = y4 . The set Veq = {A∼ i −Aj : i ∼ j} has only two independent vectors. So the rank for the relation is two, even though it has three pairs which are related (it is termed as true collision in [2]). 4

(i, j) ≺ (i , j ) if and only if either i < i or i = i , j < j . The notation (i, j) (i , j ) means either (i, j) ≺ (i , j ) or (i, j) = (i , j ). Whenever we denote i ∼ j we mean i > j, i.e. (i, j) ∈ [1, t](2) . The notion of smaller and larger for pairs are based on the dictionary order.

A Uniﬁed Method for Improving PRF Bounds

4

221

Secure Aﬃne Domain Extensions

In Deﬁnition 1 we have deﬁned aﬃne domain extension. In this section we study a subclass called secure aﬃne domain extension or SADE. The cipher block chaining message authentication code or CBC-MAC [3,22] is a very basic and old method to extend the domain of PRF. Later, many CBC-type domain extensions were proposed. For a message M , let M ∗ = M 10d with the smallest nonnegative d so that n | |M ∗ | (n divides |M ∗ |). If n | |M | then δ = 0, M = M , otherwise δ = 1 and M = M ∗ . We represent M and M ∗ by (α1 , . . . , αb ) ∈ Fb2n . The integer b := b(M ) = |M |/n is called the number of blocks of M . The keyed blockcipher is denoted by π ∈ Pn . We show some CBC-type domain extensions such as CBC-MAC, GCBC∗ , OMAC, and others such as PMAC, DAG-based PRF are aﬃne domain extensions. The deﬁnitions of these are based on some distinct non-0, non-1 constants ci ’s and cδ such that their diﬀerences are not 1. The original choices of constants can be found in their respective papers [6,9,19]. In the following, we deﬁne y(i) = π(x(i)). ⎛

α1 ⎜ α2 ⎜ ⎜ C1 = ⎜ α3 ⎜ . ⎝ .. αb

0 1 0 .. . 0

0 0 1 .. . 0

⎞ ⎛ ... 0 0 α1 0 ⎜ α2 1 ... 0 0⎟ ⎟ ⎜ ⎜ ... 0 0⎟ ⎟ C2 = ⎜ α3 0 ⎜ . . .. .. ⎟ ⎝ .. .. . .⎠ ... 1 0 αb 0

0 0 1 .. . 0

⎞ ⎛ ... 0 0 0 ⎜ α1 ... 0 0⎟ ⎟ ⎜ ⎜ ... 0 0⎟ ⎟ C3 = ⎜ α2 ⎜ . .. .. ⎟ ⎝ .. . .⎠ . . . cδ 0 αb+1

0 0 0 .. . cδ

0 0 1 .. . 0

⎞ ... 0 0 ... 0 0⎟ ⎟ ... 0 0⎟ ⎟ .. .. ⎟ . .⎠ ... 1 0

Fig. 1. C1 , C2 and C3 are the coeﬃcient matrices of CBC, GCBC∗ , and OMAC respectively for the message M

CBC-MAC [3]: The CBC-MAC is CBC applied to the padded message. Let (M ) = b and the input function x(1) = α1 and x(i) = αi + y(i − 1), 2 ≤ i ≤ b. ∗ the messages with b ≥ 2 and it is GCBC∗ [19]: In case of GCBC , we consider ∗ π deﬁned as (GCBC ) (M ) = π αb + cδ · CBCπ (α1 , . . . , αb−1 ) for some constants c0 and c1 . The input function is same as CBC-MAC except the ﬁnal intermediate input x(b) = αb + cδ · y(b − 1). OMAC [9]: OMACπ (M ) = π αb +cδ ·π(0)+CBCπ (α1 , . . . , αb−1 ) where CBCπ (λ) = 0, λ is the empty string. Let (M ) = b + 1 and the input function is x(1) = 0, x(i) = αi−1 +y(i−1), 2 ≤ i < b+1 and x(b+1) = αb +cδ ·y(1)+y(b). b−1 PMAC [6]: PMACπ (M ) = π(αb + i=1 π(αi + ci · π(0)) + cδ · π(0)). So (M ) = b + 1 and the input function x(1) = 0, x(i) = αi−1 + ci · y(1), 2 ≤ i < b, and b x(b + 1) = αb + cδ · y(1) + i=2 y(i). DAG-based PRF [11,20]: In [11,20] a domain extension over a message space M = F2n is proposed for every non-singular labeled DAG G = ([1, ], E, c) where E is the set of arcs and c : E → F2n corresponds to the label. In [11], a more general domain extension is deﬁned for arbitrary messages by considering a family

222

M. Nandi

of DAGs where each DAG corresponds to the domain extension with ﬁxed length messages after padding. The general deﬁnition includes CBC-MAC for arbitrary message space, the version of GCBC considered in our paper. In [20], a much bigger class is considered which can include PMAC and OMAC. All these constructions are aﬃne domain extensions. Here we show it for the construction based on a labeled DAG with message space M = F2n and leave readers to verify for other cases. Deﬁnition 4. A DAG G with nodes [1, ] and a color function c : E → F2n is called non-singular [11] if there exists exactly one source node (in-degree is zero), one sink node (out-degree is zero) and for any two nodes v and v with same set of incident nodes U (i.e. U = {u : u → v} = {u : u → v }), there exists u ∈ U such that c(u, v) = c(u, v ). The nodes are numbered in such a way that u → v implies u < v. This is possible since G has no cycle. Given a message M = (α1 , . . . , α ) ∈ F2n , let = (M ) = and DAGG (M ) = y() where the input and output functions are c(v , v) · y(v ), y(v) = π(x(v)), 1 ≤ v ≤ . x(v) = αv + v →v

Hence any DAG-based domain extension is ADE. The (i, j)th entry of the coeﬃcient matrix is ai,j = c(i, j) if i → j, otherwise ai,j = 0. The (i, 0)th entry is the the ith message block αi . It is easy to verify the following result. Lemma 4. If a DAG G is non-singular then for any message all rows of the coeﬃcient matrix are distinct. If M = M then AM = AM i , 1 ≤ i ≤ . EMAC [22], XCBC [5], TMAC [12] (as these domain extensions require either auxiliary keys or more than one permutation) and XOR-MAC [1] (the output is sum of all previous intermediate outputs instead of the last intermediate output) are some examples of non-ADE PRFs. Now we characterize a class of PRF secure aﬃne domain extension called secure aﬃne domain extensions. Deﬁnition 5 (Secure Aﬃne Domain Extension or SADE). An ADE D is = Dπ (M ), called SADE if for any (i, M ) = ((M ), M ), ∃π ∈ Pn such that y π (i) π 1 ≤ i ≤ where y is the intermediate output function associated with π for the message M . Informally speaking, an ADE D is non-secure (not necessarily insecure) if Dπ (M ) always collide with a speciﬁc intermediate output of π while computing Dπ (M ) for some messages M and M . We call this type of collision “forced collision” (later we see that it is related to a special collision relation called forced collision relation). By knowing the value of Dπ (M ) of a non-secure ADE (even for secretly chosen permutation π), a speciﬁc positioned intermediate output of Dπ (M ) is leaked. This may be an undesired property which could lead a distinguishing attack. For example, we have the following attacks:

A Uniﬁed Method for Improving PRF Bounds

223

CBC-MAC on {0, 1}∗ is not SADE and it has length extension attack due to forced collision. If we modify the deﬁnition of OMAC by choosing c0 = 1 then D(00) = π(0) = y π,M (1), ∀π and a message M (if we set xπ,M (1) = 0). So it is not a SADE and one can show a distinguishing attack exploiting this observation (e.g., D(00) = C ⇒ D(C) = C with probability one). It also explains why we should choose non-1 constants for OMAC. Even though we know some attacks on non-secure ADE, we do not know yet how to make a generic attack on all non-secure constructions. All aﬃne domain extensions avoiding this undesired forced collisions is SADE. 4.1

Examples of SADE

Now we show that all members of C are SADEs. Theorem 2. Member of C and (modiﬁed) non-singular DAG-based domain extensions are SADE. Before we prove it we ﬁrst introduce a special collision relation called forced relation. An equivalence relation ∼∗ is called forced relation if Veq = {0t } ∼ ∗ (i.e. A∼ i = Aj if and only if i ∼ j) and Vneq does not contain the zero vector. So a forced relation clearly satisﬁes the necessary condition of collision relation ∼∗ j and #Pn [∼∗ ] > 0 provided t(t − 1) < 2n (see Lemma 2). So for any i π π there exists a permutation π such that y (i) = y (j). Since rank(Veq ) = 0, there can exist at most one such collision relation (due to Lemma 3). The forced relation is a sub-relation of all collision relations. In other words, if i ∼∗ j then y π (i) = y π (j) for all permutation π (converse may not be true). This can be ∼ ∗ π shown as A∼ i = Aj for all i ∼ j where ∼=∼ . Let M be a message space such n/2−1 . Then for any pair of messages the joint coeﬃcient that maxM∈M (M ) < 2 matrix has at most t rows such that t(t − 1) < 2n . Lemma 5 (Equivalence characterization of SADE). An aﬃne domain extension is SADE if and only if for any tuple of two distinct messages M = (M, M ), the forced collision relation ∼∗ is I-isolated (i.e. i ∼∗ j, for all j = i, i ∈ I = {t1 := , t2 := + }). Proof of Theorem 2. Lemma 4 shows that non-singular DAG-based constructions are SADE. We prove the result for CBC-MAC with preﬁx-free message space. The similar argument will work for other members of C. Let M = (α1 , . . . , α ) and M = (α1 , . . . , α ) be two preﬁx-free messages, i.e. one is not preﬁx to other. Suppose s ≥ 0 with α1 = α1 , . . . , αs = αs , αs+1 = αs+1 . Then s < min{, } and it is called length of common preﬁx. Now deﬁne a collision relation ∼ such that 1 ∼ + 1, . . . , s ∼ s + and all other unequal values are unrelated (clearly, i ∼ i for all i since it is an equivalence relation). Now let A = A(M,M ) then it is easy ∼ to see that A∼ i = Aj if and only if i ∼ j. Hence it must be the trivial collision relation. Thus, CBC is SADE for any preﬁx-free message space. However, if we choose two messages such that one is preﬁx to other then clearly trivial collision relation says that CBC is not a secure aﬃne domain extension.

224

5

M. Nandi

A Uniﬁed PRF Security Analysis for All Secure Aﬃne Domain Extensions

i Let (M1 , . . . , Mq ) be any ﬁxed (q, t)-messages with i = (Mi ) and ti = j=1 j . The ﬁnal and intermediate index sets are I = {t1 , . . . , tq := t} and [1, t] \ I respectively. To any permutation π, we associate the intermediate input and output function xπ : [1, t] → F2n and y π : [1, t] → F2n respectively. We also associate a collision relation ∼π characterizing all collisions on xπ (i) values. Now we compute probability of collision between an intermediate input and a ﬁnal input of Π during the computations of DΠ (M1 ), . . . , DΠ (Mq ). More precisely, = i]. (M1 , . . . , Mq ) := Pr[xΠ (i) = xΠ (j), for some i ∈ I and j It is easy to see that (M1 , . . . , Mq ) ≤ i
(M1 , . . . , Mq ) ≤

1≤i
N (Mi , Mi ) (i + i )4 + 2n − 2 22n

N (M1 , . . . , Mq ) + 2 + 83 tq/2n 2n where N (M1 , . . . , Mq ) := 1≤i
= i] ≤ Lemma 6. (M1 , . . . , Mq ) := Pr[xπ (i) = xπ (j), for some i ∈ I and j N (t,q)+2+83 tq/2n . Moreover, if N (M, M ) ≤ c( + ) for some constant c and 2n all messages M = M with = (M ) and (M ) = then (M1 , . . . , Mq ) ≤ ct(q−1)+2+83 tq/2n . 2n Deﬁnition 6. For a ﬁxed block-wise distinct q-tuple w = (w1 , . . . , wq ), a permutation π is said to be w-regular if type-1: R(y π |I ) (the set of all π-outputs on I, see Section 2) and W = {w1 , . . . , wq } are disjoint (i.e. all intermediate outputs associated with the permutation π are diﬀerent from wi ’s) and type-2: xπ (i) = xπ (j), for all i ∈ I and j = i, i.e. ∼π is I-isolated.

A Uniﬁed Method for Improving PRF Bounds

225

The above Lemma 6 gives probability of type-2 permutations. A random permutation does not satisfy type-1 property if an intermediate output is from the q-set W . Intuitively, the probability that an intermediate output is from W for ﬁrst time (in terms of index) has probability less than 2nq−t . Since there are t such intermediate outputs we have the following result. ∗ [y Π (i) = wj , for some i ∈ I and j] ≤ Lemma 7. PrΠ ←P n

qt 2n −t

≤

qt+t 2n .

Now we explain why we have deﬁned w-regular permutations. Conditioning on the set of all w-regular permutations, the probability that q ﬁnal outputs of the domain extensions for the messages M1 , . . . , Mq are w1 , . . . , wq , is at least 1 −nq . We can P(2n −1,q) . So the conditional decorrelation probability is roughly 2 visualize this due to the following reason: Given that Π is w-regular, all ﬁnal intermediate inputs are fresh as they are diﬀerent from all other intermediate inputs of Π. Moreover, wj ’s do not appear in non-ﬁnal intermediate outputs. Hence the Π-outputs of the ﬁnal intermediate outputs (which are the output of the domain extensions) can be chosen at random so that they are distinct and diﬀerent from the intermediate outputs, in particular w1 , . . . , wq . ∗ Lemma 8. PrΠ ←P [y Π (ti ) = wi , 1 ≤ i ≤ q|Π is w-regular] ≥ n

1 P(2n −1,q)

Armed with these lemmas and decorrelation theorem (Theorem 1) we can prove our main result of this section. 3qt+N (t,q) 2n

Theorem 3. For any SADE D, Advprf D (q, t, ) ≤

is w-regular] . Note that P(2n −1,q) 2tq+t+2+N (t,q) w-regular] ≥ 1 − . 2n

∗ [y Π (ti ) = wi , 1 ≤ i ≤ q] ≥ Proof. μM,w := PrΠ ←P n

if < 2n/3−1 then

83 tq 2n

≤ tq and hence Pr[Π is

Clearly, t + 2 < tq and so μM,w ≥ decorrelation theorem.

if < 2n/3−1 .

(t,q) 1− 3qt+N 2n 2nq

Pr[Π

. The result follows from the

= M and Corollary 3. If N (M, M ) ≤ c × ((M ) + (M )) for all messages M prf (3+c)tq some ﬁxed constant c, then AdvD (q, t, ) ≤ 2n if < 2n/3−1 . Theorem 4. For any SADE D, we have Advprf D (q, t, ) ≤

t2 . 2n−2

Proof. The result is immediate from Theorem 3 and Lemma 3 if we have the restriction on as needed in Theorem 3. To prove the unconditional bound we note n n Pr[Π is w-regular] ∗ ≥ 1−(qt+t)/22nq−t(t−1)/2 since PrΠ ←P [xΠ (ti ) = that μM,w ≥ P(2n −1,q) n Π Π ∗ n ∗ x (j) for all j = i] ≥ Pr[∼ =∼ ] ≥ (1 − t(t − 1)/2 ) (Lemma 2) where ∼ is the forced collision relation. The result follows by using decorrelation theorem.

6

Improved Security Bounds for Members of C

We provide a sketch (the detail can be found in the full version of the paper [17]) of improved security analysis of members of C. We use the following lemmas

226

M. Nandi

proved in [2] (Lemma 12 and Lemma 17 of [2]) and Corollary 3 to provide improved PRF bounds. Lemma 12 of [2]. For the CBC-MAC and any M = M , the number of collision relations of accident one associated with messages M and M such that CBC-MAC(M ) = CBC-MAC(M ) is at most d (|(M ) − (M )|) where d (m) = maxm ≤m d(m) and d(m) denotes the number of divisors of m. Lemma 17 of [2]. For any two preﬁx-free messages M = M , N (M, M ) ≤ 8((M ) + (M )) for CBC-MAC. Improved Security Bound for CBC: By applying Lemma 17 of [2] we have N (t, q) ≤ 4t(q − 1). Hence the CBC-MAC for preﬁx-free message space has the following PRF advantage: Advprf CBC (q, t, ) ≤

11tq if ≤ 2n/3−1 . 2n

Improved Security Bound for GCBC∗ : If (i, j) is a basis of a collision relation ∼ with one accident where both i, j ∈ ∼ {1, , + 1, t := + } then the basis vector A∼ − A = c · e + e + e 0 i−1 j−1 i j for some constant c has only two non-zero entries which are 1 with the column index 1 or more (ignoring the zeroth column). Case-A : δM = δM : In this case one can show easily that ∼ is I-isolated. If ∼ not t ∼ k for some k = t then A∼ k −At can not be multiple of c·e0 +ei−1 +ej−1 . = x : ∼ is I-isolated unless ∼ t. This implies Case-B : δM = δM and x ∼ either − 1 or t − 1 is related to i (< j say). Let − 1 ∼ i − 1. Now A∼ i−1 − A−1 ∼ ∼ is multiple of c · e0 + ei−1 + ej−1 . This is possible only if Ai−1 = A−1 and hence i − 2 ∼ − 2 and so on. So we get 1 ∼ − i + 1 which can not be true ∼ as A∼ 1 − A−i+1 can not be multiple of c · e0 + ei−1 + ej−1 . Similarly one can prove that when i − 1 ∼ t − 1. Case-C : δM = δM and x = x : In this case we reduce to CBC case by dropping the last message block from both the messages. So we can assume that one of the i, j from the set {1, , + 1, t} and hence N (M, M ) ≤ 8( + ). Hence Advprf GCBC∗ (q, t, ) ≤

11tq if ≤ 2n/3−1 . 2n

Security Bound for OMAC: = δM : Suppose I is not isolated in a collision relation ∼ of rank Case-A : δM one and say t ∼ i . Let {(i, j)} be the basis for ∼ such that i, j ∈ I. The ﬁrst ∼ element in A∼ − A must be non-zero (either c − c or c − 1 or cδ ), whereas δ δ δ t i ∼ the ﬁrst element of A∼ − A is zero. Thus, the rank should be more than one. i j Hence, the only possible collision relation of rank one are relations with the basis (i, j) where j ∈ I. So, the number of such relations is at most 2( + ). ∈ I then by a similar Case-B : δM = δM : Suppose we have t ∼ i where i reason, the basis should contain the pair whose one element is from I. So there

A Uniﬁed Method for Improving PRF Bounds

227

are at most 2( + ) many such relations. Now we consider the case when ∼ t. This implies that CBC(M ) = CBC(M ) and accident is still one for CBC. Since δM = δM , M = M . Now by using Lemma 12 of [2], we know that there are at most d(| − |) such relations with one accident. Combining the above two cases, the total number of collision relations with one accident is at most 3( + ) and hence N (M, N ) ≤ 6( + ). Thus, we have PRF-insecurity bound for OMAC as Advprf OMAC ≤

9qt if ≤ 2n/3−1 . 2n

Security Bound for PMAC: It is easy to see that basis of an accident one collision relation must contain a ﬁnal index since the ﬁrst column entry of row or t is cδ which is diﬀerent from those of all other rows. So N (M, M ) ≤ 2( + ). Advprf PMAC ≤

5qt if ≤ 2n/3−1 . 2n

Theorem 5. Each member of C has PRF advantage O(tq/2n ) if < 2n/3−1 .

7

Conclusion and Future Work

We provide a uniﬁed framework for improving PRF advantages of many known blockcipher based domain extensions. We obtain improved bounds O(tq/2n ) for all members of C and our general result can also help to obtain similar improved bound for any aﬃne domain extension, once we know a better estimate of N (t, q). We believe that N (t, q) = O(tq) for all secure aﬃne domain extension and this would be an interesting research area to prove it. The other possible direction of research is to go further beyond O(tq/2n ). To do so we need to ﬁnd a completely new proof technique as our general bound or others proof idea for improved bounds can not do so. Acknowledgement. This work was supported in part by the National Science Foundation, Grant CNS-0937267. Author also wants to thank Ray Perlner, Liting Zhang and anonymous reviewers for their helpful comments.

References 1. Bellare, M., Gu´erin, R., Rogaway, P.: XOR MACs: New Methods for Message Authentication Using Finite Pseudorandom Functions. In: Coppersmith, D. (ed.) CRYPTO 1995. LNCS, vol. 963, pp. 15–28. Springer, Heidelberg (1995) 2. Bellare, M., Pietrzak, K., Rogaway, P.: Improved Security Analysis for CBC MACs. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 527–545. Springer, Heidelberg (2005) 3. Bellare, M., Killan, J., Rogaway, P.: The security of the cipher block chanining Message Authentication Code. In: Desmedt, Y.G. (ed.) CRYPTO 1994. LNCS, vol. 839, pp. 341–358. Springer, Heidelberg (1994)

228

M. Nandi

4. Bernstein, D.J.: A short proof of the unpredictability of cipher block chaining (2005), http://cr.yp.to/papers.html#easycbc ID 24120a1f8b92722b5e1 5fbb6a86521a0 5. Black, J., Rogaway, P.: CBC MACs for arbitrary length messages. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 197–215. Springer, Heidelberg (2000) 6. Black, J., Rogaway, P.: A Block-Cipher Mode of Operations for Parallelizable Message Authentication. In: Knudsen, L.R. (ed.) EUROCRYPT 2002. LNCS, vol. 2332, pp. 384–397. Springer, Heidelberg (2002) 7. Damg˚ ard, I.B.: A Design Principle for Hash Functions. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 416–427. Springer, Heidelberg (1990) 8. Goldreich, O., Goldwasser, S., Micali, S.: How to construct random functions. JACM 33-4, 792–807 (1986) 9. Iwata, T., Kurosawa, K.: OMAC: One-Key CBC MAC. In: Johansson, T. (ed.) FSE 2003. LNCS, vol. 2887, pp. 129–153. Springer, Heidelberg (2003) 10. Iwata, T., Kurosawa, K.: Stronger Security Bounds for OMAC, TMAC, and XCBC. In: Johansson, T., Maitra, S. (eds.) INDOCRYPT 2003. LNCS, vol. 2904, pp. 402– 415. Springer, Heidelberg (2003) 11. Jutla, C.S.: PRF Domain Extension using DAG. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 561–580. Springer, Heidelberg (2006) 12. Kurosawa, K., Iwata, T.: TMAC: Two-Key CBC MAC. In: Joye, M. (ed.) CT-RSA 2003. LNCS, vol. 2612, pp. 33–49. Springer, Heidelberg (2003) 13. Luby, M., Rackoﬀ, C.: How to construct pseudo-random permutations from pseudorandom functions. SIAM Journal on Computing archive 17(2), 373–386 (1988) 14. Minematsu, K., Matsushima, T.: Improved Security Bounds for PMAC, TMAC, and XCBC. In: Biryukov, A. (ed.) FSE 2007. LNCS, vol. 4593, pp. 434–451. Springer, Heidelberg (2007) 15. Nandi, M., Mandal, A.: Improved Security Analysis of PMAC. Journal of Mathematical Cryptology 2(2), 149–162 (2008) 16. Merkle, R.: One Way Hash Functions and DES. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 428–446. Springer, Heidelberg (1990) 17. Nandi, M.: A Uniﬁed Method for Improving PRF Bounds for a Class of Blockcipher based MACs. Cryptology eprint archive 2009/014 (2009) 18. Nandi, M.: Improved security analysis for OMAC as a pseudorandom function. Journal of Mathematical Cryptology 3(2), 133–148 (2009) 19. Nandi, M.: Fast and Secure CBC-Type MAC Algorithms. In: Dunkelman, O. (ed.) FSE 2009. LNCS, vol. 5665, pp. 375–393. Springer, Heidelberg (2009) 20. Nandi, M.: A Simple and Uniﬁed Method of Proving Indistinguishability. In: Barua, R., Lange, T. (eds.) INDOCRYPT 2006. LNCS, vol. 4329, pp. 317–334. Springer, Heidelberg (2006) 21. Patarin, J.: Etude des G´en´erateurs de Permutations Bas´es sur le Sch´ema du D.E.S., Phd Th`esis de Doctorat de l’Universit´e de Paris 6 (1991) 22. Petrank, E., Rackoﬀ, C.: CBC MAC for real-time data sources. Journal of Cryptology 13(3), 315–338 (2000) 23. Pietrzak, K.: A Tight Bound for EMAC. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 168–179. Springer, Heidelberg (2006) 24. Sarkar, P.: Pseudo-Random Functions and Parallelizable Modes of Operations of a Block Cipher, http://eprint.iacr.org/2009/217

A Uniﬁed Method for Improving PRF Bounds

229

25. Vaudenay, S.: Decorrelation over inﬁnite domains: the encrypted CBC-MAC case. Communications in Information and Systems (CIS) 1, 75–85 (2001) 26. Vaudenay, S.: Decorrelation: A Theory for Block Cipher Security. J. Cryptology 16(4), 249–286 (2003)

Appendix Proof of Decorrelation Theorem W.l.o.g we consider deterministic distinguisher A and the queries to be distinct. So the ﬁnal output of A only depends on responses w1 , . . . , wq . Let S ⊆ Fq2n be the set of all possible q-tuple of responses on which A returns 1. Now for any ﬁxed w := (w1 , . . . , wq ), let M := M(w) = (M1 , . . . , Mq ) be the corresponding distinct queries. Note that these queries are ﬁxed and independent of oracles. Π Let Y be the set of all coordinate-wise distinct elements from Fq2n . So Pr[AD = ∗ 1 : Π ← Pn ] = w∈S μw,M(w) . Let M(w) = (M1 , . . . , Mq ). The probability is computed over the random choice of Π. Advprf D (A) =

#S − Pr[DΠ (M1 ) = w1 , . . . , DΠ (Mq ) = wq ] nq 2 w∈S

1 #S \ Y ≤ + − Pr[DΠ (M1 ) = w1 , . . . , DΠ (Mq ) = wq ] nq nq 2 2 w∈S∩Y

q(q − 1) × #(S ∩ Y) ≤ + 2n+1 2nq n+1 + ≤ q(q − 1)/2

(from the given condition)

How to Thwart Birthday Attacks against MACs via Small Randomness Kazuhiko Minematsu NEC Corporation, 1753 Shimonumabe, Nakahara-Ku, Kawasaki, Japan [email protected]

Abstract. The security of randomized message authentication code, MAC for short, is typically depending on the uniqueness of random initial vectors (IVs). Thus its security bound usually contains O(q 2 /2n ), when random IV is n bits and q is the number of MACed messages. In this paper, we present how to break this birthday barrier without increasing the randomness. Our proposal is almost as eﬃcient as the well-known Carter-Wegman MAC, uses n-bit random IVs, and provides the security bound roughly O(q 3 /22n ). We also provide blockcipher-based instantiations of our proposal. They are almost as eﬃcient as CBC-MAC and the security is solely based on the pseudorandomness of the blockcipher. Keywords: Message Authentication Code, Birthday Bound, Mode of Operation.

1

Introduction

Message Authentication Code. Message Authentication Codes (MACs) are symmetric cryptographic functions used to ensure the authenticities of messages. Its usage is as follows. When Alice wants to send a message M , she computes a MAC function that accepts M and a secret key, K, and possibly an auxiliary variable called IV (stands for initial vector), and obtains an authentication tag T as an output. Then she sends (IV, M, T ) to Bob, who shares K. Bob veriﬁes if (IV, M, T ) is authentic or not by computing the MAC using (IV, M ) and K to obtain the local tag T , and see if T matches T . If IV is a nonce, e.g., a counter, the MAC is said to be stateful. If IV is random, the MAC is said to be (stateless but) randomized. An adversary observes valid (IV, M, T ) tuples and tries to make a forgery, i.e., a new tuple (IV , M , T ) which is determined as authentic by Bob. If this is hard, we say the MAC is strongly unforgeable [2]. Security of Hash-then-Mask. To build an IV-based MAC, a common approach is Carter and Wegman’s one [11]: it uses an -almost XOR universal (AXU, see Sect. 2) hash function H : {0, 1}∗ → {0, 1}π , and a pseudorandom function (PRF) F : {0, 1}n → {0, 1}π . It produces a π-bit tag T = F (IV ) ⊕ H(M ) for message M using n-bit IV. We call this structure Hash-then-Mask (HtM). It rnd ctr when IV is random, and Πn,π, when IV is a nonce. is denoted by Πn,π, Let us take a close look at the security of HtM against attacks with q tagging queries and qv veriﬁcation queries (see Sect. 2), where the goal of attack is to S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 230–249, 2010. c International Association for Cryptologic Research 2010

How to Thwart Birthday Attacks against MACs

231

ctr break the strong unforgeability. For Πn,π, , it is well known that the probability of a forgery is at most qv for any q ≤ 2n [4][9], except a term for the compurnd tational security of F . However, in case of Πn,π, the probability of forgery is 2 n degraded to q /2 + qv as IVs may collide with probability about q 2 /2n , that is, the birthday bound1 . In fact, it is easy to prove that the above bound is tight for q (see Sect. 3). This degradation is non-negligible when n is relatively small, say 64. In addition, as pointed out by many researchers [3][16] the use of nonce is sometimes impractical. Hence it is natural to ask if we could break the above-mentioned birthday bound without being stateful. A trivial solution is to rnd use a longer random IV. The randomized HtM with 2n-bit IV (Π2n,π, ) provides 2 2n the bound q /2 + qv , where F is a PRF with 2n-bit inputs. However, this is problematic since (1) long random IV forces increased communication cost and sender’s eﬀort for generating randomness, and (2) the need for 2n-bit-input PRF instead of n-bit-input one limits the applicability. The second problem can be avoided by using MACRX3 [3]. It uses three n-bitinput PRFs and an -AXU hash of π-bit output, and achieves O(q 3 /23n + qv )security2 . Unfortunately, MACRX3 requires an even longer, 3n-bit random IV. Thus it still fails to avoid the ﬁrst problem. As solutions to the both problems, RMAC [16] and FRMAC [17] are known. They use an n-bit random IV and an n-bit blockcipher. The bound of RMAC is O(σ/2n ) where σ is the total message blocks for all tagging and veriﬁcation queries. FRMAC has a similar bound. However, their security proofs are based on a controversial assumption on the internal block cipher [29][18].

Our Contribution. From above discussion, what is important is to build a randomized MAC with n-bit IV and has security bound better than O(q 2 /2n ) based on the standard assumptions. For this purpose, we ﬁrst allow us to use 2n-bit-input PRF, combined with a universal hash having n-bit output. Our proposal, called RWMAC, is just a randomized version of a nonce-based MAC called WMAC [8] (and almost the same as a function appeared in the proof of FRMAC [17]). With n-bit random IV and π-bit tag, RWMAC has O(q 2 /2n + qv ( + 1/2π ))-security when the universal hash is -Almost universal (-AU). As ≥ 1/2n , we can achieve O(q 2 /22n + qv /2n )-security at best. Although our proposal itself is not so new, we think our security proof is new and non-trivial. Naturally, the next step is to build a randomized MAC with n-bit random IV and n-bit-input PRF, which appears much more challenging. We present a solution, called Enhanced Hash-then-Mask (EHtM), which will be the main contribution of this paper. EHtM is very eﬃcient, as it uses only two calls of n-bit-input PRFs and one call of an -AXU hash with n-bit output. Tag length π can be set to any value up to the output length of PRF. In return for this excellent property, the security bound is O(q 3 /2n + qv ( + 2−π )), thus O(q 3 /22n + qv /2n ) at best (when = O(1/2n ) and π = n). Hence, our scheme certainly provides a 1

2

“Birthday bound” is somewhat confusing since randomized MAC has many parameters, such as tag length, IV length, etc. In this paper, we exclusively use this word to express the term O(q 2 /2n ) in the bound of randomized MACs with n-bit IVs. In the sense of weak unforgeability. See Sect.2 for deﬁnition.

232

K. Minematsu

Table 1. Proﬁles of randomized MACs. We set π = n for the compatibility with RMAC and FRMAC. We assume n-bit messages. Hu [i, j] (Hxu [i, j]) denotes -AU (-AXU) hash function of i-bit input and j-bit output. F[i, j] denotes a PRF of i-bit input and jbit output, and P[i] denotes an i-bit keyed permutation, i.e., a blockcipher. In deriving the bounds of RWMAC and RMAC, we use σ ≤ (q + qv ) for simplicity. The symbol indicates that the security proof requires a stronger assumption than the PRP, such as the ideal-cipher model, for P[n]. MAC Rand Randomized Hash-then-Mask n MACRX3 [3] 3n RMAC [16] n FRMAC [17] n RWMAC (this paper, similar to [8]) n Enhanced Hash-then-Mask (this paper) n

Eﬃciency 1Hxu [n, n] + 1F[n, n] 1Hxu [n, n] + 3F[n, n] ( + 1)P[n] 1Hu [n, n] + 1P[n] 1Hu [n, n] + 1F[2n, n] 1Hxu [n, n] + 2F[n, n]

Security O(q 2 /2n + qv ) O(q 3 /23n + qv ) O((q + qv )/2n ) O((q + qv )) O(q 2 /2n + qv ) O(q 3 /2n + qv )

security beyond the birthday bound, however, its bound is generally inferior to that of RWMAC. The proﬁles of randomized MACs3 are brieﬂy summarized in Table 1. Table 1 clearly shows that the complexity (both computation and communication) of EHtM is the closest to that of the original randomized HtM among others. Mode of Operation. EHtM is a generic construction. This generality allows us to various instantiations. Among them, we present two blockcipher modes called MAC-R1 and MAC-R2. Their complexities are almost the same as that of CBCMAC. To prove its security, we only require that the underlying blockcipher is a pseudorandom permutation (PRP). This is a crucial diﬀerence from RMAC, which is also based on CBC-MAC but requires the ideal-cipher model for its security, which is highly problematic as shown by, e.g., Knudsen and Kohno [18]. The concrete bounds of MAC-R1 and MAC-R2 are slight worse than the original EHtM using n-bit PRFs. Still, there is a remarkable gain from CBCMAC and its variants. We think our proposals will be good practical MACs using 64-bit blockciphers, thus suited to resource-constrained environments. A detailed, quantitative comparison will be given in Sect. 6.3.

2

Preliminaries

Basic Notations. A random variable and its sampled value are written by a capital and the corresponding small letters. A sequence of random variables is def written as X i = (X1 , X2 , . . . , Xi ). {0, 1}n is denoted by Σ n , and Σ ∗ denotes 3

A randomized MAC of Dodis et al. [12] also aims at reducing the bound via small randomness. However the scope is diﬀerent from us. Their purpose is to reduce the security degradation with respect to (not q) due to the use of non-optimal universal hash. Their proposal still contains O(q 2 /2n ) if n-bit universal hash is used.

How to Thwart Birthday Attacks against MACs

233

the set of all ﬁnite-length bit sequences, including the empty string φ (which is a unique element of Σ 0 ). The bit length of x is denoted by |x|, with |φ| = 0. A concatenation of two binary sequences, x and y, is written as xy. For any x and π ≤ |x|, chopπ (x) is the ﬁrst π bits of x. A keyed function is written by a capital letter, and if it has n-bit inputs and m-bit outputs it is written as F : Σ n → Σ m , i.e., we omit the description of key space. F (∗w) is a keyed function Σ n−|w| → Σ m . In particular, the uniform random function (URF) : Σ n → Σ m is denoted by Rn,m . This is a random function whose distribution is uniform over {f : Σ n → Σ m }. The n-bit uniform random permutation (URP), denoted by Pn , is a random permutation with a uniform distribution over all permutations of Σ n . Definition 1. Let H : Σ ∗ → Σ n be a keyed function. If Pr[H(x) = H(x )] ≤ () holds for any distinct x, x with max{|x|, |x |} ≤ n, where probability is defined by H’s key, H is said to be ()-almost universal (()-AU). In addition, if Pr[H(x) ⊕ H(x ) = y] ≤ () holds for any y ∈ Σ n and distinct x, x with max{|x|, |x |} ≤ n, H is said to be an ()-almost XOR universal (()-AXU). We also say H is universal (XOR-universal) if () is minimum, i.e., when H is 1/2n -AU (1/2n -AXU). For any keyed function F , Advprf F (q, τ ) denotes the maximum advantage [1] in distinguishing F from a URF having the same input/output domains using q chosen-plaintext queries and computational complexity τ . Moreover, for any keyed permutation E over Σ n , Advprp E (q, τ ) denotes the maximum advantage in distinguishing E from Pn . Definition 2. A randomized MAC function with η-bit randomness and π-bit tag is defined as a keyed function F : Σ η × Σ ∗ → Σ π . A query to the tagging oracle (called a tagging query) is a message M ∈ Σ ∗ , and the corresponding answer is (U, T ) ∈ Σ η × Σ π , where U is independent and uniform over η bits, and T = F(U, M ). A query to the verification oracle (called a verification query) , M , T) and the corresponding answer, written as a binary digit B, is a tuple (U , M ) and 0 otherwise. is 1 if T = F(U Here, F does not produce U on its own. For any F we implicitly assume the can be arbitrarily chosen. uniform distribution of U . In a veriﬁcation query, U As mentioned in Introduction, the adversary’s goal is to create a forgery in the sense of strong unforgeability [2] deﬁned as follows. Definition 3. A (q, qv , , τ )-forger, A, against a randomized MAC, F : Σ η × Σ ∗ → Σ π , is an entity that performs q tagging queries and qv verification queries, where every message is at most n-bit and A’s total computational complexity is τ . We use subscripts to express the ordinal number of queries, e.g., Mi denotes j , M j , Tj ) the i-th tagging query. If (U = (Ui , Mi , Ti ), i = 1, . . . , q, and Bj = 1 j , M j , Tj ) is called a successful forgery. holds for some j ∈ {1, . . . , qv }, (U Note that Mi can depend on U i−1 , M i−1 , and T i−1 but not depend on Ui .

234

K. Minematsu

Strong and Weak Unforgeabilities. If we require a stricter condition that j M = Mi for i = 1, . . . , q, we call the corresponding security notion the weak unforgeability. This notion is deﬁned as (mere) unforgeability by Bellare et al. [2]. See [2] for the technical diﬀerences in strong and weak unforgealibities. Definition 4. For any forger A and randomized MAC F, the forgery probability is the probability that A produces at least one successful forgery (in the sense of Def. 3) for F. The maximum forgery probability for all (q, qv , , τ )-forgers is denoted by FPF (q, qv , , τ ). By omitting τ we mean the maximum informationtheoretic forgery probability, i.e., FPF (q, qv , ) means FPF (q, qv , , ∞). As pointed out by [6], if we focus on the ﬁrst successful forgery, we only need to consider forgers that ﬁrst perform q tagging queries and then perform qv veriﬁcation queries. I.e., the game is divided into the consecutive two phases; the tagging and veriﬁcation phases. This restriction does not increase the chance of single successful forgery. Also, the veriﬁcation phase can be deﬁned as a batch process, j , M j , Tj ) is a (possibly non-deterministic) function of (U q , M q , T q ) and i.e., (U j−1 , M j−1 , Tj−1 , B j−1 ). However, these conventions will not dependent on (U not work if we focus on other security notions, see [8][23].

3

Randomized WMAC

Limitation of Hash-then-Mask. Let us consider a randomized HtM with rnd n-bit IV, π-bit tag, deﬁned as Πn,π, in Introduction. The components are H : ∗ π Σ → Σ which is -AXU and F : Σ n → Σ π which is URF. Then we have FPΠ rnd

n,π,()

(q, qv , ) ≤ q 2 /2n+1 + ()qv ,

(1)

ctr since the bound of Πn,π, is qv [9] and the forgery probability under random IVs is at most the sum of forgery probability under distinct random IVs (i.e., nonce) and the probability of IV collision, which is at most q2 /2n ≤ q 2 /2n+1 . In fact, the above bound is tight as 2n/2 tagging queries are enough to break rnd . The attack is as follows: Πn,π,

1. Make j tagging queries with distinct M j where a collision Ui = Uj for some i < j occurs. 2. Let Mj+1 = Mj . Check if Uj+1 = Uj holds (otherwise try another query with the same message). , M , T) = (Uj+1 , Mi , Ti ⊕ Tj ⊕ Tj+1 ). 3. Make a veriﬁcation query as (U As Ti ⊕Tj ⊕Tj+1 = F (Uj+1 )⊕H(Mi ), T is a valid tag for a new tuple (Uj+1 , Mi )4 . The attack succeeds with probability almost 1 if we use 2n/2 queries in the step 1. Since the attack does not exploit any speciﬁc properties of H and F , it works 4

Here we break the strong unforgeability: it is open if the bound is also tight for the weak unforgeability.

How to Thwart Birthday Attacks against MACs

235

for any randomized HtM5 . Hence, to break the bound O(q 2 /2n ) while keeping the n-bit random IV, we need a diﬀerent structure from Hash-then-Mask. Randomized WMAC. To avoid the above attack, a promising solution is to process the n-bit hash value, S = H(M ), and the n-bit random IV, U , together with a 2n-bit-input PRF, G. More precisely, the tag T ∈ Σ π for M is generated as T = G(U, H(M )), where H : Σ ∗ → Σ n is ()-AU and G : Σ 2n → Σ π is a PRF. This MAC is denoted by RWMAC[H, G] as it is a randomized version of WMAC [8], a nonce-based MAC. Indeed, RWMAC oﬀers a very high security, since neither an S-collision nor a U -collision can be noticed by adversary, unless both collisions occur simultaneously. The security bound is as follows6 . Theorem 1. If H is ()-AU and q ≤ min{2n−2 , 2n · ()−1 }, FPRWMAC[H,G] (q, qv , , τ ) = Advprf G (q + qv , τ + O(q + qv )) () 1 + q 2 n+1 + qv 2(n − 1)() + π . 2 2 The proof of Theorem 1 is in Appendix A. The structure of the proof is the same as that of our main theorem (Theorem 2), but details are much simpler.

4

Enhanced Hash-then-Mask

Although RWMAC provides a very high security, a big problem still remains: it needs G, a PRF with 2n-bit input, while the original HtM is based on a PRF with n-bit input. One may try some domain extension scheme of an n-bitinput PRF to obtain a 2n-bit-input PRF. However, most known schemes such as CBC-MAC, are only O(q 2 /2n )-secure, thus can not be used for our purpose. One workable scheme of Maurer [21] is a composition of a keyed function that diﬀuses a 2n-bit input to a cn-bit output for some c ≥ 2 and an encryption function consisting of c PRFs aligned parallel. The output is the sum of each c PRFs’ outputs. The security bound is O(q c+1 /2cn) [21]. However, it is still cumbersome to implement this diﬀuse-encrypt-xor scheme, as the diﬀusion must be 2c-locally-uniform [21], which is much costly than the universal hash functions even for a small c. Nevertheless, there seems a chance of a simpler domain extension scheme, because inputs to G of RWMAC can not be arbitrarily chosen. We will prove that this intuition is true: 2n-bit PRF of RWMAC can be safely substituted with an extremely simple function using two n-bit PRFs. The concrete proposal and its security bound is in the following. Definition 5. Let H : Σ ∗ → Σ n and Fi : Σ n → Σ n for i = 1, 2. The enhanced hash-then-mask (EHtM) with π-bit tags (for some π ≤ n) is defined 5 6

This attack has some similarities to the L-collision attack by Semanko [26], though the targets of attacks are diﬀerent. An equivalent to RWMAC was appeared in Lemma 4 of [17] and the bound O(σ()) was claimed, though we did not scrutinize the proof.

236

K. Minematsu

as EHtM[H, F1 , F2 ](U, M ) = chopπ (U, F1 (U ) ⊕ F2 (H(M ) ⊕ U )) for message M ∈ Σ ∗ , where U ∈ Σ n is independent and uniformly random. def

Theorem 2. Let H : Σ ∗ → Σ n be ()-AXU. Let F1 and F2 be independentlykeyed instances of F : Σ n → Σ n . Then we have prf

FPEHtM[H,F1 ,F2 ] (q, qv , , τ ) ≤ 2AdvF (q + qv , τ ) q 3 () 1 1 + + 3n + qv 4() + π , 6 2n 2 2 −1/3 . Here τ = τ + O(q + qv ). if q ≤ 3 ()/2n + 1/23n Hence, EHtM is secure if q (6·2n ·())−1/3 and qv min{2π , ()−1 } hold. In other words, EHtM guarantees about 2n/3-bit security for q and π-bit security for qv , if ∼ 1/2n. Random IV U

Message M

n

H n

F2

F1

chop π

Tag T

Fig. 1. Enhanced Hash-then-Mask

5

Proof of Theorem 2

Overview. Let us denote two independent n-bit block URFs by R(1) and R(2) . We deﬁne EH as EHtM[H, R(1) , R(2) ] with an ()-AXU hash, H, and assume some π ≤ n. We here prove a bound of FPEH (q, qv , ). Computational counterpart is easy, thus omitted. We ﬁrst provide an intuition for the proof. Let Si = Ui ⊕ H(Mi ) for i-th tagging query. We observe that the ﬁnalization of EH, (U, S) → R(1) (U ) ⊕ R(2) (S), is indistinguishable from a 2n-bit-input URF, if set G = {(U1 , S1 ), . . . , (Uq , Sq )} satisﬁes two linear conditions. These conditions are related to the linear independence of a characteristic vector matrix formed by G, but weaker than that. Here, if we use the identical URF for processing of U and S, we need the linear independence as in the proof of similar structures [3][21]. We show that, with q ≈ 2n/2 tagging queries to EH the probabilities of violating these conditions are negligible: the one is O(q 3 ()/2n ) and the other is O(q 2 ()/2n ). We also

How to Thwart Birthday Attacks against MACs

237

show that, if the above-mentioned conditions are satisﬁed for tagging phase, the forgery probability is O(qv (α() + 2−π )), where α is the size of largest class of U (i.e. there is an α-collision but not (α + 1)-collision) in the tagging phase. As U s are perfectly random, the probability of (α + 1)-collision is bounded by O(q α+1 2−nα ), thus taking α = 2 will suﬃce. Setup. Let Hw(V) denote the Hamming weight of a binary sequence V, and let n Hw(V, V ) be (Hw(V), Hw(V )) for a pair (V, V ). For x ∈ Σ n , we use λ(x) ∈ Σ 2 to denote its characteristic vector (CV) by seeing x as an integer in [0, . . . , 2n − 1]. I.e., Hw(λ(x)) = 1 and the bit 1 is in the x-th coordinate of λ(x). For λ(X) to denote X q ∈ (Σ n )q and I ⊆ {1, . . . , q}, we use I i∈I λ(Xi ). Let Q ⊆ {1, . . . , q} denote the index set of unique (U, M ) pairs, i.e., for any i = j, i, j ∈ Q, (Ui , Mi ) = (Uj , Mj ) holds. Note that, if i ∈ Q there exists j ∈ Q with (Ui , Mi , Ti ) = (Uj , Mj , Tj ), and thus all transcripts outside Q are useless for forgers. Here, Q is a random variable whose probability is deﬁned by EH and the forger, and we assume Q is uniquely determined for any ﬁxed (U q , M q ) = (uq , mq ). We will use the following probabilistic events deﬁned on {(Ui , Si )}i∈Q , where Si = Ui ⊕ H(Mi ) as mentioned. – Collision-freeness: cfq = [(Ui , Si ) = (Uj , Sj ) for all distinct i, j ∈ Q]. – Linear independence: def lidq = [Hw( I λ(U ), I λ(S)) = (0, 0) for all I ⊆ Q, |I| = even ≥ 2]. – Non-two-vulnerability : def = (1, 1) for all I ⊆ Q, |I| = odd ≥ 3]. ntvq = [Hw( I λ(U ), I λ(S)) – The size of U ’s largest equivalent class is at most α: def eqs(α) = [maxi ec(Ui ) ≤ α], where ec(Ui ) = |{j ∈ {1, . . . , q} : Uj = Ui }|. def

For convenience, when |Q| = 1, cfq and lidq are deﬁned as true. When |Q| ≤ 2, ntvq is deﬁned the same as lidq . With this convention, ntvq → lidq → cfq holds true (proof for |Q| ≤ 2 is trivial, and proof for |Q| ≥ 3 is obtained via taking contraposition). For a forger A and a MAC F, let P AF denote the probability space deﬁned by A and F (following Defs. 2 and 3). Furthermore, we def deﬁne νq,qv , (F, E) = maxA:(q,qv ,)-forger P AF (E) as the maximum probability of event E. The maximum conditional probability of E given another condition E is similarly deﬁned and denoted by νq,qv , (F, E|E ). We also deﬁne a weak form of adversary. If A’s tagging and veriﬁcation queries are independent of T q , j , M j , Tj ) is made from (U q , M q ) for i.e., Mi is made from U i−1 M i−1 and (U all i ≤ q and j ≤ qv , A is said to be T -independent7 . We deﬁne μq, (F, E) as the maximum probability of E under all T -independent (q, qv , )-forgers. If E is deﬁned for tagging phase (that is, the probability of E is independent of the result of veriﬁcation phase), we simply write νq, (F, E) or μq, (F, E). For i = 1, . . . , qv , def let suci denote the event Bi = 1 (see Def. 3) and let suc = suc1 ∨ · · · ∨ sucqv . 7

Here, T -independent forger is stronger than non-adaptive one, who determines M q independent of (U q , T q ).

238

K. Minematsu

Now we have FPEH (q, qv , ) = νq,qv , (EH, suc) ≤ νq,qv , (EH, suc|eqs(α) ∧ ntvq ) + νq, (EH, eqs(α)) + νq, (EH, ntvq ).

(2)

In the following, we analyze each of the three terms in the r.h.s. of Eq. (2). Analysis of the Third Term. Let RW be an idealized RWMAC with n-bit IV and π-bit tag, deﬁned as RW(U, M ) = R2n,π (U, U ⊕ H(M )), where H : Σ ∗ → Σ n is the same as one used by EH. ntvq and cfq are similarly deﬁned with Si = Ui ⊕ H(Mi ). Proposition 1. Let Func ∈ {EH, RW}. Then for E ∈ {lidq , ntvq } we have P Func (Tq = tq |U q = uq , M q = mq , T q−1 = tq−1 , E) =

1 2π

(3)

= (ui , mi ) holds for all possible arguments (tq , tq−1 , uq , mq ), as long as (uq , mq ) for all i ≤ q − 1 (that is, q ∈ Q). Moreover, νq, (EH, ntvq ) = νq, (RW, ntvq ) = μq, (RW, ntvq ) holds.

(4)

Proof. The proof is based on Maurer’s methodology [21]. See Appendix B. Proposition 2. cfq ∧ ntvq is equivalent to the event that there exist distinct i, j, k ∈ {1, . . . , q}, satisfying Ui = Uj = Uk and Si = Sj = Sk with Mi = Mj = Mk (here Mi = Mk is possible), and does not exist distinct i , j ∈ {1, . . . , q} such that (Ui , Si ) = (Uj , Sj ) with Mi = Mj . Proof. See Appendix C. Let T be the set of all T -independent (q, qv , )-forgers. Now we have μq, (RW, ntvq ) ≤ μq, (RW, cfq ) + μq, (RW, cfq ∧ ntvq ) = max P

(5)

BRW ∃

( distinct i, j ∈ {1, . . . , q} : Ui = Uj , Si = Sj , Mi = Mj )

B∈T

+ max P BRW (∃ distinct i, j, k ∈ {1, . . . , q} : Ui = Uj , Sj = Sk , Mi = Mj = Mk ), B∈T

≤ max P BRW (Ui = Uj , H(Mi ) = H(Mj ), Mi = Mj ) 1≤i<j≤q

+

B∈T

max P BRW (Ui = Uj , H(Mj ) + Uj = H(Mk ) + Uk , Mi = Mj = Mk ),

distinct i,j,k ∈{1,...,q}

B∈T

(6) where the ﬁrst inequality follows from union bound, the second follows from the deﬁnition of cfq and Proposition 2. Clearly, for any B ∈ T we have = Mj ), P BRW (Ui = Uj , H(Mi ) = H(Mj ), Mi = P BRW (H(Mi ) = H(Mj ), Ui = Uj |Mi = Mj ) · P BRW (Mi = Mj ), 1 1 ≤ max Pr(H(mi ) = H(mj )) · n ≤ () · n , 2 2 mi =mj ,|mi |,|mj |≤n

(7)

How to Thwart Birthday Attacks against MACs

239

as B is T -independent (thus Ui , Uj , Mi , Mj are independent of H’s key) and H is ()-AXU, and that Ui , Uj are uniformly random. In addition, we observe that P BRW (Ui = Uj , H(Mj ) + Uj = H(Mk ) + Uk , Mi = Mj = Mk ) = P BRW (H(Mj ) + Uj = H(Mk ) + Uk |Ui = Uj , Mi = Mj = Mk ) · P BRW (Mi = Mj = Mk |Ui = Uj ) · P BRW (Ui = Uj ), 1 1 ≤ max Pr(H(mj ) + uj = H(mk ) + uk ) · n ≤ () · n mi =mj =mk ,uj ,uk , 2 2

(8) (9)

|mi |,|mj |,|mk |≤n

from the same reason as above8 . From Eqs. (4) to (9), we have q q () () νq, (EH, ntvq ) ≤ + = (q 3 − q) . 3 2 2n 6 · 2n

(10)

Analysis of the Second Term. Clearly, the probability of eqs(α) is bounded as νq, (EH, eqs(α)) ≤ Pr(∃ distinct i1 , i2 , . . . , iα+1 : Ui1 = Ui2 = · · · = Uiα+1 ) q 1 ≤ . (11) nα α+1 2 Analysis of the First Term. We have the following lemma. Lemma 1. If νq, (EH, eqs(α) ∧ ntvq ) ≤ 1/2,

1 νq,qv , (EH, suc|eqs(α) ∧ ntvq ) ≤ qv 2α() + π . 2

The proof of Lemma 1 is in Appendix D. Combining Terms. From Eqs. (2), (10), (11), and Lemma 1, FPEH (q, qv , ) ≤ q 1 () 3 2α() + 21π for any positive integer α ≥ 2, if α+1 2nα + (q − q) 6·2n + qv q 1 () 3 νq, (EH, eqs(α) ∧ ntvq ) ≤ α+1 2nα + (q − q) 6·2n ≤ 1/2. By setting α = 2 we conclude the proof.

6

Blockcipher-Based Instantiations

6.1

A CBC-Based Mode

The generality of our EHtM allows us to derive various concrete instantiations. Here, we present two blockcipher modes of operation. They look similar to 8

At a glance, p = P BRW (H(Mj ) + Uj = H(Mk ) + Uk |Ui = Uj , Mi = Mj = Mk ) seems 1/2n irrespective of H as Ui , Uj , and Uk are independent and uniform. This is wrong if i < k < j and H is (e.g.) identity function for n-bit inputs: by choosing Mk = Ui and Mj = Uk , p is 1. Moreover p is 1 if H is (a special class of) AU but not AXU. Thus being AU is not the suﬃcient condition for H.

240

K. Minematsu

RMAC [16]. However they are provably secure on the pseudorandomness of the blockcipher whereas RMAC needs the ideal-cipher model (ICM). Our modes use CBC-MAC and a collision-free message padding, pad : Σ ∗ → i=0,1,... (Σ n )i . For input x, pad appends 10|x| mod n−1 to x if |x| mod n = 0, otherwise appends 10n−1 , then partitions the appended x inton-bit blocks. For empty string φ, we deﬁne pad(φ) = 10n−1 . Let CBC[EK ] : i=1,... (Σ n )i → Σ n be CBC-MAC using EK : Σ n → Σ n . For x = (x1 , . . . , x ) ∈ (Σ n ) , CBC[EK ](x) = Y , where Yi = EK (xi ⊕ Yi−1 ) for i ≥ 1 and Y0 = 0n . Our ﬁrst proposal, MAC-R1, uses two blockcipher keys and is as follows. Definition 6. The mode MAC-R1 generates the π-bit tag, T , for message M ∈ Σ ∗ , using (n − 1)-bit random IV, U , as T = chopπ (EK2 (U 0) ⊕ EK2 (S1)), where S denotes U ⊕ chopn−1 (CBC[EK1 ](pad(M ))). Here K1 and K2 are two keys of an n-bit blockcipher, EK . Fig. 2 depicts MAC-R1, where an internal chop is substituted with a logical OR. One may wonder if this really keeps the security beyond the birthday bound, as the use of PRP-PRF switching lemma will bring O(q 2 /2n ) into the bound. However, this problem is circumvented by the use of Bernstein’s lemma [7] instead of the switching lemma9 . The security bound of MAC-R1 is as follows. Random IV U n-1

Message M[1]

M[0]

M[L]||10n-|M[L]|-1 …

…

||0 n

K1

E

K1

K1

E

E n

n

0n-11 K2

E

K2

E

chop π

Tag T

Fig. 2. MAC-R1 when the last message block is partial

Corollary 1. Let cbc() = 2d( + 1)/2n + 64( + 1)4 /22n , where d(x) denotes def the maximum number of positive integers that divide h, for all h ≤ x. Let δ(a) = a − 2 1 − a−1 . Then, we have 2n def

∗ FPMAC-R1[EK1 ,EK2 ] (q, qv , , τ ) ≤ 2Advprp E (q1 , τ + O(q + qv )) 3 q 2cbc( + 1) 4 1 + + + q 8 ( + 1) + · δ(q2∗ ), v cbc 3 2n 23n 2π 9

Bernstein’s lemma is useful to derive a bound for the ratio (rather than the diﬀerence) of two game probabilities where one involves URP and the other involves URF.

How to Thwart Birthday Attacks against MACs

where q1∗ = (q + qv )( + 1), q2∗ = 2(q + qv ), if q 3 ≤ 1.5

2cbc (+1) 2n

+

4 23n

−1

241

.

(2) Proof. Let P(1) n and Pn be independent n-bit URPs. Using Bernstein’s lemma (Theorem 2.2 of [7]), we have

FPMAC-R1[P(1) ,P(2) ] (q, qv , ) ≤ FPR1PR (q, qv , ) · δ(q2∗ ), n

n

(12)

where R1PR denotes MAC-R1[P(1) n , Rn,n ] (recall Rn,n is an n-bit block URF). As a pair of functions (Rn,n (∗0), Rn,n (∗1)) is equivalent to a pair of independent URFs : Σ n−1 → Σ n , R1PR is a complete instantiation of EHtM with (n − 1)bit random IV (and hash value). We then need to analyze the hash function of R1PR, namely HR1PR = chopn−1 ◦ CBC[P(1) n ] ◦ pad. From Bellare et al. [4] and its extension [24], CBC[Pn ] is cbc()-AXU, and thus HR1PR is 2cbc( + 1)AXU. Combining this observation and Theorem 2 proves that FPR1PR (q, qv , ) 3 4 1 + + q is at most q3 2cbc2(+1) n v 8cbc ( + 1) + 2π . From this and Eq. (12), 23n we prove the information-theoretic version of Corollary 1. The computational counterpart is easy. Inside the Bound. We conﬁrmed that δ(q2∗ ) is well approximated via the ﬁrstorder approximation, (1 + (q2∗ )2 /2n+1 ), when q2∗ ≤ 2n/2 . Thus MAC-R1’s bound is about q 3 cbc()/2n + qv (cbc () + 1/2π ) when q + qv ≤ 2n/2−1 . Here, cbc () grows much slower than /2n (see [4]). When q2∗ exceeds 2n/2 , δ(q2∗ ) rapidly grows and the bound quickly reaches 1. From this, the bound is almost 1 when q = 2n/2+c for a small positive constant c. This seemingly contradicts with our proposition, but the bound is still negligibly small when q = 2n/2 . This can be veriﬁed by numerical results given in Fig. 3. 6.2

CBC-Based, More Secure Mode

As mentioned, the bound of MAC-R1 quickly reaches one as q exceeds 2n/2 . To overcome this problem, we consider a diﬀerent ﬁnalization : (Σ n−2 )2 → Σ n as DTWIN[EK ](x, x ) = EK (x00) ⊕ EK (x10) ⊕ EK (x 01) ⊕ EK (x 11). (13) def

Definition 7. The mode MAC-R2 generates the π-bit tag, T , for message M ∈ Σ ∗ , using (n − 2)-bit random IV, U , as T = chopπ (DTWIN[EK2 ](U, S)), where S is n − 2 bits and defined as U ⊕ chopn−2 (CBC[EK1 ](pad(M ))). To derive a bound, we deﬁne TWIN[EK ] : Σ n−1 → Σ n as TWIN[EK ](x) = EK (x0)⊕EK (x1). Here DTWIN[EK ](U, S) corresponds to TWIN[EK ](U 0)⊕ prf TWIN[EK ](S1), and Lucks [20] proved AdvTWIN[Pn ] (q) ≤ 4q/2n + q 3 /3 · 22n−1 . Hence, the concrete bound of MAC-R2 can be derived without Bernstein’s lemma, which is as follows.

242

K. Minematsu

Corollary 2 prp

FPMAC-R2[EK1 ,EK2 ] (q, qv , , τ ) ≤ 2AdvE (2q1∗ , τ + O(q + qv )) q 3 8cbc( + 1) 64 1 8(q + qv ) 16(q + qv )3 + ( + 1) + + + + q 16 + v cbc 3 2n 23n 2π 2n 3 · 22n −1 () 4 if q 3 ≤ 1.5 2cbc + , where cbc() and q1∗ are as defined by Corollary 1. n 3n 2 2 From Corollary 2, the dominant term of MAC-R2’s bound is cbc ()q 3 /2n (without the restriction q + qv < 2n/2−1 ). Thus, MAC-R2 provides the same level of security as that of EHtM with n-bit PRFs. 6.3

A Detailed Comparison

Table. Table 2 presents a detailed comparison of MAC-R1, MAC-R2, and previous MAC modes. Presenting the table is not a straightforward task because of the diﬀerences in MAC types, security notions, and parameters. We tried to do a fair comparison while keeping the simplicity. We chose CMAC (a.k.a. OMAC [13]), RMAC, EMAC [10], and MAC-R1 and MAC-R2 with π = n, where n-bit blockcipher is used. The bounds are shown without minor terms. For CMAC and EMAC, only their prf-advantages are published [4][13][14]. For them we have used Proposition 7.3 of [2] to get the bounds of FP. RMAC has several versions, and we employ one deﬁned in [16]. The RMAC proof is based on the ideal-cipher model. For CMAC and RMAC, the bounds using σ (total message blocks of queries) are also known. As σ ≤ (q + qv ) holds we can always translate a bound using σ into one using (, q, qv ). The diﬀerence is small unless message length distribution has very long tails. We note that one call (two calls) of blockcipher in MAC-R1 (MAC-R2) can be done only with random IV. Hence, when such precomputation is feasible they will be even faster in practice. Graph. It is still diﬃcult to see the bound shapes from Table 2. Hence, we also perform exact bound computations for n = 64 and 128. The log2 FP – log2 q graphs are shown in Fig. 3. We assume qv = q 1/2 , but the bound shape is almost unchanged if qv is larger, e.g., qv = q. The diﬀerence of CMAC and EMAC’s bounds is due to the recent advance in the collision analysis of CBC-MAC [4], and will be smaller if is smaller (or, one can use a result of Nandi [25]). To compute d(), we used that d() < lg2 for < 225 , shown by [4]. This graph enables us to see how much queries or data are acceptable to restrict the forgery probability being smaller than 2−γ , where γ works as a security parameter10 . For example, if we set γ = 20, the maximum acceptable data amount for n = 64 and = 210 is about 14.6 Mbyte for CMAC, 3.2 Gbyte for EMAC, 512.9 Gbyte for RMAC, 40.4 Tbyte for MAC-R1 and 65.6 Tbyte for 10

If we say “it has b-bit security” or “it is secure if q 2b ”, we implicitly assume γ = 0. This is a simple, conventional way. However, it is sometimes too weak to grasp the actual values: q 2 /2n can be much smaller than q/2n/2 but both mean n/2-bit security.

How to Thwart Birthday Attacks against MACs

243

Table 2. Detailed Comparison of MAC Modes MAC CMAC EMAC RMAC MAC-R1 MAC-R2

Key Rand Blockcipher Calls Security Bound 1 − |M |/n + 1 (precomp) σ 2 /2n [14] or 2 (q + qv )2 /2n [13] 2 − (|M | + 1)/n + 1 d()(q + qv )2 /2n [4] n 2 n (|M | + 1)/n + 1 σ/2 [16] or (q + qv )/2n (with ICM) 2 n−1 (|M | + 1)/n + 2 (d()q 3 /22n + d()qv /2n ) · δ(2q + 2qv ) 2 n−2 (|M | + 1)/n + 4 (d()q 3 + qv3 )/22n + (q + d()qv )/2n

20

log FP

40

log q 60

80

10

100

0

0

-20

-10

-40

-20

log FP

-60

20

log q

30

40

50

-30

-80

-40

-100

-50

-120

-60 Legend

Legend MAC-R1 MAC-R2 OMAC EMAC RMAC

MAC-R1 MAC-R2 OMAC EMAC RMAC

Fig. 3. log2 FP – log2 q graphs with qv = q 1/2 . (left) n = π = 128, = 220 (right) n = π = 64, = 210 .

MAC-R2. In this case, our proposal is even superior to RMAC; it is due to a relatively large constant of RMAC bound ((4n + 6)σ/2n is presented in [16]), and the diﬀerence in growths of q/2n and q 3 /22n .

Acknowledgments We would like to thank Liang Bo and anonymous referees for helpful comments that improved the paper.

References 1. Bellare, M., Desai, A., Jokipii, E., Rogaway, P.: A Concrete Security Treatment of Symmetric Encryption. In: Proceedings of the 38th Annual Symposium on Foundations of Computer Science, FOCS 1997, pp. 394–403 (1997) 2. Bellare, M., Goldreich, O., Mityagin, A.: The Power of Veriﬁcation Queries in Message Authentication and Authenticated Encryption. Cryptology ePrint Archive, 2004/309

244

K. Minematsu

3. Bellare, M., Goldreich, O., Krawczyk, K.: Stateless Evaluation of Pseudorandom Functions: Security Beyond the Birthday Barrier. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 270–287. Springer, Heidelberg (1999) 4. Bellare, M., Pietrzak, K., Rogaway, P.: Improved Security Analyses for CBC MACs. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 527–541. Springer, Heidelberg (2005) 5. Bernstein, D.J.: The Poly1305-AES Message-Authentication Code. In: Gilbert, H., Handschuh, H. (eds.) FSE 2005. LNCS, vol. 3557, pp. 32–49. Springer, Heidelberg (2005) 6. Bernstein, D.J.: Stronger Security Bounds for Wegman-Carter-Shoup Authenticators. In: Cramer, R. (ed.) EUROCRYPT 2005. LNCS, vol. 3494, pp. 164–180. Springer, Heidelberg (2005) 7. Bernstein, D.J.: Stronger Security Bounds for Permutations, http://cr.yp.to/papers.html 8. Black, J., Cochran, M.: MAC Reforgeability. In: Dunkelman, O. (ed.) FSE 2009. LNCS, vol. 5665, pp. 345–362. Springer, Heidelberg (2009) 9. Black, J.: Message Authentication Code. PhD dissertation (2000) 10. Bosselaers, A., Preneel, B. (eds.): RIPE 1992. LNCS, vol. 1007. Springer, Heidelberg (1995) 11. Carter, L., Wegman, M.: Universal Classes of Hash Functions. Journal of Computer and System Science 18, 143–154 (1979) 12. Dodis, Y., Pietrzak, K.: Improving the Security of MACs Via Randomized Message Preprocessing. In: Biryukov, A. (ed.) FSE 2007. LNCS, vol. 4593, pp. 414–433. Springer, Heidelberg (2007) 13. Iwata, T., Kurosawa, K.: OMAC: One-Key CBC MAC. In: Johansson, T. (ed.) FSE 2003. LNCS, vol. 2887, pp. 129–153. Springer, Heidelberg (2003) 14. Iwata, T., Kurosawa, K.: Stronger Security Bounds for OMAC, TMAC, and XCBC. In: Johansson, T., Maitra, S. (eds.) INDOCRYPT 2003. LNCS, vol. 2904, pp. 402– 415. Springer, Heidelberg (2003) 15. Iwata, T.: New Blockcipher Modes of Operation with Beyond the Birthday Bound Security. In: Robshaw, M.J.B. (ed.) FSE 2006. LNCS, vol. 4047, pp. 310–327. Springer, Heidelberg (2006) 16. Jaulmes, E., Joux, A., Valette, F.: On the Security of Randomized CBC-MAC Beyond the Birthday Paradox Limit: A New Construction. In: Daemen, J., Rijmen, V. (eds.) FSE 2002. LNCS, vol. 2365, pp. 237–251. Springer, Heidelberg (2002) 17. Jaulmes, E., Lercier, R.: FRMAC, a Fast Randomized Message Authentication Code. Cryptology ePrint Archive- 2004/166 18. Knudsen, L.R., Kohno, T.: Analysis of RMAC. In: Johansson, T. (ed.) FSE 2003. LNCS, vol. 2887, pp. 182–191. Springer, Heidelberg (2003) 19. Krovetz, T.: Message Authentication on 64-Bit Architectures. In: Biham, E., Youssef, A.M. (eds.) SAC 2006. LNCS, vol. 4356, pp. 327–341. Springer, Heidelberg (2007) 20. Lucks, S.: The Sum of PRPs Is a Secure PRF. In: Preneel, B. (ed.) EUROCRYPT 2000. LNCS, vol. 1807, pp. 470–484. Springer, Heidelberg (2000) 21. Maurer, U.: Indistinguishability of Random Systems. In: Knudsen, L.R. (ed.) EUROCRYPT 2002. LNCS, vol. 2332, pp. 110–132. Springer, Heidelberg (2002) 22. McGrew, D., Viega, J.: The Security and Performance of the Galois/Counter Mode (GCM) of Operation. In: Canteaut, A., Viswanathan, K. (eds.) INDOCRYPT 2004. LNCS, vol. 3348, pp. 343–355. Springer, Heidelberg (2004) 23. McGrew, D., Fluhrer, S.: Multiple forgery attacks against Message Authentication Codes. Cryptology ePrint Archive, 2005/161

How to Thwart Birthday Attacks against MACs

245

24. Minematsu, K., Matsushima, T.: New Bounds for PMAC, TMAC, and XCBC. In: Biryukov, A. (ed.) FSE 2007. LNCS, vol. 4593, pp. 434–451. Springer, Heidelberg (2007) 25. Nandi, M.: Improved security analysis for OMAC as a pseudorandom function. Journal of Mathematical Cryptology 3(2), 133–148 (2009) 26. Semanko, M.: L-collision Attacks against Randomized MACs. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 216–228. Springer, Heidelberg (2000) 27. Yasuda, K.: A One-Pass Mode of Operation for Deterministic Message Authentication- Security beyond the Birthday Barrier. In: Nyberg, K. (ed.) FSE 2008. LNCS, vol. 5086, pp. 316–333. Springer, Heidelberg (2008) 28. Wegman, M., Carter, L.: New Hash Functions and Their Use in Authentication and Set Equality. Journal of Computer and System Sciences 22, 265–279 (1981) 29. Comments on Draft RMAC Speciﬁcation, http://csrc.nist.gov/groups/ST/toolkit/BCM/comments.html

A

Proof of Theorem 1

We abbreviate RWMAC[H, R2n,n ] to RW . Let U ∈ Σ n be the random value, and let V = H(M ) ∈ Σ n be the hash value for message M . Then it is trivial to see that the uniqueness of (Ui , Vi ) for all i ∈ Q (see Sect. 5 for deﬁnition of Q), denoted by cfq , provides the uniform distribution of tags, T q ∈ (Σ π )q . From this, we easily obtain FPRW (q, qv , ) ≤ νq,qv , (RW , suc|cfq ∧ eqs(α)) + νq, (RW , cfq ) + νq, (RW , eqs(α)) ≤ qv · νq,1, (RW , suc1 |cfq ∧ eqs(α)) + μq, (RW , cfq ) + μq, (RW , eqs(α)) q () q 1 ≤ qv · νq,1, (RW , suc1 |cfq ∧ eqs(α)) + + , (14) n nα 2 2 α+1 2 where event deﬁnitions (suc, suc1 , and eqs(α)) and probability deﬁnitions (ν , M , T), let V = H(M ). and μ) are the same as Sect. 5. For forgery attempt (U We deﬁne col as the event that (U , V ) = (Ui , Vi ) for some i ∈ Q. Now we observe νq,1, (RW , suc1 |cfq ∧ eqs(α)) ≤ νq,1, (RW , col |cfq ∧ eqs(α)) + νq,1, (RW , suc1 |col ∧ cfq ∧ eqs(α)). (15) , M ) is completely unpreHere the last term is 1/2π since the real tag for (U dictable given col ∧ cfq ∧ eqs(α) (the same as Eq. (24)). The remaining task is to evaluate the ﬁrst term of the r.h.s. of Eq. (15). We have νq,1, (RW , col |cfq ∧ eqs(α)) = μq,1, (RW , col |cfq ∧ eqs(α)), ≤

μq,1, (RW , col |eqs(α)) 1 − μq,1, (RW , cfq ∧ eqs(α))

.

(16) (17)

246

K. Minematsu

We assume the denominator being at least 1/2. The numerator is clearly at most α · () as the target forgers are T -independent and any Ui ’s equivalent class is of size at most α. Thus, we have q () q 1 1 FPRW (q, qv , ) ≤ + + qv 2α · () + π , (18) 2 2n α + 1 2nα 2 q () q 1 n−2 and α = n − 1, we have 2 2n + α+1 2nα ≤ 1/2, for any α ≥ 2. If q ≤ 2 q 1 1 2 n+1 π < . Thus, the above implies q ()/2 + q v (2(n − 1) · () + 1/2 ) 2n α+1 2α when q ≤ min{2n−2 , 2n · ()−1 }. This concludes the information-theoretic part of the proof. The computational part is trivial. if

B

Proof of Proposition 1

For simplicity, we assume π = n and Q = {1, . . . , q} (i.e., all (ui , mi )s are distinct) throughout the proof; proving under this setting is enough to prove other settings. Let FNL : (Σ n )2 → Σ n be the ﬁnalization of EH, i.e. FNL(u, s) = R(1) (u)⊕R(2) (s). Note that FNL(u, s) is equivalent to Rn+1,n (U 0)⊕Rn+1,n (S1). Then, a pair of CV (λ(U ), λ(S)) can be expressed as Λ(U, S) = λ(U 0)⊕λ(S1), where λ(U 0) and λ(S1) are 2n+1 bits CVs. Here Λ(U, S)’s weight is always 2, as U 0 and S1 never collide. Then, from Sect. 5.2 of [21] (or [3]), when the set {Λ(U1 , S1 ), . . . , Λ(Uq , Sq )} is linearly independent the outputs of FNL are perfectly random. Since this condition is equivalent to lidq , we have P FNL (Tq = tq |U q = uq , S q = sq , T q−1 = tq−1 , E) = 1/2n

(19)

for all possible arguments, when E = lidq . When E = ntvq , Eq. (19) also holds since ntvq → lidq and that ntvq is deﬁned over (U q , S q ) as well as lidq . As Tq ’s distribution in Eq. (19) is independent of actual values of U q and S q , we immediately obtain P EH (Tq = tq |U q = uq , M q = mq , T q−1 = tq−1 , E) = 1/2n

(20)

for all possible arguments, and for both E = lidq and ntvq . This proves Eq. (3) for Func = EH. When Func = RW, the proof follows from the fact that both ntvq and lidq includes cfq , which assures the uniform distribution of Tq given U q = uq , M q = mq , and T q−1 = tq−1 . We proceed to the proof of Eq. (4). From Eq. (19) it is clear that P FNL (Tq = tq , ntvq |U q = uq , S q = sq , T q−1 = tq−1 , ntvq−1 ) = P R2n,n (Tq = tq , ntvq |U q = uq , S q = sq , T q−1 = tq−1 , ntvq−1 )

(21)

holds for all possible arguments (recall that we assumed unique (uq , sq )). From Lemma 4 of [21], this equality also holds true for FNL ◦ Pre and R2n,n ◦ Pre, for

How to Thwart Birthday Attacks against MACs

247

any independently-keyed pre-processing Pre : (Σ n )2 → (Σ n )2 . Thus, by deﬁning the pre-processing as (U, M ) → (U, U ⊕ H(M )), we obtain P EH (Tq = tq , ntvq |U q = uq , M q = mq , T q−1 = tq−1 , ntvq−1 ) = P RW (Tq = tq , ntvq |U q = uq , M q = mq , T q−1 = tq−1 , ntvq−1 ).

(22)

From Lemma 6 of [21], the above implies νq, (EH, ntvq ) = νq, (RW, ntvq ). In RW, the q tags are independently random as long as ntvq is satisﬁed, thus the maximum probability of ntvq can be achieved without seeing tags, that is, by T -independent forgers (this is a simple extension of Corollary 1 (iv) of [21]: the diﬀerence is that Corollary 1 (iv) of [21] only considers chosen inputs with no randomness while in our case a part of input is independently random). Therefore, we have νq, (EH, ntvq ) = νq, (RW, ntvq ) = μq, (RW, ntvq ), which concludes the proof of Eq. (4).

C

Proof of Proposition 2

Let E = E1 ∧ E2 , where E1 = [∃ i, j, k ∈ Q, Ui = Uj = U k ∧ Si = Sj = Sk ] and def = (Uj , Sj )]. Note that E2 ≡ cfq . Also, it is easy to E2 = [∀i , j ∈ Q, (Ui , Si ) = U k ∧ Si = Sj = see that E1 is equivalent to [∃ i, j, k ∈ {1, . . . , q}, Ui = Uj Sk , M i = Mj = Mk ]. Using this, what we need to prove is cfq ∧ ntvq ≡ E. This equivalence trivially holds when |Q| ≤ 2, as both sides are false in this def case. When q¯ = |Q| ≥ 3, w.l.o.g. we assume the set {(Ui , Mi )}i=1,...,¯q consists of unique elements (i.e., Q = {1, . . . , q¯}). If a subset I ⊆ {1, . . . , q¯} whose size is an odd number ≥ 3 satisﬁes that Hw( I λ(U, S)) = (1, 1) and any I ⊂ I whose size is an odd number ≥ 3 has Hw( I λ(U, S)) = (1, 1), I is called the minimal index set. When cfq ∧ ntvq holds, there exists at least one minimal index set, which will be denoted by I ∗ (it may not be unique). The set {Ui }i∈I ∗ is uniquely partitioned into equivalent classes, i.e. the sets of identical elements. We say Ui is odd-colliding (evencolliding) if the size of Ui ’s equivalent class in I ∗ is odd (even). We use the same deﬁnition for {Si }i∈I ∗ . If Ui and Si are both odd-colliding, we say (Ui , Si ) is an odd-odd pair. In {(Ui , Si )}i∈I ∗ , there is a unique equivalent class of U whose size is odd, and a unique equivalent class of S whose size is odd, too. Here multiple odd-odd pairs do not exist in {(Ui , Si )}i∈I ∗ , as this implies cfq . Moreover, any odd-odd pair does not exist; if it exists when |I ∗ | = 3, cfq occurs by the remaining two pairs, and when |I ∗ | > 3 (as |I ∗ | must be odd, we have |I ∗ | ≥ 5 ), removing the unique odd-odd pair and an even-even pair will result in an index set I ⊂ I ∗ satisfying Hw( I λ(U, S)) = (1, 1), thus contradicting to the minimality of I ∗ . Therefore, we must have at least one odd-even or evenodd pair in I ∗ . Let us assume that (Ui , Si ) is such an odd-even pair. As Si is even-colliding, there exists j = i, j ∈ I ∗ such that Si = Sj . This implies Uj = Ui as Uj = Ui means a (U, S)-collision. From Uj = Ui , we know Uj is even-colliding, and thus there exists k ∈ I ∗ \ {i, j} such that Uj = Uk . Then Sk = Sj holds from cfq . The case that there exists one even-odd pair holds true from the symmetry. def

248

K. Minematsu

This proves the direct part, cfq ∧ntvq → E. The converse clearly holds true, and thus we have cfq ∧ ntvq ≡ E. From the deﬁnition of E, the proof is completed.

D

Proof of Lemma 1

First, we have νq,qv , (EH, suc|eqs(α) ∧ ntvq ) ≤ qv · νq,1, (EH, suc1 |eqs(α) ∧ ntvq ), ≤ qv · νq,1, (EH, lidq+1 |eqs(α) ∧ ntvq ) + qv · νq,1, (EH, suc1 |lidq+1 ∧ eqs(α) ∧ ntvq ), (23) ), λ(S))}, where lidq+1 denotes the event that {(λ(Ui ), λ(Si ))}i∈Q ∪{(λ(U where S = U ⊕ H(M ), is linearly independent. Obviously, (U , M ) = (Ui , Mi ) for any , M ) be the real tag for (U , M ). If lidq+1 occurs, Treal i ≤ q. Let Treal = EH(U is uniform and independent of previous transcripts. Hence, Treal is completely unpredictable. This means that νq,1, (EH, suc1 |lidq+1 ∧ eqs(α) ∧ ntvq ) = 1/2π .

(24)

To see νq,1, (EH, lidq+1 |eqs(α) ∧ ntvq ), the occurrence of lidq+1 indicates ), ⊕ λ(U Hw( I λ(U ) I λ(S) ⊕ λ(S)) = (0, 0) for an index set I ⊆ Q. Thus we have Hw( I λ(U ), I λ(S)) = (1, 1). As we have ntvq in the conditional clause, this is impossible if |I| ≥ 3, and also impossible if |I| = 2 (as any index set of even size can not produce (1, 1)). The only possibility is |I| = 1. This def , S) = (Ui , Si )]. Thus we obtain corresponds to the event col = [∃ i ∈ Q, (U νq,1, (EH, lidq+1 |eqs(α) ∧ ntvq ) = νq,1, (EH, col|eqs(α) ∧ ntvq ).

(25)

, M ), and (U q , M q ). When ntvq Note that col is a function of H’s key, (U occurs, any information on H’s key, K, cannot be obtained from T q , as they are independent of K (from Prop. 1). Thus, the maximum of the conditional probability of col given ntvq can be achieved by T -independent forgers. Thus, by deﬁning T as the set of all T -independent (q, qv , )-forgers, we have11 νq,1, (EH, col|eqs(α) ∧ ntvq ) = μq,1, (EH, col|eqs(α) ∧ ntvq ) = max P BEH (col|eqs(α) ∧ ntvq ) B∈T

≤ max B∈T

11

P BEH (col|eqs(α)) , P BEH (ntvq |eqs(α))

Here we derive an upper bound of the probability of a “bad” event B conditioned by a “good” event G. For a (randomized) HtM we need a similar analysis where B is the hash collision between veriﬁcation and tagging queries, and G is the uniqueness of random IVs. Note that, while the uniqueness of random IVs in HtM gives no information on the hash values, the good event G = eqs ∧ ntv for EHtM may give some, negligble information on the hash values. This is the reason why 2α() is needed rather than α() in Eq. (29).

How to Thwart Birthday Attacks against MACs

≤ max B∈T

≤

P BEH (col|eqs(α)) 1 − P BEH (ntvq ∧ eqs(α))

,

maxB∈T P BEH (col|eqs(α)) 1 − maxB ∈T

P B EH (ntv

q

∧ eqs(α))

249

(26) ≤

μq,1, (EH, col|eqs(α)) 1 − νq, (EH, eqs(α) ∧ ntvq )

, (27)

as T -independent forger is a subclass of normal forger. If μq,1, (EH, col|eqs(α)) is achieved by some B ∗ ∈ T, we have ∗

μq,1, (EH, col|eqs(α)) = P B EH (col|eqs(α))

∗ =u = m, ≤ P B EH (col|U q = uq , M q = mq , U , M eqs(α)) · PB

∗

EH

=u = m|eqs(α)) (U q = uq , M q = mq , U , M

=u = m), ≤ max P EH (∃ i : H(m) = H(mi ), u = ui |U q = uq , M q = mq , U , M

≤ max P H (H(m) = H(mi )) ≤ α(), (28) i∈Q: u=ui

where the ﬁrst sum and two maximums are taken for (uq , mq , u , m) such that (uq , mq ) satisﬁes eqs(α) and ( u, m) = ∀(ui , mi ). The third inequality follows and M are independent of H’s key (as B ∗ is T -independent), from that U q , M q , U and the last inequality follows from that |{i : u = ui }| ≤ α as eqs(α), and that H is ()-AXU. From Eqs. (23), (24), (27), (28), we have νq,qv , (EH, suc|eqs(α) ∧ ntvq ) ≤ qv (2α() + 1/2π ) ,

(29)

with the assumption νq, (EH, eqs(α) ∧ ntvq ) ≤ 1/2. This concludes the proof.

Constructing Rate-1 MACs from Related-Key Unpredictable Block Ciphers: PGV Model Revisited Liting Zhang, Wenling Wu, Peng Wang, Lei Zhang, Shuang Wu, and Bo Liang State Key Laboratory of Information Security Institute of Software, Chinese Academy of Sciences, Beijing 100190, P.R. China Graduate University of Chinese Academy of Sciences, Beijing 100049, P.R. China {zhangliting,wwl,zhanglei1015,wushuang,liangb}@is.iscas.ac.cn, [email protected]

Abstract. Almost all current block-cipher-based MACs reduce their security to the pseudorandomness of their underlying block ciphers, except for a few of them to the unpredictability, a strictly weaker security notion than pseudorandomness. However, the latter MACs oﬀer relatively low eﬃciency. In this paper, we investigate the feasibility of constructing rate-1 MACs from related-key unpredictable block ciphers. First, we show all the existing rate-1 MACs are insecure when instantiated with a special kind of related-key unpredictable block cipher. The attacks on them inspire us to propose an assumption that all the chaining values are available to adversaries for theoretically analyzing such MACs. Under this assumption, we study the security of 64 rate-1 MACs in keyed PGV model, and ﬁnd that 1) 15 MACs are meaningless; 2) 25 MACs are vulnerable to three kinds of attacks respectively and 3) 24 MACs are provably secure when their underlying block ciphers are related-key unpredictable. Furthermore, we reﬁne these 24 provably secure rate-1 MACs in Compact PGV model by removing a useless parameter away, and ﬁnd that the resulting 6 provably secure MACs are in fact equivalent to each other. In the aspect of eﬃciency, however, the low rate of these secure MACs does not necessarily mean they can run faster than none rate-1 one MACs, due to their large number of key schedules. Keywords: Message Authentication Code, Block Cipher, Mode of Operation, Provable Security.

1 1.1

Introduction Background

In cryptography, block ciphers are symmetric-key primitives, and they can only handle ﬁxed-length messages, such as AES [1]. In order to handle variable-length messages and reach diﬀerent kinds of security targets, modes of operation for S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 250–269, 2010. c International Association for Cryptologic Research 2010

Constructing Rate-1 MACs from Related-Key Unpredictable Block Ciphers

251

them are proposed, such as authentication modes, encryption modes and authenticated encryption modes. In this paper, we focus on the design of authentication modes, or block-cipherbased Message Authentication Codes. MACs are widely used to protect data integrity and data origin authentication in communications. To use a MAC, the sender and receiver should share a secret key K beforehand. When sending a message M , the sender computes T ←MAC(K, M ) as a tag, and then sends (M, T ) out. On receipt of a pair (M, T ), the receiver computes T ← MAC(K, M ), and deems message M to be valid only if T = T . The security of a MAC algorithm is evaluated by how unpredictable it is. Informally speaking, an adversary A has access to the MAC algorithm, whose key is randomly selected and kept secret from A. A can query the MAC with any message in the domain, and receives the corresponding tags; in the end, A is asked to make a forgery, i.e. to output a pair (M , T ) such that 1) M was never queried to the MAC algorithm by A and 2) T is the tag of M . The success probability for A to do this is called A’s advantage, and the MAC algorithm is deemed to be secure if all the advantages of reasonably restricted adversaries are suﬃciently small. The history of block-cipher-based MACs dates back as early as to CBC-MAC [2]. Although it is secure for ﬁxed-length messages when its underlying block cipher is a PseudoRandom Permutation (PRP) , it is not secure for variable-length messages [3]. Later, several variants of CBC-MAC were proposed to ﬁx this ﬂaw, and usual solutions include diﬀerent initial and output transformations for CBCMAC, as suggested in the ISO standard [4]. Furthermore, EMAC [5] and RMAC [6] appends an extra block-cipher invocation at the end of CBC-MAC; XCBC [7] adds secret sub-keys to the last message block; TMAC [8], OMAC [9] and CMAC [10] 1 improve XCBC by taking diﬀerent sub-key deriving methods. Recently, GCBC [11] was proposed as a generalization of XCBC, TMAC and OMAC, and it avoids length-extension attacks by applying shift operations to chaining values. Besides these, f9 [12] sums the chaining values in CBC structure up and also takes an extra block-cipher invocation in the end, while PMAC [13] takes a parallel structure other than CBC structure, and it adds distinct secret masks to message blocks to ensure the security. All these later-proposed block-cipher-based MACs are highly eﬃcient, and can be classiﬁed into rate-1 MACs. 2 Nevertheless, the provable security of these MACs is based on the assumption that their underlying block ciphers are PseudoRandom Permutations (PRPs) or even Related-Key PseudoRandom Permutations (RK-PRPs). Recall that the security goal for MACs is only unpredictability, and it is strictly weaker than pseudorandomness (we will give an example in Section 3); so, it is desirable to reduce the provable security of MACs to the unpredictability of their underlying block ciphers, other than the pseudorandomness. On the other hand, practical block ciphers seem to be less secure than expected [16,17], and this depressing fact makes it much more reasonable to reduce MAC security to the unpredictability other than the pseudorandomness of the block cipher. 1 2

CMAC belongs to OMAC family; more speciﬁcally, it is OMAC1. Rate is the average number of block-cipher invocations per message block [14,15].

252

L. Zhang et al.

As far as we know, reducing MAC security to unpredictable primitives is ﬁrst studied by An and Bellare [18], and later works include [19,20,21,22,23,24]; however, all those constructions are based on compression functions, while using length-preserving primitives (e.g. block ciphers) to do this initiated by Dodis et al. They proposed enciphered CBC mode [14] and SS-NMAC mode [15] to address this problem. These two MACs are not only provably secure based on unpredictable block ciphers, but also provably secure against Side Channel Attacks (SCAs) as long as their underlying block ciphers are secure against SCAs; unfortunately, their rates are as much as 2 or 3, and this implies they can only oﬀer relatively low eﬃciency. Then, there comes a question — How about the security of rate-1 MACs based on unpredictable block ciphers?. 1.2

Our Work

In this paper, we try to answer this question in two aspects. First, we investigate the security of current rate-1 MACs when they are instantiated with related-key unpredictable block ciphers, and ﬁnd that they are all insecure by constructing a special related-key unpredictable block cipher. Our attacks on them show that the chaining values of those MACs can hardly be kept secret from adversaries, which is fatal to their security as MACs; then, we propose a natural assumption — to study the security of MACs based on unpredictable block ciphers, assume all their chaining values are available to adversaries. Under this assumption, we try to construct rate-1 MACs in PGV model, which was proposed by Preneel, Govaerts and Vandewalle to study the security of block-cipher-based hash functions [25]. Since MACs can be seen as keyed hash functions, PGV model is naturally suitable to discuss MAC constructions after being equipped with a secret key K, as shown in Fig. 1. IB KM ? ? K ⊕->E

?

FF -⊕

?

Ti

Fig. 1. In the keyed PGV model, a basic function f (K, Mi , Ti−1 ) is deﬁned as $

f (K, Mi , Ti−1 ) = E(K ⊕ KM, IB) ⊕ FF, where K ← KE and IB, KM, FF∈ {Mi , Ti−1 , Mi ⊕ Ti−1 , Cst}

In the keyed PGV model (K-PGV for short), there are three kinds of inputs for a block cipher E, i.e. an Input Block IB, a Key Mask KM and a FeedForward FF, each of which have four choices, i.e. the current message block Mi , the last

Constructing Rate-1 MACs from Related-Key Unpredictable Block Ciphers

253

chaining value Ti−1 , their sum Mi ⊕ Ti−1 and a constant Cst. Without loss of generality, we assume T0 = Cst and all these four kinds of values and the secret key K have the same length as the block size of E. Moreover, we restrict the secret key K to be at the exact position where the block cipher key should be, because it is dangerous to take it as other inputs of block ciphers (IB and FF), even when the block ciphers are assumed to be pseudorandom [26]. K-PGV model gives us 43 = 64 rate-1 MACs, among which we ﬁnd 1) 15 MACs are meaningless, because their inputs are independent of either Mi or Ti−1 ; 2) 6 MACs are vulnerable to ﬁxed-M attack; 3) 6 MACs are vulnerable to ﬁxed-T attack; 4) 13 MACs are vulnerable to ﬁxed-(M ⊕ T ) attack; 5) the remaining 24 MACs are provably secure on the assumption that their underlying block ciphers are independently unpredictable for diﬀerent keys (or RK-UPs as we will deﬁne in Section 2). Furthermore, we ﬁnd that FF in fact has no inﬂuence over the security of these MACs, so we propose the Compact PGV model in which FF is removed from K-PGV model away. In the new model, we have six provably secure MACs, all of which are equivalent to each other in the sense that their basic functions can be transformed into one another by some invertible 2 × 2 matrix over GF(2). Unfortunately, this equivalence implies the security of the six MACs aﬀects each other. That is, if one MAC is used with a secret key K, adversaries can easily make forgeries against all the other ﬁve MACs with the same key K, although the other ﬁve may never be used with K before. This can be seen as a relatedmode attack introduced by Phan and Siddiqi [27]. To avoid this attack, we break these six MACs into three groups, in each of which the two MACs can take distinct-and-ﬁxed initial value T0 to ensure the security with each other. As we will prove, by taking distinct-and-ﬁxed T0 , the two MACs in the same group are in fact independent of each other. 1.3

Related Works

In PGV model, Preneel et al study the security of 64 block-cipher-based hash functions from the attackers’ point of view, and conclude that 4 schemes are secure and 8 more are less secure, while other schemes are vulnerable to diﬀerent kinds of attacks [25]. Then, Black et al review these hash functions by provable security techniques [28], and show that the Preneel’s 12 schemes are really secure, and 8 more are provably secure with larger security bounds, while other schemes are not. Interestingly, the 24 secure MAC constructions found in K-PGV model include the previous 20 secure hash constructions (after being equipped with a secret key K), and 4 more schemes are also provably secure as MACs, because here adversaries are not allowed to make inverse queries to block ciphers, diﬀerent from that of [28]. More clear relationships are illustrated in Table 1 of Section 4. The rest of this paper is organized as follows: section 2 introduces the symbols and security notions we will use in this paper; section 3 gives detailed attacks

254

L. Zhang et al.

on current rate-1 MACs by constructing a special unpredictable block cipher; section 4 lists the results we obtain from K-PGV model and section 5 investigates MAC security and their relationships in Compact PGV model. At last, section 6 concludes the full paper.

2

Preliminaries $

Symbols. Suppose A is a set, then #A denotes the size of set A, and x ← A denotes that x is chosen from set A uniformly at random. If a, b ∈ {0, 1}∗ are strings of equal length then a⊕b is their bitwise XOR. If a, b ∈ {0, 1}∗ are strings, then a||b denotes their concatenation. Sometimes, we write ab for a||b if there is no confusion. Furthermore, msbi (a) stands for the most signiﬁcant i bits of a, and lsbi (a) stands for the least signiﬁcant i bits of a. If M ∈ {0, 1}∗ is a string then |M | stands for its length in bits, and we let pad(M ) = M 10n−1−(|M| mod n) = M1 M2 · · · Ml , where |Mi | = n for 1 ≤ i ≤ l. Security Definitions. Denote Perm(n) as the set containing all the permutations over {0, 1}n. An adversary A is an algorithm with an oracle. A can query the oracle with any message in the domain, but should not repeat a query. For a block cipher E : KE × {0, 1}n → {0, 1}n , and a function family F : KF × {0, 1}∗ → {0, 1}n, the security notions of prp and mac are listed below, where the maximum is taken over computation time at most t, oracle queries at most q, and the aggregate length of queries at most σ blocks. In the mac secu rity notions, the event adversary AF (K,·) forges means A outputs a pair (M , T ) such that F (K, M ) = T and M was never queried to F (K, ·) by A. ⎧ $ $ ⎨ Advprp (A) def = | Pr[K ← KE : AE(K,·) = 1] − Pr[P ← Perm(n) : AP (·) = 1]|, E def ⎩ Advprp = max{Advprp E (t, q, σ) E (A)}. A ⎧ $ ⎨ Advmac (A) def = Pr[K ← KF : AF (K,·) forges], F def mac ⎩ Advmac F (t, q, σ) = max{AdvF (A)}. A

More details about these two security notions can be found in [29,3]. Next, we deﬁne the unpredictability of a block cipher E : KE × {0, 1}n → {0, 1}n under related-key chosen message attack. A Related-Key-Deriving (RKD) function φ ∈ Φ is a map φ : KE → KE , where Φ is a set of functions mapping KE to KE . Then, for a block cipher E : KE × {0, 1}n → {0, 1}n and a RKD function family Φ : KE → KE , consider the following experiment: Experiment Exprk−up E,A $

K ← KE ; while A makes a query (φ, M ) to E(K, ·), do T ← E(φ(K), M ); return T to A; until A stops and outputs (φ , M , T ) such that

Constructing Rate-1 MACs from Related-Key Unpredictable Block Ciphers

255

1) E(φ (K), M ) = T ; 2) (φ , M ) was never queried to E(K, ·); then return 1 else return 0. Deﬁne

⎧ ⎨ Advrk−up (A) def = Pr[Exprk−up = 1], E E,A def ⎩ Advrk−up = (t, q, μ) max{Advrk−up (A)}, E E A

where the maximum is taken over computation time at most t, oracle queries at (t, q, μ) is suﬃciently small, most q, whose total length is at most μ. If Advrk−up E we say block cipher E : KE × {0, 1}n → {0, 1}n is secure against Φ-restricted related-key chosen message attack. Remark 1. The way to deﬁne rk-up is similar to that of prp-rka, which was proposed by Bellare et al to theoretically study the pseudorandomness of block ciphers under related-key attacks [30]. Nevertheless, rk-up is strictly weaker than prp-rka since unpredictability is strictly weaker than pseudorandomness. Remark 2. The RKD function family Φ plays an important role in rk-up security. If Φ is not properly restricted, there may be no rk-up secure block ciphers. For example, we consider a special RKD function family ΦCst K = {φ|φ : KE → Cst}. $

That is, for any K ← KE and φ ∈ ΦCst K , we have φ(K) = Cst. Obviously, any -restricted related-key attack is easy to predict, block cipher under such a ΦCst K not to mention rk-up security. For more discussions about Φ, refer to [30,31]. Almost all current block-cipher-based MACs take only one secret key for their underlying block ciphers, except for a few of them who aim to get higher security by taking more than one block-cipher keys, e.g. RMAC [6], f9 [12] and some in the ISO standards [4]. In RMAC, the authors suggest the second block-cipher key can be obtained by K2 ⊕R, where K2 is a secret key for all messages and R is a random value for only one message; while f9 just lets K2 = K1 ⊕ Cst to obtain the second block-cipher key. In this paper, we only consider such a commonly used RKD function family Φ⊕ K = {XORKM |XORKM : K → K ⊕ KM, KM ∈ {0, 1}n}. Then, any Φ⊕ K -restricted adversary A attacking the rk-up security of $

E has access to an oracle E(K ⊕ ·, ·) with K ← KE , who will accept queries (KM,M ) ∈ {0, 1}n × {0, 1}n and returns the tag T ← E(K ⊕ KM, M ) to A. At last, A is asked to output a three-tuple (KM ,M , T ) such that 1) (KM ,M ) was never queried to E(K ⊕ ·, ·) by A and 2) T = E(K ⊕ KM , M ). If all reasonably restricted adversaries can do this within suﬃciently small probability, we say block cipher E is secure against Φ⊕ K -restricted related-key chosen message attack. For simplicity, we directly say E is rk-up secure in the rest of this paper, without pointing out that all adversaries attacking E are Φ⊕ K -restricted, and we denote E as RK-UP (Related-Key Unpredictable Permutation).

256

3

L. Zhang et al.

Attacks on Current Rate-1 MACs

The provable security of current rate-1 MACs relies on the assumption that their underlying block ciphers are PRPs or RK-PRPs. In this section, we give detailed attacks to show that their provable security can no longer exist if their underlying block ciphers are only RK-UPs. The idea comes from [18], in which An and Bellare show the basic CBC-MAC does not hold unpredictability. In our attacks, we ﬁrst construct a special block cipher E : K × {0, 1}n → {0, 1}n that is rk-up secure, but not pseudorandom, and then give attacks against the unpredictability of current rate-1 MACs instantiated with E , m1 ||m2 ||m3 ||c, if msb1 (m1 ) = 0, E (K, M ) = m1 ||c||m3 ||m4 , if msb1 (m1 ) = 1, where M = m1 ||m2 ||m3 ||m4 , |mi | = n/4 for 1 ≤ i ≤ 4, c = CBC[QK ](m1 m2 m3 m4 ) and Q : K×{0, 1}n/4 → {0, 1}n/4 is a block cipher with RK-PRP security. Notice that c is obtained by applying a RK-PRP QK to m1 m2 m3 m4 in Cipher-BlockChaining mode, which has been proved to hold pseudorandomness when its inputs are of ﬁxed-length [3]. So, c is pseudorandom, and this indicates E (K, M ) is rk-up secure; however, it is absolutely not pseudorandom since parts of its inputs are listed in the ciphertext directly. Next, we give an attack on the unpredictability of XCBC [7] instantiated with E . Notice that, to authenticate messages of length ln bits, XCBC ﬁrst deal with its ﬁrst l − 1 blocks by CBC[EK ], and then XORs a secret sub-key K2 and the 1 ]. Finally, it encrypts the sum by last message block to the output of CBC[EK 1 EK1 . The attack on XCBCE (·) is as follows, 1) Adversary A queries XCBCE (·) with 0n , obtains the tag T 1 = t11 t12 t13 t14 ; 2) A queries XCBCE (·) with 10n−1 , obtains the tag T 2 = t21 t22 t23 t24 ; 3) A makes a forgery (M , T 1 ), where M = (t11 t12 t13 t24 )||T 1 , if msb1 (t11 ) = 0, 2 2 2 1 1 M = (t1 t2 t3 t4 )||T , if msb1 (t11 ) = 1. By the deﬁnitions of E and XCBC, it is easy to get that the secret sub-key K2 in XCBC is (t11 t12 t13 t24 ) if msb1 (t11 ) = 0 or (t21 t22 t23 t14 ) if msb1 (t11 ) = 1 after the above attack. Then, the validity of the forgery is obvious. Since TMAC [8], OMAC [9] and CMAC (OMAC1) [10] are variants of XCBC by taking diﬀerent sub-key deriving methods, the same attack applies to them as well. What is more, the other existing rate-1 MACs are also vulnerable when instantiated with E , and we describe the attacks on them in Appendix A. The reason behind the insecurity of these MACs instantiated with E is that the secrecy of their chaining values can no longer be kept; so, we propose the following assumption,

Constructing Rate-1 MACs from Related-Key Unpredictable Block Ciphers

257

Assumption: To study the security of MACs based on unpredictable block ciphers, assume all their chaining values are available to adversaries. This assumption gives much more power to the attackers than that in the usual black-box model [3], and it may even overkill the current rate-1 MACs; however, it indeed explain why the existing rate-1 MACs are no longer secure when their underlying block ciphers are only RK-UPs, and also it helps to understand why SS-NMAC is provably secure against SCAs as long as its underlying block ciphers are [15]. Moreover, this assumption aﬀects the security deﬁnition of MACs a little. That is, under such an assumption adversaries should not forge with a message which after being padded is a preﬁx of a queried message, although the forgery message may never be queried to the MACs before. This seems to bring trouble into MAC security; however, we can apply preﬁx-free encoding to messages and it is easy to achieve by simply prepending each message with a block denoting its length in bits, as suggested in [32].

4

Rate-1 MACs from K-PGV Model

In this section, we consider the feasibility of constructing rate-1 MACs from RK-UPs in K-PGV model. As shown in Fig. 1, K-PGV model gives us 64 basic functions fs (K, Mi , Ti−1 ) (s = 1, 2, · · · , 64), all of which can be used in an iterative way to construct MACs Fs (K, M ) who can authenticate arbitrarylength messages. Without loss of generality, we assume pad(M ) = M1 M2 · · · Ml ; then, Fs (K, M ) is deﬁned as follows, MAC Fs (K, M ) $

K ← KE ; for i = 1 to l do Ti ← fs (K, Mi , Ti−1 ) end for return Tl . Next, we study the security of Fs (K, M ) as MACs and ﬁnd the main results as follows, while the details are listed in Table. 1. 1) 15 MACs are meaningless, because their inputs are independent of either Mi or Ti−1 ; 2) 6 MACs are vulnerable to attack 1 — ﬁxed-M attack, who can make a forgery for M 2 by simply choosing any queried message M 1 , where pad(M 1 ) = M11 M21 · · · Ml1 , and let pad(M 2 ) = pad(M 1 )||Ml1 ; 3) 6 MACs are vulnerable to attack 2 — ﬁxed-T attack, who can forge with M 2 , 1 ||(Ml1 ⊕ Δ) and Δ can be any non-zero where pad(M 2 ) = M11 M21 · · · Ml−1 n value in {0, 1} ; 4) 13 MACs are vulnerable to attack 3 — ﬁxed-(M ⊕ T ) attack, who can forge 1 ); with M 2 , where pad(M 2 ) = pad(M 1 )||(Ml1 ⊕ Tl1 ⊕ Tl−1 5) 24 MACs are provably secure, on the assumption that their underlying block cipher is rk-up secure. We will prove this in Theorem 1.

258

L. Zhang et al.

Table 1. The security of 64 MACs from K-PGV model. “–” means the MAC is meaningless because its inputs are independent of either Mi or Ti−1 ; a number i (i = 1, 2, 3) means the MAC is vulnerable to attack i; the MACs marked with fi (i = 1, 2, · · · , 24) are provably secure.

choice of KM choice of FF Mi Mi Ti−1 Mi ⊕ Ti−1 Cst Ti−1 Mi Ti−1 Mi ⊕ Ti−1 Cst Mi ⊕ Ti−1 Mi Ti−1 Mi ⊕ Ti−1 Cst Cst Mi Ti−1 Mi ⊕ Ti−1 Cst

choice of IB Mi Ti−1 Mi ⊕ Ti−1 – f17 f20 1 f5 f8 1 f7 f6 – f15 f19 f1 2 f4 f21 – f24 f3 2 f2 f23 – f22 f9 f12 3 f11 f10 3 f14 f18 3 f13 f16 3 – 2 3 1 – 3 1 2 3 – – 3

Cst – 1 1 – 2 – 2 – 3 3 3 3 – – 3 –

We also ﬁnd that all the MACs with a ﬁxed key K ⊕ Cst (KM = Cst) are either insecure or meaningless, and this implies within this model, it is impossible to construct a rate-1 MAC from only unpredictable block ciphers. The basic functions in the 24 provably secure MACs are marked as fi (i = 1, 2, · · · , 24), of which the ﬁrst 20 (being removed the secret key K away) are the exact compression functions of the 20 provably secure hash functions [28]. The extra 4 (f21 , f22 , f23 , f24 ) can be used to construct provably secure MACs (with K), but not hash functions (without K). The reason is that, in attacks on MACs adversaries are not allowed to make inverse queries to block cipher E, but they can do this in attacks on hash functions, since the latter is considered within the ideal cipher model [33,28]. Theorem 1. Suppose the underlying block cipher E : KE × {0, 1}n × {0, 1}n is rk-up secure, then Fs [E] (s = 1, 2, · · · , 24) is provably secure for prefix-free messages. More concretely, we have rk−up 2 Advmac (t , q , μ ), Fs [E] (t, q, μ) ≤ (σ − σ + 1)AdvE

where σ is the total block length of all q queried messages plus the block length of the forgery message, t = t + O(σ), q = σ − 1, μ = μ + O(σ). Proof. To upper bound the success probability for any adversary A attacking the mac security of Fs [E], we construct an adversary B attacking the rk-up security

Constructing Rate-1 MACs from Related-Key Unpredictable Block Ciphers

259

Game 0 Game 1 Range← {T0 }; Collisionw ←False, for w ≥ 1; z ← 1. when A makes a query M j , where Pad(M j ) = M1j M2j · · · Mljj , j = 1, 2, · · · , q 01. for i = 1 to lj do j j 02. renew KM, IB, FF with (Mij , Ti−1 , Mij ⊕ Ti−1 , Cst) by the deﬁnition of fs ; j 03. Ti = OB (K ⊕ KM, IB) ⊕ FF; j1 j 04. if Tij ∈Range and j1 < j s.t. M1j1 M2j1 · · · Mi−1 = M1j M2j · · · Mi−1 05. then { Collisionz ←True; Stop. } 06. end if 07. Range←Range∪{Tij }; z ← z + 1; return Tij to A; 08. end for when A makes a forgery (M , T ), where Pad(M ) = M1 M2 · · · Ml 11. for i = 1 to l − 1 do 12. renew KM, IB, FF with (Mi , Ti−1 , Mi ⊕ Ti−1 , Cst) by the deﬁnition of fs ; 13. Ti = OB (K ⊕ KM, IB) ⊕ FF; j1 14. if Ti ∈Range and j1 ∈ {1, 2, · · · , q} s.t. M1j1 M2j1 · · · Mi−1 = M1 M2 · · · Mi−1 15. then { Collisionz ←True; Stop. } 16. end if 17. Range←Range∪{Ti }; z ← z + 1; return Ti to A; 18. end for 19. renew KM, IB, FF with (Ml , Tl −1 , Ml ⊕ Tl −1 , Cst) by the deﬁnition of fs ; 20. if T = OB (K ⊕ KM, IB) ⊕ FF return 1 else return 0 end if Fig. 2. Deﬁnitions for Game 0 (excluding the boxed codes) and Game 1 (including the boxed codes), in which adversary B simulates A’s oracle Fs [E] with its own oracle OB (·, ·) = E(·, ·) and the deﬁnition of Fs , for s = 1, 2, · · · , 24

of E. B will simulate A’s oracle Fs [E](·) with its own oracle OB (·, ·) = E(·, ·) and the deﬁnition of Fs , as in Fig. 2. In either Game 0 or Game 1, A can make any preﬁx-free queries, get not only the corresponding tags but also the chaining values; at last, he is asked to make a forgery. However, the forgery message should not be a preﬁx of a queried message by the arguments in the end of Section 3. The only diﬀerences between Game 0 and Game 1 are the boxed codes in lines 05 and 15, where the ﬂag Collisionz would be true and then B would stop the simulation. We denote such an event by Collz . Let event Coll be Coll1 ∨ Coll2 ∨ · · · ∨ Collσ−1 . So, we get | Pr[A forges in Game 0] − Pr[A forges in Game 1]| = | Pr[A forges in Game 0 ∧ Coll] + Pr[A forges in Game 0 ∧ Coll] − Pr[A forges in Game 1 ∧ Coll] − Pr[A forges in Game 1 ∧ Coll]| = | Pr[A forges in Game 0 ∧ Coll] − Pr[A forges in Game 1 ∧ Coll]| = | Pr[A forges in Game 0|Coll] − Pr[A forges in Game 1|Coll]| × Pr[Coll] ≤ Pr[Coll]. (1)

260

L. Zhang et al.

Furthermore, noticing that Pr[A forges in Game 1] = Pr[A forges in Game 1|Coll] × Pr[Coll] + Pr[A forges in Game 1|Coll] × Pr[Coll] ≤ Pr[Coll] + Pr[A forges in Game 1|Coll],

(2)

by inequalities (1) and (2), we have Pr[A forges in Game 0] ≤ 2 Pr[Coll] + Pr[A forges in Game 1|Coll]

(3)

Next, we show that the two items in the right side of inequality (3) are both sufﬁciently small, because the occurrence of either Coll or [A forges in Game 1|Coll] implies B can make a successful forgery against the rk-up security of E. If Coll occurs, then at least one of [Colli |Colli−1 ∧· · ·∧Coll0 ] for i = 1, 2, · · · , σ− 1 occurs, where Coll0 is the null event. By this we let B select T ∈ Range uniformly at random, and make a forgery (KM, IB, T ⊕ FF) against the rk-up security of E, where KM, IB, FF are from lines 03 and 13 of Fig. 2 at the moment z = i. Notice that event [Colli |Colli−1 ∧ · · · ∧ Coll0 ] occurs implies B’s forgery is valid; however, E is rk-up secure by assumption, so we get Pr[Colli |Colli−1 ∧ · · · ∧ Coll0 ] × 1i ≤ Advrk−up (ti−1 , i − 1, μi−1 ), which implies Pr[Colli |Colli−1 ∧ · · · ∧ E rk−up Coll0 ] ≤ i × AdvE (ti−1 , i − 1, μi−1 ). Then, we get Pr[Coll] = Pr[Coll1 ∨ Coll2 ∨ · · · ∨ Collσ−1 ] ≤ σ−1 i=1 Pr[Colli |Colli−1 ∧ · · · ∧ Coll0 ] σ−1 (ti−1 , i − 1, μi−1 )) ≤ i=1 (i × Advrk−up E σ−1 rk−up ≤ i=1 i × AdvE (tσ−2 , σ − 2, μσ−2 ) σ(σ − 1) Advrk−up = (tσ−2 , σ − 2, μσ−2 ) E 2

(4)

Similarly, if event [A forges in Game 1|Coll] occurs, we let B directly make a forgery (KM, IB, T ⊕ FF) against the rk-up security of E, where KM, IB, FF are from line 19 in Fig. 2 and T is from A’s forgery. Also, by assumption E is rk-up secure, we have Pr[A forges in Game 1|Coll] ≤ Advrk−up (tσ−1 , σ − 1, μσ−1 ) E

(5)

Combining inequalities (3), (4) and (5), we know that for any adversary A attacking the mac security of Fs [E], the following holds, Advmac Fs [E] (A) = Pr[A forges in Game 0] σ(σ − 1) ≤2× Advrk−up (tσ−2 , σ − 2, μσ−2 ) + Advrk−up (tσ−1 , σ − 1, μσ−1 ) E E 2 ≤ (σ 2 − σ + 1)Advrk−up (tσ−1 , σ − 1, μσ−1 ). E

Constructing Rate-1 MACs from Related-Key Unpredictable Block Ciphers

261

rk−up 2 Finally, we get Advmac (t , q , μ ), where σ is the Fs [E] (t, q, μ) ≤ (σ −σ+1)AdvE total block length of all q queried messages plus the block length of the forgery message, t = t + O(σ), q = σ − 1, μ = μ + O(σ).

In Theorem 1, we reduce the mac security of Fs [E] to the rk-up security of E, under the assumption that adversaries of Fs [E] can observe all its chaining values. This implies in practical implementations for Fs [E], engineers do not have to protect the secrecy of its chaining values, so Fs [E] is provably secure against SCAs as long as E is. In this sense, the security of Fs [E] is more reliable than those from PRF (PseudoRandom Function) to PRF reductions since the latter MACs are treated as black boxes in the analysis [3,5,6,7,8,9,10,11,12,13]. Furthermore, notice that unpredictability requires block ciphers much less than pseudorandomness does; on the other hand, related-key attacks (especially the kind we consider here, Φ⊕ K -restricted as in Section 2) have become common analysis methods for block ciphers, and block ciphers are expected to be secure against such attacks in their designs. So, the security level that Fs [E] asks for E is not hard for practical block ciphers to reach. Nevertheless, we note that rk-up and PRF are two separate security notions from theory, although unpredictability is strictly weaker than PRF, because related key is a notion independent of unpredictability and pseudorandomness. In the aspect of eﬃciency, the 24 secure rate-1 MACs may not run faster than none rate-1 MACs, due to their large number of key schedules. Since for many practical block ciphers, the time for key schedule is no shorter than that for an encryption. However, notice that there are 8 MACs out of the 24 (Fi for i = 5, 6, 7, 8, 15, 17, 19.20) whose KM are independent of the chaining values Ti−1 , and they may pre-compute the key-schedules once having obtained Mi , so these 8 MACs may oﬀer relatively high eﬃciency.

5

Compact PGV Model

In the proof for the 24 secure MACs from K-PGV model, it is easy to ﬁnd that FF in fact has no inﬂuence over their security, and this observation can also be gotten from Table 1. So, it is natural to remove FF from K-PGV model away, and we call the remaining Compact PGV model, as illustrated in Fig. 3. In Compact PGV model, we have 16 basic functions gs (K, Mi , Ti−1 ) = E(K ⊕ KM, IB) (s = 0, 1, · · · , 15) to construct rate-1 MACs Gs (K, M ) in the same way as we deﬁne Fs (s = 1, 2, · · · , 64) by fs . To be concrete, Gs (K, M ) is deﬁned as follows, MAC Gs (K, M ) $

K ← KE ; for i = 1 to l do Ti ← gs (K, Mi , Ti−1 ) end for return Tl . where pad(M ) = M1 M2 · · · Ml and T0 = Cst.

262

L. Zhang et al. IB KM ? ? K ⊕->E

?

Ti

$

Fig. 3. In Compact PGV model with secret key K ← KE , a block cipher E has two inputs: an input block IB and a key mask KM. A basic function g(K, Mi , Ti−1 ) is deﬁned as g(K, Mi , Ti−1 ) = E(K ⊕ KM, IB), where IB, KM∈ {Mi , Ti−1 , Mi ⊕ Ti−1 , Cst}.

5.1

Rate-1 MACs from Compact PGV Model

The security evaluations for the 16 MACs are shown in Table 2, and they can also be got from Table 1 directly. Table 2. The security of 16 MACs from Compact PGV model. “–” means the MAC is meaningless because its inputs are independent of Mi or Ti−1 ; the number 3 means the MAC is vulnerable to attack 3; the MACs marked with gs (s = 0, 1, · · · , 5) are provably secure.

choice of KM Mi Ti−1 Mi ⊕ Ti−1 Cst

choice of IB Mi Ti−1 Mi ⊕ Ti−1 – g0 g5 g1 – g4 g2 g3 3 – – 3

Cst – – 3 –

Compact PGV model gives us a more clear view on the MACs, whose security is related with the independence of KM and IB. More speciﬁcally, the MACs with independent KM and IB are provably secure, while others are not. 5.2

Equivalence of the Six Secure MACs

Next, we study the relationships among these six secure MACs from Compact PGV model, and ﬁnd that they are in fact equivalent to each other, in the sense that ∀0 ≤ s1 ≤ s2 ≤ 5 there exists an invertible 2 × 2 matrix Ai (1 ≤ i ≤ 6) over gs2 . That is, , IBs2), GF(2) who cantransform gs1into (KM s1 , IBs1 )×A i= (KMs2 10 11 11 01 10 , A2 = , A3 = , A4 = , A5 = , where A1 = 0 1 1 0 0 1 1 1 11 01 A6 = . In other words, the basic functions in these six MACs can be 10 generated from any one of them by invertible 2 × 2 matrixes over GF(2), whose total number is exactly six.

Constructing Rate-1 MACs from Related-Key Unpredictable Block Ciphers

263

The goodness of this equivalence includes not only the convenience for us to get a better understanding on these six secure MACs, but also the fact that any one of the six is as secure as the others; however, this equivalence also implies if one of them is used with secret key K, then adversaries can easily make forgeries against the other ﬁve with the same K, although the other ﬁve may never be used with K before. As an example, suppose an adversary has access to G0 under a secret key K, and he queries G0 with message M = T0 ||T1 || · · · ||Tl , obtaining the tag Tl+1 (He can make such a query because he can observe the chaining values by our assumption and he can decide Mi+1 = Ti once he has obtained Ti ). Then, he outputs a forgery (M, Tl+1 ) against G1 under the same K. The forgery is valid because the adversary never queried (M, Tl+1 ) to G1 under K. What is more, this adversary can also make similar forgeries against Gs under K (for s = 2, 3, 4, 5) by querying G0 with a carefully selected message. This can be seen as a related-mode attack introduced by Phan and Siddiqi [27], which is dangerous since in many practical protocols, such as those of IPSec [34], there are several comparable algorithms for the users to choose. Some lazy users may take the same key for diﬀerent algorithms, and in such a case Gs (s = 0, 1, · · · , 5) can not be used in the same protocol together. 5.3

Independence Classes

The equivalence of the six secure MACs makes it inconvenient to use them in practice; luckily, for parts of the six, it is easy to break their equivalence. That is, we let G0 and G3 take distinct-and-ﬁxed T0 , then we can prove they are independent of each other. The same technique also applies to break the independence of G1 and G4 , G2 and G5 ; thus, we have three independent classes. Theorem 2. Suppose Gs1 and Gs2 (s1 = (s2 + 3) mod 6) have the same secret $

key K ← KE but distinct-and-fixed T0 , and their underlying block cipher E : KE × {0, 1}n × {0, 1}n is rk-up secure, then they are independent of each other. The proof idea is that, for any adversary A attacking the independence of Gs1 and Gs2 , we let it have access to two oracles Gs1 [E] and Gs2 [E]. A can query these two oracles with any preﬁx-free messages and obtain not only the corresponding tags but also the chaining values; at last, A is asked to make a forgery against either Gs1 [E] or Gs2 [E]. If A can do this with a non-trivial probability, we say that Gs1 and Gs2 are not independent of each other. However, as we will prove, the success probability for A to forge against either Gs1 [E] or Gs2 [E] is suﬃciently small, and it can be reduced to the rk-up security of E. Thus, Gs1 [E] and Gs2 [E] are independent of each other. The detailed proof is given in Appendix B. However, taking distinct-and-ﬁxed T0 can not guarantee the independence of Gs1 and Gs2 , where s1 = (s2 + 3) mod 6.

264

6

L. Zhang et al.

Conclusions and Future Work

To sum up, we study the provable security of MACs based on related-key unpredictable block ciphers in this paper, and obtain both good news and bad news. The bad news are mainly two folds: ﬁrstly, all current rate-1 MACs may not guarantee their provable security when instantiated with related-key unpredictable block ciphers; secondly, in the keyed PGV model 25 MACs are vulnerable to three kinds of attacks respectively. The good news is that 24 provably secure rate-1 MACs are found in the keyed PGV model, whose provable security relies on the related-key unpredictability of their underlying block ciphers. Furthermore, we study the 16 rate-1 MACs in Compact PGV model, and ﬁnd that the six provably secure MACs are equivalent to each other, which implies relatedmode attacks on them. Then, we give a suggestion for parts of the six to avoid such attacks by taking distinct-and-ﬁxed initial values. In the aspect of eﬃciency, these provably secure rate-1 MACs may not run faster than none rate-1 MACs due to their large number of key schedules. Furthermore, we ﬁnd that in the keyed PGV model all the MACs with a ﬁxed key K ⊕ Cst (KM = Cst) are either insecure or meaningless. This implies within this model, it is impossible to construct a rate-1 MAC from only unpredictable block ciphers. However, it is still unknown whether it is possible to do this beyond the keyed PGV model, and we leave this as an open question. Acknowledgments. The authors would like to thank the anonymous referees for their valuable comments. Special thanks to Kan Yasuda for his help to revise this paper. Furthermore, this work is supported by the National High-Tech Research and Development 863 Plan of China (No. 2007AA01Z470), the National Natural Science Foundation of China (No. 60873259, and No. 60903219) and the Knowledge Innovation Project of The Chinese Academy of Sciences.

References 1. FIPS 197. Advanced Encryption Standard (AES). National Institute of Standards and Technology (2001) 2. FIPS 113. Computer Data Authentiaction. National Institute of Standards and Technology (1985) 3. Bellare, M., Kilian, J., Rogaway, P.: The Security of Cipher Block Chaining. In: Desmedt, Y.G. (ed.) CRYPTO 1994. LNCS, vol. 839, pp. 341–358. Springer, Heidelberg (1994) 4. ISO/IEC 9797 – 1, Information Technology – Security Techniques – Message Authentication Codes (MACs) – Part 1: Mechanisms Using A Block Cipher. International Organization for Standardization (1999) 5. Petrank, E., Rackoﬀ, C.: CBC MAC for Real-Time Data Sources. J. Cryptology 13(3), 315–338 (2000) ´ Joux, A., Valette, F.: On the Security of Randomized CBC-MAC 6. Jaulmes, E., Beyond the Birthday Paradox Limit: A New Construction. In: Daemen, J., Rijmen, V. (eds.) FSE 2002. LNCS, vol. 2365, pp. 237–251. Springer, Heidelberg (2002)

Constructing Rate-1 MACs from Related-Key Unpredictable Block Ciphers

265

7. Black, J., Rogaway, P.: CBC MACs for Arbitrary-Length Messages: The Three-Key Constructions. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 197–215. Springer, Heidelberg (2000) 8. Kurosawa, K., Iwata, T.: TMAC: Two-Key CBC MAC. In: Joye, M. (ed.) CT-RSA 2003. LNCS, vol. 2612, pp. 33–49. Springer, Heidelberg (2003) 9. Iwata, T., Kurosawa, K.: OMAC: One-Key CBC MAC. In: Johansson, T. (ed.) FSE 2003. LNCS, vol. 2887, pp. 129–153. Springer, Heidelberg (2003) 10. Special Publication 800-38B. Recommendation for Block Cipher Modes of Operation: The CMAC Mode for Authentication. National Institute of Standards and Technology, http://csrc.nist.gov/groups/ST/toolkit/BCM/current_modes.html 11. Nandi, M.: Fast and Secure CBC-Type MAC Algorithms. In: Dunkelman, O. (ed.) FSE 2009. LNCS, vol. 5665, pp. 375–393. Springer, Heidelberg (2009) 12. Speciﬁcation of the 3GPP Conﬁdentiality and Integrity Algorithms; Document 1: f8 and f9 Speciﬁcations, http://www.3gpp.org/ftp/Specs/archive/35_series/35.201/ 13. Black, J., Rogaway, P.: A Block-Cipher Mode of Operation for Parallelizable Message Authentication. In: Knudsen, L.R. (ed.) EUROCRYPT 2002. LNCS, vol. 2332, pp. 384–397. Springer, Heidelberg (2002) 14. Dodis, Y., Pietrzak, K., Puniya, P.: A New Mode of Operation for Block Ciphers and Length-Preserving MACs. In: Smart, N.P. (ed.) EUROCRYPT 2008. LNCS, vol. 4965, pp. 198–219. Springer, Heidelberg (2008) 15. Dodis, Y., Steinberger, J.: Message Authentication Codes from Unpredictable Block Ciphers. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 267–285. Springer, Heidelberg (2009) 16. Biham, E., Dunkelman, O., Keller, N.: A Related-Key Rectangle Attack on the Full KASUMI. In: Roy, B.K. (ed.) ASIACRYPT 2005. LNCS, vol. 3788, pp. 443–461. Springer, Heidelberg (2005) 17. Biryukov, A., Khovratovich, D., Nikoli´c, I.: Distinguisher and Related-Key Attack on the Full AES-256. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 231–249. Springer, Heidelberg (2009) 18. An, J.H., Bellare, M.: Constructing VIL-MACs from FIL-MACs: Message Authentication under Weakened Assumptions. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 252–269. Springer, Heidelberg (1999) 19. Maurer, U.M., Sj¨ odin, J.: Single-Key AIL-MACs from Any FIL-MAC. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 472–484. Springer, Heidelberg (2005) 20. Bellare, M.: New Proofs for NMAC and HMAC Security Without CollisionResistance. In: Dwork, C. (ed.) CRYPTO 2006. LNCS, vol. 4117, pp. 602–619. Springer, Heidelberg (2006) 21. Bellare, M., Ristenpart, T.: Hash Functions in the Dedicated-Key Setting: Design Choices and MPP Transforms. In: Arge, L., Cachin, C., Jurdzi´ nski, T., Tarlecki, A. (eds.) ICALP 2007. LNCS, vol. 4596, pp. 399–410. Springer, Heidelberg (2007) 22. Hirose, S., Park, J.H., Yun, A.: A Simple Variant of the Merkle-Damg˚ ard Scheme with a Permutation. In: Kurosawa, K. (ed.) ASIACRYPT 2007. LNCS, vol. 4833, pp. 113–129. Springer, Heidelberg (2007) 23. Yasuda, K.: A Single-Key Domain Extender for Privacy-Preserving MACs and PRFs. In: Lee, P.J., Cheon, J.H. (eds.) ICISC 2008. LNCS, vol. 5461, pp. 268–285. Springer, Heidelberg (2009)

266

L. Zhang et al.

24. Yasuda, K.: A Double-Piped Mode of Operation for MACs, PRFs and PROs: Security beyond the Birthday Barrier. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 242–259. Springer, Heidelberg (2009) 25. Preneel, B., Govaerts, R., Vandewalle, J.: Hash Functions Based on Block Ciphers: A Synthetic Approach. In: Stinson, D.R. (ed.) CRYPTO 1993. LNCS, vol. 773, pp. 368–378. Springer, Heidelberg (1994) 26. Wang, P., Feng, D., Wu, W., Zhang, L.: On the Unprovable Security of 2-Key XCBC. In: Mu, Y., Susilo, W., Seberry, J. (eds.) ACISP 2008. LNCS, vol. 5107, pp. 230–238. Springer, Heidelberg (2008) 27. Phan, R.C.W., Siddiqi, M.U.: Related-Mode Attacks on Block Cipher Modes of Operation. In: Gervasi, O., Gavrilova, M.L., Kumar, V., Lagan´ a, A., Lee, H.P., Mun, Y., Taniar, D., Tan, C.J.K. (eds.) ICCSA 2005. LNCS, vol. 3482, pp. 661– 671. Springer, Heidelberg (2005) 28. Black, J., Rogaway, P., Shrimpton, T.: Black-Box Analysis of the Block-CipherBased Hash-Function Constructions from PGV. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, pp. 320–335. Springer, Heidelberg (2002) 29. Luby, M., Rackoﬀ, C.: How to Construct Pseudo-Random Permutations from Pseudo-Random Functions (Abstract). In: Williams, H.C. (ed.) CRYPTO 1985. LNCS, vol. 218, p. 447. Springer, Heidelberg (1986) 30. Bellare, M., Kohno, T.: A Theoretical Treatment of Related-Key Attacks: RKAPRPs, RKA-PRFs, and Applications. In: Biham, E. (ed.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 491–506. Springer, Heidelberg (2003) 31. Lucks, S.: Ciphers Secure against Related-Key Attacks. In: Roy, B.K., Meier, W. (eds.) FSE 2004. LNCS, vol. 3017, pp. 359–370. Springer, Heidelberg (2004) 32. Coron, J.S., Dodis, Y., Malinaud, C., Puniya, P.: Merkle-Damg˚ ard Revisited: How to Construct a Hash Function. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 430–448. Springer, Heidelberg (2005) 33. Shannon, C.E.: Communication Theory of Secrecy Systems. Bell Systems Technical Journal 28(4), 656–715 (1949) 34. Kent, S., Atkinson, R.: Security Architecture for the Internet Protocol. RFC 2401, standards track, the Internet Society (1998)

A

Attacks on Some Current Rate-1 MACs

Here, we give the attacks on the unpredictability of other existing rate-1 MACs instantiated with a special kind of rk-up block cipher E , as deﬁned in Section 3. Our attacks can be seen as extensions of An and Bellare’s attack on the basic CBC-MAC [18], and they show that all the existing rate-1 MACs may not hold their unpredictability when their underlying block ciphers are only rk-up secure; however, these attacks do not necessarily mean the non-existence of secure rate-1 MACs based on only unpredictable block ciphers. Due to limitation of pages, we describe the attacks without introducing the corresponding MAC algorithms. Attack on RMAC [6]. Adversary A does as follows, 1) queries RMACE (·) with 0n ||10n−2 , obtains the tag T = t1 t2 t3 t4 and a random value R; 2) makes a forgery (R, M , T ), where M = 0n ||(03n/4 ||(t4 ⊕ 0n/4−1 1))||10n−2 . This attack also applies to EMAC [5].

Constructing Rate-1 MACs from Related-Key Unpredictable Block Ciphers

267

Attack on GCBC1 [11]. Adversary A does as follows, 1) queries GCBC1E (·) with 0n ||10n−1 , obtains the tag T = t1 t2 t3 t4 ; 2) makes a forgery (M , T ), where M = 0n ||(03n/4 ||x)||10n−1 and x = lsb1 (t3 )|| thrmmsbn/4−1 (t4 ). Attack on GCBC2 [11]. Adversary A does as follows, 1) queries GCBC2E (·) with 0n ||10n−1 , obtains the tag T 1 = t11 t12 t13 t14 ; 2) queries GCBC2E (·) with (01n−4 000)||10n−1, obtains the tag T 2 = t21 t22 t23 t24 ; 3) makes a forgery (M , T 2 ), where M = 0n ||((03n/4 ||x1 )⊕(03n/4 ||x2 )⊕10n−1 ), x1 = lsb1 (t13 )||msbn/4−1 (t14 ) and x2 = lsb1 (t23 )||msbn/4−1 (t24 ). Attack on f9 [12]. If msb1 (N ) = 0 (N = COUNT||FRESH), adversary A does as follows, 1) queries f9E (·) with M = 10n−2 , obtains the tag T = t1 t2 t3 t4 and a nonce N = n1 n2 n3 n4 ; 2) makes a forgery (N, M , T ), where M = M1 ||M1 ||10n−2 and M1 = 03n/4 || (n4 ⊕ n3 ⊕ t3 ). If msb1 (N ) = 1, we can deﬁne E as follows to attack f9, m1 ||m2 ||m3 ||c, if msb1 (m1 ) = 1, E (K, M ) = m1 ||c||m4 ||m3 , if msb1 (m1 ) = 0, where c is the same as in E . Attack on PMAC [13]. Adversary A does as follows, 1) queries PMACE (·) with 0n , obtains the tag T 1 = t11 t12 t13 t14 ; 2) queries PMACE (·) with 10n−1 , obtains the tag T 2 = t21 t22 t23 t24 ; 3) makes a forgery (M , T ), where ⎧ if msb1 (t11 ) = 0, ⎨ M = t11 t12 t13 t24 , M = t21 t22 t23 t14 , if msb1 (t11 ) = 1, ⎩ T = M · x.

B

Proof for the Independence of Gs1 and Gs2 , Where s1 = (s2 + 3) mod 6

Proof. If there exists an adversary A who can attack the independence of Gs1 [E] and Gs2 [E], it implies A is able to attack the mac security of either Gs1 [E] or Gs2 [E]. Now we show that the success probability for A to do the latter is upper bounded since E is rk-up secure by assumption. The following proof is much similar to that for Theorem 1. We deﬁne two Games in Fig. 4, where an adversary B will simulate A’s oracles Gs1 [E] and

268

L. Zhang et al.

Game 0 Game 1 Range← {Ts1,0 , Ts2,0 }, Collisionw ←False, for w ≥ 1; z ← 1. when A makes a query Msj to Gs , where s ∈ {s1, s2} and j j j Pad(Msj ) = Ms,1 Ms,2 · · · Ms,l , j = 1, 2, · · · , qs s,j 01. for i = 1 to ls,j do j j j j 02. renew KM, IB with (Ms,i , Ts,i−1 , Ms,i ⊕ Ts,i−1 , Cst) by the deﬁnition of gs ; j 03. Ts,i = OB (K ⊕ KM, IB); j j1 j1 j1 j j j 04. if Ts,i ∈Range and j1 < j s.t. Ms,1 Ms,2 · · · Ms,i−1 = Ms,1 Ms,2 · · · Ms,i−1 05. then { Collisionz ←True; Stop. } 06. end if j j 07. Range←Range∪{Ts,i }; z ← z + 1; return Ts,i to A; 08. end for 10. when A makes a forgery (M , T ) to Gs , where s ∈ {s1, s2} and Pad(M ) = M1 M2 · · · Ml 11. for i = 1 to l − 1 do 12. renew KM, IB with (Mi , Ti−1 , Mi ⊕ Ti−1 , Cst) by the deﬁnition of gs ; 13. Ti = OB (K ⊕ KM, IB); 14. if Ti ∈Range and j1 ∈ {1, 2, · · · , qs } j1 j1 j1 s.t. Ms,1 Ms,2 · · · Ms,i−1 = M1 M2 · · · Mi−1 15. then { Collisionz ←True; Stop. } 16. end if 17. Range←Range∪{Ti }; z ← z + 1; return Ti to A; 18. end for 19. renew KM, IB with (Ml , Tl −1 , Ml ⊕ Tl −1 , Cst) by the deﬁnition of gs ; 20. if T = OB (K ⊕ KM, IB) return 1 else return 0 end if Fig. 4. Deﬁnitions for Game 0 (excluding the boxed codes) and Game 1 (including the boxed codes), in which adversary B simulates adversary A’s oracles Gs1 [E] and Gs2 [E] with its own oracle OB (·, ·) = E(·, ·) combining the deﬁnitions of Gs1 and Gs2 , where s1 = (s2 + 3) mod 6.

Gs2 [E] with its own oracle OB (·, ·) = E(·, ·) combining the deﬁnitions of Gs1 and Gs2 . Finally, B will attack the rk-up security of E. In either Game 0 or Game 1, A can make any preﬁx-free queries, get not only the corresponding tags but also the chaining values; at last, he is asked to make a forgery against either Gs1 [E](·) or Gs2 [E](·). Also, the forgery message should not be a preﬁx of a queried message. Unlike that in Fig. 2, the Range in Fig. 4 is deﬁned as the set containing the outputs of E when dealing with both Gs1 [E](·) and Gs2 [E](·). By similar discussions as in the proof for Theorem 1, we get Pr[A breaks the independence of Gs1 [E] and Gs2 [E]] mac ≤ max{Advmac Gs1 [E] (A), Adv Gs2 [E] (A)} ≤ Pr[A forges in Game 0]

≤ Pr[Coll] + Pr[A forges in Game 1] ≤ 2 Pr[Coll] + Pr[A forges in Game 1|Coll]

Constructing Rate-1 MACs from Related-Key Unpredictable Block Ciphers

269

σ (σ − 1) Advrk−up (tσ −2 , σ − 2, μσ −2 ) + Advrk−up (tσ −1 , σ − 1, μσ −1 ) E E 2 ≤ (σ 2 − σ + 1)Advrk−up (tσ −1 , σ − 1, μσ −1 ), E ≤2×

where the event Coll is the same as that in the proof for Theorem 1 and σ is the total block length of all queried messages (to both Gs1 [E] and Gs2 [E]) plus the block length of the forgery message (to either Gs1 [E] or Gs2 [E]). Thus, any adversary A in fact has a suﬃciently small probability to make a forgery against either Gs1 [E](·) or Gs2 [E](·), after having queried Gs1 [E](·) and Gs2 [E](·) for some time (this is measured by the total block length σ ). Finally, we conclude that Gs1 [E](·) and Gs2 [E](·) are independent of each other.

Higher Order Diﬀerential Attack on Step-Reduced Variants of Luﬀa v1 Dai Watanabe1 , Yasuo Hatano1 , Tsuyoshi Yamada2 , and Toshinobu Kaneko2 1

Systems Development Laboratory, Hitachi, Ltd., 292 Yoshida-cho, Totsuka-ku, Yokohama, 244-0817, Japan 2 Science University of Tokyo, 2641 Yamazaki, Noda, Chiba, 278-8510, Japan [email protected]

Abstract. In this paper, a higher order diﬀerential attack on the hash function Luﬀa v1 is discussed. We conﬁrmed that the algebraic degree of the permutation Qj which is an important non-linear component of Luﬀa grows slower than an ideal case both by the theoretical and the experimental approaches. According to our estimate, we can construct a distinguisher for step-reduced variants of Luﬀa v1 up to 7 out of 8 steps by using a block message. The attack for 7 steps requires 2216 messages. As far as we know, this is the ﬁrst report which investigates the algebraic property of Luﬀa v1. Besides, this attack does not pose any threat to the security of the full-step of Luﬀa v1 nor Luﬀa v2. Keywords: Hash function, Luﬀa, Higher order diﬀerential attack, Nonrandomness.

1

Introduction

A cryptographic hash function has a lot of application such as a digital signature and a message authentication code. Recently, several important breakthroughs have been made in the cryptanalysis against hash functions and they imply that most of the currently used standard hash functions are vulnerable against new attacks. In these circumstances, National Institute of Standards and Technology (NIST) decided to organize Cryptographic Hash Algorithm Competition (The SHA-3 competition) [13] and started to call for algorithms. Luﬀa [6] is a family of hash functions submitted to the SHA-3 competition and was selected as one of the second round candidates. Luﬀa modiﬁed its algorithm at the beginning of the second round and the current algorithm is called Luﬀa v2. Throughout this document, we discuss the algorithm submitted to Round 1 (Luﬀa v1) and denote it Luﬀa. The self-security evaluations in the supporting document for the Round 1 [7] mainly discuss generic attacks and diﬀerential cryptanalysis. Besides, analyses based on algebraic approach is not discussed seriously in the document. In this paper, we are going to investigate the algebraic property of step-reduced variants of Luﬀa by a higher order diﬀerential attack. S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 270–285, 2010. c International Association for Cryptologic Research 2010

Higher Order Diﬀerential Attack on Step-Reduced Variants of Luﬀa v1

271

An application of a higher order diﬀerence to cryptanalysis was suggested by Lai [11] and Knudsen ﬁrstly presented the higher order diﬀerential attack to a block cipher [10]. The higher order diﬀerential attack is a tool to analyze the algebraic property of the target function, especially its algebraic degree. The application to stream ciphers was proposed by Dinur and Shamir [9] and Aumasson et al. proposed a cube tester [1,2] which intends to detect the non-randomness of the target function. The cube tester has been applied not only to stream ciphers, but also to several hash functions submitted to the SHA-3 competition such as MD6 and Hamsi. Recently, Aumasson and Meier proposed the zero-sum attack which is an application of the higher order diﬀerential attack [3]. In this paper, ﬁrstly we conﬁrm that the algebraic degree of Qj grows slower than an ideal case both by the theoretical estimate and the experiments. According to our estimate, we can construct a distinguisher for reduced step Luﬀa up to 7 out of 8 steps by using a block message. The attack for 7 steps requires 2216 messages. As far as we know, this is the ﬁrst report which investigates the algebraic property of Luﬀa v1. Besides, this attack does not pose any threat to the security of the full-step of Luﬀa v1 nor Luﬀa v2. The rest of this paper is organized as follows: Firstly the speciﬁcation of Luﬀa is brieﬂy introduced in Section 2. Secondly the deﬁnition of the higher order diﬀerence and its basic property is introduced in Section 3. The increase of the algebraic degree by the iteration of the step function is investigated in Section 4. Then the higher order diﬀerential attack on step-reduced variant of the permutation Qj and its extension to the hash function is given in Section 5. We conclude the discussion in Section 6.

2

Speciﬁcation of Luﬀa

In this section, we introduce a part of the speciﬁcation of Luﬀa which is needed to describe the attack. Please refer to [6] for the detail of the speciﬁcation. 2.1

Chaining

The chaining of Luﬀa is a variant of a sponge function [4,5]. Figure 1 shows the basic structure of the chaining. The chaining of a hash function consists of iterations of a round function. The message is padded by 10...0 in order to the padded message length is divisible by 256. Round Function. The round function is a composition of a message injection function M I and w permutations Qj of 256 bits input (See Figure 1). Let the (i−1) (i−1) , . . . , Hw−1 ), then the output of the i-th round input of the i-th round be (H0 is given by (i)

Hj = Qj (Xj ), X0 || · · · ||Xw−1 = (0)

where Hj

= Vj .

0 ≤ j < w,

(i−1) (i−1) M I(H0 , . . . , Hw−1 , M (i) ),

D. Watanabe et al. M( 2 )

V1

Vw - 1

Q0 MI

Q1

Qw - 1

H ( 1 0) H ( 1 1)

H ( 1 w) - 1

0

H ( N )0 H ( N )1

H ( N )w - 1

Round function

V0

0

Round function

M( 1 )

Round function

272

Round function if N>1 Z0

Z1

Fig. 1. The Luﬀa construction

In the speciﬁcation of Luﬀa, the input length of the sub-permutation Qj is ﬁxed to nb = 256 bits, and the number of the sub-permutations w is 3, 4 and 5 for the hash length 256, 384 and 512 bits respectively. The message injection functions can be represented by a matrix over the ring 32 map from an 8 words value (a0 , . . . , a7 ) to an element of the ring GF(28 ) . The is deﬁned by ( 0≤k<8 ak,l xk )0≤l<32 . Note that the least signiﬁcant word a7 is the coeﬃcient of the heading term x7 in the polynomial representation. Finalization. The ﬁnalization consists of iterations of an output function OF and a round function with a ﬁxed message 0x00...0. If the number of (padded) message blocks is more than one, a blank round with a ﬁxed message block 0x00...0 is applied at the beginning of the ﬁnalization. The output function OF XORs all block values and outputs the resultant 256bit value. Let the output at the i-th iteration be Zi , then the output function is deﬁned by Zi =

w−1

(N +i )

Hj

,

j=0

where i = i if N = 1 and i = i + 1 otherwise. If the hash length is 256-bit, the output is Zi . For longer hash lengths, more than one round outputs are used to generate the hash values. 2.2

Non-linear Permutation

The permutation Qj is deﬁned as a composition of an input tweak and iterations of a step function Step. The number of iterations of a step function is 8 and the tweak is applied only once per a permutation.

Higher Order Diﬀerential Attack on Step-Reduced Variants of Luﬀa v1 a ( r - 1 0)

a ( r - 1 1)

a ( r - 1 2)

a ( r - 1 3)

a ( r - 1 4)

SubCrumb (bit slice)

MixWord

a ( r - 1 5)

a ( r - 1 6)

273

a ( r - 1 7)

SubCrumb (bit slice)

MixWord

MixWord

MixWord

AddConstant

a ( r )0

a ( r )1

a ( r )2

a ( r )3

a ( r )4

a ( r )5

a ( r )6

a ( r )7

32 bits

Fig. 2. The step function

At the beginning of the step function process, the 256 bits data stored in 8 (r) 32-bit registers is denoted by ak for 0 ≤ k < 8. The data before applying the (0) permutation Qj is denoted by bk and the data after the tweak is denoted by ak . The step function consists of the following three functions; SubCrumb, MixWord, AddConstant. The pseudo code for Qj is given by Permute(a[8], j){ //Permutation Q_j Tweak(a); for (r = 0; r < 8; r++){ SubCrumb(a[0],a[1],[2],a[3]); SubCrumb(a[4],a[5],[6],a[7]); for (k = 0; k < 4; k++) MixWord(a[k],a[k+4]); AddConstant(a, j, r); } } Each function is described below in turn and the tweaks are described in Section 2.2. We omit the description of AddConstant because it is not needed in this paper. Substitution. SubCrumb substitutes l-th bits of a0 , a1 , a2 , a3 (or a4 , a5 , a6 , a7 ) by an Sbox S deﬁned by S[16] = {7, 13, 11, 10, 12, 4, 8, 3, 5, 15, 6, 0, 9, 1, 2, 14}. Let the output of SubCrumb be x0 , x1 , x2 , x3 (or x4 , x5 , x6 , x7 ). Then the substitution by SubCrumb is given by x3,l ||x2,l ||x1,l ||x0,l = S[a3,l ||a2,l ||a1,l ||a0,l ], x7,l ||x6,l ||x5,l ||x4,l = S[a7,l ||a6,l ||a5,l ||a4,l ],

0 ≤ l < 32, 0 ≤ l < 32.

274

D. Watanabe et al.

Linear Diﬀusion. MixWord is a linear permutation of two words. Let the output words be yk and yk+4 where 0 ≤ k < 4. Then MixWord is given by the following equations: yk+4 = xk+4 ⊕ xk , yk = xk ≪ σ1 , yk = yk ⊕ yk+4 , yk+4 = yk+4 ≪ σ2 , yk+4 = yk+4 ⊕ yk , yk = yk ≪ σ3 , yk = yk ⊕ yk+4 , yk+4 = yk+4 ≪ σ4 . The parameters σi are given by σ1 = 2, σ2 = 14, σ3 = 10, σ4 = 1. Tweaks. For each permutation Qj , the least signiﬁcant four words of a 256-bit input are rotated by j bits to the left in 32-bit registers. Let the j-th block, k-th word input be bj,k and the tweaked word (namely the input to the ﬁrst step (0) function) be aj,k , then the tweak is deﬁned by (0)

aj,k,l = bj,k,l ,

0 ≤ k < 4,

(0)

4 ≤ k < 8.

aj,k,l = bj,k,(l−j mod 32) ,

3

Higher Order Diﬀerential Attack

An application of a higher order diﬀerence to cryptanalysis was suggested by Lai [11] and Knudsen ﬁrstly presented the higher order diﬀerential attack to a block cipher [10]. The higher order diﬀerential attack is a tool to analyze the algebraic property of the target function, especially its algebraic degree. In this section, we give a deﬁnition of the higher order diﬀerence. In addition, the meaning of the distinguishing attack on a hash function is discussed. 3.1

Higher Order Diﬀerence n

m

Let Y = f (X) be a function where X ∈ GF(2) , Y ∈ GF(2) . Let {A1 , . . . , Ai } n be a set of linearly independent vectors in GF(2) and V (i) be the sub-space spanned by these vectors. The i-th order diﬀerence is deﬁned by f (X + A). ΔV (i) f (X) = A∈V (i)

In the following, Δ(i) denotes ΔV (i) if the choice of V (i) does not matter in the discussion. The basic fact of the higher order diﬀerence is that Δ(D+1) f (X) = 0 if the algebraic degree of f with respect to X is D. Therefore the higher

Higher Order Diﬀerential Attack on Step-Reduced Variants of Luﬀa v1

275

order diﬀerence is used as the tool to evaluate the algebraic degree of the target function. In addition to the original deﬁnition of the higher order diﬀerence, we import some terms and notations from the Square attack. The Square attack was proposed by Daemen et al. in 1997 as the dedicated attack on the block cipher Square [8]. Let a Λ-set be a set consisting of 16 states such that their values in some crumbs (4-bit inputs to an S-box) are all diﬀerent (these crumbs are called active) and their values are all equal in other crumbs (called passive). The basic idea of Square attack is that a permutation preserves the status active or passive. In the higher order diﬀerential attack, this observation is useful to choose V (i) . If V (i) consists of active crumbs and passive crumbs, the increase of algebraic order at the ﬁrst Sbox can be ignored by replacing the inputs of the Sboxes by the corresponding outputs. 3.2

Distinguishing Attack on a Hash Function

We will clarify what the terminology distinguisher means in this paper. A distinguisher for a family of functions F and a set of all functions which maps {0, 1}m to {0, 1}n is deﬁned as a program that, given a function f , determines if f belongs to F . Therefore, a discussion on distinguishing attack makes sense only if the target function has a parameter. Besides, the naive deﬁnition of a collision resistant hash function does not take secret key. Therefore the application of the distinguishing attack in practice is limited to keyed applications such as HMAC. Dealing the IVs as a parameter (as in the discussion of security proof) is another possible situation. Note that the distinguisher on a hash function (family) only detects a kind of non-randomness property of the target, does not violate collision resistance, second preimage resistance, nor preimage resistance. Even though distinguishing attacks reveal only non-randomness, we believe that this can be a ﬁrst step to analyze the target function. By the deﬁnition, it is possible to calculate the higher order diﬀerences of arbitrary functions including hash functions. Let f be a randomly chosen function whose input length is n-bit. Then the algebraic degree of f is expected n − 1 so that the event that the i-th order diﬀerence Δ(i) f is rarely zero if i is not much less than n. We use this property as a distinguisher and claim that the attack is successful if such events are detected.

4

Algebraic Degree of Non-linear Permutation Qj

It is pointed out in [7] that the Boolean polynomial representations of the Sbox of Luﬀa are sparse, especially at the highest degree. The ﬁrst step of the theoretical estimate is to observe how this property aﬀects the increase of the algebraic degree throughout the iterations of the step functions. In the following, the r (r) iterations of the step function is denoted by Qj . The original permutation of (8)

Luﬀa is given by Qj = Qj .

276

4.1

D. Watanabe et al.

Boolean Representations of Sbox

Let the inputs and outputs of the Sbox be x0,l , x1,l , x2,l , x3,l and y0,l , y1,l , y2,l , y3,l . Then the polynomial representations of the relations between the input and output bits are given by y0,l = 1 + x2,l + x0,l x1,l +x1,l x3,l + x2,l x3,l + x0,l x1,l x3,l , y1,l = 1 + x0,l + x2,l + x0,l x1,l + x0,l x2,l + x3,l +x1,l x3,l + x2,l x3,l + x0,l x1,l x3,l , y2, l = 1 + x1,l +x1,l x3,l + x2,l x3,l + x0,l x1,l x3,l , y3,l = x0,l + x1,l + x2,l + x0,l x1,l + x1,l x2,l + x0,l x1,l x2,l + x1,l x3,l .

4.2

Basic Facts

It is clear from the simple observation of the Boolean representations of the Sbox that the terms whose degrees are more than one and which has monomial x3,l in y0,l , y1,l , y2,l are equal. Let ηl · x3,l be the common part in y0,l , y1,l , y2,l and ξk,l be the remainders (The strict deﬁnitions of ηl and ξk,l are given in Section 4.3). Then the multiplication of yk,l and yk ,l for k = k is given by yk,l · yk ,l = (ξk,l + ηl x3,l )(ξk ,l + ηl x3,l ) = ξk,l ξk ,l + (ξk,l + ξk ,l + 1)ηl x3,l . (1) Therefore, we get deg yk,l · yk ,l < deg yk,l + deg yk ,l . This indicates that the (r) designer’s estimate of the algebraic degree of Qj is too optimistic. We should carefully estimate it. On the other hand, MixWord() is the function whichsums up yk,l over the subscript l: zk,l = MixWord(yk , yk+4 )k,l = ι∈Ωl yk,ι + ι∈Ω yk+4,ι . Then zk,l l are given by zk,l =

ι∈Ωl

ξk,ι +

ι∈Ωl

ξk+4,ι +

ι∈Ωl

ηl x3,l +

ηl x7,l

(2)

ι∈Ωl

for k = 0, 1, 2, where ηl is calculated in the same manner as ηl but diﬀers at the choice of the variables. ηl uses x4,l , x5,l , x6,l , x7,l instead of x0,l , x1,l , x2,l , x3,l . The property, that the higher degree terms of y0,l , y1,l , y2,l are the same, is preserved by MixWord(). AddConstant() has no inﬂuence on this property. 4.3

Recurrence Relations about Algebraic Degree

The observations in Section 4.2 indicates that only SubCrumb() contributes to the increase of the algebraic degree. In the following, we identify the iterations of the Sboxes (SubCrumb()) as the iterations of the step functions for the simple discussion. Let us denote the inputs to the l-th Sbox in the r-th step function by (r−1) (r−1) (r−1) (r−1) (x0,l , x1,l , x2,l , x3,l ) and denote ηl , ξk,l by

Higher Order Diﬀerential Attack on Step-Reduced Variants of Luﬀa v1 (r)

ηl

(r)

(r)

(r)

(r)

(r)

277

(r) (r)

= ηl (x0,l , x1,l , x2,l ) = x1,l + x2,l + x0,l x1,l ,

(r)

(r)

(r)

(r)

(r)

(r) (r)

(r)

(r)

(r)

(r)

(r)

(r)

(r)

(r)

(r)

(r)

(r)

(r)

(r)

(r)

(r)

ξ0,l = ξ0,l (x0,l , x1,l , x2,l ) = 1 + x2,l + x0,l x1,l , (r) (r)

(r) (r)

ξ1,l = ξ1,l (x0,l , x1,l , x2,l ) = 1 + x0,l + x2,l + x0,l x1,l + x0,l x2,l , ξ2,l = ξ2,l (x0,l , x1,l , x2,l ) = 1 + x1,l , (r)

(r)

(r)

(r) (r)

(r) (r)

(r) (r) (r)

ξ3,l = ξ3,l (x0,l , x1,l , x2,l ) = x0,l + x1,l + x2,l + x0,l x1,l + x1,l x2,l + x0,l x1,l x2,l . In other words, ηl · x3,l denotes the common terms of the polynomial representations and ξk,l denotes the diﬀerent terms which do not have the variable x3,l . In (r) (r) (r) (r) addition, we denote the terms of degree d in ηl , ξk,l by ηl,d , ξk,l,d respectively. (r)

(r)

(r)

Now we are going to estimate the algebraic degree of xk,l , ηl , ξk,l by the recurrence relations. We approximate the relations in order to simplify their representations and Equation 1 is applied once for each variable in the estimation. (r) (r−1) (r−1) (r) (r) (r) Let us denote δl = deg ηl + deg x3,l , k,k ,l = deg ξk,l + deg ξk ,l . Then we have the following relations: (r)

(r−1)

(r−1)

∼

max(0,1,l , deg max(ξ0,l

(r) deg ξ0,l

∼

(r) deg ηl ,

deg ξ1,l

(r)

∼

max(deg ξ1,l

(r)

∼

max(deg ξ1,l

=

(r−1) (r−1) max(deg ξ0,l , deg ξ2,l )

deg ηl

deg ξ2,l

deg ξ3,l,2

(r−1)

deg ξ3,l,3 ∼ max( deg ξ0,l

(r−1)

, ξ1,l

(r−1)

, deg ξ2,l

(r−1)

, δl

(r−1)

(r−1)

(r−1)

(r−1)

(r−1)

) + max(deg ξ0,l

+

(r−1)

(r−1)

(r−1)

∼ ∼

(r−2) (r−2) (r−1) max(0,1,l , 0,2,l , δl ),

(r)

∼

δl

∼ max(

),

(8) (9) (10) (11)

+

(r−2) deg ξ1,l

+

(r−2) deg ξ2,l ,

(r−2) (r−2) (r−2) (r−2) max(0,1,l , 0,2,l , 1,2,l , deg x3,l ) (r−2)

2 deg ξ1,l

(7)

,

,

(r−2) deg ξ0,l

), (5)

(r−1) (r−1) max(deg ξ1,l , δl ),

+ deg ξ2,l

(r) deg x1,l

(r) deg x3,l

(r−1)

, δl

(6)

(r−2) (r−1) max(0,1,l , δl ),

(r−1)

(3)

),

+ deg ξ1,l

(r−1)

deg x2,l

),

(4)

max(0,1,l , 0,2,l , 1,2,l ) + δl (r) deg x0,l

(r−1)

) + δl

(r−2)

+ deg ξ3,l

(r−2)

+ δl

,

).

(12)

The detailed calculations to get the relations are given in Appendix A. 4.4

Theoretical Estimate of Algebraic Degrees

Table 1 shows the pace of increase of algebraic degrees of variables xk,l , ξk,l , ηl from the recurrent relations 2 to 11 and the initial values at r = 0, 1. The (r) input/output length of the non-linear permutation Qj is 256 bits so that the algebraic degrees are at most 256. However we put the estimated degrees as it is, even if it is more than 256, in order to clarify the pace of increase.

278

D. Watanabe et al. Table 1. Pace of increase of algebraic degrees (Theoretical estimate) (r)

r x0,l

(r)

x1,l

(r)

x2,l

(r)

x3,l

(r)

ξ0,l

(r)

ξ1,l

(r)

ξ2,l

(r)

ξ3,l

(r)

ηl

0 1 1 1 1 2 2 1 2 2 1 3 3 3 3 5 5 3 7 5 2 8 8 8 7 13 13 8 18 13 3 20 20 20 18 33 33 20 46 33 4 51 51 51 46 84 84 51 117 84 5 130 130 130 117 214 214 130 298 214 6 331 331 331 298 545 545 331 759 545 7 843 843 843 759 1,388 1,388 843 1,933 1,388 8 2,147 2,147 2,147 1,933 3,535 3,535 2,147 4,923 3,535

Higher Order Diﬀerential Attack on Luﬀa

5

(r)

The designers of Luﬀa expected the algebraic degree of the permutation Qj is given by 3r [7]. However, as shown in the previous section, the degree increases (r−1) (r−1) slower than the ideal case. In addition, the high order part ηl · x3,l of (r)

the variables xk,l are common for k = 0, 1, 2. We use this property to construct (r)

a distinguisher for the permutation Qj . Then we extend the attack to 7-step Luﬀa. 5.1

Theoretical Estimate (r)

(r−1)

Remind the deﬁnitions of ξk,l and ηl that xk,l = ξk,l order part

(r−1) (r−1) ηl · x3,l

of the variables

(r) xk,l

(r−1) (r−1) x3,l .

+ ηl

The high

are common for k = 0, 1, 2, so that (r)

(r)

it can be eliminated by the addition (on the binary ﬁeld) xk,l + xk ,l . We propose (r) (r) xk +xk

to use a 32-bit value as the higher order diﬀerential distinguisher. Now (r−1) (r−1) (r−1) it is clear that the important variables in this attack are ξ0,l , ξ1,l , ξ2,l , (r)

not xk,l . (r)

These observations indicate that the number of steps r for which Qj be attacked can be estimated by the maximum degree (5)

can

(r−2) of ξk,l . In Table 1, (6) Qj , which calculates

maxk deg ξk,l is 214 so that there is a distinguisher on 214-th order diﬀerence. This distinguisher for 6 steps does not depend on the choice of the input space V (i) . By the careful choice of the input space V (i) , we can extend this distinguisher to 7 steps. There are two known techniques to skip the increase of the algebraic degree by applying SubCrumb() in the ﬁrst step. The ﬁrst one is to choose the input space V (i) in which the inputs to the Sboxes are active or passive. For example, if the V (i) takes all values in xk,l for 0 ≤ k < 8 and 0 ≤ l < t, we can ignore the eﬀect of SubCrumb() at the ﬁrst step. The second technique is to

Higher Order Diﬀerential Attack on Step-Reduced Variants of Luﬀa v1

279

vary only a bit per an Sbox. This technique is applicable only if the algebraic degree of the target function is small. Let us denote m1 -th bit to m2 -th bit of the variable x by x[m1 ..m2 ]. The distinguisher for 7 steps takes xk [0..26] for all k as variables. In other words, all possible values of xk,l for 0 ≤ k < 8 and 0 ≤ l < 27 (8) appear once. This distinguisher requires 2216 messages. On the other hand, Qj (6)

is not expected to be distinguishable because max deg ξk,l = 545 > 256. 5.2

Experimental Inspection

By performing experiments, we check if the theoretical estimates summarized in 1 are reliable. We applied the “a bit per an Sbox” technique which is one of the two techniques to ignore the eﬀect of the ﬁrst step, as mentioned in the previous section. We did not apply the other technique. If we did, the active Sboxes are relatively sparse, so that it would be possible to skip the SubCrumb() in the second step by choosing a good alignment of the active Sboxes. However, our purpose is not to optimize the attack, but to check if the theoretical estimates summarized in Table 1 is reliable so that this kind of “unexpected” skip is not desired. Therefore, we calculated t-th order diﬀerences by varying the least (0) signiﬁcant t bits of the 32-bit variable x0 for 1 ≤ t ≤ 32. We calculated each higher order diﬀerence for 100 times by randomly generating the initial states. The experimental results are summarized in Table 2 where the numerical (r) (r) (r) (r) (r) values show the ratio that one of the equations x0 = x1 , x0 = x2 , x1 = (r) (r) (r) (r) (r) (r) (r) x2 , x4 = x5 , x4 = x6 , x5 = x6 holds, where r means the number of steps. In other words, the values mean the ratio of the distinguishing attack being successful. Table 3 shows the comparison between the theoretical estimates (See Table 1) and the experimental results (See Table 2). (r) We calculated the algebraic degree of Qj from the experimental results by (r)

the order. Let t be the lowest number such that the t-th order diﬀerential of xk (r) is equal to zero with probability one. The degree of Qj is formally estimated at (r−2)

t − 1. This may cause the contradictions in Table 3 such that the degree of ξk,l (r−1)

is larger than that of xk,l for r = 1, 2. In other cases, the Table 3 indicates that the theoretical estimates in Table 1 are very close to the experimental results in Table 21 . 5.3

Higher Order Diﬀerential Attack on the Hash Function

The higher order diﬀerential attack on a hash function does not violate the central three requirements for a hash function, namely collision resistance, second preimage resistance, preimage resistance. On the other hand, the distinguishing attacks are useful to check whether or not the target function has pseudo-randomness 1

We append a note that the t − 1-th order diﬀerentials are rarely constants, so that (r) it might be better to estimate the degree of Qj by t.

280

D. Watanabe et al. (r)

Table 2. The success rates of the distinguishing attacks on the permutation Qj (Experimental results)

Order 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Number of steps 1 2 3 4 5 1.00 0.39 0.00 0.00 0.00 1.00 1.00 0.12 0.00 0.00 1.00 1.00 0.56 0.00 0.00 1.00 1.00 0.93 0.00 0.00 1.00 1.00 1.00 0.00 0.00 1.00 1.00 1.00 0.01 0.00 1.00 1.00 1.00 0.04 0.00 1.00 1.00 1.00 0.16 0.00 1.00 1.00 1.00 0.45 0.00 1.00 1.00 1.00 0.83 0.00 1.00 1.00 1.00 0.97 0.00 1.00 1.00 1.00 1.00 0.00 1.00 1.00 1.00 1.00 0.00 1.00 1.00 1.00 1.00 0.00 1.00 1.00 1.00 1.00 0.00 1.00 1.00 1.00 1.00 0.00 1.00 1.00 1.00 1.00 0.01 1.00 1.00 1.00 1.00 0.00 1.00 1.00 1.00 1.00 0.00 1.00 1.00 1.00 1.00 0.00 1.00 1.00 1.00 1.00 0.00 1.00 1.00 1.00 1.00 0.03 1.00 1.00 1.00 1.00 0.04 1.00 1.00 1.00 1.00 0.13 1.00 1.00 1.00 1.00 0.21 1.00 1.00 1.00 1.00 0.38 1.00 1.00 1.00 1.00 0.46 1.00 1.00 1.00 1.00 0.71 1.00 1.00 1.00 1.00 0.83 1.00 1.00 1.00 1.00 0.85 1.00 1.00 1.00 1.00 0.93 1.00 1.00 1.00 1.00 0.99

Table 3. The summary of the algebraic degrees Number of steps 1 Algebraic degree Theoretical estimate 1 (r−1) (max0≤k≤2,l xk,l ) Experimental result 1 Distinguisher’s degree Theoretical estimate – (r−2) (max0≤k≤2,l ξk,l ) Experimental result –

2 3 1 2 2

3 8 7 5 5

4 5 6 7 8 20 51 130 – – 18 – – – – 13 33 84 214 – 12 ≥ 32 – – –

Higher Order Diﬀerential Attack on Step-Reduced Variants of Luﬀa v1

281

M( 1 ) V0

Q0

H ( 1 0)

2 V1

Q1

H ( 1 1)

Z0

2 V2

Q2 2

H ( 1 2)

MI

Fig. 3. Luﬀa for a block message (w = 3)

which is also required to a hash function. Here we consider the higher order differential attack on the 7-step Luﬀa hash function. The ﬁrst point of Luﬀa is that there is no blank round if the message length is less than 256 bits. In this case, the message is mixed by the message injection function M I, permuted by non-linear permutation Qj , then the XORed 256-bit value is output. Therefore, it might be possible to construct a distinguisher based on a higher order diﬀerence if the algebraic degree of Qj is smaller than 256 for all j. Because the only non-linear components in Luﬀa are Qj s which diﬀer only in their tweaks and their step constants. In order to extend the distinguisher for (7) Qj to the one for the 7-step Luﬀa, we consider the inﬂuences by M I and the tweaks. In the following, we show that neither M I nor the tweaks has inﬂuence, which is not diﬃcult. Firstly, the message injection function M I consists of the constant multi32 32 plication over GF(28 ) . This map stabilizes subspaces of GF(28 ) given by a t natural injection of GF(28 ) where t ≤ 32. Therefore there is no inﬂuence of the message injection function M I if the input space is the direct product of the Λ-set. Secondly, the tweaks rotate the lower 4 words a4 , a5 , a6 , a7 by j bits to the left in a word. Obviously, the tweaks preserve the properties active and passive. Therefore the input space V (i) which cancels the inﬂuence of the message injection function also cancels that of tweaks. (7) These two facts indicate that the distinguisher for Qj is also applicable to the reduced step hash function as it is. 5.4

Probabilistic Distinguisher

Table 2 shows that the behavior of the distinguisher is probabilistic if the order is less than the expected algebraic degree. Here we discuss how to reduce the complexity of the attack. If the target function is suﬃciently random, the probability to eventually ﬁnd a local collision xk = xk for any k, k is given by 6 · 2−(32+1)/2 ∼ 2−14 and it

282

D. Watanabe et al.

is small2 . Therefore xk + xk can be used as a distinguisher even if the event is probabilistic. For example, Table 2 shows that 3 of 100 trials successfully found the partial collision with the 22-th order diﬀerence for 5 steps. In this case, the computational complexity is 222 × 100 ∼ 228.6 , which is smaller than the complexity of the attack with the deterministic distinguisher 233 . On the other hand, we have no idea to theoretically estimate the frequency of this probabilistic event so that it is not clear how much the computational complexity can be reduced in the case of larger number of steps. In addition, the expected degree of the distinguisher for 8 steps is much larger than 256 so that the distinguisher is expected to include many high order terms. We expect that it is diﬃcult to apply the higher order diﬀerential distinguisher to 8 steps if the probabilistic event can be observed more often than the deterministic event. 5.5

Zero-Sum Attack

Zero-sum attack was recently proposed by Aumasson and Meier [3] and it is an application of higher order diﬀerential attack. The basic idea of the zero-sum attack is to choose the Λ-set as the intermediate variables and estimate the increases of the algebraic degrees at the input and output of the target function. If the algebraic degrees (to both sides) are low, there is an certain set of inputs such that (a) their xoring is zero, and (b) the xoring of their corresponding outputs is also zero. This is a property which an ideal permutation does not have, and the zero-sum attack uses this property as the distinguisher. By intuition the zero-sum attack enables to attack double more rounds than the original higher order diﬀerential attack. They claimed that the attack on the permutation Qj of Luﬀa requires 281 inputs. It is obvious that the number of required inputs can be reduced to 233 due to our evaluation result (See Table 1). However, as they mentioned, it is not obvious problem to ﬁnd an adequate set of messages which satisﬁes zero-sum property for all Qj . 5.6

The Higher Order Diﬀerential Attack on Luﬀa v2

Luﬀa changed its algorithm at the beginning of the Round 2 and it is called Luﬀa v2. We do not describe the changes in detail, but the most signiﬁcant change of Luﬀa v2 in terms of higher order diﬀerential attack is that a blank round in the ﬁnalization process is applied even if the message length is less than 256 bits. Therefore 16 step functions are always applied for any message block so that their algebraic degree is not likely to be less than 256.

6

Conclusion

In this paper, a higher order diﬀerential attack on the hash function Luﬀa is discussed. We conﬁrmed that the algebraic degree of the underlying non-linear 2

In [11] Lai pointed out that Prob(δVi f (a) = b) is either 0 or at least 2i−n where f : GF(2n ) → GF(2n ). But this is not our case because the domain of our distinguisher is larger than the range. The probabilistic behavior of our distinguisher may be (r) caused by the terms of high degree of Qj being sparsely distributed.

Higher Order Diﬀerential Attack on Step-Reduced Variants of Luﬀa v1

283

permutation Qj increases slower than expected both by the theoretical estimate and the experiments. According to our estimate, we can construct a distinguisher for reduced step Luﬀa up to 7 out of 8 steps by using a block message. The attack for 7 steps requires 2216 messages. As far as we know, this is the ﬁrst report which investigates the algebraic property of Luﬀa v1. Besides, this attack does not pose any threat to the security of the full-step of Luﬀa v1 nor Luﬀa v2.

Acknowledgements The authors would like to thank Hirotaka Yoshida and the anonymous reviewers of FSE 2010 for their helpful comments and suggestions.

References 1. Aumasson, J.-P., Dinur, I., Meier, W., Shamir, A.: Cube Testers and Key Recovery Attacks On Reduced-Round MD6 and Trivium. In: Dunkelman, O. (ed.) FSE 2009. LNCS, vol. 5665, pp. 1–22. Springer, Heidelberg (2009) 2. Aumasson, J.-P., Dinur, I., Henzen, L., Meier, W., Shamir, A.: Eﬃcient FPGA Implementations of High-Dimensional Cube Testers on the Stream Cipher Grain128. In: Special-purpose Hardware for Attacking Cryptographic Systems, SHARCS 2009 (2009) 3. Aumasson, J.P., Meier, W.: Zero-sum distinguishers for reduced Keccak-f and for the core functions of Luﬀa and Hamsi (2009), http://www.131002.net/data/papers/AM09.pdf 4. Bertoni, G., Daemen, J., Peeters, M., Van Assche, G.: Sponge Functions. In: Ecrypt Hash Workshop (2007) 5. Bertoni, G., Daemen, J., Peeters, M., Van Assche, G.: On the Indiﬀerentiability of the Sponge Construction. In: Smart, N.P. (ed.) EUROCRYPT 2008. LNCS, vol. 4965, pp. 181–197. Springer, Heidelberg (2008) 6. De Canni`ere, C., Sato, H., Watanabe, D.: Hash Function Luﬀa: Speciﬁcation. Submission to NIST SHA-3 Competition (2008), http://www.sdl.hitachi.co.jp/crypto/luffa/ 7. De Canni`ere, C., Sato, H., Watanabe, D.: Hash Function Luﬀa: Supporting Document. Submission to NIST SHA-3 Competition (2008), http://www.sdl.hitachi.co.jp/crypto/luffa/ 8. Daemen, J., Knudsen, L., Rijmen, V.: The Block Cipher Square. In: Biham, E. (ed.) FSE 1997. LNCS, vol. 1267, pp. 149–165. Springer, Heidelberg (1997) 9. Dinur, I., Shamir, A.: Cube Attacks on Tweakable Black Box Polynomials. Cryptology ePrint Archive, Report 2008/385 10. Knudsen, L.R.: Truncated and Higher Order Diﬀerentials. In: Preneel, B. (ed.) FSE 1994. LNCS, vol. 1008, pp. 196–211. Springer, Heidelberg (1995) 11. Lai, X.: Higher order derivatives and diﬀerential cryptanalysis. In: Proc. Symposium on Communication, Coding and Cryptography, pp. 227–233. Kluwer Academic Publishers, Dordrecht (1994) 12. National Institute of Standards and Technology, Secure Hash Standard (SHS), FIPS 180-2 (2002) 13. National Institute of Standards and Technology, Cryptographic hash project, http://csrc.nist.gov/groups/ST/hash/index.html

284

A

D. Watanabe et al.

Recurrence Relations

The symbol “∼” means the simpliﬁcation of the expression which (is considered) preserves the algebraic degree. A.1

(r)

Recurrence Relation of ηl (r)

ηl

(r)

(r)

(r) (r)

= x1,l + x2,l + x0,l x1,l (r) (r)

∼ x0,l x1,l (r−1)

(r−1) (r−1) (r−1) x3,l )(ξ1,l

= (ξ0,l

+ ηl

(r−1) (r−1) ξ1,l

+ (ξ0,l

(r−1) (r−1) ξ1,l

+ (ξ0,l

= ξ0,l ∼ ξ0,l

A.2

(r−1)

+ ξ1,l

(r−1)

+ ξ1,l

(r−1) (r−1) x3,l )

+ ηl

(r−1) (r−1)

(r−1) (r−1) x3,l

+ 1)ηl

(r−1) (r−1) x3,l .

)ηl

(13)

(r)

Recurrence Relation of ξk,l (r)

(r)

(r)

(r)

(r)

(r) (r)

(r)

(14)

(r−1) (r−1) x3,l ).

(15)

ξ0,l = ξ0,0 + ξ0,1 + ξ0,2 ∼ ξ0,2 = x0,l x1,l = ηl . (r)

(r)

(r)

(r)

ξ1,l = ξ1,0 + ξ1,1 + ξ1,2 (r)

∼ ξ1,2

(r) (r)

(r) (r)

= x0,l x1,l + x0,l x2,l (r−1)

= (ξ1,l (r)

(r)

(r)

(r−1)

+ ξ2,l

(r−1)

)(ξ0,l

+ ηl

(r)

(r−1)

(r)

ξ2,l = ξ2,0 + ξ2,1 ∼ ξ2,1 = x1,l = ξ1,l (r)

(r)

(r)

(r−1) (r−1) x3,l .

+ ηl

(16)

(r)

ξ3,l,2 = (x0,l + x2,l )x1,l (r−1)

= (ξ0,l (r)

(r−1)

+ ξ2,l

(r−1)

)(ξ1,l

(r−1) (r−1) x3,l ).

+ ηl

(17)

(r) (r) (r)

ξ3,l,3 = x0,l x1,l x2,l (r−1)

= (ξ0,l

(r−1) (r−1) (r−1) x3,l )(ξ1,l

+ ηl

(r−1) (r−1) (r−1) x3,l )(ξ2,l

+ ηl

(r−1) (r−1) x3,l )

+ ηl

(r−1) (r−1) (r−1) ξ1,l ξ2,l

∼ ξ0,l

(r−1) (r−1) ξ1,l

+(ξ0,l

(r−1) (r−1) ξ2,l

+ ξ0,l

(r−1) (r−1) (r−1) (r−1) ξ2,l )ηl x3,l .

+ ξ1,l

(18)

Higher Order Diﬀerential Attack on Step-Reduced Variants of Luﬀa v1

A.3

(r)

Recurrence Relation of xk,l (r)

(r−1)

+ ξ0,1

(r−1)

+ ηl

x0,l = ξ0,0 ∼ ξ0,2

(r−1)

(r−1) (r−1) x1,l (r−2)

= (ξ0,l

(r−1)

+ ξ0,2

(r−1) (r−1) x3,l

+ ηl

(r−2) (r−2) (r−2) x3,l )(ξ1,l

(r−2) (r−2) ξ1,l

(r)

(r−1) (r−1) x3,l

+ ηl

(r−1) (r−1) x3,l .

(r−1)

+ ξ1,1

(r−1)

(r−1)

+ ηl

(r−1) (r−1) x1,l

= x0,l

(r−2)

∼ ξ0,l

(r−1)

x2,l = ξ2,0

(r−1)

+ ξ1,2

(r−1) (r−1) x3,l

+ ηl

(r−1) (r−1) x2,l

+ x0,l

(r−2)

+ ξ2,l

(r−1)

+ ηl

(ξ1,l

+ ξ2,1

(r−1)

(19)

(r−1) (r−1) x3,l

∼ ξ1,2

(r)

+ ηl

+ ηl

x1,l = ξ1,0

(r−1)

(r−2) (r−2) x3,l )

+ ηl

∼ ξ0,l

x3,l = ξ3,1

(r−1) (r−1) x3,l

+ ηl

(r−1) (r−1) x3,l

= x0,l

(r)

285

(r−1)

(r−2)

(r−1) (r−1) x3,l

+ ηl

(r−1) (r−1) x3,l .

) + ηl

(r−1) (r−1) x3,l

(20)

(r−1) (r−1) x3,l .

∼ ηl

(21)

(r−1) (r−1) x3,l

+ ξ3,l,2 + ξ3,l,3 + x1

(r−1)

(r−1) (r−1) x3,l

∼ ξ3,l,3 + x1,l

(r−1) (r−1) (r−1) x1,l x2,l

= x0,l

(r−2)

= (ξ0,l

(r−2) (r−2) (r−2) x3,l )(ξ1,l

+ ηl

(r−2)

+(ξ1,l

(r−1) (r−1) x3,l

+ x1,l

(r−2) (r−2) (r−2) x3,l )(ξ2,l

+ ηl

(r−2) (r−2) (r−2) x3,l )(ξ3,l,2

+ ηl

(r−2)

(r−2) (r−2) x3,l )

+ ηl

(r−2) (r−2) x3,l )

+ ξ3,l,3 + x1,l

(r−2) (r−2) (r−2) ξ1,l ξ2,l

∼ ξ0,l

(r−2) (r−2) ξ1,l

+(ξ0,l

(r−2)

ξ1,l

(r−2)

(r−2) (r−2) ξ2,l

+ ξ0,l

(r−2) (r−2) x3,l ).

(ξ3,l,3 + x1,l

(r−2) (r−2) ξ2,l

+ ξ1,l

(r−2)

(r−2) (r−2) x3,l

+ ξ3,l,3 )ηl

(22)

Rebound Attack on Reduced-Round Versions of JH Vincent Rijmen1,2 , Deniz Toz1 , and Kerem Varıcı1, 1

2

Katholieke Universiteit Leuven Department of Electronical Engineering ESAT SCD-COSIC, and Interdisciplinary Institute for BroadBand Technology (IBBT) Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium Institute for Applied Information Processing and Communications (IAIK) Graz University of Technology, Inﬀeldgasse 16a, A-8010 Graz, Austria {vincent.rijmen,deniz.toz,kerem.varici}@esat.kuleuven.be

Abstract. JH, designed by Wu, is one of the 14 second-round candidates in the NIST Hash Competition. This paper presents the ﬁrst analysis results of JH by using rebound attack. We ﬁrst investigate a variant of the JH hash function family for d = 4 and describe how the attack works. Then, we apply the attack for d = 8, which is the version submitted to the competition. As a result, we obtain a semi-free-start collision for 16 rounds (out of 35.5) of JH for all hash sizes with 2179.24 compression function calls. We then extend our attack to 19 (and 22) rounds and present a 1008-bit (and 896-bit) semi-free-start near-collision on the JH compression function with 2156.77 (2156.56 ) compression function calls, 2152.28 memory access and 2143.70 -bytes of memory.

1

Introduction

Recent years witnessed the continuous works on analysis of hash functions which reveal that most of them are not as secure as claimed. Wang et al. presented collisions on the MD family [1, 2, 3] using an attack technique on hash functions which is based on diﬀerential cryptanalysis. This idea was further developed and used in the analysis of other famous and widely used hash functions SHA-1 and SHA-2 [4, 5, 6]. In response, the National Institute of Standards and Technology (NIST) announced a public competition for designing a new hash function which will be chosen as the hash function standard: Secure Hash Algorithm 3 (SHA3) [7]. JH [8] is the submission of Hongjun Wu to the NIST Hash Competition and it is one of the 14 second-round candidates. According to the designer, the hash

This work was sponsored by the Research Fund K.U.Leuven, by the IAP Programme P6/26 BCRYPT of the Belgian State (Belgian Science Policy) and by the European Commission through the ICT Programme under Contract ICT-2007-216676 (ECRYPT II). The information in this paper is provided as is, and no warranty is given or implied that the information is ﬁt for any particular purpose. The user thereof uses the information at its sole risk and liability.

S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 286–303, 2010. c International Association for Cryptologic Research 2010

Rebound Attack on Reduced-Round Versions of JH

287

Table 1. Summary of results for JH (CFC = Compression Function Call) Function Rnds

Time Complexity

Memory Complexity

Collision Type

Sec.

Hash Comp.

16 19

2178.24 CFC 2156.77 CFC

2101.12 byte 2143.70 byte

§3.2 §4.2

Comp.

22

2156.56 CFC

2143.70 byte

semi-free-start collision semi-free-start near-collision (1008 bits) semi-free-start near-collision (768 bits)

§4.2

algorithm is very simple and eﬃcient in both software and hardware. JH supports four diﬀerent hash sizes (224, 256, 384 and 512-bit), and is among the fastest contestants. The rebound attack [9], a new technique for cryptanalysis of hash functions, has been introduced by Mendel et al. in FSE 2009. It is applicable to both block cipher based and permutation based hash constructions. Later, it has been improved by Mendel et al. [10] and Matusiewicz et al. [11]. In this work, we bring all these ideas together to analyze the JH hash function. First, we implement the rebound attack to the small scale variant of JH by choosing d = 4. Then, we adapt the method to the submitted version of JH (d = 8). In order to improve the complexity of the attack, we use three inbound phases rather than one, which provides us to use freedom more eﬃciently. The results can be seen in Table 1. This paper is organized as follows: In Sec. 2, we give a brief description of the JH hash function, its properties and an overview of the rebound attack. In Sec. 3, we ﬁrst describe the main idea of our attack on small scale version of JH and then give the results on submitted version of JH. In Sec. 4, we follow the same outline for the improved version of the rebound attack. Finally, we conclude this paper and summarize our results in Sec. 5.

2

Preliminaries

2.1

Notation

Throughout this paper, we will use the following notation: word 4-bit mi,j the j th word of the ith round value d JH-X || ×

dimension of a block of bits i.e. a d-dimensional block of bits consists of 2d words the member of the family whose message digest is X bits concatenation operation cross-product: an operation on two arrays that results in another array whose elements are obtained by combining each element in the ﬁrst array with every element in the second array

288

V. Rijmen, D. Toz, and K. Varıcı H M

(i−1)

(i)

Ed

M

(i)

H

(i)

Fig. 1. The compression function Fd

2.2

The JH Hash Function

The hash function JH is an iterative hash function that accepts message blocks of 512 bits and produces a hash value of 224, 256, 384 and 512 bits. The message is padded to be a multiple of 512 bits. The bit ‘1’ is appended to the end of the message, followed by 384 − 1 + (−l mod 512) zero bits. Finally, a 128-bit block is appended which is the length of the message, l, represented in big endian form. Note that this scheme guarantees that at least 512 additional bits are padded. In each iteration, the compression function Fd , given in Fig.1, is used to update the 2d+2 bits as follows: Hi = Fd (Hi−1 , Mi ) where Hi−1 is the previous chaining value and Mi is the current message block. The compression function Fd is deﬁned as follows: d+1

Fd (Hi−1 , Mi ) = Ed (Hi−1 ⊕ (Mi ||02

d+1

)) ⊕ (02

||Mi )

Here, Ed is a permutation and is composed of an initial grouping of bits followed by 5(d − 1) rounds, plus an additional S-Box layer and a ﬁnal degrouping of bits. The grouping operation arranges bits in a way that the input to each S-Box has two bits from the message and two bits from the chaining value. In each round, the input is divided into 2d words and then each word passes through an S-Box. JH uses two 4-bit-to-4-bit S-Boxes (S0 and S1) and every round constant bit selects which S-Boxes are used. Then two consecutive words pass through the linear transformation L, which is based on a [4, 2, 3] Maximum Distance Separable (MDS) code over GF (24 ). Finally all words are permuted by the permutation Pd . After the degrouping operation each bit returns to its original position. The round function for d = 4 is shown in Fig. 2 and d = 8 is the submitted version. For a more detailed information we refer to the speciﬁcation of JH [8].

Rebound Attack on Reduced-Round Versions of JH

289

m i−1,0 m i−1,1 m i−1,2 m i−1,3 mi−1,4 m i−1,5 m i−1,6 m i−1,7 m i−1,8 m i−1,9 m i−1,10 m i−1,11 mi−1,12 m i−1,13 m i−1,14 m i−1,15 S

S L

S

S L

S

S L

S

S L

S

S L

S

S L

S

S L

S

S L

m i,0 m i,1 m i,2 m i,3 m i,4 m i,5 m i,6 m i,7 m i,8 m i,9 m i,10 m i,11 m i,12 m i,13 m i,14 m i,15 Fig. 2. Round Function for d = 4

The initial hash value H0 is set depending on the message digest size. The ﬁrst two bytes of H−1 are set as the message digest size, and the rest of the bytes of H−1 are set as 0. Then, H0 = Fd (H−1 , 0). Finally, the message digest is generated by truncating HN where N is the number of blocks in the padded message, i.e, the last X bits of HN are given as the message digest of JH-X for X = 224, 256, 384, 512. 2.3

Properties of the Linear Transformation L

Since the linear transformation L implements a (4, 2, 3) MDS matrix, any difference in one of the words of the input (output) will result in a diﬀerence in two words of the output (input). For a ﬁxed L transformation, if one tries all possible 216 pairs, the number of pairs satisfying the condition 2 → 1 or 1 → 2 is 3840, which gives a probability of 3840/65536 ≈ 2−4.09 . Note that, if the words are arranged in a way that they will be both active this probability increases to 3840/57600 ≈ 2−3.91 . For the latter case, if both words remain active (2 → 2), the probability is 49920/57600 ≈ 2−0.21 . 2.4

Observations on the Compression Function

The grouping of bits at the beginning of the compression function assures that the input of every ﬁrst layer S-Box is xor-ed with two message bits. Similarly, the output of each S-Box is xor-ed with two message bits. Therefore, for a random non-zero 4-bit diﬀerence, the probability that this diﬀerence is related to a message is 3/15 ≈ 2−2.32 . The bit-slice implementation of Fd uses d − 1 diﬀerent round function descriptions. The main diﬀerence between these round functions is the permutation function. In each round permutation, the odd bits are swapped by 2r mod (d − 1) where r is the round number. Therefore, for the same input passing through multiple rounds, the output is identical to the output of the original round function for the α · (d − 1)-th round where α is any integer. Three rounds of the bit-sliced representation can be seen in Fig. 3 between rounds 1 and 4.

290

2.5

V. Rijmen, D. Toz, and K. Varıcı

The Rebound Attack

The rebound attack is introduced by Mendel et al. [9]. The two main steps of the attack are called inbound phase and outbound phase. In the inbound phase, the freedom is used to connect the middle rounds by using the match-in-themiddle technique and in the outbound phase connected truncated diﬀerentials are calculated in both forward and backward direction. This attack has been ﬁrst used for the cryptanalysis of reduced versions of Whirlpool and Grøstl, and then extended to obtain distinguishers for the full Whirlpool compression function [12]. Later, linearized match-in-the-middle and start-from-the-middle techniques are introduced by Mendel et al. [10] to improve the rebound attack. Moreover, a sparse truncated diﬀerential path and state is recently used in the attack on LANE by Matusiewicz et al. [11] rather than an all active state in the matching part of the attack. In our work, we ﬁrst apply the start-from-the-middle technique with an all active state, then we improve our results by using three inbound phases with partially active states rather than one all active one. This allows us to decrease the complexity requirements of the attack.

3

The Start-From-The-Middle Attack on JH

We use the available freedom in the middle by a start-from-the-middle-technique. We begin by guessing the middle values and then proceed forwards and backwards using the ﬁltering conditions to reduce the number of active S-Boxes in each round. In this section, we describe the steps of our attack on JH in detail. We will ﬁrst describe the attack on smaller version of JH, i.e., d = 4 in detail, and then give the algorithm and analysis for d = 8. 3.1

Attack on 8 Rounds of JH for d = 4

Attack Procedure: The inbound phase of the attack described in this section is composed of 8 rounds, and the number of active S-Boxes in each round is: 1 ← 2 ← 4 ← 8 ← 16 → 8 → 4 → 2 → 1

The bit-slice implementation allows us to analyze the algorithm easily. The truncated diﬀerential path is given in Fig. 3. The attack can be summarized as follows: • Step 1: Try all possible 216 values for the middle values m4,j ||m4,j+1 and m4,j ||m4,j+1 in round 4 for each of the four sets (shown with colors and diﬀerent shapes in Fig. 3), and keep only those that satisfy the desired pattern (2 ← 4 → 2). Therefore, the expected number of remaining pairs is 216 · 216 /[(24.09 )2 · (23.91 )2 ] = 216 for each set. • Step 2: Compute the cross-product of the sets: × and × , check if the diﬀerences satisfy 2 → 1 when they pass the inverse L transform, L−1 . For the pairs that satisfy the ﬁltering condition, store

Rebound Attack on Reduced-Round Versions of JH

291

only the values in the active words and the middle values for each of the 2 sets. After this step, the number of pairs in each set is approximately 216 · 216 /(23.91 )2 = 224.18 . • Step 3: Compute the cross-product of the sets: × , check whether the remaining 10 ﬁltering conditions (marked with ) are satisﬁed or not. This control can be done by calculating L ◦ S or S −1 ◦ L−1 for only the active words and does not require the use of the round function entirely, hence it is very eﬃcient. The total number of remaining pairs that pass the inbound phase is 224.18 · 224.18 /(23.91 )10 = 29.26 .

0

1

2

3

4

5

6

7

8

Fig. 3. Inbound Phase of JH for d = 4 (bit-slice implementation)

Note that, due to the symmetry, the actual number of remaining pairs is 28.26 and the duplication can be avoided in the earlier steps of the algorithm, but for simplicity it is described like this in the paper. The attack algorithm only stores the middle values for the pairs that follows the desired diﬀerential path and the values in the n-th round can be computed by calling the round (or inverse round) function. The active S-Boxes in the input and output to the compression function must satisfy the desired property, so out of 28.26 pairs only 28.26 · (2−2.32 )2 = 24.32 of them remain. In order to obtain a collision, the diﬀerence in both S-Boxes also need to match, which happens with a probability of 1/3. Therefore for 24.32 ·1/3 22.74 pairs we obtain a semi-free-start collision for the hash function. Complexity of the Attack: The inbound phase is the part of the attack where most of the calculations are done. Let U = L ◦ S and U −1 = S −1 ◦ L−1 . Then, a round function consists of 8 U -functions, similarly an inverse round functions has 8 U −1 -functions.

292

V. Rijmen, D. Toz, and K. Varıcı

For each of the four sets in Step 1 of the algorithm, we try all possible 216 pairs and apply the ﬁltering condition, Although we have 2 U -functions in forward direction and 2 U −1 -functions in backward direction, we only check a condition if the previous one is satisﬁed, so the total number of calls, n1 is: n1 = 232 + 232 /24.09 + 232 /(24.09 )2 + 232 /[(24.09 )2 · 23.91 ] = 232.09 which is approximately 229.1 calls to round function1 . This step can be done by 232 table look-ups if precomputation is used. For each of the two sets in Step 2 of the algorithm, since L is a linear transformation, it is suﬃcient to check whether the diﬀerences satisfy the desired property, i.e.(L−1 (Δc, Δd) = (Δa, 0) or (0, Δb)). The total number of calls in this step is: n2 = (216 )2 · (1 + 2−3.91 ) = 232.09 In the ﬁnal step of the inbound phase, 3 of the 10 conditions to be checked are again linear transformations and the remaining 7 require the use of U -function. The complexity of the attack is dominated by this step and the total number of operations performed is: n3,bck = (224.18 )2 · [(1 + 2−3.91 + (2−3.91 )2 ] 248.46 n3,f wd = (224.18 )2 /(2−3.91 )3 ·

6

(2−3.91 )i 236.72

i=0

where f wd and bck denote the forward and backward direction respectively. In the outbound phase, for each of the 28.26 remaining pairs, starting from the middle values, we call the round and inverse round functions the obtain the input and output values. Results: The above algorithm has been implemented in C for d = 4 and we observed that the results are compatible with the precomputed values. Thus, we obtain semi-free-start collision for 8-rounds JH-16. An example is given in Table 5. 3.2

The Attack on 16 Rounds of JH with d = 8

In this section, we ﬁrst present an outline for the start-from-the middle attack on reduced round version of JH for all hash sizes, and then give the calculations for the complexity analysis of the attack. Attack Procedure: For the compression function E8 , the attack is composed of 16 rounds and the number of active S-Boxes is: 1 ← 2 ← 4 ← 8 ← 16 ← 32 ← 64 ← 128 ← 256 → 128 → 64 → 32 → 16 → 8 → 4 → 2 → 1 1

232.09 · 1/8 = 229.09

Rebound Attack on Reduced-Round Versions of JH

293

Table 2. Overview of the inbound phase of the attack on 16 rounds of JH (d = 8) Step

Size (bits)

Sets

Filtering Conditions

Pairs Remain

Time Complexity

Direction

0 1 2 3 4 5 6 7

8 16 32 64 128 256 512 1024

128 64 32 16 8 4 2 1

1 1 4 4 4 8 8 46

211.75 219.59 223.54 231.44 247.24 263.20 295.12 210.38

215.84 223.50 239.18 247.08 262.88 294.48 2124.40 2190.24

fwd bck fwd fwd fwd fwd fwd fwd + bck

The algorithm is similar to the one of E4 . We again start from the middle and then propagate outwards by computing the cross-product of the sets and using the ﬁltering conditions. However, instead of trying all 216 possible pairs, we start with 27.92 values for each middle value. The number of sets, the bit length of the middle values (size) of each set, and the number of ﬁltering conditions followed by the number of pairs in each set are given in Table 2. Similarly, we only store the values in the active bytes for the outermost rounds and the middle round for each set, i.e., no other intermediate value is stored. Complexity of the Attack: The time complexity of the attack for d = 8 is calculated in a manner similar to that of d = 4. Instead of giving all equations explicitly, we summarize the results in terms of function calls and their direction for each step in Table 2. The time complexity of the given attacks is 2190.24 U function calls (equivalent to 2190.24 · 2−7 · (1/16) = 2179.24 16-round compression function calls). Results: We may lose up to half of the remaining pairs due to symmetry. In addition, similar to the case for d=4, the active S-Boxes in the input and output to the compression function should correspond only to the message bits and then match each other in order to obtain a collision. Therefore, out of 210.38 pairs only 210.38 · 1/2 · (2−2.32 )2 · 1/3 23.15 pairs remain. Suppose that we intend to attack a block Mi where (i < N ). Since we obtain a zero diﬀerence in the chaining value, it is guaranteed that the outputs of the compression function will be the same provided that the both messages have the same length. As mentioned earlier the same compression function is used for all hash sizes, and the message digest is generated by truncating HN where N is the number of blocks in the padded message. Therefore, we have a semi-free-start collision for all hash sizes of 16-round JH.

4

The Rebound Attack on JH Compression Function

The attack on the hash function given in Sec. 3.2 can easily be converted to an attack on 19 rounds of the compression function for the pairs that satisfy the inbound phase by using the following diﬀerential trails in the outbound phases:

294

V. Rijmen, D. Toz, and K. Varıcı

2 ← 1 ← Inbound Phase → 1 → 2 → 4 The complexity of the attack remains the same (i.e., 2190.24 U -function calls) and we obtain a semi-free-start near-collision for 1008 bits. In this section, we improve these results by using three inbound phases. Once again, we ﬁrst describe the steps of our attack for d = 4 in detail, and then give the algorithm and complexity analysis for d = 8. 4.1

The Improved Rebound Attack on JH with d= 4

The inbound phase of the attack described in this section is composed of 8 rounds, using the following trail: 2←4→2←4←8→4←8→4→2

(1)

It is perhaps interesting to make here some observations on the number of active S-boxes in the trail. Similar to the AES, the linear diﬀusion layer of JH imposes a lower bound on the number of active S-boxes: if d ≥ 2, then there are at least 32 = 9 active S-boxes in every sequence of 4 rounds. The conjectured bound on the number of active S-boxes over 2d + 1 rounds [8], as well as the trail (1), demonstrate that the higher dimension of the JH diﬀusion layer allows for relatively long and narrow trails. We decompose the inbound phase into a sequence of three smaller inbound phases [12], each of which are 3, 2 and 3 rounds respectively. The number of active SBoxes for each of the steps in each round is: 2←4→2 2←4←8→4 4←8→4→2 We use the bit-sliced representation to analyze the algorithm. We ﬁrst calculate the results of the ﬁrst and the third inbound phases, and then match them with the second inbound phase. The truncated diﬀerential path is given in Fig. 4. The attack can be summarized as follows: First Inbound Phase: • Try all possible 28 values for each of the middle values m1,j ||m1,j+1 and m1,j ||m1,j+1 in round 1 for each of the two sets, and keep only those which satisfy the desired pattern. Therefore, the expected number of remaining pairs is 28 · 28 /24.09 = 211.91 for each set. • Compute the cross-product of the two sets, check if the diﬀerences satisfy 2 → 1 when they pass L−1 . For the pairs that satisfy the ﬁltering condition, store only the values in the active words and the middle values. After this step, the number of pairs is approximately 211.91 · 211.91 /(23.91 )2 = 216 . • Check whether the remaining pairs satisfy the desired input diﬀerence, and store these values in list L1 . Therefore, the size of L1 is 216 · (2−2.32 )2 = 211.36

Rebound Attack on Reduced-Round Versions of JH 0

1

2

3

Inbound 1

4

5

Inbound 2

6

7

295 8

Inbound 3

Fig. 4. Inbound Phase of JH (d = 4)

Second Inbound Phase: • Try all possible 28 values for each of the middle values m4,j ||m4,j+1 and m4,j ||m4,j+1 in round 4 for each of the four sets, and keep only those that satisfy the desired pattern. The expected number of remaining pairs is again 28 · 28 /24.09 = 211.91 for each set. • Compute the cross-product of the sets having the same pattern, check if the diﬀerences satisfy 2 → 1 when they pass L−1 . For the pairs that satisfy the ﬁltering condition, store the values for each of the 2 sets. The expected number of pairs in each set is approximately (211.91 )2 /(23.91 )2 = 216 . • Compute the cross-product of the sets, × , and check if the diﬀerences satisfy the ﬁltering condition when they pass L−1 . The expected number of pairs that pass the second inbound phase is (216 )2 /(2−3.91 )2 = 224.18 . Third Inbound Phase: • Try all possible 28 values for each of the middle values m6,j ||m6,j+1 and m6,j ||m6,j+1 in round 6 for each of the four sets, and keep only those that satisfy the desired pattern. The expected number of remaining pairs is again 211.91 for each set. • Compute the cross-product of the sets having the same pattern, check if the diﬀerences satisfy 2 → 1 when they pass the inverse L transform. For the pairs that satisfy the ﬁltering condition, store the values for each of the 2 sets. The expected number of pairs in each set is approximately (211.91 )2 /(23.91 )2 = 216 .

296

V. Rijmen, D. Toz, and K. Varıcı

• Compute the cross-product of the two sets, × , to check the ﬁnal ﬁltering conditions in round 7. The expected number of pairs that pass the third phase is (216 )2 /(2−3.91 )2 = 224.18 . Store these values in list L3 . Merging Inbound Phases: The three previous inbound phases overlap in the 2 and 4 active words (denoted with black) in rounds 2 and 5 respectively. Since we have to match these active words in both values, we get a condition on 16 + 32 = 48 bits in total. These conditions are checked as soon as we have a remaining pair for the second inbound phase, by using the lists L1 and L3 . As a result, we expect to ﬁnd (211.36 · 224.18 · 2−16 ) · 224.18 · 2−32 · (2−2 ) 29.72 solutions for the inbound phase. The last 2−2 factor in the calculation follows from symmetry. Outbound Phase: For the pairs that satisfy the inbound phase, we expect to see the following diﬀerential trail in the outbound phase: 2 ← Inbound Phase → 2 → 4 → 2 Therefore, for the compression function E4 , we have the 10-round diﬀerential path shown in Fig. 5. Note that, there are two ﬁltering conditions in the last round of the outbound phase. Thus, out of 29.72 solutions, only 29.72 ·(2−3.91 )2 ≈ 21.9 pass to the last round. After the degrouping operation, the message is xor-ed to the rightmost 32-bits of the output and for the compression function of JH for d = 4 we have a near-collision for 52 bits. Data Complexity: The time complexity of the attack is determined by the ﬁrst and third inbound phases which is about 232.09 each, hence the total time complexity is 232.09 + 232.09 = 233.09 U -function calls, equivalent to 233.09 · 2−3 · (1/10) = 226.77 compression function calls. The memory complexity is also determined by the third inbound phase which is 224.18 . 0

8

9

10

I N B O U N D

P H A S E

Fig. 5. Outbound Phase of JH (d = 4)

Rebound Attack on Reduced-Round Versions of JH

297

Results: We obtain a 52-bit free-start near-collision for 10 rounds of the JH compression function. The results are still not better than theoretic bounds for JH with d = 4, but it helps us to implement the attack and expand it for the submitted version of JH. 4.2

The Improved Rebound Attack on JH with d = 8

In this section, the attack in Sec. 4.1 on JH with d = 4 is extended to JH with d = 8 using more rounds (hence the larger number of steps and the increased complexity) for each of the inbound and outbound phases. The attack is applicable to 19 rounds of the compression function. We ﬁrst explain the attack in detail and then give the calculations for the complexity analysis. Inbound Phase: For the compression function E8 , the inbound phase of the attack is 16 rounds and is composed of two parts. In the ﬁrst part, we apply the start-from-the-middle-technique three times for rounds 0−3, 3−10 and 10−16. In the second part, we connect the resulting active bytes (hence the corresponding state values) by a match-in-the-middle step. The number of active S-Boxes in each of the sets is: 2←4←8→4 4 ← 8 ← 16 ← 32 ← 64 ← 128 → 64 → 32 32 ← 64 → 32 → 16 → 8 → 4 → 2 For a detailed sketch of the inbound phase we refer to Fig. 7. The algorithm for each set is similar to the one of E4 . We again start from the middle and then propagate outwards by computing the cross-product of the sets and using the ﬁltering conditions. For each list, we try all possible 216 pairs in Step 0. The number of sets, the bit length of the middle values (size) of each list, and the number of ﬁltering conditions followed by the number of pairs in each set are given in Table 3. Merging Inbound Phases: Connecting these three lists is performed as follows: whenever a pair is obtained from set 2, we check whether it exits in L1 or not. If it does, another check is done for L3 . Since we have 219.54 and 2138.7 elements in lists 1 and 3 respectively, 2152.28 pairs passing the second inbound phase, and 32-bit and 256-bit conditions for the matches. The total expected number of remaining pairs is (219.54 · 2152.28 · 2−32 ) · 2138.70 · 2−256 · (2−2 ) = 220.52 . We obtained more pairs than usual due to the additional ﬁltering conditions in the outbound phase, in order to obtain a near-collision (which will be explained in the following part) for 19 rounds of the compression function. Outbound Phase: The outbound of the attack is composed of 3 rounds in the forward direction. For the pairs that satisfy the inbound phase, we expect to see the following diﬀerential trail in the outbound phase: Inbound Phase → 2 → 4 → 8 → 4

298

V. Rijmen, D. Toz, and K. Varıcı

Sets

Filtering Conditions

Pairs Remain

Complexity Backwards Forwards

Inbound 1

0 1 2 3

8 16 32 32

4 2 1 1

1 2 2 2a

211.91 216 224.18 219.54

− 223.91 232.09

216 − −

Inbound 2

Step

a

Size

0 1 2 3 4 5 6

8 16 32 64 128 256 512

64 32 16 8 4 2 1

1 2 2 4 4 4 4

211.91 216 224.18 232.72 249.80 283.96 2152.28

− 223.91 − 248.46 265.54 299.70 2168.02

216 − 232.09 − − − −

Inbound 3

Table 3. Overview of inbound phases of the attack on 19 rounds of JH

0 1 2 3 4 5

8 16 32 64 128 256

32 16 8 4 2 2

1 2 2 2 2 2

211.91 216 224.18 240.54 273.26 2138.70

− 223.91 − − − −

216 − 232.09 248.45 281.17 2146.61

Check whether the pairs satisfy the desired input diﬀerence

Note that in the last step of the outbound phase we have four ﬁltering conditions. We had 220.52 remaining pairs from the inbound phase, thus, we expect 220.52 · (23.91 )4 = 24.88 pairs to satisfy the above path. A detailed schema of this trail is shown in Fig. 7. The ﬁnal step of the compression function is xor-ing the message bits after the degrouping operation to the output of the compression function. We have 4 active words in the output and 4-bit diﬀerence in the message, two of which collide in bit positions 512 and 768. Thus, it is possible to cancel them with a probability of 2−2 and the number of pairs reduce to 22.88 . To sum up, we have a diﬀerence in (4 · 4 − 2) + 2 = 16 bits in total. Complexity of the Attack: For the inbound phase, the complexity of the attack for d = 8 is calculated in a similar manner to that of d = 4. The results for preparing the lists are summarized for each step in Table 3. The time complexity of the attack is dominated by the second set, L2 , which is about 2168.02 U function calls (equivalent to 2168.02 · 2−7 · (1/19) = 2156.77 19-round compression function calls). The memory requirements are determined by the largest list, which is L3 with a size of 2138.70 256-bit data. Results: Note that in this attack, the complexity requirements are reduced signiﬁcantly compared to the initial idea that uses only one inbound phase. For 19 rounds of the JH compression function, we obtain a semi-free-start near-collision

Rebound Attack on Reduced-Round Versions of JH

299

Table 4. Complexity of the generic attack for near-collisions #Rounds

# bits Near Collision

Generic Attack Complexity

Our Results

19 20 21 22 23

1008 992 960 896 768

2454.21 2411.18 2341.45 2236.06 299.18

2156.77 2156.70 2156.63 2156.56 2156.50

for 1008 bits. We can simply increase the number of rounds by proceeding forwards in the outbound phase. Our attack still works in this case with the same complexity (U -function calls). The number of bits for near-collision and the generic attack complexities are given in Table 4. As a result, our attack is better than generic attacks up to 22 rounds.

5

Conclusion

In this paper, we presented the ﬁrst cryptanalysis results of JH by using rebound attack techniques. We ﬁrst explained our attack on 8 rounds of JH (d = 4) in detail and then showed how this attack can be used to attack 16 rounds of JH hash function. We further improved our ﬁndings by using three inbound phases instead of one and obtained a 1008-bit semi-free-start near-collision for 19 rounds of the JH compression function. The required memory for the attack is 2143.70 bytes of data. The time complexity is reduced to 2156.77 compression function calls, and it requires 2152.28 memory accesses. We also presented semi-free-start near-collision results for 20-22 rounds of the JH compression function with the same memory requirement and time complexity. Our ﬁndings are summarized in Table 1.

Acknowledgements The authors would like to thank Hongjun Wu for discussion of our results, his remarks and corrections. We also like to thank anonymous reviewers of FSE ¨ 2010 for their comments; Meltem S¨onmez Turan and Onur Ozen for reviewing the previous versions of the paper.

References 1. Wang, X., Lai, X., Feng, D., Chen, H., Yu, X.: Cryptanalysis of the Hash Functions MD4 and RIPEMD. In: [13], pp. 1–18 2. Wang, X., Yu, H.: How to Break MD5 and Other Hash Functions. In: [13], pp. 19–35

300

V. Rijmen, D. Toz, and K. Varıcı

3. Wang, X., Yin, Y.L., Yu, H.: Finding Collisions in the Full SHA-1. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 17–36. Springer, Heidelberg (2005) 4. De Canni`ere, C., Rechberger, C.: Finding SHA-1 Characteristics: General Results and Applications. In: Lai, X., Chen, K. (eds.) ASIACRYPT 2006. LNCS, vol. 4284, pp. 1–20. Springer, Heidelberg (2006) 5. De Canni`ere, C., Mendel, F., Rechberger, C.: Collisions for 70-Step SHA-1: On the Full Cost of Collision Search. In: Adams, C., Miri, A., Wiener, M. (eds.) SAC 2007. LNCS, vol. 4876, pp. 56–73. Springer, Heidelberg (2007) 6. Stevens, M., Sotirov, A., Appelbaum, J., Lenstra, A.K., Molnar, D., Osvik, D.A., de Weger, B.: Short Chosen-Preﬁx Collisions for MD5 and the Creation of a Rogue CA Certiﬁcate. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 55–69. Springer, Heidelberg (2009) 7. NIST: Cryptographic Hash Competition, http://www.nist.gov/hash-competition 8. Wu, H.: The Hash Function JH. Submission to NIST (2008), http://icsd.i2r.a-star.edu.sg/staff/hongjun/jh/jh_round2.pdf 9. Mendel, F., Rechberger, C., Schl¨ aﬀer, M., Thomsen, S.S.: The Rebound Attack: Cryptanalysis of Reduced Whirlpool and Grøstl. In: Dunkelman, O. (ed.) FSE 2009. LNCS, vol. 5665, pp. 260–276. Springer, Heidelberg (2009) 10. Mendel, F., Peyrin, T., Rechberger, C., Schl¨ aﬀer, M.: Improved Cryptanalysis of the Reduced Grøstl Compression Function, ECHO Permutation and AES Block Cipher. In: Jacobson Jr., M.J., Rijmen, V., Safavi-Naini, R. (eds.) SAC 2009. LNCS, vol. 5867, pp. 16–35. Springer, Heidelberg (2009) 11. Matusiewicz, K., Naya-Plasencia, M., Nikolic, I., Sasaki, Y., Schl¨ aﬀer, M.: Rebound Attack on the Full Lane Compression Function. In: [14], pp. 106–125 12. Lamberger, M., Mendel, F., Rechberger, C., Rijmen, V., Schl¨ aﬀer, M.: Rebound Distinguishers: Results on the Full Whirlpool Compression Function. In: [14], pp. 126–143 13. Cramer, R. (ed.): EUROCRYPT 2005. LNCS, vol. 3494. Springer, Heidelberg (2005) 14. Matsui, M. (ed.): ASIACRYPT 2009. LNCS, vol. 5912. Springer, Heidelberg (2009)

Rebound Attack on Reduced-Round Versions of JH

A

301

Sample Data

We performed several experiments for diﬀerent S-Box setups, but the implementation results shown below are obtained by using the same S-Box (S0) in each round for simplicity.

Table 5. Example for the rebound attack with one inbound phase (d = 4) Bit-Slice Results

0 1 2 3 4 5 6 7 8

Reference Results

P1

P2

Diﬀerence

P1

P2

Diﬀerence

6ddf8804acec67ef 53d4792d4304231c c7b01dd79c2227e3 55c5a1f5a7f8a0bc 2b3ead8b7712b433 bd2e0491b4824e41 7c47294041d9f1c6 89100fddca5632e0 29184f8fc8fc9af1

addf8804acec67ef 03d4792d4104231c e7ba1dd7cc2927e3 3595acfe6708a9b2 a9f04e08299bd357 d62ef09101827441 46472940b0d9f1c6 a5100fddca5632e0 19184f8fc8fc9af1

c000000000000000 5000000002000000 200a0000500b0000 60500d0bc0f0090e 82cee3835e896764 6b00f400b5003a00 3a000000f1000000 2c00000000000000 3000000000000000

6dacdfec886704ef 53d4792d4304231c c71d9c27b0d722e3 55a7c5f8a1a0f5bc 2b3ead8b7712b433 bd04b44e2e918241 7c4147d929f140c6 89100fddca5632e0 294fc89a188ffcf1

adacdfec886704ef 03d4792d4104231c e71dcc27bad729e3 35679508aca9feb2 a9f04e08299bd357 d6f001742e918241 46b047d929f140c6 a5100fddca5632e0 194fc89a188ffcf1

c000000000000000 5000000002000000 200050000a000b00 60c050f00d090b0e 82cee3835e896764 6bf4b53a00000000 3af1000000000000 2c00000000000000 3000000000000000

302

V. Rijmen, D. Toz, and K. Varıcı

Fig. 6. Diﬀerential characteristic for 16 rounds of JH Hash Function (bit-slice representation)

Rebound Attack on Reduced-Round Versions of JH

19

18

16

15

14

13

12

11

10

9

17

Outbound

Inbound Phase 3

8

7

5

4

3

2

1

0

6

Inbound Phase 2

Inbound Phase 1

303

Fig. 7. Inbound and Outbound Phases of JH compression function (bit-slice representation)

Pseudo-cryptanalysis of the Original Blue Midnight Wish Søren S. Thomsen DTU Mathematics, Technical University of Denmark [email protected]

Abstract. The hash function Blue Midnight Wish (BMW) is a candidate in the SHA-3 competition organized by the U.S. National Institute of Standards and Technology (NIST). BMW was selected for the second round of the competition, but the algorithm was tweaked in a number of ways. In this paper we describe cryptanalysis on the original version of BMW, as submitted to the SHA-3 competition in October 2008. The attacks described are (near-)collision, preimage and second preimage attacks on the BMW compression function. These attacks can also be described as pseudo-attacks on the full hash function, i.e., as attacks in which the adversary is allowed to choose the initial value of the hash function. The complexities of the attacks are about 214 for the nearcollision attack, about 23n/8+1 for the pseudo-collision attack, and about 23n/4+1 for the pseudo-(second) preimage attack, where n is the output length of the hash function. Memory requirements are negligible. Moreover, the attacks are not (or only moderately) aﬀected by the choice of security parameter for BMW. Keywords: hash function cryptanalysis, SHA-3 competition, Blue Midnight Wish, pseudo-attacks.

1

Introduction

On October 31, 2008, the “SHA-3 competition”, organized by the National Institute of Standards and Technology (NIST), was launched [1]. 64 algorithms were submitted, and 51 of these were accepted for the ﬁrst round of the competition. On July 24, 2009, 14 candidates were chosen by NIST to advance to the second round of the competition. One of the candidates that made it to the second round is called Blue Midnight Wish [2], or BMW for short. BMW was tweaked for the second round of the competition. Throughout this paper, unless explicitly stated otherwise, we always refer to the original version of the hash function, i.e., the version submitted to the competition before the October 31, 2008 deadline. In this paper we describe some weaknesses in BMW. We show how to easily ﬁnd near-collisions in the compression function of BMW. By near-collisions we

The author is supported by a grant from the Villum Kann Rasmussen Foundation.

S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 304–317, 2010. c International Association for Cryptologic Research 2010

Pseudo-cryptanalysis of the Original Blue Midnight Wish

305

mean a pair of inputs to the compression function for which the outputs only diﬀer in a few (pre-speciﬁed) bit positions. We also show how to ﬁnd collisions, preimages, and second preimages in the compression function, faster than what is possible for an ideal compression function. This can be done by controlling 128, respectively 256 bits of the output of the compression function of BMW-256, respectively BMW-512. By controlling we mean that the adversary can give these bits any value he wishes with negligible eﬀort. The complexity of these attacks corresponds to the complexity for a 192-bit, respectively 384-bit ideal hash function in the case of BMW-256, respectively BMW-512. Hence, for instance, pseudo-collisions can be found in BMW-512 in time about 2192 , which is to be compared to the expected 2256 for an ideal 512-bit hash function. Memory requirements of all attacks are negligible. We point out that the attacks described in this paper do not seem to apply to the tweaked version of Blue Midnight Wish. In Section 5, we brieﬂy describe the tweaks and make some preliminary comments on these.

2

A Description of Blue Midnight Wish

Blue Midnight Wish is in fact a collection of four hash functions returning digests of four diﬀerent sizes: 224 bits, 256 bits, 384 bits, and 512 bits. The two shortest digests are computed in the same way, except in the ﬁnal step, which is a truncation. Likewise for the two longest digests. The word size, denoted by w, for the short variants is 32 bits, and for the long variants is 64 bits. Apart from the word size, all four variants are very similar. A little-endian byte ordering is assumed. Blue Midnight Wish applies only four diﬀerent types of operations: additions modulo 2w , exclusive-ors (XORs), and bitwise shifts and rotations. In the following, all additions of words in the description of Blue Midnight Wish are to be taken modulo 2w . Blue Midnight Wish maintains a state of 16 words during the processing of a message; only in the end, the 16 words are truncated down to 6, 7, or 8, depending on the digest size (truncation is done by throwing away the ﬁrst 10, 9, or 8 words, respectively). Message blocks are also 16 words in length, and Blue Midnight Wish operates with a compression function mapping 2× 16 words to 16 words. The compression function is iterated in a standard fashion [3,4]. Hence, the message m of bitlength μ = |m| must be padded to a length that is a multiple of 16w bits, which is done by ﬁrst appending a ‘1’-bit, then appending z = −μ − 65 mod 16w ‘0’-bits (this part of the padding will be called “10. . . ” padding in the following), and ﬁnally appending a 64-bit representation of the message length μ (length padding). We now turn to a description of the Blue Midnight Wish compression function. 2.1

The Blue Midnight Wish Compression Function

The Blue Midnight Wish compression function takes two 16-word inputs and returns a single 16-word output. It applies three diﬀerent sub-functions, which

306

S.S. Thomsen

are called P , f1 , and f2 . P is an eﬃciently invertible permutation1 . f1 is a socalled multi-permutation taking two inputs, meaning that by ﬁxing one of the inputs, the function is a permutation on the other input. Finally, f2 compresses three inputs of 16 words to a single 16-word output, which is also the output of the compression function. The two 16-word inputs to the compression function will be called H and M , H being the chaining variable, and M being the message block. Referring to a single word in one of the inputs will be denoted by Hi or Mi , meaning word number i, where counting starts from 0. Hence, (e.g.) M = M0 M1 · · · M15 . The input to the permutation P is H ⊕ M . The output of P is denoted Y = Y0 Y1 · · · Y15 . The inputs to f1 are Y and M . The output of f1 is denoted Z = Z0 Z1 · · · Z15 . The inputs to f2 are Y , Z, and M , and the output, which is also the output of the compression function, is denoted H ∗ . See also Fig. 1.

Y

H M

-

6

P

-

f1

-

Z-

f2

- H∗

-

Fig. 1. The Blue Midnight Wish compression function

Given Y and M , a matching H can easily be computed as P −1 (Y ) ⊕ M , since P is eﬃciently invertible. Moreover, H is not used as input to any other subfunction than P . Hence, in attacks on the compression function, the details of P are irrelevant, and therefore we do not describe these in this paper; see instead [2]. We now describe the two other components of the compression function. A Description of f1 . As mentioned, the inputs to f1 are M and Y , and the output is denoted Z. Let Q = Y Z be a 32-word vector, and note that when f1 is called, Q contains 16 already computed words, and 16 “null” words. Then, f1 can be described as a shift register, that computes one word of Q at the time as a function of the previous 16 words of Q. There are two variants of the step function that computes each new word of Q: a simple step function, and a more complex one. Since 16 words are computed in f1 , there are always 16 rounds, but the number of complex and simple rounds 1

In the Blue Midnight Wish speciﬁcation, a mapping f0 : {0, 1}16w × {0, 1}16w → {0, 1}16w is deﬁned. Since f0 is a permutation on the XOR of its two inputs, we choose to focus on this permutation and denote it P , i.e., f0 (H, M ) = P (H ⊕ M ).

Pseudo-cryptanalysis of the Original Blue Midnight Wish

307

depends on a tunable security parameter. By default, there are ﬁrst two complex rounds and then 14 simple rounds, and we shall generally assume that this is the distribution of complex and simple rounds. However, all our attacks apply to BMW using any value of the security parameter (in the case of the near-collision attack, some modiﬁcations are required). Both complex and simple rounds use a number of invertible sub-functions s0 , . . . , s5 and r1 , . . . , r7 , whose descriptions are postponed to Appendix A. Both types of rounds also use the same “message schedule”: consider M to be a 16element column vector in Z2w , and deﬁne the matrix B as the circulant matrix whose ﬁrst row contains the 16 elements [1, 0, 0, 1, 0, 0, 0, 0, 0, 0, −1, 0, 0, 0, 0, 0]. Row i + 1 of B is row i cyclically shifted one position to the right. Note that B is invertible for both word sizes; the inverses can be found in Appendix B. Deﬁne W = B · M mod 2w , with Wi referring to the ith word of W . We note that this means that Wi = Mi + Mi+3 − Mi+10 mod 2w , where the indices are to be reduced modulo 16. In round i of f1 , Wi is involved in the computation of Zi = Qi+16 . Furthermore, 16 constants Ki are deﬁned as 2w /3 · i. We can now describe a complex round as Qi+16 ← s1 (Qi ) + s2 (Qi+1 ) + s3 (Qi+2 ) + s0 (Qi+3 ) +s1 (Qi+4 ) + s2 (Qi+5 ) + s3 (Qi+6 ) + s0 (Qi+7 ) +s1 (Qi+8 ) + s2 (Qi+9 ) + s3 (Qi+10 ) + s0 (Qi+11 ) +s1 (Qi+12 ) + s2 (Qi+13 ) + s3 (Qi+14 ) + s0 (Qi+15 ) +Wi + Ki , for increasing i; for the default choice of the tunable security parameter, i increases from 0 to 1. A simple round, covering the range of remaining values of i up to and including 15, is described as Qi+16 ← Qi + r1 (Qi+1 ) + Qi+2 + r2 (Qi+3 ) +Qi+4 + r3 (Qi+5 ) + Qi+6 + r4 (Qi+7 ) +Qi+8 + r5 (Qi+9 ) + Qi+10 + r6 (Qi+11 ) +Qi+12 + r7 (Qi+13 ) + s5 (Qi+14 ) + s4 (Qi+15 ) +Wi + Ki . Note that given M and 16 consecutive words of Q, the remaining 16 words of Q can be computed; in particular, given M and Z, Y can be computed. Likewise, given Y and Z (i.e., all of Q), M can be computed via W as M = B−1 · W mod 2w . A Description of f2 . The sub-function f2 takes as input M , Y , and Z. Let XL = Z0 ⊕ Z1 ⊕ · · · ⊕ Z7 XH = Z0 ⊕ Z1 ⊕ · · · ⊕ Z15 .

and

308

S.S. Thomsen

∗ The output words H0∗ , . . . , H15 are computed as follows. 5 H0∗ = (XH ⊕ Z0 5 ⊕ M0 ) + (XL ⊕ Z8 ⊕ Y0 )

7 ⊕ Z1 8 ⊕ M1 ) + (XL ⊕ Z9 ⊕ Y1 ) H1∗ = (XH

5 H2∗ = (XH ⊕ Z2 5 ⊕ M2 ) + (XL ⊕ Z10 ⊕ Y2 ) 1 H3∗ = (XH ⊕ Z3 5 ⊕ M3 ) + (XL ⊕ Z11 ⊕ Y3 ) 3 ⊕ Z4 ⊕ M4 ) + (XL ⊕ Z12 ⊕ Y4 ) H4∗ = (XH

6 H5∗ = (XH ⊕ Z5 6 ⊕ M5 ) + (XL ⊕ Z13 ⊕ Y5 ) 4 ⊕ Z6 6 ⊕ M6 ) + (XL ⊕ Z14 ⊕ Y6 ) H6∗ = (XH

11 H7∗ = (XH ⊕ Z7 2 ⊕ M7 ) + (XL ⊕ Z15 ⊕ Y7 )

H8∗ = (H4∗ )≪9 + (XH ⊕ Z8 ⊕ M8 ) + (XL8 ⊕ Z7 ⊕ Y8 ) H9∗ = (H5∗ )≪10 + (XH ⊕ Z9 ⊕ M9 ) + (XL6 ⊕ Z0 ⊕ Y9 ) ∗ H10 = (H6∗ )≪11 + (XH ⊕ Z10 ⊕ M10 ) + (XL6 ⊕ Z1 ⊕ Y10 ) ∗ = (H7∗ )≪12 + (XH ⊕ Z11 ⊕ M11 ) + (XL4 ⊕ Z2 ⊕ Y11 ) H11 ∗ = (H0∗ )≪13 + (XH ⊕ Z12 ⊕ M12 ) + (XL3 ⊕ Z3 ⊕ Y12 ) H12 ∗ H13 = (H1∗ )≪14 + (XH ⊕ Z13 ⊕ M13 ) + (XL4 ⊕ Z4 ⊕ Y13 ) ∗ = (H2∗ )≪15 + (XH ⊕ Z14 ⊕ M14 ) + (XL7 ⊕ Z5 ⊕ Y14 ) H14 ∗ H15 = (H3∗ )≪16 + (XH ⊕ Z15 ⊕ M15 ) + (XL2 ⊕ Z6 ⊕ Y15 ).

Here, xs respectively xs means x shifted left, respectively right by s bit positions. Similarly, x≪s means x rotated left by s bit positions.

3

Near-Collisions in the Compression Function

Attacks on the compression function of Blue Midnight Wish are not aﬀected by the permutation P , since this permutation can be inverted, and thereby the chaining input can be computed. One may also observe that by choosing the same (XOR) diﬀerences in H and M , there is no input diﬀerence in P , and therefore also no output diﬀerence. By ensuring that only the last few words of the expanded message W contain a diﬀerence, we see that no diﬀerence is involved in a large part of f1 . Combined with the fact that diﬀusion is not very eﬀective in f2 , this observation leads to near-collisions in the compression function. Hence, a strategy to ﬁnd the best (lowest weight) near-collision in the compression function is to search for diﬀerence patterns of the last few words of W , such that these diﬀerences do not spread too much in the last few rounds of f1 and in f2 . Note that the inverse message schedule must be applied to W in order to be able to compute f2 , and this message schedule will cause some

Pseudo-cryptanalysis of the Original Blue Midnight Wish

309

diﬀusion of diﬀerences in W ; however, diﬀerences in the most signiﬁcant bits (MSBs) will remain in the MSB positions after the inverse message schedule is applied. Therefore, an obvious choice is to search for diﬀerence patterns in W that only aﬀect the MSBs of words of W . 3.1

An Example

In the case of both BMW-256 and BMW-512, the search mentioned showed that a good diﬀerence pattern in W has diﬀerences in the MSBs of W13 , W14 , and W15 only. The inverse message schedule causes diﬀerences in the MSBs of M0 , M1 , M3 , M4 , M7 , M9 , and M13 . Hence, there are 7 bit diﬀerences in M , which are introduced in the function f2 . A diﬀerence in W13 is propagated directly to Z13 in the 13th round of f1 . Hence, Z13 obtains the diﬀerence 100 . . . 0 (in binary). In round 14, the function s4 is applied to Z13 yielding the diﬀerence 1100 . . . 0 (see Appendix A), and a diﬀerence in the MSB of W14 is also introduced. The end result is the diﬀerence 0100 . . . 0 in Z14 with probability 1/2. Finally, in round 15, the function s5 is applied to Z13 yielding the diﬀerence 10100 . . . 0, the function s4 is applied to Z14 yielding the diﬀerence 01100 . . . 0, and a diﬀerence in the MSB of W15 is also introduced. Optimally, these diﬀerences result in the diﬀerence 1100 . . . 0 in Z15 , since then ΔZ13 ⊕ ΔZ14 ⊕ ΔZ15 = 0 (ΔZi meaning the diﬀerence on Zi ), which means that in f2 , the variables XL and XH will contain no diﬀerence. The total probability of this characteristic is about 2−3 . See Table 1. Table 1. The desired binary diﬀerences in the last three words of Z Word

Desired XOR diﬀerence (binary)

Z13

100...0

Z14

010...0

Z15

110...0

In f2 , as mentioned, the desired bit diﬀerences in Z yield no diﬀerence in XL and XH . Hence, in the output words H0∗ , H1∗ , H3∗ , H4∗ , there are only diﬀerences in the MSBs, and these come from the message M . There is no diﬀerence in H2∗ . In H5∗ , the MSB diﬀerence in Z13 is inherited, and there are no other diﬀerences. In H6∗ , the diﬀerence 0100 . . . 0 in Z14 is inherited and with probability 1/2 does not propagate. In H7∗ , the MSB diﬀerence in M7 cancels the MSB diﬀerence in Z15 , and the resulting diﬀerence is 0100 . . . 0, which does not propagate with probability 1/2. ∗ One may investigate in a similar way the eﬀects on the words H8∗ , . . . , H15 . This shows that as few as 17 bit diﬀerences remain in the best case, and this has a total probability of around 2−14 . See Table 2. Note that in a pseudo-attack, only the last 6, 7, or 8 words are part of the output.

310

S.S. Thomsen

Table 2. Output diﬀerences in the near-collision attack on the BMW compression function. Applies to all variants.

3.2

Word

XOR diﬀerence (binary)

H0∗

100 . . . 00000000000000000

H1∗

100 . . . 00000000000000000

H2∗

000 . . . 00000000000000000

H3∗

100 . . . 00000000000000000

H4∗

100 . . . 00000000000000000

H5∗

100 . . . 00000000000000000

H6∗

010 . . . 00000000000000000

H7∗

010 . . . 00000000000000000

H8∗

000 . . . 00000000100000000

H9∗

100 . . . 00000001000000000

∗ H10

000 . . . 00000001000000000

∗ H11

000 . . . 00000010000000000

∗ H12

000 . . . 00001000000000000

∗ H13

000 . . . 00010000000000000

∗ H14

010 . . . 00000000000000000

∗ H15

010 . . . 01000000000000000

Other Diﬀerence Patterns

We note that the diﬀerence in Z may be slightly diﬀerent, and still give the same results as those described. For instance, the diﬀerence patterns of Z14 and Z15 may be swapped. Moreover, there are in fact slightly better message diﬀerence patterns than the one described above. As an example, a diﬀerence in the MSB of W13 and in the second-most signiﬁcant bit of W15 yields—with a high probability—a near-collision in all but 14 bits of the compression function output. However, the corresponding message diﬀerence in M has a higher Hamming weight, and speciﬁcally there are diﬀerences in the words M14 and M15 , which (in a pseudoattack on the hash function) are reserved for padding. We did not ﬁnd simple diﬀerence patterns with diﬀerences only in the last few words of W that lead to full collisions with high probability. With a value of the security parameter above 13, the above characteristic has a low (if not zero) probability. However, even with a value of 16, a high probability characteristic exists producing near-collisions of total Hamming weight as low as 24 for the 16 output words of the compression function (see [5]).

Pseudo-cryptanalysis of the Original Blue Midnight Wish

3.3

311

A Pseudo-near-collision in BMW-256

In the attack described above, there are no diﬀerences in M14 and M15 , which in BMW-256 are the words containing length padding. This means we can extend the near-collision attack on the compression function to a pseudo-near-collision attack on the BMW-256 hash function. Moreover, one of the colliding messages may start from the correct initial value of BMW-256; the other initial value will be diﬀerent from the correct one in the same 7 bit positions as those which contain diﬀerences in M . As an example, the bit sequence of length 447 bits starting with the three bytes f3 8b 01 and ending with (423) ‘0’-bits follows the characteristic described above (with chaining input equal to the BMW-256 initial value). Further details can be found in the full version of this paper [6].

4

Pseudo-attacks

A second observation on the BMW compression function leads to improved pseudo-collision, -preimage, and -second preimage attacks: if Zi = 0 for all i, 0 ≤ i < 16, then we get the following greatly simpliﬁed description of f2 . H0∗ H1∗ H2∗ H3∗ H4∗ H5∗ H6∗ H7∗ H8∗ H9∗ ∗ H10 ∗ H11 ∗ H12 ∗ H13 ∗ H14 ∗ H15 4.1

= M0 + Y0 = M1 + Y1 = M2 + Y2 = M3 + Y3 = M4 + Y4 = M5 + Y5 = M6 + Y6 = M7 + Y7 = (M4 + Y4 )≪9 + M8 + Y8 = (M5 + Y5 )≪10 + M9 + Y9 = (M6 + Y6 )≪11 + M10 + Y10 = (M7 + Y7 )≪12 + M11 + Y11 = (M0 + Y0 )≪13 + M12 + Y12 = (M1 + Y1 )≪14 + M13 + Y13 = (M2 + Y2 )≪15 + M14 + Y14 = (M3 + Y3 )≪16 + M15 + Y15 .

Controlling Output Words – A First Example

Some output words of the compression function can be controlled by an attacker after ﬁxing Z = 0. The idea is to ﬁx some words of M and some words of Y in such a way that a number of output words obtain an arbitrary value chosen by the attacker, and such that f1 can be computed backwards, i.e., one may

312

S.S. Thomsen

compute Y15 from Z, then Y14 , etc. Words of M can be ﬁxed directly, since they are part of the input to the compression function. Words of Y can be ﬁxed indirectly via words of W , which, as mentioned, depend on M . There is enough freedom to ﬁx some words of M and some words of W at the same time. More details follow. Note that this attack is independent of the value of the security parameter, since both simple and complex rounds are invertible. ∗ Consider as an example the “new” deﬁnition of H11 when Z = 0: ∗ H11 = (M7 + Y7 )≪12 + M11 + Y11 . ∗ . Message By ﬁxing M7 , Y7 , M11 , and Y11 , one has eﬀectively controlled H11 words are part of the input to the compression function. Words of Y can be controlled to some extent via words of W ; after having ﬁxed Z, we are able to compute words of Y in the backward direction, i.e., we compute Y15 ﬁrst, then Y14 etc., all the way down to Y0 . Alternatively, we can compute the value of Wi needed to get some desired value of Yi , for any i such that Yj is already computed for all j > i. Thereby, we indirectly control Yi . ∗ As a simple example for BMW-256, assume we want H11 to obtain the value α. To do this, we may choose (e.g.) M7 = Y7 = Y11 = 0 and M11 = α. We obtain Y7 = Y11 = 0 by controlling W7 and W11 . Note that once Y and M are ﬁxed, we compute H as described in Section 2.1. How to control words of M and words of W at the same time is now described. Sticking to the example, assume we want to be able to control M7 , M11 , W7 , and W11 . Compute W = B·M with (initially) all words of M as free parameters. As an example, one gets W15 = M2 −M9 +M15 . Now, make W15 free by replacing everywhere M15 by W15 − M2 + M9 . Now W15 is freed, but M15 is no longer free. Since W14 = M1 − M8 + M14 , we can make W14 free by replacing everywhere M14 by W14 − M1 + M8 . We may continue like this, freeing all Wi down to i = 7 (incl.), without making M7 or M11 dependent. Since M13 , M14 , and M15 contain padding, we might want to keep these three words of M free as well. This way, one obtains (e.g.)

W0 = −M1 + M3 + 2M7 − M13 − W7 + W13 W1 = 2M1 − M7 − M11 + M13 + W7 − W10 W2 = 2M1 − M3 + 3M11 + 3M14 + M15 − 2W8 − W9 − W11 − 2W14 − W15 W3 = −M1 + 2M3 − M11 − M13 − M14 + W8 + W9 − W12 + W14 + W15 W4 = M1 + M13 − M14 + W7 − W10 W5 = M1 + M11 + 2M14 − M15 − W11 − W14 W6 = M3 − M7 + M13 + M15 + W9 − W12 − W13 . All words appearing on the right hand sides are free, and all other words are dependent. By computing the words Yi in the backward direction, or choosing Yi and computing the required Wi for i from 15 down to 7, we can control all the words Y15 , Y14 , . . . , Y7 . In particular, we can make sure that Y7 = Y11 = 0. Since M7

Pseudo-cryptanalysis of the Original Blue Midnight Wish

313

and M11 are free, we can also choose these two message words as we want; in ∗ particular, we can choose M7 = 0 and M11 = α, so that we obtain H11 = α. ∗ ∗ Since we indirectly also control H7 , we can obtain H7 = β for any β of our choice via a proper choice of, say, M7 . Note that in order to compute Y1 and Y0 , s1 must be inverted (see Section 2.1). This is slightly more complicated in practice than computing s1 in the forward direction, but it can also be done eﬃciently (with some additional memory requirements) by pre-computing and storing all inverses. ∗ The reason for choosing to control H11 is that Y7 is involved in its computation. This means we have to make only a few words of W free (W15 down to W7 ), and there is still a large amount of freedom in the choice of a number of words of M . This will be useful in extensions of the attack. 4.2

Controlling Additional Output Words

There are many degrees of freedom left. These can be used to control additional ∗ output words. For instance, we may control H6∗ and H10 via M6 , W6 , M10 , and W10 . We again keep M14 and M15 free as above, but M13 is not free. We shall obtain correct “10. . . ” padding in M13 probabilistically; the probability is about 1/2 if we assume only a single bit of “10. . . ” padding (hence, the message length is 512 − 65 = 447 bits). We set M6 = M7 = M10 = M11 = 0 (for the sake of simplicity), and now we want to free all Wi for i from 15 down to 6, since we need to be able to control Y6 . Using the same method as in the previous examples, we obtain W0 = 2M14 + M15 − W6 − 2W7 − 2W8 − W9 + W12 − 2W14 − 2W15 W1 = −M14 − M15 + W6 + W8 − W10 + W13 + W14 + W15 W2 = 2M14 + M15 − W7 − W8 − W11 − W12 − W14 W3 = 2M14 + M15 − W6 − 2W7 − 2W8 − W9 + W12 − W13 − 2W14 − 2W15 W4 = −2M14 − M15 + W6 + W7 + W8 − W10 + W13 + W14 + W15 W5 = 2M14 − M15 − W7 − W11 − W14 . ∗ ∗ We now control the four output words H6∗ , H7∗ , H10 , and H11 via W6 , W7 , W10 , and W11 . The time complexity of this attack is about 2, since we need correct “10. . . ” padding in M13 , but we have no (direct) control over this message word.

4.3

Other Variants of BMW

The same technique as described above for BMW-256 can be applied to BMW512. In fact, for BMW-512, length padding takes up only one message word, and therefore we have enough freedom to ensure correct “10. . . ” padding with probability 1. Obviously, the attacks also apply to BMW-224 and BMW-384, since these diﬀer from BMW-256 and BMW-512 (respectively) only in the initial value and the ﬁnal truncation.

314

S.S. Thomsen

4.4

Applications

After truncation, two out of eight (or out of six or seven in the case of BMW-384 and BMW-224, respectively) output words can be given any value chosen by the attacker. This control can be used to carry out pseudo-attacks, i.e., attacks in which the attacker is free to choose the initial value of the hash function. Example pseudo-attacks are pseudo-collision, pseudo-preimage, and pseudo-second preimage attacks. The time complexities of these attacks on BMW correspond to brute force attacks on 3/4 of the output bits (or 2/3 or 5/7 in the case of BMW-384 and BMW-224, respectively). Hence, the time complexity is reduced compared to an ideal hash function. Table 3 summarizes the attack complexities for the three types of attack on the four variants of Blue Midnight Wish. Memory requirements are negligible. As mentioned, pseudo-attacks are attacks in which the attacker is free to choose the initial value of the hash function. In the case of pseudo-collision and pseudo-second preimage attacks, the two colliding messages will generally assume two diﬀerent initial values. Table 3. Pseudo-attack complexities on the four Blue Midnight Wish variants (in brackets, birthday/brute force complexities) Variant BMW-224

81

2

97

112

(2

)

2161 (2224 )

2

)

2193 (2256 )

BMW-384

2128 (2192 )

2256 (2384 )

192

2

128

Pseudo-(second) preimage

BMW-256 BMW-512

4.5

Pseudo-collision (2

256

(2

)

2384 (2512 )

Available Degrees of Freedom

Clearly, in these attacks we do not have to choose Z to be all-zero, we can choose it to be anything we want. Also, we have lots of freedom in the choice of M6 , M7 , ∗ ∗ M10 , M11 , H6∗ , and H7∗ to get the desired values of H10 and H11 . The choices we made in the examples above were only to simplify expressions. The available degrees of freedom may be useful in extensions of the attacks; however, so far we did not succeed in doing this. 4.6

Examples

For examples demonstrating the attack, see the full version of this paper [6].

5

The Tweaked Blue Midnight Wish

For the second round of the SHA-3 competition, Blue Midnight Wish was tweaked in three ways [7]:

Pseudo-cryptanalysis of the Original Blue Midnight Wish

315

1. The function f0 (H, M ), which in the original BMW was deﬁned as P (H⊕M ), is now deﬁned as P (H ⊕ M ) + H ≪w mod 2w . 2. The message expansion in f1 is changed. The terms Mi + Mi+3 − Mi+10 + Ki , as they appeared in the original BMW, are replaced by (Mi ≪1+i + Mi+3 ≪1+(i+3 mod 16) − Mi+10 ≪1+(i+10 mod 16) + Ki mod 2w ) ⊕ Hi+7 . 3. In the tweaked version, after the processing of the message, the compression function is invoked again, using a constant value for H and using the intermediate hash of the message as M . The ﬁrst tweak means that it is still easy to compute M given H and Y , but now it appears to be hard to compute H given M and Y (as needed in the attacks described above). The second tweak seems to imply that one needs to choose H before f1 can be computed, in contrast with the original case, where H did not have to be chosen until the complete attack on the compression function had been carried out. The last tweak makes it harder to turn some compression function attacks into pseudo-attacks. Some preliminary thoughts with respect to cryptanalysis of the tweaked BMW follow. – Collisions in f0 with constant M may now exist, since it is no longer guaranteed that f0 is a multi-permutation. – Since each output word of f0 depends on only six words of H, one can ﬁnd an input H (for arbitrary M ) such that eight out of 16 output words of f0 have any desired value. The complexity of this “attack” is 1, and it allows the attacker to choose eight words of H arbitrarily. – f1 is still a multi-permutation, i.e., given two out of three inputs and the output, the last input can be (eﬃciently) computed. It may be of particular interest that given any M, Y, Z, a matching H in f1 is easy to compute. Aumasson [8] found a distinguisher for the Tweaked Blue Midnight Wish compression function requiring 219 unknown input pairs with a ﬁxed diﬀerence. The distinguisher detects a strong bias on the least signiﬁcant bits of the output word H0∗ . Similarly, Guo and Thomsen [9] provided input diﬀerences to the tweaked BMW compression function such that with a limited amount of message modiﬁcation, a single or a few output bits contain a diﬀerence with probability 0 or 1 (depending on the diﬀerence chosen). These distinguishers do not threaten the security of the hash function.

6

Conclusion

We have described a number of weaknesses in the original version of the Blue Midnight Wish hash function. The weaknesses lead to attacks in which the adversary is allowed to choose the initial value of the hash function. It is by no means straightforward to extend the attacks to full-blown attacks using the given initial values of the BMW variants. Meet-in-the-middle attacks also do not seem possible since BMW uses an internal state that is at least twice as large as the output of the hash function.

316

S.S. Thomsen

The attacks, as they are described in this paper, apparently do not apply to the tweaked version of BMW. Acknowledgments. I would like to thank Guo Jian and my colleagues at DTU Mathematics for useful feedback and encouragement, and the anonymous reviewers for helpful comments.

References 1. National Institute of Standards and Technology: The SHA-3 competition website available, http://csrc.nist.gov/groups/ST/hash/sha-3/index.html (2009/08/26) 2. Gligoroski, D., Kl´ıma, V., Knapskog, S.J., El-Hadedy, M., Amundsen, J., Mjølsnes, S.F.: Cryptographic Hash Function Blue Midnight Wish. SHA-3 Algorithm Submission (October 2008), http://csrc.nist.gov/groups/ST/hash/sha-3/Round1/documents/ Blue Midnight Wish.zip (2009/11/09) 3. Damg˚ ard, I.: A Design Principle for Hash Functions. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 416–427. Springer, Heidelberg (1990) 4. Merkle, R.C.: One Way Hash Functions and DES. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 428–446. Springer, Heidelberg (1990) 5. Thomsen, S.S.: A near-collision attack on the Blue Midnight Wish compression function. Manuscript (November 2008), http://www.mat.dtu.dk/people/S.Thomsen/bmw/nc-compress.pdf (2009/09/09) 6. Thomsen, S.S.: Pseudo-cryptanalysis of the Original Blue Midnight Wish. Cryptology ePrint Archive, Report 2009/478 (2009), http://eprint.iacr.org/ 7. Gligoroski, D., Kl´ıma, V.: A Document describing all modiﬁcations made on the Blue Midnight Wish cryptographic hash function before entering the Second Round of SHA-3 hash competition (September 2009), http://people.item.ntnu.no/∼danilog/Hash/BMW-SecondRound/ Supporting Documentation/Round2Mods.pdf (2009/11/09) 8. Aumasson, J.P.: Practical distinguisher for the compression function of Blue Midnight Wish. Manuscript, http://131002.net/data/papers/Aum10.pdf (2010/03/10) 9. Guo, J., Thomsen, S.S.: Distinguishers for the Compression Function of Blue Midnight Wish with Probability 1. Manuscript (March 2010), http://www.mat.dtu.dk/people/S.Thomsen/bmw/bmw-distinguishers.pdf (2010/03/31)

Pseudo-cryptanalysis of the Original Blue Midnight Wish

A

317

Sub-functions Used in f1

The sub-functions si , 0 ≤ i ≤ 4, and ri , 1 ≤ i ≤ 7, used in f1 are deﬁned as follows. BMW-224 and BMW-256 s0 (x) = x1 ⊕ x3 ⊕ x≪4 ⊕ x≪19 s1 (x) = x1 ⊕ x2 ⊕ x≪8 ⊕ x≪23 s2 (x) = x2 ⊕ x1 ⊕ x≪12 ⊕ x≪25 s3 (x) = x2 ⊕ x2 ⊕ x≪15 ⊕ x≪29 s4 (x) = x1 ⊕ x s5 (x) = x2 ⊕ x r1 (x) = x≪3 r2 (x) = x≪7 r3 (x) = x≪13 r4 (x) = x≪16 r5 (x) = x≪19 r6 (x) = x≪23 r7 (x) = x≪27

B

BMW-384 and BMW-512 s0 (x) = x1 ⊕ x3 ⊕ x≪4 ⊕ x≪37 s1 (x) = x1 ⊕ x2 ⊕ x≪13 ⊕ x≪43 s2 (x) = x2 ⊕ x1 ⊕ x≪19 ⊕ x≪53 s3 (x) = x2 ⊕ x2 ⊕ x≪28 ⊕ x≪59 s4 (x) = x1 ⊕ x s5 (x) = x2 ⊕ x r1 (x) = x≪5 r2 (x) = x≪11 r3 (x) = x≪27 r4 (x) = x≪32 r5 (x) = x≪37 r6 (x) = x≪43 r7 (x) = x≪53

Inverses of the Matrix B Used in f1

The matrix B is introduced in Section 2.1. This matrix is circulant, meaning that each row is equal to the row above rotated one position to the right. The inverses modulo 232 and 264 are also circulant. The ﬁrst row of B−1 mod 232 (in hexadecimal) is [abababac, c6c6c6c7, bdbdbdbe, c0c0c0c1, 15151515, 4e4e4e4e, 90909090, cfcfcfd0, babababb, 6c6c6c6d, dbdbdbdc, 0c0c0c0c, 51515151, e4e4e4e5, 09090909, fcfcfcfd]. The ﬁrst row of B−1 mod 264 is [abababababababac, c6c6c6c6c6c6c6c7, bdbdbdbdbdbdbdbe, c0c0c0c0c0c0c0c1, 1515151515151515, 4e4e4e4e4e4e4e4e, 9090909090909090, cfcfcfcfcfcfcfd0, babababababababb, 6c6c6c6c6c6c6c6d, dbdbdbdbdbdbdbdc, 0c0c0c0c0c0c0c0c, 5151515151515151, e4e4e4e4e4e4e4e5, 0909090909090909, fcfcfcfcfcfcfcfd].

Diﬀerential and Invertibility Properties of BLAKE Jean-Philippe Aumasson1, , Jian Guo2, , Simon Knellwolf3,† , Krystian Matusiewicz4,‡ , and Willi Meier3,§ 1

Nagravision SA, Cheseaux, Switzerland Nanyang Technological University, Singapore 3 FHNW, Windisch, Switzerland 4 Technical University of Denmark, Denmark

2

Abstract. BLAKE is a hash function selected by NIST as one of the 14 second round candidates for the SHA-3 Competition. In this paper, we follow a bottom-up approach to exhibit properties of BLAKE and of its building blocks: based on diﬀerential properties of the internal function G, we show that a round of BLAKE is a permutation on the message space, and present an eﬃcient inversion algorithm. For 1.5 rounds we present an algorithm that ﬁnds preimages faster than in previous attacks. Discovered properties lead us to describe large classes of impossible diﬀerentials for two rounds of BLAKE’s internal permutation, and particular impossible diﬀerentials for ﬁve and six rounds, respectively for BLAKE-32 and BLAKE-64. Then, using a linear and rotation-free model, we describe near-collisions for four rounds of the compression function. Keywords: cryptanalysis, hash functions, SHA-3.

1

Introduction

BLAKE [2] is one of the 14 designs selected for the second round of the SHA-3 Competition organized by the U.S. National Institute of Standards and Technology. BLAKE uses HAIFA as [3] operation mode, with some simpliﬁcations. Its compression function is based on a keyed permutation that reuses internals of † ‡ §

Supported in part by European Commission through the ICT programme under contract ICT-2007-216676 ECRYPT II. A full version of this article appears in [1]. Work done while this author was with FHNW, Switzerland, and supported by the Swiss National Science Fundation under project no. 113329. The paper was partly done during the author’s visit to Technical University of Denmark and was partly supported by a DCAMM grant there. Supported by Hasler Foundation http://www.haslerfoundation.ch under project number 08065. Supported by the grant from the Danish Research Council for Technology and Production Sciences number 274-07-0246. ¨ STIFTUNG, project no. GRS-069/07. Supported by GEBERT RUF

S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 318–332, 2010. c International Association for Cryptologic Research 2010

Diﬀerential and Invertibility Properties of BLAKE

319

the stream cipher ChaCha [4]. Wordwise operations are integer addition, XOR, and rotation (AXR). Depending on the output length BLAKE works on 32-bit or 64-bit words. If necessary we refer to the speciﬁc instances by BLAKE-32 and BLAKE-64 respectively. In a previous work, Ji and Liangyu [5] presented a preimage attack on roundreduced versions of BLAKE-32 and BLAKE-64 with up to 2.5 rounds (out of 10 and 14 respectively). In particular they described a method with complexity 2192 to ﬁnd preimages of BLAKE-32 reduced to 1.5 rounds. Contribution of this paper. We establish diﬀerential properties of the permutation used in the compression function of BLAKE and investigate invertibility of one and more rounds. Following a bottom-up approach, we ﬁrst state diﬀerential properties of the core function G. We exploit them to show injectivity of one round of the permutation with respect to the message space. We derive explicit input-output equations for G, which yield an eﬃcient algorithm to invert one round and an improved algorithm to ﬁnd a preimage of 1.5 rounds (in 2128 for BLAKE-32). Then we exploit diﬀerential properties of G to ﬁnd large classes of impossible diﬀerentials for one and two rounds, and speciﬁc impossible diﬀerentials for ﬁve and six rounds of BLAKE-32 and BLAKE-64 respectively. Using a linear and rotation-free model of G we ﬁnd near-collisions for the compression function reduced to four rounds.

2

Preliminaries

This section describes the compression function of BLAKE and then ﬁxes notations used in the rest of this paper. A complete speciﬁcation of BLAKE can be found in [2]. 2.1

The Compression Function of BLAKE

The compression function of BLAKE processes a 4×4 state of 16 words v0 , . . . , v15 . This state is initialized by a chaining value h0 , . . . , h7 , a salt s0 , . . . , s3 , a counter t0 , t1 , and constants k0 , . . . , k7 as depicted below: ⎛ ⎞ ⎞ ⎛ v0 v1 v2 v3 h0 h1 h2 h3 ⎜v4 v5 v6 v7 ⎟ ⎜ h4 h5 h6 h7 ⎟ ⎜ ⎟ ⎟ ⎜ ⎝v8 v9 v10 v11 ⎠ ← ⎝s0 ⊕ k0 s1 ⊕ k1 s2 ⊕ k2 s3 ⊕ k3 ⎠ v12 v13 v14 v15 t0 ⊕ k4 t0 ⊕ k5 t1 ⊕ k6 t1 ⊕ k7 The initial state is processed by 10 or 14 rounds for BLAKE-32 and BLAKE-64 respectively. A round is composed of a column step: G0 (v0 , v4 , v8 , v12 ) G1 (v1 , v5 , v9 , v13 ) G2 (v2 , v6 , v10 , v14 ) G3 (v3 , v7 , v11 , v15 ) followed by a diagonal step: G4 (v0 , v5 , v10 , v15 ) G5 (v1 , v6 , v11 , v12 ) G6 (v2 , v7 , v8 , v13 ) G7 (v3 , v4 , v9 , v14 ).

320

J.-P. Aumasson et al.

The G function depends on a position index s ∈ {0, . . . , 7} (indicated as subscript), a round index r ≥ 0, a message block m0 , . . . , m15 , and constants k0 , . . . , k15 . At round r of BLAKE-32, Gs (a, b, c, d) computes 1 : a ← (a + b) + (mi ⊕ kj ) 2 : d ← (d ⊕ a) ≫ 16 3 : c ← (c + d) 4 : b ← (b ⊕ c) ≫ 12

5 : a ← (a + b) + (mj ⊕ ki ) 6 : d ← (d ⊕ a) ≫ 8 7 : c ← (c + d) 8 : b ← (b ⊕ c) ≫ 7

with i = σr (2s) and j = σr (2s + 1), where {σr } is a family of permutations of {0, . . . , 15}. In BLAKE-64, the only diﬀerences—besides the word size—are the rotation constants, respectively set to 32, 25, 16, and 11. For a ﬁxed message block m, G is invertible and so a series of rounds is a permutation of the state. One may view the permutation as a block cipher with key m. After the 10 or 14 rounds the new chaining value h0 , . . . , h7 is computed as h0 ← h0 ⊕ s0 ⊕ v0 ⊕ v8 h4 ← h4 ⊕ s0 ⊕ v4 ⊕ v12 h1 ← h1 ⊕ s1 ⊕ v1 ⊕ v9 h5 ← h5 ⊕ s1 ⊕ v5 ⊕ v13 . h2 ← h2 ⊕ s2 ⊕ v2 ⊕ v10 h6 ← h6 ⊕ s2 ⊕ v6 ⊕ v14 h3 ← h3 ⊕ s3 ⊕ v3 ⊕ v11 h7 ← h7 ⊕ s3 ⊕ v7 ⊕ v15 Observe that in the deﬁnition of G, we write the ﬁrst line as “a ← (a+b)+(mi ⊕ kj )”, instead of “a ← a+b+(mi ⊕kj )”. This is to avoid ordering ambiguities when computing probabilities of diﬀerential characteristics. For instance, a diﬀerence in mi propagates through one addition in the former case, and through two additions in the latter, when interpreted as “a ← a + (b + (mi ⊕ kj ))”, idem for the ﬁfth line. Clearly, one can simultaneously use diﬀerent characteristics in this model as being equivalent to a single characteristic in a model that does not make any assumption on the order of the operations. 2.2

Notations

The symbols ∧ and ∨ denote logical AND and OR. Numbers in hexadecimal basis are written in typewriter (for example, ABCDEF01). A diﬀerence Δ always means a diﬀerence with respect to XOR, that is, two words m and m have the diﬀerence Δ if m ⊕ Δ = m . The Hamming weight of word m is denoted |m|, the Hamming weight of (m∧7FF · · · FF), that is, the Hamming weight of m excluding the most signiﬁcant bit (MSB), is denoted m. A diﬀerential characteristic (DC) for BLAKE is the sequence of diﬀerences followed through application of addition, XOR, and rotation. In contrast a diﬀerential only consists in a pair of input and output diﬀerences. When analyzing the diﬀerential behavior of the G function, we use the following notation: Δa : Δˆ a: Δa : Δi :

initial diﬀerence in a diﬀerence in the intermediate value of a set at line 1 ﬁnal diﬀerence in a diﬀerence in mi

Diﬀerential and Invertibility Properties of BLAKE

321

Analogous notations are used for diﬀerences in b, c, d, and mj . For instance, if Δa = Δi = 0 and Δb = 80 · · · 00, then Δˆ a = 80 · · · 00.

3

Diﬀerential Properties of the G Function

This section enumerates properties of the G function. We ﬁrst consider the case of diﬀerences in mi and mj only, and then consider the general case with input diﬀerences in the state. Finally we brieﬂy look at the inverse of G. 3.1

Diﬀerences in the Message Words Only

All statements below assume zero input diﬀerence in the state words, that is, Δa = Δb = Δc = Δd = 0. Proposition 1. If (Δi = 0) ∧ (Δj = 0), then (Δa = 0) ∧ (Δb = 0) ∧ (Δc = 0) ∧ (Δd = 0). Proof. If there is no diﬀerence in mi then there is no diﬀerence in a, b, c, and d after the ﬁrst four lines of G. Thus a diﬀerence Δ in mj always gives a nonzero diﬀerence Δ in a. Then, d always has a diﬀerence (Δ ≫ 8), which propagates to a nonzero diﬀerence Δ to c, and ﬁnally b has diﬀerence (Δ ≫ 7).

Proposition 2. If Δi = 0, then (Δa = 0) ⇒ (Δd = 0) (Δb = 0) ⇒ (Δc = 0)

(Δc = 0) ⇒ (Δb = 0) ∧ (Δd = 0) (Δd = 0) ⇒ (Δa = 0) ∧ (Δc = 0)

Proof. We show that in the output, a and d cannot be both free of diﬀerence, idem for d and c, and for b and c. By a similar argument as in the proof of Proposition 1, after the ﬁrst four lines of G the four state words have nonzero diﬀerences. In particular, the state has diﬀerences (Δ , Δ ≫ 12, Δ , Δ ≫ 16), for some nonzero Δ and Δ . Suppose that we obtain Δa = 0. Then we must have Δd = (Δ ≫ 24). Hence a and d cannot be both free of diﬀerence. Similarly, cancelling the diﬀerence Δ in c requires a diﬀerence in d, thus c and d cannot be both free of diﬀerence. Finally, to cancel the diﬀerence in b, c must have a diﬀerence, thus b and c cannot be both free of diﬀerence.

Two corollaries immediately follow from Proposition 1 and Proposition 2. Corollary 1. If (Δi ∨ Δj ) = 0, then there are diﬀerences in at least two output words. Corollary 2. All diﬀerentials with an output diﬀerence of one of the following forms are impossible: (Δ, 0, 0, 0) (0, 0, Δ, 0)

(0, Δ, 0, 0) (0, 0, 0, Δ)

(Δ, 0, 0, Δ ) (Δ, Δ , 0, 0)

for some nonzero Δ and Δ , and for any Δi and Δj .

(Δ, 0, Δ , 0) (0, Δ, Δ , 0)

322

J.-P. Aumasson et al.

Note that output diﬀerences of the form (0, Δ, 0, Δ ) are possible. For instance, if Δi = (Δi ≫ 4), then the output diﬀerence obtained by linearization is (0, Δi ≫ 3, 0, Δi ). For such a Δi , highest probability 2−28 is achieved for Δ = 88888888. A consequence of Corollary 2 is that a diﬀerence in at least one word of m7 , . . . , m15 gives diﬀerences in at least two output words after the ﬁrst round. This yields the following upper bounds on the probabilities of DCs. Proposition 3. A DC with input diﬀerence Δi , Δj has probability at most 2−1 if (Δi = 0) ∧ (Δj = 0), at most 2−6 if (Δi = 0) ∧ (Δj = 0) and at most 2−5 if = 0) ∧ (Δj = 0). (Δi A proof is given in the full version of this article [1]. 3.2

General Case

Statements below no longer assume zero input diﬀerence in the state words. Proposition 4. If Δa = Δb = Δc = Δd = 0, then Δb = Δc = 0. Proof. First, when Δi = Δj = 0, collisions do not exist since G is a permutation for ﬁxed mi and mj . So we must have diﬀerences in mi and/or mj . By Proposition 6, in G−1 a diﬀerence in mi and/or mj cannot aﬀect b and c, hence a collision for G needs no diﬀerence in b and c.

In other words, a collision for G requires zero diﬀerence in the initial b and c. For instance, collisions can be obtained for certain diﬀerences Δa, Δi , and zero diﬀerences in the other input words. Indeed at line 1 of the description of G, Δa propagates to (a + b) with probability 2−Δa , Δi propagates to (mi ⊕ kj ) with probability one, and ﬁnally Δa eventually cancels Δi . Note that a collision for G with diﬀerence 88888888 in both m11 and a is used in §6 to ﬁnd near-collisions for a modiﬁed version of BLAKE-32 with 4 rounds. The following result directly follows from Proposition 4. Corollary 3. The following classes of diﬀerentials for G are impossible: (Δ, Δ , Δ , Δ ) → (0, 0, 0, 0) → (0, 0, 0, 0) (Δ, 0, Δ , Δ ) (Δ, Δ , 0, Δ ) → (0, 0, 0, 0) for nonzero Δ and Δ , possibly zero Δ and Δ , and any Δi and Δj . =0 Many other classes of impossible diﬀerentials for G exist. For example, if Δa and Δb = Δc = Δd = 0, then Δb = 0. Proposition 5. The only DCs with probability one give Δa = Δb = Δc = Δd = 0 and have either – Δi = Δa = 800 · · · 00 and Δb = Δc = Δd = Δj = 0; – Δj = Δa = Δd = 800 · · · 00 and Δb = Δc = Δi = 0; – Δi = Δj = Δd = 800 · · · 00 and Δa = Δb = Δc = 0.

Diﬀerential and Invertibility Properties of BLAKE

323

Proof. The diﬀerence (800 · · · 00) is the only diﬀerence whose diﬀerential probability is one. Hence probability-1 DCs must only have diﬀerences active in additions. By enumerating all combinations of MSB diﬀerences in the input, one observes that the only valid ones have either MSB diﬀerence in Δi and Δa, in Δj and Δa and Δd, or in Δi and Δj and Δd. For constants ki equal to zero, more probability-1 diﬀerentials can be obtained using diﬀerences with respect to integer addition. 3.3

Properties of G−1

At round r, the inverse of Gs of BLAKE-32 computes 1 : b ← c ⊕ (b ≪ 7) 2: c← c−d 3 : d ← a ⊕ (d ≪ 8) 4 : a ← a − b − (mj ⊕ ki )

5 : b ← c ⊕ (b ≪ 12) 6: c← c−d , 7 : d ← a ⊕ (d ≪ 16) 8 : a ← a − b − (mi ⊕ kj )

where i = σr (2s) and j = σr (2s + 1). Unlike G, G−1 has low ﬂow dependency: two consecutive lines can be computed simultaneously and independently, with concurrent access to one variable. Many properties of G−1 can be deduced from the properties of G. For example, probability-1 DCs for G−1 can be directly obtained from Proposition 5. We report two particular properties of G−1 . The ﬁrst one follows directly from the description of G−1 . Proposition 6. In G−1 , the ﬁnal values of b and c do not depend on the message words mi and mj . In particular, b depends only on the initial b, c, and d. That is, when inverting G, initial b and c depend only on the choice of the image (a, b, c, d), not on the message. The following property follows from the observation in Proposition 3. Proposition 7. There exists no DC that gives collisions with probability one. Properties of G−1 are exploited in §4 to ﬁnd impossible diﬀerentials.

4

Impossible Diﬀerentials

An impossible diﬀerential (ID) is a pair of input and output diﬀerences that cannot occur. This section studies IDs for several rounds of the permutation of BLAKE. First we exploit properties of the G function to describe IDs for one and two rounds. Then we apply a miss-in-the-middle strategy to reach up to ﬁve and six rounds.

324

J.-P. Aumasson et al.

To illustrate IDs we use the following color code: absence of diﬀerence undetermined (possibly zero) diﬀerence undetermined or partially determined nonzero diﬀerence totally determined nonzero diﬀerence 4.1

Impossible Diﬀerentials for One Round

The following statement describes many IDs for one round of BLAKE’s permutation. Proposition 8. All diﬀerentials for one round (of any index) with no input diﬀerence in the initial state, any diﬀerence in the message block, and an output with diﬀerence in a single diagonal of one of the forms in Corollary 2, are impossible. Proof. We give a general proof for the central diagonal (v0 , v5 , v10 , v15 ); the proof directly generalizes to the other diagonals of the state. We distinguish two cases: 1. No diﬀerences are introduced in the column step: the result directly follows from Proposition 4 and Corollary 2. 2. Diﬀerences are introduced in the column step: recall that if Δb = 0 or Δc = 0, then one cannot obtain a collision for G (see Proposition 4); in particular, if there is a diﬀerence in one of the two middle rows of the state before the diagonal step, then the corresponding diagonal cannot be free of diﬀerence after. We reason ad absurdum: if a diﬀerence was introduced in the column step in the ﬁrst or in the fourth column, then there must be a diﬀerence in the corresponding b or c (for output diﬀerences with Δb = Δc = 0 are impossible after the column step, see Corollary 2). That is, one diagonal distinct from the central diagonal must have diﬀerences. We deduce that any state after one round with diﬀerence only in the central diagonal must be derived from a state with diﬀerences only in the second or in the third column. In particular, when applying G to the central diagonal, we have Δa = Δd = 0. From Proposition 2, we must thus have Δa = 0, Δc = 0, and Δd = 0. In particular, the output diﬀerences in Corollary 2 cannot be reached. We have shown that after one round of BLAKE, diﬀerences in the message block cannot lead to a state with only diﬀerences in the central diagonal, such that the diﬀerence is one of the diﬀerences in Corollary 2. The proof directly extends to any of the three other diagonals.

To illustrate Proposition 8, which is quite general and covers a large set of diﬀerentials, Fig. 1 presents two examples corresponding to the two cases in the proof. Note that our ﬁnding of IDs with zero diﬀerence in the initial and in the ﬁnal state is another way to prove Proposition 9.

Diﬀerential and Invertibility Properties of BLAKE

column step −−−−−−−−−→ prob.= 1

diagonal step −−−−−−−−−−→ prob.= 0

column step −−−−−−−−−→ prob.= 0

diagonal step ←−−−−−−−−−− prob.= 1

325

Fig. 1. Illustration of IDs after one round: when there is no diﬀerence introduced in the column step (top), and when there is one or more (bottom)

4.2

Extension to Two Rounds

We can directly extend the IDs identiﬁed above to two rounds, by prepending a probability-1 DC leading to a zero diﬀerence in the state after one round. For example, diﬀerences 800 · · · 00 in m0 and in v0 always lead to zero-diﬀerence state after the ﬁrst round. By Proposition 8, a state with diﬀerences only in v0 and v10 cannot be reached after one round when starting from zero-diﬀerence states. Therefore, diﬀerences 800 · · · 00 in m0 and v0 cannot lead to diﬀerences only in v0 and v10 after two rounds. This example is illustrated in Fig. 2.

2 rounds −−−−−−−→ prob.= 0

2 rounds −−−−−−−→ prob.= 0 Fig. 2. Examples of IDs for two rounds: given diﬀerence 800 · · · 00 in m0 and v0 (top), or in m2 , m6 , v1 , v3 (bottom)

4.3

Miss in the Middle

The technique called miss-in-the-middle [6] was ﬁrst applied to identify IDs in block ciphers (for instance, DEAL [7] and AES [8, 9]). Let Π = Π0 ◦ Π1 be a permutation. A miss-in-the-middle approach consists in ﬁnding a diﬀerential (α → β) of probability one for Π1 and a diﬀerential (γ → δ) of probability one for Π0−1 , such that β = δ. The diﬀerential (α → δ) thus has probability zero and so is an ID for Π. The technique can be generalized to truncated diﬀerentials,

326

J.-P. Aumasson et al.

that is, to diﬀerentials β and δ that only concern a subset of the state. Below we apply such a generalized miss-in-the-middle to the permutation of BLAKE. We expose separately the application to BLAKE-32 and to BLAKE-64. The strategy is similar for both: 1. Start with a probability-1 diﬀerential with diﬀerence in the state and in the message so that diﬀerence vanish until the second round. 2. Look for bits that are changed (or not) with probability one after a few more rounds, given this diﬀerence. 3. Do same as step 2 in the backwards direction, starting from the ﬁnal diﬀerence. Good choices of diﬀerences are those that maximize the delay before the input of the ﬁrst diﬀerence, more precisely, those such that the message word with the diﬀerence appears in the second position of a diagonal step forwards, and in the ﬁrst position of a column step backwards. The goal is to minimize diﬀusion so as to maximize the chance of probability-1 truncated diﬀerentials.

2.5 rounds −−−−−−−−→ prob.= 1

=

2.5 rounds ←−−−−−−−− prob.= 1

Fig. 3. Miss-in-the-middle for BLAKE-32, given the input diﬀerences 80000000 in m2 and v1 . The two diﬀerences in dark gray are incompatible, thus the impossibility. In the forward direction, 2.5 rounds are two rounds plus a column step; backwards, 2 inverse rounds plus an inverse diagonal step.

Application to BLAKE-32. We consider a diﬀerence 80000000 in the initial state in v1 , and in the message block word m2 ; we have that – Forwards, diﬀerences in v1 and m2 cancel each other at the beginning of the column step and no diﬀerence is introduced until the diagonal step of the second round in which m2 appears as mj in G5 ; after the column step of the third round (that is, after 2.5 rounds), we observe that bits1 35, 355, 439, and 443 are always changed in the state. – Backwards, we start from a state free of diﬀerence, and m2 introduces a diﬀerence at the end of the ﬁrst inverse round, as it appears as mi in the column step’s G2 ; after 2.5 inverse rounds, we observe that bits 35, 355, 439, and 433 are always unchanged. The probability-1 diﬀerentials reported above were ﬁrst discovered empirically, and could be veriﬁed analytically by tracking diﬀerences, distinguishing bits with probability-1 (non-) diﬀerence, and other bits. 1

Here, bit 35 is the fourth most signiﬁcant bit of the second state word v1 , bit 355 is the fourth most signiﬁcant bit of v11 , etc.

Diﬀerential and Invertibility Properties of BLAKE

327

We deduce from the observations above that diﬀerence 80000000 in v1 and m2 cannot lead to a state free of diﬀerence after ﬁve rounds. We thus identiﬁed a 5round ID for the permutation of BLAKE-32. Fig. 3 gives a graphical description of the ID. Application to BLAKE-64. For BLAKE-64, we follow a similar approach as for BLAKE-32, with MSB diﬀerence in m2 and v1 . We could detect contradictious probability-1 diﬀerentials over three instead of 2.5 rounds, both forwards and backwards. For example, we detected probability-1 inconsistencies for bits 450, 453, 457, 462, and 463 of the state. We obtain an ID for six rounds of the permutation of BLAKE-64. Remarks 1. The probability-1 truncated diﬀerentials used above were empirically discovered, but one can easily verify them analytically. For instance, for bit 35 forward (fourth bit of v1 ), we observe that the state is free of diﬀerence until the input of m2 in the second round in G5 , which sets a diﬀerence Δ = 80000000 in v1 , and other diﬀerences in v6 , v11 , v12 . At the next (third) round, when computing G1 the only diﬀerence occurs in the MSB of v1 , which gives diﬀerence Δˆ a = Δ, Δdˆ = Δ ≫ 16, Δˆ c with no diﬀerence in the ﬁrst 15 bits and a diﬀerence in the 16th, Δˆb with no diﬀerence in the ﬁrst three bits and a diﬀerence in the fourth; thus we have Δa with no diﬀerence in the ﬁrst three bits and a diﬀerence in the fourth, that is, the bit 35 of the state is always ﬂipped after 2 rounds plus a column step. Similar veriﬁcation can be realized for the backwards diﬀerentials. 2. The IDs presented in this section do not lead to IDs for the compression function. This is because a given diﬀerence in the output of the compression function can be caused by 2256 distinct diﬀerences in the ﬁnal value of the permutation (for BLAKE-32).

5

Invertibility of a Round

Let f r be the function {0, 1}512 ×{0, 1}512 → {0, 1}512, that for initial state v and message block m outputs the state after r rounds of the permutation of BLAKE32. Non-integer round indices (for example r = 1.5) mean the application of r rounds and the following column step. We write fvr = f r (v, ·) when considering r when the message block is ﬁxed. f r for a ﬁxed initial state and respectively fm r As noted above, fm is a permutation for any message block m and any r ≥ 0. In this section we use the diﬀerential properties of G to show that fv1 is also a permutation for any initial state v. Then we derive an eﬃcient algorithm for the inverse of fv1 and an algorithm with complexity 2128 to compute a preimage of fv1.5 for BLAKE-32 (a similar method applies to BLAKE-64 in 2256 ). This improves the round-reduced preimage attack presented in [5] (whose complexity was respectively 2192 and 2384 for BLAKE-32 and BLAKE-64).

328

5.1

J.-P. Aumasson et al.

A Round Is a Permutation on the Message Space

Proposition 9. For any ﬁxed state v, one round of BLAKE (for any index of the round) is a permutation on the message space. In particular, fv1 is a permutation. Proof. We show that if there is no diﬀerence in the state, any diﬀerence in the message block implies a diﬀerence in the state after one round of BLAKE. Suppose that there is a diﬀerence in at least one message word. We distinguish two cases: 1. No diﬀerences are introduced in the column step: there is thus no diﬀerence in the state after the column step. At least one of the message words used in the diagonal step has a diﬀerence; from Corollary 1, there will be diﬀerences in at least two words of the state after the diagonal step. 2. Diﬀerences are introduced in the column step: from Corollary 2, output differences of the form (0, 0, 0, 0), (Δ, 0, 0, 0), (0, 0, 0, Δ), or (Δ, 0, 0, Δ ) are impossible. Thus, after the ﬁrst column step, there will be a diﬀerence in at least one word of the two middle rows (that is, in v4 , . . . , v11 ). These words are exactly the words used as b and c in the calls to G in the diagonal step; from Proposition 4, we deduce that diﬀerences will exist in the state after the diagonal step, since Δb = Δc = 0 is a necessary condition to make diﬀerences vanish (see Proposition 4). We conclude that whenever a diﬀerence is set in the message, there is a diﬀerence in the state after one round.

The fact that a round is a permutation with respect to the message block indicates that no information of the message is lost through a round and thus can be considered a strength of the algorithm. The same property also holds for AES-128. Note that Proposition 9 says nothing about the injectivity of fvr for r = 1. 5.2

Inverting One Round and More

Without loss of generality, we assume the constants equal to zero, that is, ki = 0 for i = 0, . . . , 7 in the description of G. We use explicit input-output equations of G to derive our algorithms. Input–output equations for G. Consider the function Gs operating at round r on a column or diagonal of the state respectively. Let (a, b, c, d) be the initial state words and (a , b , c , d ) the corresponding output state words. For shorter ˆ = a+b+mi be the intermediate notation let i = σr (2s) and j = σr (2s+1). Let a value of a set at line 1 of the description of G. From line 2 we get a ˆ = (dˆ ≪ 16) ⊕ d, where dˆ is the intermediate value of d set at line 2. From line 7 we get dˆ = (d ≪ 8) ⊕ a and derive a = (((d ≪ 8) ⊕ a ) ≪ 16) ⊕ d − b − mi .

(1)

Diﬀerential and Invertibility Properties of BLAKE

329

Below we use the following equations that can be derived in a similar way: a = (((((((b ≪ 7) ⊕ c ) ≪ 12) ⊕ b) − c) ≪ 16) ⊕ d) − mi − b

(2)

= a − ((b ≪ 7) ⊕ c ) − mj − b − mi b = (((b ≪ 7) ⊕ c ) ≪ 12) ⊕ (c − d )

(3) (4)

c = c − d − ((d ≪ 8) ⊕ a ) = c − d − ((d ⊕ (a + b + mi )) ≫ 16)

(5) (6)

d = (((d ≪ 8) ⊕ a ) ≪ 16) ⊕ (a − ((b ≪ 7) ⊕ c ) − mj ) (7) a = (((((((b ≪ 7) ⊕ c ) ≪ 12) ⊕ b) − c) ≪ 16) ⊕ d) + ((b ≪ 7) ⊕ c ) + mj (8) b = ((((b ⊕ (c − d )) ≫ 12) ⊕ c ) ≫ 7) d = c − c − ((d ⊕ (a + b + mi )) ≫ 16)

(9) (10)

Observe that (1), (2) and (8) allow to determine mi and mj from (a, b, c, d) and (a , b , c , d ). Further, (4) and (5) imply Proposition 6. We now apply these equations to invert fv1 and to ﬁnd a preimage of fv1.5 (m) i for arbitrary m and v. Denote v i = v0i , . . . , v15 the internal state after i rounds. Again, non-integer round indices refer to intermediate states after a column step but before the corresponding diagonal step. The state v r is the output of fvr0 . Inverting fv1 . Given v 0 and v 1 , the message block m = (m0 , . . . , m15 ) with fv10 (m) = v 1 can be determined as follows: 1. 2. 3. 4.

Determine Determine Determine Determine

0.5 v40.5 , . . . , v70.5 using (4) and v80.5 , . . . , v11 using (5). m0 , . . . , m7 using (2), (8), and (10). 0.5 0.5 v00.5 , . . . , v30.5 , v12 , . . . , v15 using G0 , . . . , G3 . m8 , . . . , m15 using (2), (8), and (10).

This algorithm always succeeds, as it is deterministic. Although slightly more complex than the forward computation of fv1 , it can be executed eﬃciently. Preimage of fv1.5 (m). Given some v 0 , and v 1.5 in the codomain of fv1.5 0 (thus, 1.5 can be detera preimage of v 1.5 exists), a message block m with fv1.5 0 (m) = v mined as follows: 0.5 1. Guess m8 , m10 , m11 and v10 . 1 1 1 1 1 2. Determine v4 , . . . , v7 using (4) and v81 , . . . , v11 using (5), v12 , v13 using (7). 0.5 0.5 1 0.5 0.5 0.5 0.5 3. Determine v6 , v7 using (4), m4 (2), v1 (2), v14 (6), v1 (3), v11 (5), v12 (2). 1 0.5 4. Determine v20.5 (5), m5 (8), m6 (2), v15 (7), v15 (6), v50.5 (4), v01 (5), m9 (8), m14 (2). 1 5. Determine v30.5 (5), m7 (8), v00.5 (2), v80.5 (5), m0 (1), v21 (5), v14 (2), m15 (8). 0.5 0.5 1 0.5 6. Determine v4 (9), m1 (8), v9 (6), v3 (8), m13 (2), m2 (2), m3 (8), v13 (7), m12 (2). 1.5 7. If fv1.5 output m, otherwise make a new guess. 0 (m) = v

330

J.-P. Aumasson et al.

This algorithm yields a preimage of fv1.5 (m) for BLAKE-32 after 2128 guesses in the worst case. It directly applies to ﬁnd a preimage of the compression function of BLAKE reduced to 1.5 rounds and thus greatly improves the roundreduced preimage attack of [5] which has complexity 2192 . The method also applies to BLAKE-64, giving an algorithm of complexity 2256 , improving on [5]’s 2384 algorithm. There are other possibilities to guess words of m and the intermediate states. But exhaustive search showed that at least four words are necessary to determine the full message block m by explicit input-output equations.

6

Near Collisions

In this section, we exploit linearization of the G function, that is, approximation of addition by XOR. This enables us to ﬁnd near collisions for a variant of BLAKE-32 with four rounds. 6.1

Linearizing G

Observe that in G, the number of bits rotated are 16, 12, 8 and 7. Only 7 is not a multiple of 4. The idea of our attack is to use diﬀerences that are invariant by rotation of 4 bits (and thus by any rotation multiple of 4), as 88888888, and try to avoid diﬀerences pass through the rotation by 7. We model the compression function in GF(2), where a 1 denotes a diﬀerence in the register and 0 means no diﬀerence. We linearize the G function by replacing addition with XOR. Further we remove the rotations as the diﬀerences we choose are rotation invariant. 6.2

Diﬀerential Characteristic for Near Collisions

In our linearized model, we have 16 bits of message and 16 bits of chaining values, hence the search space is 232 , which can be explored exhaustively. We can further reduce the search space by the condition that no diﬀerence passes through rotation by 7 over four rounds of the compression function. As the model is linear, the whole compression function can be expressed by a bit vector consisting of message and chaining value multiplied by a matrix. We used the program MAGMA to eﬃciently reduce the search space to 24 for a 4-round reduced compression function. Linearizing a diﬀerence pattern 8888888 costs 27 for each addition. We aim to ﬁnd those conﬁgurations which linearize the addition operation as little as possible. Note that by choosing proper chaining values and messages, we can get the ﬁrst 1.5 rounds “for free”. We did the search, and the conﬁguration with diﬀerences in m11 and v0 , v1 , v5 , v6 , v7 , v7 , v10 , v11 , v13 , v15 and starting point at round 6 gives count 6 only. This gives complexity 242 , with no memory requirements. This conﬁguration gives after feedforward ﬁnal diﬀerences in h1 , h2 , h5 , and h6 (assuming no diﬀerence in salts).

Diﬀerential and Invertibility Properties of BLAKE

331

We thus obtain a near collision on (256 − 40) = 216 bits (note h5 contains 16 bits of diﬀerence). Figure 4 shows how diﬀerences propagate from round 6 to 9. We expect similar methods to apply to any sequence of four rounds, though with diﬀerent complexities.

Fig. 4. Tracing the diﬀerences for near collisions on rounds 6 to 9

6.3

On the Extension to More Rounds

Consider the linearized model of G, in which we approximate addition by xor, and use the special diﬀerence 88888888 (so that diﬀerences do not propagate to the ﬁnal b). Consider a linearized round, as in §6.1. Since there are 16 chaining variables and 16 message words, hence we have 216+16 diﬀerent conﬁgurations. When we restrict “no diﬀerence in output b of G”, the number of good conﬁgurations is reduced by a factor 2 when passing each G. Each round function has eight G’s. Hence each round reduces the “good conﬁgurations” by a factor 28 . Thus, N rounds reduce the number of good conﬁgurations to 232 /28N ≥ 1. Hence four seems to be the maximum possible number of rounds for which our method applies, which was veriﬁed by our program. This is also why we need to seek non-linear connectors to give collisions for more rounds. The 4-round near collision applies to almost all 4-round (i.e. start with any round), however they give diﬀerent complexity due to diﬀerent counts for number of linearization, the round 6-9 gives lowest count 6.

7

Conclusion

We studied diﬀerential properties of the SHA-3 candidate BLAKE, and our main ﬁndings are – Diﬀerential properties of BLAKE’s permutation and of its core function G. – Inversion algorithms for one and 1.5 rounds of BLAKE’s round function for a ﬁxed initial value. – Impossible diﬀerentials for ﬁve (resp. six) rounds of BLAKE-32’s (resp. BLAKE-64’s) permutation. – Near-collisions on four rounds of the compression function of BLAKE-32. None of our observations seems to be a threat to the security of BLAKE.

332

J.-P. Aumasson et al.

Future work may address properties related to additive diﬀerences, instead of XOR diﬀerences. Our results may also assist cryptanalysis of the stream ciphers Salsa20 and ChaCha, on which BLAKE is based.

Acknowledgments Jian Guo is supported by the Singapore Ministry of Education under Research Grant T206B2204.

References 1. Aumasson, J.P., Guo, J., Knellwolf, S., Meier, W., Matusiewicz, K.: Diﬀerential and invertibility properties of BLAKE (full version). Cryptology ePrint Archive, Report 2010/043 (2010) 2. Aumasson, J.P., Henzen, L., Meier, W., Phan, R.C.W.: SHA-3 proposal BLAKE. Submission to the SHA-3 Competition (2008) 3. Biham, E., Dunkelman, O.: A framework for iterative hash functions - HAIFA. Cryptology ePrint Archive, Report 2007/278 (2007) 4. Bernstein, D.J.: ChaCha, a variant of Salsa20, http://cr.yp.to/chacha.html 5. Ji, L., Liangyu, X.: Attacks on round-reduced BLAKE. Cryptology ePrint Archive, Report 2009/238 (2009) 6. Biham, E., Biryukov, A., Shamir, A.: Miss in the middle attacks on IDEA and Khufu. In: Knudsen, L.R. (ed.) FSE 1999. LNCS, vol. 1636, pp. 124–138. Springer, Heidelberg (1999) 7. Knudsen, L.R.: DEAL - a 128-bit block cipher. Technical Report 151, University of Bergen (1998); Submitted as an AES candidate 8. Jakimoski, G., Desmedt, Y.: Related-key diﬀerential cryptanalysis of 192-bit key AES variants. In: Matsui, M., Zuccherato, R.J. (eds.) SAC 2003. LNCS, vol. 3006, pp. 208–221. Springer, Heidelberg (2004) 9. Biham, E., Dunkelman, O., Keller, N.: Related-key impossible diﬀerential attacks on 8-round aes-192. In: Pointcheval, D. (ed.) CT-RSA 2006. LNCS, vol. 3860, pp. 21–33. Springer, Heidelberg (2006)

Rotational Cryptanalysis of ARX Dmitry Khovratovich and Ivica Nikoli´c University of Luxembourg [email protected], [email protected]

Abstract. In this paper we analyze the security of systems based on modular additions, rotations, and XORs (ARX systems). We provide both theoretical support for their security and practical cryptanalysis of real ARX primitives. We use a technique called rotational cryptanalysis, that is universal for the ARX systems and is quite eﬃcient. We illustrate the method with the best known attack on reduced versions of the block cipher Threeﬁsh (the core of Skein). Additionally, we prove that ARX with constants are functionally complete, i.e. any function can be realized with these operations. Keywords: ARX, cryptanalysis, rotational cryptanalysis.

1

Introduction

A huge number of symmetric primitives using modular additions, bitwise XORs, and intraword rotations have appeared in the last 20 years. The most famous are the hash functions from MD-family (MD4, MD5) and their descendants SHA-x. While modular addition is often approximated with XOR, for random inputs these operations are quite diﬀerent. Addition provides diﬀusion and nonlinearity, while XOR does not. Although the diﬀusion is relatively slow, it is compensated by a low price of addition in both software and hardware, so primitives with relatively high number of additions (tens per byte) are still fast. The intraword rotation removes disbalance between left and right bits (introduced by the addition) and speeds up the diﬀusion. Many recently design primitives use only XOR, addition, and rotation so they are grouped into a single family ARX (Addition-Rotation-XOR). Among them are SHA-3 competitors Skein [14], BLAKE [3], CubeHash [5], and the stream ciphers Salsa20 [4]. It is a common belief that the mixture of these operations gives a good primitive, if the number of rounds is suﬃcient. However, to the best of our knowledge, there is no formal theory whether all three operations are necessary and suﬃcient for this task. We investigate this problem from diﬀerent points of view. Certainly, the most interesting question is how secure the ARX(-C) systems are. So far, the most of the analysis of ARX systems was made in the framework of diﬀerential cryptanalysis [23,9,8], with a few exceptions where symmetric states were considered [1]. We investigate the security of ARX systems with a technique that we S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 333–346, 2010. c International Association for Cryptologic Research 2010

334

D. Khovratovich and I. Nikoli´c

call rotational cryptanalysis, where we study the propagation of a rotational pair (X, X ≫r ) throughout the primitive1 . Operations XOR and rotation both preserve the rotational pair with probability 1, while the modular addition does it with probability up to 38 , depending only on the rotation amount r. Therefore, a rotational pair of inputs is converted to a rotational pair of outputs with a probability depending only on the number of additions in the scheme. Hence, we get a universal upper bound on the security of the ARX primitives. The use of constants, which may not form a rotational pair, does not restrain our analysis, but makes it more sophisticated. We show how reduced versions of Threeﬁsh (the block cipher used in the hash function Skein) can be analyzed with rotational cryptanalysis, and our results basically are the best known cryptanalysis of this design. We also prove that the ARX operations with a constant are functionally complete in the set of functions over Z2n . In other words, any function can be realized with modular addition, XOR, rotation, and a single constant (ARX-C ). We also show that the AR systems, that do not use XOR, are theoretically equivalent to ARX systems. However, we prove that they are less secure with the same number of operations, because of the linear mod 2n − 1 approximation. It is also easy to prove that omitting addition or rotation is devastating, and such systems (XR and AX) can always be broken. This paper is structured as follows. We survey related works in Section 2. We describe rotational cryptanalysis in Section 3 and then apply it to Threeﬁsh (Section 4). Then we prove the completeness of AR and ARX operations in Section 5. We conclude our paper with generic cryptanalysis of the AR systems (Section 6).

2

Related Work

It is hard to survey all the research done on ARX systems, so we point out only the most important. Relation between modular addition and XOR was studied in the PhD thesis of Daum [12]. Dedicated approaches were applied in many works, among them on MD5, SHA-1 [23,9]. The impact of rotations was independently studied in the cryptanalysis of block ciphers. In the pioneering work on related keys by Biham [6] a rotational pair of keys was considered. This approach was extended by Kelsey et al. in several related-key attacks on block ciphers [15]. In these attacks the adversary tries to ﬁnd pairs of plaintexts of form (P, F (P )), where F is the round transformation, so this is not a pure rotational cryptanalysis. An AR system RC5P was attacked with mod n cryptanalysis in [16], and the property (1) was introduced in the same paper. The internal states were, however, computed modulo 3, and the computation modulo 2n − 1 was only brieﬂy pointed out. 1

The technique of rotational cryptanalysis has been known and applied before, see Section 2.

Rotational Cryptanalysis of ARX

335

A common approach to cryptanalysis of ARX systems is linearization ([10,9], see also the systematic treatment in [8]), when modular additions are approximated by XOR, and the resulting function is linear. In the diﬀerential cryptanalysis there is no need to approximate the whole addition by XOR, it is suﬃcient to assume that diﬀerence propagates in a linear way. The linearization approach works worse if the diﬀusion is good, so that even in a linear model diﬀerentials involve too many active bits. As a result, the probability of the approximation becomes very low even for a single modular addition. A systematic treatment of AX-systems can be found in the work by Paul and Preneel [20]. It was demonstrated that all such systems can be broken with a low complexity, since one can work with bits from rightmost to leftmost. Related-key attacks were introduced by Biham [6] and were used in the practical break of WEP [22]. The key relations used in the attacks vary from ﬁxed diﬀerences [17] to non-trivial subkey relations [7]. A rotational pair of inputs (though it was not named so) was used in the attack on the compression function of Shabal [18]. However, it was traced only through bitwise operations, and not through additions. Bernstein [4] explicitly prevented from use of rotational pairs in Salsa20 by ﬁxing non-symmetric constants in the input of the permutation. However, he did not provide any complexity or probability estimates for this kind of attack. The designers of the block cipher SEA [21] described the technique of rotational cryptanalysis in 2006 and defended against it with non-linear key-schedule and pseudo-random constants. A modiﬁed version of block cipher Serpent, with key schedule constants removed, is vulnerable to rotation cryptanalysis due to the bit-slice nature of the S-boxes [13]. Also, the rotational pairs were tested for Threeﬁsh [19], but this did not result into a full attack.

3

Review of Rotational Cryptanalysis

In this section we describe a generic method for the analysis of ARX systems. The main idea is to consider pair of words where one is the rotation of the other one. We denote the intraword rotation operations by ≪r and ≫r . A rotated ← − → − → − variable is then denoted by X and X , respectively. Now, let X be the rotation → − of X by r bits to the right. We call (X, X ) a rotational pair [with a rotation amount r]. It is easy to prove that a rotational pair is preserved by any bitwise transformation, particularly by the bitwise XOR and by any rotation: −−−−→ − → X ⊕Y =→ x ⊕− y,

−−−→ − → x ≫r = x≫r .

Now consider addition modulo 2n . The probability that the rotational pair comes out of the addition is given by the following lemma.

336

D. Khovratovich and I. Nikoli´c

−−−→ → − Lemma 1 (Daum, [12]). P(x + y = − x +→ y ) = 14 (1 + 2r−n + 2−r + 2−n ). For large n and small r we get the following table: r

pr

log2 (pr )

1 0.375 −1.415 2 0.313 −1.676 3 0.281 −1.831 For r = n/2 the probability is close to 1/4. The same holds for rotations to the left. Now consider an arbitrary scheme S with additions, rotations, and XORs over n-bit words. Then the following theorem holds under independency assumptions. Theorem 1. Let q be the number of addition operations in an ARX scheme S. −−→ → − → − Let I be the input I of scheme S rotated to the right by r bits. Then S( I ) = S(I) with probability (pr )q . Proof. It can be proved by induction on the scheme size.

We would like to stress that in order the rotational analysis to work, all inputs to the ARX scheme should compose rotational pairs. −−→ → − For a random function P that maps to Z2t the probability that P( I ) = P(I) for random I is 2−t . Therefore, we can detect nonrandomness if a function can be implemented with q additions, and (pr )q > 2−t . For example, when r = 1 we get that any ARX scheme that can be implemented with less than t/1.415 additions, is vulnerable to rotational cryptanalysis. 3.1

Dealing with Constants

In contrast to random schemes, iterative schemes with identical rounds suﬀer from slide attacks and their modiﬁcations. The use of diﬀerent round constants is typical countermeasure. As a result, many designs explicitly use constants, and we have to adapt our method to work with them. → − Let us introduce the notion a rotation error E . In the further text r is a ﬁxed rotation amount. We deﬁne → − E(X, Y ) = X ⊕ Y. → − Clearly, E(X, X ) = 0. An addition of a constant may generate a rotation error (the exact probability depends on r, the constant value, and which type of addition is used — modular or XOR). On the other hand, a modular addition of variables also may generate an error, and with some probability these errors compensate each other: → − − → − → E(X + Y + Z + C, X + Y + Z + C) = 0.

Rotational Cryptanalysis of ARX

337

The probability is higher if the constant has low Hamming weight and its ones are concentrated close to the positions where addition errors appear. It may also → − happen that a constant is added by XOR and is invariant of the rotation: C = C . Then the rotation property passes the addition of a constant for free. The subkey indices in Threeﬁsh are examples of low-weight constants. However, they are not compensated by a single addition, only by two previous additions, which leads to an error in the adjacent key addition. We have to introduce additional corrections in the key to cancel the impact of constants (see Section 4 for more details).

4

Rotational Cryptanalysis of Threefish

In this section we attack the block cipher Threeﬁsh with rotational cryptanalysis. We demonstrate that a rotational pair of Threeﬁsh ciphertexts can be obtained faster than for a random permutation, which provides both a distinguisher and a key recovery attack. 4.1

Specification of Threefish

Threeﬁsh is a family of block ciphers underlying the compression function of Skein[14]. Threeﬁsh supports three diﬀerent versions: 1) Threeﬁsh-256 — 256bit block cipher with 256-bit key, 2) Threeﬁsh-512 — 512-bit block and key, and 3) Threeﬁsh-1024 — 1024-bit block and key. Both the internal state I and the key K consist of Nw (Nw = 4, 8, 16 for Threeﬁsh-256,-512,-1024, respectively) 64-bit words. The Nw words of the s-th subkey K s are deﬁned as follows: Kjs = K(s+j) mod (Nw +1) , s KN w −3 s KNw −2 s KN w −1

0 ≤ j ≤ Nw − 4;

= K(s+Nw −3) mod (Nw +1) + ts mod 3 ; = K(s+Nw −2) mod (Nw +1) + t(s+1) mod 3 ; = K(s+Nw −1) mod (Nw +1) + s,

where s is a round counter, t0 and t1 are tweak words, and t2 = t0 + t1 ,

KNw = 2 /3 ⊕ 64

N w −1

Kj .

j=0

Further in our analysis we ﬁx the tweaks t0 = t1 = 0. The formal description of internal rounds is as follows. Let Nr be the number of rounds. Then for every 1 ≤ d ≤ Nr d/4

– If d mod 4 = 1 add a subkey by setting Ij ← Ij + Kj ; – For 0 ≤ j < Nw /2 set (I2j , I2j+1 ) ← MIX((I2j , I2j+1 )); – Apply the permutation π on the state words.

338

D. Khovratovich and I. Nikoli´c Table 1. Summary of the attacks on Threeﬁsh Rounds

Type

Reference

Threeﬁsh-256 (72 rounds) 24

Related-key diﬀerential

[14]

39

Related-key rotational

Sec. 4

Threeﬁsh-512 (72 rounds) 25

Related-key diﬀerential

[14]

32

Related-key boomerang

[2]

33

Related-key boomerang

[11]

42

Related-key rotational

Sec. 4

35

Known-related-key distinguisher

[2]

Threeﬁsh-1024 (80 rounds) 26

Related-key diﬀerential

[14]

43.5

Related-key rotational

Sec. 4

In the end a subkey K Nr /4 is added. The operation MIX has two inputs x0 , x1 and produces two outputs y0 , y1 with the following ARX transformation: y0 = x0 + x1 y1 = (x1 ≪R(d mod 8)+1,j ) ⊕ y0 The exact values of the rotation constants Ri,j as well the permutations π (which are diﬀerent for each version of Threeﬁsh) can be found in [14]. The best known analysis [2] of Threeﬁsh-512 is 33-round attack in the relatedkey model, and 35-round attack in the known-related-key model (Tbl. 1). The designers of Threeﬁsh have changed the MIX rotation constants in the latest tweak to get better diﬀusion. We note that our attack is independent of these constants. 4.2

Attacks on Simplified Versions of Threefish

There are two places in the key schedule of Threeﬁsh where we encounter constants: 1) KNw is obtained with a XOR of all key words and the constant s C5 = 264 /3, and 2) the last subkey word KN has a modular addition w −1 of the round counter s. Hence, in addition to the original Threeﬁsh, we can obtain three simpliﬁed versions by discarding these constant XOR and counter additions. Our attacks are in the related-key scenario, where all the key and plaintext words compose rotational pairs, i.e. if the ﬁrst key and the plaintext have the values (k0 , . . . , kNw ), (p0 , . . . , pNw −1 ) then the second (related) key and → − −−→ → −−→ the plaintext have the values (k0 , . . . , kNw ), (− p0 , . . . , − p− Nw −1 ).

Rotational Cryptanalysis of ARX

339

The simplest version of Threeﬁsh is without the XOR of C5 and the additions of the round counters. We can ﬁx the rotation amount in the rotational pair to 1 in order to get the best probability — 2−1.415 per addition. A simple MIX has only one addition, hence a round of Threeﬁsh-256 has only two additions. The 59round version of Threeﬁsh-256 has 2·59 = 118 additions in the MIX of the rounds and 4·15 = 60 additions of the subkey words, so the probability that a rotational pair of key/plaintext (with a rotation equal to 1) will produce a rotational pair of ciphertexts is 2−1.415·(118+60) = 2−252 , which is higher than for a random permutation. Every right pair also provides information on leftmost key bits of each key word, so we get a valid key recovery attack with a complexity of about 2252 encryptions. The same reasoning is applicable for 59-round distinguishers for Threeﬁsh-512 which has a complexity of 2504 and to Threeﬁsh-1024 and 21008 , because these ciphers diﬀer from Threeﬁsh-256 only in the size of the state and the key. When the XOR of C5 is present, then the only diﬀerence is that we cannot use the rotation amount 1 because C5 ≪ 1 = C5 , i.e. the constant C5 is not invariant of rotation 1. Instead we can use rotation 2, and get attacks on 50 rounds. The complexity of the attack on Threeﬁsh-256 is 21.67·(2·50+4·13) = 2253.8 . For Threeﬁsh-512 and Threeﬁsh-1024 they are 2507.6 and 21015.2 . For the version of Threeﬁsh without the constant C5 and the round counters, we get much better results if we consider a weak key class, for which it is unlikely to get errors during the modular addition. Let the three leftmost bits of each key word be zero, and consider rotation to the left by one bit. Then the probability ←−−−− ← − ← − that X + K = X + K is equal to 2−0.28 for a random X, and so the total probability for the full 72-round Threeﬁsh-256, the version without C5 and round counters, is 2−1.415·2·72−0.28·4·18 = 2−224 . The size of the weak key class that we attack is 261·4 = 2244 , so we get a valid attack on a very large key class. Analogously, we can attack a weak key class with 2488 keys of Threeﬁsh-512 with complexity 2448 , and Threeﬁsh-1024 with a complexity 2950 (the complexity is slightly higher because Threeﬁsh-1024 has 80 rounds). 4.3

Attacks on the Original Threefish

Let us try to apply rotational analysis to the original version of Threeﬁsh. This means we have to deal with the round counters – low weight constants. In order to bypass them we introduce corrections in the key pair. Let K be the ﬁrst secret key. Then the second key K is deﬁned as follows: ← − Ki = Ki ⊕ ei The use of rotational pairs with errors is illustrated in Fig. 1. We have found experimentally, that the values of the corrections ei should not be larger than 16 (otherwise they do not cancel the round counters). For Threeﬁsh-256 and Threeﬁsh-512 it is feasible to ﬁnd by brute force the exact

340

D. Khovratovich and I. Nikoli´c

values for the corrections that cancel the counters with maximal probability. For Threeﬁsh-1024 we took the values that were good in Threeﬁsh-512. The corrections forbid to obtain clear formula for the probability of addition of a rotational pair. Hence, we have found these probabilities empirically. We have grouped two rounds with a subkey addition (round – subkey addition – round), and by Monte Carlo method found the probability that a rotational pair of states at the input of these two rounds and a rotational pair of subkeys with corrections will produce a rotational pair of states at the output. Based on these values, we have produced the probabilities of the best round-reduced rotational pairs. The explicit round-by-round values of the probabilities are given in Tbl. 4 in the Appendix. The results are given for the original versions as well as for the versions without the C5 (except in the case for Threeﬁsh-1024 where the probability of the version without C5 is lower than for the original Threeﬁsh1024). We can break 39, 42, and 43.52 rounds of the original versions of Threeﬁsh256,-512, and -1024 with complexity of 2252.4 , 2507 , 21014.5 encryptions respectively. The attacks procedures follow the same algorithm: 1. Generate a random plaintext P and encrypt it on K; 2. Compute P and encrypt on K ; 3. Check whether (EK (P ), EK (P )) is a rotational pair. A rotational pair discloses information about leftmost key bits of every key word. The plaintext P is computed by the following rule: ← − Pi = Pi ⊕ di . The plaintext and the key corrections are deﬁned separately for all the three versions of Threeﬁsh in Tbl. 2. The correction values for the version without C5 are given in Appendix. Table 2. Corrections in the plaintext pairs (di ) and the key pairs (ei ) in Threeﬁsh 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Threeﬁsh-256 di 3 10 3 15 ei 6 10 6 15 Threeﬁsh-512 di 0 6 3 6 3 6 3 6 ei 5 6 6 6 6 6 6 6 Threeﬁsh-1024 di 0 6 3 6 3 6 3 6 3 6 3 ei 5 6 6 6 6 6 6 6 6 6 6 2

.5 means without the last subkey addition.

6 6

3 6

6 6

3 6

6 6

Rotational Cryptanalysis of ARX

341

>>>

Round d Key

Key

Key addition Round d + 1

Round index

>>>

Fig. 1. Rotational errors in the key addition layer of Threeﬁsh. Dashed lines contain rotational pairs with errors.

For the versions with counters but without C5 , again we only change the rotation amount to 1, and obtain attacks on 44 and 51.5 rounds of Threeﬁsh256, -512 with a complexity of 2252 and 2506.7 encryptions.

5

Completeness of ARX

In this section we investigate which primitives can be obtained in the ARX framework. We show that any function can be implemented with the ARX operations and a constant 1. Therefore, there is no generic property that holds for the ARX-C systems with probability 1, and any cryptanalysis method fails for a reasonably large system. However, rotational cryptanalysis (Section 3) shows that such a system should be much larger than expected. Let us now analyze the ARX operations and start with a system of notations. In the further text + always stands for modular addition. Let an integer n be the word length, and denote an n − bit word by W . Denote also by F the set of all functions to W : F = {f : W m → W | m ∈ N}. We say that a set Q is a basis in F if any function from F can be realized by a scheme with elements from Q. Theorem 2. The set of functions {+, ⊕, ≫1 , 1} is a basis in F . Proof. We show how to realize an arbitrary function f ∈ F. 1. Realize si (X) = 00 · · · 0xi , where X = x1 x2 · · · xn . First we rotate X by n − i − 1 bits to the right using n − i − 1 simple rotations. Then we add it to itself n − 1 times thus getting xi 00 · · · 0. Finally, we rotate the result to 00 · · · 0xi .

342

D. Khovratovich and I. Nikoli´c

Table 3. Attack complexities for diﬀerent versions of Threeﬁsh; the weak key class on full-round Threeﬁsh Cipher Threeﬁsh-256

(weak key of 2244 ) Threeﬁsh-512

(weak key of 2488 ) Threeﬁsh-1024 (weak key of 2976 )

Round index no no yes yes no no no yes yes no no no yes no

Constant 264 /3 no yes no yes no no yes no yes no no yes yes no

Rounds

Complexity

59 50 44 39 72 59 50 51.5 42 72 59 50 43.5 80

2252 2253.8 2251.4 2254.1 2224 2504 2507.6 2505.5 2507 2448 21008 21015.2 21014.5 2950

2. Realize all functions Mk (X, Y ) = 00 · · · 0(xk yk ). 00 · · · 01, if X = C; 3. Realize all functions jC (X) = as follows. Let C = 0, else. (c1 , c2 , . . . , cnm ). Then jC (X) = (xi ⊕ ci ⊕ 1), which can be computed with functions {Mk (X, Y )} and the constant 1. C2 , if X = C1 ; 4. Realize all functions JC1 ,C2 (X) = as C2 · jC1 (X). 0, else. 5. Realize f as X∈W m JX,f (X) . Theorem 3. The set of functions {+, ≫1 , 1} is a basis in F . Proof. We only have to prove that ⊕ can be realized with AR operations. Indeed, we realize si (X) = xi 00 · · · 0 in the same way as in Theorem 2. Then we add si (X) to si (Y ) and get xi ⊕ yi in the leftmost bit. Then we rotate this to the position of xi and get the function Si (X, Y ) = 00 · · · 0 (xi ⊕ yi ) 00 · · · 0 . Finally i−1

n−i

we note that ⊕(X, Y ) ≡ S1 (X, Y ) + S2 (X, Y ) + · · · + Sn (X, Y ). This concludes the proof.

6

Cryptanalysis of Generic AR Systems

In this section we consider the AR schemes, which involve only modular additions and rotations (constants are admitted). Although they are theoretically equivalent to the ARX systems, they are more vulnerable with the same number of operations. We show that the resulting function can be approximated with

Rotational Cryptanalysis of ARX

343

a simple formula, and this approximation becomes invalid only after a large number of additions. The ﬁrst important instrument is the following observation made in [16]: X≪r ≡ 2r · X

(mod 2n − 1).

(1)

Let us note that addition modulo 2n is equivalent to the addition modulo 2n −1 if the sum is smaller than 2n −1 (which happens with probability 1/2). The main idea is to replace in a scheme S all the modular additions with additions modulo 2n − 1. With probability 2−q , where q is the number of additions in the scheme, a new scheme S produces the same output as S. Finally, we replace rotations with multiplications using (1). As a result, we get a scheme with only additions and multiplications modulo 2n − 1. Since we multiply variables only by constants, the resulting function is linear in its inputs. Corollary 1. An AR scheme can be approximated by a linear function with probability 2−q where q is the number of additions in the scheme. 6.1

Applications

Any AR scheme claiming n bits of security and having less than n additions is potentially vulnerable to our attack. Consider, for example, an n-bit AR hash function H with q < n additions. Given H(M ) we substitute it with a linear formula and solve a linear equation. If there is enough input freedom, we ﬁnd a second preimage with complexity 2q . An n-bit AR block cipher is also vulnerable to this kind of attack. Let us replace all XORs in Threeﬁsh with additions (the constants remain). Then there are Nw additions per round and per subkey addition, so we can break a 50-round version, which would have 504 additions.

7

Conclusion

We have investigated security of ARX systems from both theoretical and practical points of view. We described a technique — rotational cryptanalysis — that is very eﬃcient for ARX systems. The complexity of the rotational attack depends only on the number of modular additions, and does not depend on the number of XORs and rotations, nor on the rotation amounts. The rotational attacks on Threeﬁsh-256, -512, -1024 (39/42/43.5 rounds out of 72/72/80 rounds) are the best attacks on this primitive. On the other hand, we proved that actually any function can be implemented with ARX operations and constants. As a result, there is no attack that works for any ARX system, though our rotational cryptanalysis demonstrates that secure systems must be large enough. Roughly, a primitive claiming n-bit security must have at least 0.7n addition operations in any implementation. Use of constants, however, makes rotational cryptanalysis more complicated. We also showed that though AR systems (not using XOR) are theoretically equivalent to ARX, they are vulnerable to a linear approximation attack regardless of constants used in the primitive.

344

D. Khovratovich and I. Nikoli´c

Acknowledgements The authors thank Christian Rechberger and Ralf-Philipp Weinmann for fruitful discussions and anonymous reviewers for their helpful comments. Dmitry Khovratovich is supported by PRP ”Security & Trust” grant of the University of Luxembourg. Ivica Nikoli´c is supported by the BFR grant 07/031 of the FNR Luxembourg.

References 1. Aumasson, J.-P., Brier, E., Meier, W., Naya-Plasencia, M., Peyrin, T.: Inside the hypercube. In: Boyd, C., Gonz´ alez Nieto, J. (eds.) ACISP 2009. LNCS, vol. 5594, pp. 202–213. Springer, Heidelberg (2009) ¨ 2. Aumasson, J.-P., C ¸ alik, C ¸ ., Meier, W., Ozen, O., Phan, R.C.-W., Varici, K.: Improved cryptanalysis of Skein. In: Matsui, M. (ed.) ASIACRYPT 2009. LNCS, vol. 5912, pp. 542–559. Springer, Heidelberg (2009) 3. Aumasson, J.-P., Henzen, L., Meier, W., Phan, R.C.-W.: SHA-3 proposal BLAKE. Submission to NIST (2008) 4. Bernstein, D.J.: Salsa20. Technical Report 2005/025. eSTREAM, ECRYPT Stream Cipher Project (2005), http://cr.yp.to/snuffle.html 5. Bernstein, D.J.: CubeHash speciﬁcation (2.b.1). Submission to NIST (2008) 6. Biham, E.: New types of cryptanalytic attacks using related keys. J. Cryptology 7(4), 229–246 (1994) 7. Biryukov, A., Khovratovich, D.: Related-key cryptanalysis of the full AES-192 and AES-256. In: Matsui, M. (ed.) ASIACRYPT 2009. LNCS, vol. 5912, pp. 1–18. Springer, Heidelberg (2009) 8. Brier, E., Khazaei, S., Meier, W., Peyrin, T.: Linearization framework for collision attacks: Application to CubeHash and MD6. In: Matsui, M. (ed.) ASIACRYPT 2009. LNCS, vol. 5912, pp. 560–577. Springer, Heidelberg (2009) 9. Canni`ere, C.D., Rechberger, C.: Finding SHA-1 characteristics: General results and applications. In: Lai, X., Chen, K. (eds.) ASIACRYPT 2006. LNCS, vol. 4284, pp. 1–20. Springer, Heidelberg (2006) 10. Chabaud, F., Joux, A.: Diﬀerential collisions in SHA-0. In: Krawczyk, H. (ed.) CRYPTO 1998. LNCS, vol. 1462, pp. 56–71. Springer, Heidelberg (1998) 11. Chen, J., Jia, K.: Improved related-key boomerang attacks on round-reduced Threeﬁsh-512. Cryptology ePrint Archive, Report 2009/526 (2009) 12. Daum, M.: Cryptanalysis of Hash Functions of the MD4-Family. PhD thesis, RuhrUniversit¨ at Bochum (May 2005) 13. Dunkelman, O., Indesteege, S., Keller, N.: A diﬀerential-linear attack on 12-round Serpent. In: Chowdhury, D.R., Rijmen, V., Das, A. (eds.) INDOCRYPT 2008. LNCS, vol. 5365, pp. 308–321. Springer, Heidelberg (2008) 14. Ferguson, N., Lucks, S., Schneier, B., Whiting, D., Bellare, M., Kohno, T., Callas, J., Walker, J.: The Skein hash function family. Submitted to SHA-3 Competition (2008) 15. Kelsey, J., Schneier, B., Wagner, D.: Related-key cryptanalysis of 3-WAY, BihamDES, CAST, DES-X, NewDES, RC2, and TEA. In: Han, Y., Quing, S. (eds.) ICICS 1997. LNCS, vol. 1334, pp. 233–246. Springer, Heidelberg (1997) 16. Kelsey, J., Schneier, B., Wagner, D.: Mod n cryptanalysis, with applications against RC5P and M6. In: Knudsen, L.R. (ed.) FSE 1999. LNCS, vol. 1636, pp. 139–155. Springer, Heidelberg (1999)

Rotational Cryptanalysis of ARX

345

17. Kim, J., Hong, S., Preneel, B.: Related-key rectangle attacks on reduced AES-192 and AES-256. In: Biryukov, A. (ed.) FSE 2007. LNCS, vol. 4593, pp. 225–241. Springer, Heidelberg (2007) 18. Knudsen, L.R., Matusiewicz, K., Thomsen, S.S.: Observations on the Shabal keyed permutation (2009) 19. Leander, G.: Private communication (2009) 20. Paul, S., Preneel, B.: Solving systems of diﬀerential equations of addition. In: Boyd, C., Gonz´ alez Nieto, J.M. (eds.) ACISP 2005. LNCS, vol. 3574, pp. 75–88. Springer, Heidelberg (2005) 21. Standaert, F.-X., Piret, G., Gershenfeld, N., Quisquater, J.-J.: SEA: A scalable encryption algorithm for small embedded applications. In: Domingo-Ferrer, J., Posegga, J., Schreckling, D. (eds.) CARDIS 2006. LNCS, vol. 3928, pp. 222–236. Springer, Heidelberg (2006) 22. Tews, E., Weinmann, R.-P., Pyshkin, A.: Breaking 104 bit WEP in less than 60 seconds. In: Kim, S., Yung, M., Lee, H.-W. (eds.) WISA 2007. LNCS, vol. 4867, pp. 188–202. Springer, Heidelberg (2008) 23. Wang, X., Yu, H.: How to break MD5 and other hash functions. In: Cramer, R. (ed.) EUROCRYPT 2005. LNCS, vol. 3494, pp. 19–35. Springer, Heidelberg (2005)

A

On the Probabilities of Rotational Characteristics for Threefish

One part of the probabilities of the trails presented at Tbl. 4 is computed theoretically, and one part practically. When the rotational pair does not have corrections, then we use the probability of addition deﬁned by Lemma 1. In the original versions (with C5 ) the rotation amount is 2, so the probability of addition is 2−1.676 . When C5 is absent then we use a rotation amount 1, and the probability of addition becomes 2−1.415 . Two consecutive rounds of Threeﬁsh-256 have 4 MIX and each MIX has one addition. Hence, two rounds with no subkey additions have a probability of 2−6.6 . Analogously, for Threeﬁsh-512 and Threeﬁsh-1024 we get 2−13.3 and 2−26.6 , respectively. These numbers translate into 2−5.7 , 2−11.3 , 2−22.6 for the versions without C5 . When there are corrections in the rotational pair, we ﬁnd the probabilities of two rounds (one round + subkey addition + one round) experimentally. The probabilities for the round 1 (key addition + one regular round) are also computed experimentally. We have used to following corrections: – for Threeﬁsh-256 without C5 in the key: 7, 2, 2, 6; in the plaintext: 2,2,7,6; – for Threeﬁsh-512 without C5 in the key: 2, 1, 3, 1, 7, 1, 7, 3; in the plaintext: 7,1,6,1,2,1,2,3

346

D. Khovratovich and I. Nikoli´c

Table 4. Probabilities for the rotational pairs of diﬀerent versions of Threeﬁsh Rounds

Threeﬁsh-256 Threeﬁsh-512 Threeﬁsh-1024 original without C5 original without C5 original

1 2−3 4−5 6−7 8−9 10 − 11 12 − 13 14 − 15 16 − 17 18 − 19 20 − 21 22 − 23 24 − 25 26 − 27 28 − 29 30 − 31 32 − 33 34 − 35 36 − 37 38 − 39 40 − 41 42 − 43 44 − 45 46 − 47 48 − 49 50 − 51 52

−13.1 −6.6 −17.57 −6.6 −15.33 −6.6 −15.60 −6.6 −21.08 −6.6 −18.46 −6.6 −21.47 −6.6 −21.55 −6.6 −21.74 −6.6 −22.96 −6.6

−10.5 −5.7 −12.56 −5.7 −13.95 −5.7 −12.05 −5.7 −14.17 −5.7 −14.6 −5.7 −17.41 −5.7 −13.44 −5.7 −16.64 −5.7 −17 −5.7 −17.74 −5.7 −18.89 −5.7 −2.8

−22.8 −13.3 −29.47 −13.3 −31.33 −13.3 −29.73 −13.3 −34.35 −13.3 −37.25 −13.3 −34.38 −13.3 −36 −13.3 −37.63 −13.3 −38.17 −13.3 −36.24 −6.6

−21.6 −11.3 −25.92 −11.3 −22.68 −11.3 −27.99 −11.3 −25.81 −11.3 −29.43 −11.3 −26.89 −11.3 −25.61 −11.3 −26.74 −11.3 −25.12 −11.3 −31.34 −11.3 −30.60 −11.3 −33.19 −11.3 −5.7

−45.6 −26.7 −61.48 −26.7 −63.32 −26.7 −61.68 −26.7 −66.44 −26.7 −68.82 −26.7 −66.34 −26.7 −67.31 −26.7 −69.28 −26.7 −67.79 −26.7 −69.64 −26.7 −13.3

Total rounds

39

44

42

51.5

43.5

−251.4

−507

−505.5

−1014.5

Total −254.1 probability

Another Look at Complementation Properties Charles Bouillaguet1 , Orr Dunkelman1,2 , Ga¨etan Leurent1 , and Pierre-Alain Fouque1 1

D´epartement d’Informatique ´ Ecole normale sup´erieure 45 Rue D’Ulm, 75320 Paris, France {charles.bouillaguet,gaetan.leurent,pierre-alain.fouque}@ens.fr 2 Faculty of Mathematics and Computer Science Weizmann Institute of Science P.O. Box 26, Rehovot 76100, Israel [email protected]

Abstract. In this paper we present a collection of attacks based on generalisations of the complementation property of DES. We ﬁnd symmetry relations in the key schedule and in the actual rounds, and we use these symmetries to build distinguishers for any number of rounds when the relation is deterministic. This can be seen as a generalisation of the complementation property of DES or of slide/related-key attacks, using diﬀerent kinds of relations. We further explore these properties, and show that if the relations have easily found ﬁxed points, a new kind of attacks can be applied. Our main result is a self-similarity property on the SHA-3 candidate Lesamnta, which gives a very surprising result on its compression function. Despite the use of round constants which were designed to thwart any such attack, we show a distinguisher on the full compression function which needs only one query, and works for any number of rounds. We also show how to use this self-similarity property to ﬁnd collisions on the full compression function of Lesamnta much faster than generic attacks. The main reason for this is the structure found in these round constants, which introduce an interesting and unexpected symmetry relation. This casts some doubt on the use of highly structured constants, as it is the case in many designs, including the AES and several SHA-3 candidates. Our second main contribution is a new related-key diﬀerential attack on round-reduced versions of the XTEA block-cipher. We exploit the weakness of the key-schedule to suggest an iterative related-key diﬀerential. It can be used to recover the secret key faster than exhaustive search using two related keys on 37 rounds. We then isolate a big class of weak keys for which we can attack 51 rounds out of the cipher’s 64 rounds. We also apply our techniques to ESSENCE and PURE.

1

Introduction

In this paper with study the existence of simple relations that can go trough the rounds of a cipher with a very high probability. For example, in DES if all S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 347–364, 2010. c International Association for Cryptologic Research 2010

348

C. Bouillaguet et al.

the key bits are ﬂipped, then all the subkeys are ﬂipped as well. Moreover, if we also negate the plaintext, then all the F functions receive the original input, and the ciphertext is also negated: DESK (P ) = (DESK (P )). This is known as the complementation property of DES. A similar property is present in one round of AES [14]. If one rotates the columns of an AES state, this rotates the column of the state after SubBytes, ShiftRows, and MixColumns. A study of similarity relations of the AES round operations is done in [14]. The authors show that the rotations by 1, 2 or 3 columns are the only byte-permutation to commute with the AES round. The AES key-schedule is responsible for breaking those symmetry relations and one should be very careful when using the AES round in a new construction. Another well-known example is based on related-key attacks [3,4,12] and slide attacks [5]. In the latter, two plaintexts such that one is the encryption of the other by one round are used. If this is the case, then the ciphertexts are also separated by one round of encryption. Hence, a slid pair (or a related-key plaintext pair) suggests two equations for the round function, which in many cases is suﬃcient to retrieve the secret key. We note that slide attacks were also adapted for hash functions, where the slide property is used for several attack scenarios [7]. In this paper we show new kinds of self-similarity properties in block ciphers and hash functions. The new ideas generalize the previous attacks, by treating a wider set of relations. Moreover, some of the similarity relations we use have ﬁxed points, which is not the case for the complementation property. The keys which are mapped to themselves by the similarity relation can be treated as weak-keys, and they even allow to mount various attacks when the cipher is used to build a hash function. Deterministic self-similarity properties can usually be detected with a very small number of queries (one or two), over any number of rounds of a cipher, making them very interesting properties to study. However, it should be noted that most attacks involving self-similarity properties are expected to be in the related-key setting and/or will only aﬀect classes of weak keys. This restriction is less problematic in the context of a hash function: there is no secret involved, and the adversary has a greater control over the inputs to the primitive (depending on how exactly the hash function is built, and the attack model). We also stress that our distinguishers are very simple and eﬃcient, so that they can be practical if the block cipher is used in an unusual setting. For example, a well-known self-similarity property of TEA is that each key has four equivalent keys [11].1 This does not seem to be a practical threat for the block cipher (up to a loss of two bits of security in exhaustive search). However, Microsoft used TEA as a hash function in Davies-Meyer mode to enforce limitations on the Xbox, and this weakness of TEA has been used in practice to bypass the security limitations [20]. We show an example of self-similarity properties in Lesamnta, and discuss a probabilistic self-similarity for ESSENCE . Using this approach we have identiﬁed 1

We note that this property is commonly described as a related-key diﬀerential with probability 1, or a complementation property.

Another Look at Complementation Properties

349

an iterative related-key diﬀerential for XTEA. We also ﬁnd a class of weak keys in the block-cipher PURE. One of the interesting outcomes of this research, is a new way to tackle round constants. While diﬀering round constants seem to thwart slide attacks, they are not necessarily suﬃcient to protect against our more generalized approach, as we present in the attack on the compression function of Lesamnta. 1.1

Road-Map

First we formally deﬁne the notion of self-similarity in Section 2, and we show some ways to exploit this property. In particular we discuss new attacks based on keys which are ﬁxed under the similarity relation. Then we show concrete example of self-similarity relation: we study Lesamnta in Section 3, ESSENCE in Section 4, PURE in Section 5, and round-reduced XTEA in Section 6. We conclude the paper in Section 7.

2

Self-similarity

Following [2], we deﬁne self-similarity as follows: Definition 1 (Self-similarity relation in a block cipher). A block cipher E encrypts the plaintext P under the key K to EK (P ). A self-similarity relation is given by invertible and easy to compute transformations2 φ, ψ and θ such that: ∀K, P :

θ(EK (P )) = Eψ(K) (φ(P ))

If such relations exists then the cipher is self-dual according to the terminology of [2]. Similarly, we can deﬁne self-similarity for compression functions and for stream ciphers: Definition 2 (Self-similarity relation in a compression function). A compression function H maps a chaining value X and a message M to a new chaining value H(X, M ). A self-similarity relation is given by invertible and easy to compute transformations φ, ψ and θ such that: ∀X, M :

θ(H(X, M )) = H(φ(X), ψ(M ))

Definition 3 (Self-similarity relation in a stream cipher). A stream cipher G generates the key-stream GK (I) with the key K and the IV I. A selfsimilarity relation is given by invertible and easy to compute transformations φ, ψ and θ such that: ∀K, I : 2

θ(GK (I)) = Gψ(K) (φ(I))

In this context, and for the reminder of the paper, we require that at least one of the three transformations φ, ψ, and θ is not the identity.

350

C. Bouillaguet et al.

We say that a set of transformation is a weak self-similarity if this relation only holds with a probability which is less than 1 (but higher than for a random permutation). These deﬁnitions are wide enough to include several types of known attacks such as the complementation property of DES, for which φ = ψ = θ, and φ(x) = x. Other known results also ﬁt our framework of self-similarity properties: – The complementation property of LOKI [13]: ψ(K) = K ⊕ Δ

φ(P ) = P ⊕ Δ

θ(C) = C ⊕ Δ

For several values of Δ. – The equivalent keys of TEA [11] ψ(K) = K ⊕ Δ

φ(P ) = P

θ(C) = C

For several values of Δ. – The recently found weakness in the compression function of the SHA-3 candidate CHI [1]: ψ(K) = K

φ(P ) = P

θ(C) = C

It is possible to consider high probability diﬀerentials as (weak) self-similarity properties. For example, in Section 6 we present some iterative related-key differential for XTEA that was found using the self-similarity approach. In this paper, we will consider iterated self-similarity properties. Let E be a cipher deﬁned by the iteration of a round function F , with the subkeys RK i derived for the master key K by a function G: RK i = G(K, i), i.e., the cipher can be described as: RK i = G(K, i)

X0 = P

Xi+1 = F (Xi , RK i )

EK (P ) = Xr .

We look for a self-similarity property of the round function F , i.e., two transformations Θ, Ψ such that Θ(F (X, RK )) = F (Θ(X), Ψ (RK )). Then assuming we can ﬁnd (or construct) K, K such that G(K , i) = Ψ (G(K, i)), we have Ek (θ(P )) = θ(Ek (P )). Note that the relation we are using on the subkeys is deﬁned as RK i = Ψ (RK i ): each subkey of the second cipher is related to the corresponding subkey of the original cipher. This is in contrast with related-key attacks where the relation is RK i+1 = RK i . 2.1

Attacks Based on Self-similarity

Self-similarity properties obviously oﬀer an eﬃcient distinguisher in the relatedkey setting. If one is given access to an oracle EK ∗ and an oracle Eψ(K ∗ ) for an unknown K ∗ , he can distinguish the block cipher E from an ideal cipher by querying EK (P ) and Eψ(K) (φ(P )) for a random P . Moreover, a self-similarity property can be used to speed up the exhaustive search of the key by a small factor, as explained in [3]. Let n be the size of the longest cycle of the permutation ψ. In most cases, n will be quite small (if n is big

Another Look at Complementation Properties

351

then the attack will actually be more eﬃcient). Then, query Ci = EK ∗ (φ(i) (P )) i = θ(−i) (Ci ). Now, compute EK (P ) for a for i ∈ 0, 1, . . . , n − 1, and compute C i . If there is a match, then set of keys K and look for a match with one of the C ψ (i) (K) is likely to be the key: i ⇐⇒ EK (P ) = θ(−i) (Ci ) EK (P ) = C ⇐⇒ Eψ(i) (K) (φ(i) (P )) = Ci = EK ∗ (φ(i) (P )). The idea of the attack is that the set of tested keys has to include only one key per each cycle of ψ. Each time a new EK (P ) is computed, this allows to test all the key candidates in the cycle of K, by applying θ iteratively to the obtained ciphertext, and comparing the resulting ciphertext with the respective pre-computed ciphertext. Hence, if the evaluation of θ is faster than Eψ(K) , one can reduce the time of exhaustive search. 2.2

Attacks Based on Fixed points of Self-similarity Relations

Some of the φ and ψ relations that we show have a large number of ﬁxed points. These points can be used in several diﬀerent attack scenarios. The ﬁxed points of ψ will be weak key, so we consider the ﬁxed points of φ as weak plaintexts. First, let us show that the set of the ﬁxed points of ψ is a weak-key class. If a weak plaintext is encrypted under a weak key, the ciphertext will also be a ﬁxed point of the similarly relation, i.e., θ(C) = θ(EK (P )) = Eψ(K) (φ(P )) = EK (P ) = C. This allows to distinguish the weak keys with a single query using a weak plaintext. In the context of hash functions, ﬁxed points in the similarity relations allow to ﬁnd collisions in the compression function more eﬃciently than exhaustive search. One just has to evaluate the compression function on weak inputs, and the output would lie in the set of ﬁxed point of θ. Note that the attack becomes more eﬃcient when the number of ﬁxed-points in θ becomes smaller. In Section 3.1 we show how to use the self-similarity property of Lesamnta to reduce its security in several usage scenarios. These attacks are based on the idea that one can randomly reach a ﬁxed point of φ, and from there it is easy to reach any ﬁxed point of θ. In this case, the attacks become more eﬃcient when the number of ﬁxed points of φ grows.

3

Application to Lesamnta

Our most interesting result is a self-similarity property of Lesamnta. This selfsimilarity property is based on swapping the two halves of the state and XOR-ing them with a constant. The most surprising part about this property is that it can actually deal with the round constants, which are supposed to break all symmetry relations.

352

C. Bouillaguet et al. Ki

Ki+1

Ki+2

Ki+3

Xi

Xi+1

Xi+3

Xi+3

Xi+3

Xi+4

Ri+3 ⊕ G ⊕

Ki+1

Ki+2

⊕ F ⊕

Ki+3

Ki+4

Xi+1

Xi+2

Fig. 1. The Round Function of Lesamnta. This Round Function is Iterated 32 Times.

A Short Description of Lesamnta. Lesamnta is a hash function proposal by Hirose, Kuwakado, and Yoshida as a candidate in the SHA-3 competition [8]. It is based on a 32-round unbalanced Feistel scheme with four registers used in MMO mode. The key schedule is also based on a similar Feistel scheme. The round function is described by Figure 1 and can be written as: Xi+4 = Xi ⊕ F (Xi+1 ⊕ Ki+3 ) Ki+4 = Ki ⊕ G (Ki+1 ⊕ Ri+3 ) where R0 , . . . , R31 are round constants, the state register X is initialized with the message in X−3 , X−2 , X−1 , X0 , and the key register is initialized with the chaining value in K−3 , K−2 , K−1 , K0 . The output of the compression function is X−3 ⊕ X29 , X−2 ⊕ X30 , X−1 ⊕ X31 , X0 ⊕ X32 . Lesamnta has two main variants: Lesamnta-256 with 64-bit registers (hence, a message block and chaining value of 256 bits), and Lesamnta-512 with 128-bit registers (hence, a message block and chaining value of 512 bits). The round functions F and G are inspired by the AES round function. F uses four round of transformations similar to SubBytes, ShiftRows and MixColumns, while G uses only one round of similar transforms. The transformations used in F and G are diﬀerent, even though they are both heavily inspired by the AES. In Lesamnta-256 a 64-bit register is represented by a 2-by-4 byte matrix in F , and by a 4-by-2 matrix in G. In Lesamnta-512, a 128-bit register is seen as a 4-by-4 matrix. The round constants are deﬁned by a simple counter: Ri = 2i+(2i+1)·232 for Lesamnta-256 and Ri = 2i + (2i + 1) · 264 for Lesamnta-512. For more details, we refer the reader to the full speciﬁcation [8]. The Self-Similarity Relation of Lesamnta . The round functions F and G are very similar to the AES round function, and they have the same selfsimilarity property: if the two halves of the input are swapped, then the output is also swapped. Indeed, it is easy to see that SubBytes, ShiftRows and MixColumns do have this property. However, the key-schedule of Lesamnta includes round constants Ri to avoid symmetry-based attacks. Luckily, these constants are word-symmetric up to the least signiﬁcant bit. More precisely, if Ri = (Ri Ri⊥ ), where the top and bottom parts of Ri are

Another Look at Complementation Properties

353

32-bit (64-bit in Lesamnta-512), then the only diﬀerence between Ri and Ri⊥ is in the least signiﬁcant bit. Let us introduce a few notations. We say that two words x and y are “halfswapped” if the top half of x (denoted by x ) is the bottom half of y (denoted by y ⊥ ), and vice-versa. We denote the half-swapped value of x = (x x⊥ ), → i.e., (x⊥ x ), by ← x . We also formalize the structural property of the round constants: we deﬁne x to be (x⊥ ⊕ 1x ⊕ 1). Then, the following property of i . We extend these two relations to vectors of rounds constants holds: Ri = R , y ) if x = x and y = y . words in the natural way, namely, (x, y) = (x Our idea is to combine the swapping of halves of the state with ﬂipping the least signiﬁcant bit of each half to compensate the diﬀerence in the constants. The swapping commutes with the round functions F and G, and the masking is canceled by the Feistel structure. For this purpose, let us mention the following useful properties: ←−→ i) x ⊕ y = x ⊕ y → ii) x ⊕← y = x ⊕y i.e., by deﬁniHence, consider a master keys K of 4 words, and let K = K, tion, K−3 = K−3 , . . . , K0 = K0 . The ﬁrst property implies in particular that ←−−−−−→ K−2 ⊕ R0 = K−2 ⊕ R0 . Therefore, when computing K1 and K1 , the two values that enter G are half-swapped. This property goes through all the successive operations in G (SubWords, KeyLinear, and ByteTranspose3 ). The output of G is therefore half-swapped as well. Then, thanks to the second property, and =K thanks to the fact that K−3 −3 , we ﬁnd that K1 = K1 . This argument can be iterated, and shows that if the master keys are related, then all the other i for all i). subkeys are related (we have Ki = K Now, it is easy to see that as the subkeys in the two concurrent hash processes are related and at the same time so are the “plaintexts”, then the same relation will be maintained through the rounds. Speciﬁcally, let the plaintexts be P and =X P = P . By deﬁnition again, we have X−3 −3 , . . . , X0 = X0 . The argument above repeats: we have X−2 = X−2 , K0 = K0 , and thanks to the ﬁrst property, the input of F is half-swapped. This property goes trough F , and since X−3 = X , the second property grants us X = X . This argument can be iterated −3 1 1 i , for all i. again, and shows that Xi = X Lastly, there is a feed-forward operation: the output of the compression function is Y0 = X−3 ⊕ X29 , . . . , Y3 = X0 ⊕ X32 . Thanks to the ﬁrst property again, ← → we ﬁnd that Y = Y . This yields: ←−−−−−−→ M ) = CF (K, M ) CF (K, Self-Similarity of Lesamnta . Finally, we note that if we pick a chaining value h which is weak (i.e., h = h), and a message block m which is also weak (i.e., m = m), then CF (h, m) is weak as well, but in a diﬀerent manner (we have 3

We note that in the submission document [8] the term “ByteTranspos” is used.

354

C. Bouillaguet et al.

←−−−−−→ CF (h, m) = CF (h, m)), and this can be easily identiﬁed (the top and bottom halves of each output word are the same). Hence, it is possible to distinguish the compression function of Lesamnta using one single query. For example, in Lesamnta-256 (00000000, 00000001), (00000000, 00000001), h=m= (00000000, 00000001), (00000000, 00000001) leads to

CF (h, m) =

and in Lesamnta-512

(52afa888, 52afa888), (61c0aebc, 61c0aebc), (1c9d4d3a, 1c9d4d3a), (95f45a98, 95f45a98)

⎞ (0000000000000000, 0000000000000001), ⎜ (0000000000000000, 0000000000000001), ⎟ ⎟ h=m=⎜ ⎝ (0000000000000000, 0000000000000001), ⎠ (0000000000000000, 0000000000000001)

leads to

⎛

⎛

⎞ (b0421baf4899c67e, b0421baf4899c67e), ⎜ (e6b528589fadd0ce, e6b528589fadd0ce), ⎟ ⎟ CF (h, m) = ⎜ ⎝ (3547c4021eb4c7ee, 3547c4021eb4c7ee), ⎠ (a8188b26052d044d, a8188b26052d044d)

Following our ﬁndings, the designers of Lesamnta decided to tweak the algorithm by changing the round constants. For the tweaked version we refer the reader to [19]. 3.1

Using These Properties on the Full Lesamnta

Faster Collisions in the Compression Function. It is possible to use the above property to ﬁnd collisions in the compression function faster than exhaustive search. We pointed out that if we pick weak inputs, h = h and m = m, then each of the four output words has the same top and bottom halves. In other words, the output of the compression function is restricted to a subspace of size 2n/2 for Lesamnta-n. Hence, by taking 2n/4 pairs of (chaining values, message blocks) which are weak, we expect to ﬁnd a collision in the output of the compression function. Second Preimage Attack on Weak Messages. The self-similarity property of the compression function induces a set of weak messages. In such messages, one of the chaining values hi (which is the output of the compression function) ← → encountered during the hash process is such that hi = hi , i.e., it is of the form hi = (S||S), (U ||U ), (W ||W ), (Y ||Y ). A random (r + 1)-block long message is in the weak set with probability r · 2−n/2 . We now discuss how to ﬁnd a second

Another Look at Complementation Properties

355

preimage of a weak message in time 2n/2 and negligible memory. Finding a new message of (i − 1) blocks yielding the chaining value hi immediately yields a second preimage on the full hash function. To do so, an adversary may follow these steps: 1. Choose an arbitrary preﬁx of (i − 2) message blocks. 2. Find a message block which yields a weak chaining value hi−1 . Since there are 2n/2 such chaining values, it is expected that 2n/2 random trials are suﬃcient. 3. Find a weak message block m such that CF (hi−1 , m) = hi . On average, for each starting chaining value, there is one such message block, amongst 2n/2 possible choices. Thus, about 2n/2 random trials are expected to be necessary. 4. Concatenating all parts with the suﬃx of the original message starting from the i-th block yields a second preimage. The total complexity of the process is 2n/2 . Herding Attack on Lesamnta . The self-similarity property of the compression can be used to propose a better herding attack on Lesamnta than the one of [10]. In the herding attack, the adversary commits to a hash value H ∗ , and is given a challenge message M afterwards. His goal is to ﬁnd a suﬃx S such that H(M S) = H ∗ . The generic herding attack on hash functions of n bits presented in [10] requires an oﬄine time complexity of 2(n+)/2+2 online time complexity of 2n− and memory of 2+1 . We now present a customized herding attack using self-similarity for Lesamnta with complexity 2n/2 and negligible precomputation and memory. 1. Choose a weak chaining value h∗ , and choose an upper-bound L on the number of message blocks of the preﬁx M . Then, compute the ﬁnalization function on h∗ , assuming a message of L + 2 full blocks. This yields the committed value H ∗ (of a message of length L + 2 blocks). 2. Receive the challenge M , and append random suﬃxes S of L+1−|M | blocks, until hitting a weak chaining value h. It is expected that 2n/2 random trials are suﬃcient. 3. Try random weak message blocks S . We know that H(M SS ) is weak, ←−−−−−−−→ with H(M SS ) = H(M SS ). Therefore, we expect to reach h∗ after only 2n/2 trials. 4. Output M SS as the answer to the challenge. Distinguishing-H Attack on HMAC. The self-similarity property can also be used to distinguish HMAC-Lesamnta from HMAC instantiated with another PRF. The distinguisher follows the ideas of Wang et. al in [21], and the complexity would be 23n/4 . We note that this line of research may not be harmful, as distinguishing HMAC from PRF (for any hash function) has a complexity of 2n/2 . We also note that the claimed security level of HMAC-Lesamnta is only 2n/2 .

356

4

C. Bouillaguet et al.

Application to ESSENCE

Description of ESSENCE . ESSENCE is a hash function proposal by Martin as a candidate in the SHA-3 competition [16]. The design is based on two shift registers with 8 words each. One shift register is used to expand the message by generating subkeys, while the other processes the chaining value with the subkeys. Ri+8 = Ri ⊕ F (Ri+1 , Ri+2 , . . . , Ri+7 ) ⊕ L(Ri+7 ) ⊕ Ki Ki+8 = Ki ⊕ F (Ki+1 , Ki+2 , . . . , Ki+7 ) ⊕ L(Ki+7 ) The feedback function is designed with two functions: a non-linear bit-wise function F , and a linear function L that mixes the bits inside the words. The linear function is based on clocking an LFSR a ﬁxed number of times. The chaining value is loaded into R−7 , R−6 , . . . , R0 , while the message block is loaded into K−7 , K−6 , . . . , K0 . After 32 rounds, the output is computed as R−7 ⊕ R25 , R−6 ⊕ R26 ,. . . , R0 ⊕ R32 . ESSENCE has two main variants: ESSENCE -256 with 32-bit words (hence a message block and a chaining value of 256 bits each), and ESSENCE -512 with 64-bit words (hence a message block and chaining value of 512 bits). The best known attack on ESSENCE is a collision attack with complexity 268 [18]. The design of ESSENCE uses no constants. Therefore, it is possible to build a distinguisher against ESSENCE using a slide attack [17]. The Self-Similarity Relation. Most of the components used in ESSENCE are bitwise, and the L is the only part responsible of mixing the bits of the words. Moreover, it is based on an LFSR, and LFSRs have a good behavior with regards to rotations. More precisely we have Pr(L(x≪1 ) = L(x)≪1 ) = 1/4 as shown in Figure 3. Therefore, we can build a new self-similarity attack based on rotating the message and the chaining value. This gives subkeys such that Ki = Ki≪1 as opposed to Ki = Ki+1 in the slide attack. More precisely, we consider the following relation: Ri = Ri≪1 Ki = Ki≪1 ≪1 Constructing a Good Pair. Let us rotate the master key: K−7 = K−7 , ≪1 ≪1 K−6 = K−6 , . . . , K0 = K0 . For each round, there is a probability 1/4 that

Ri

Ri+1 Ri+2 Ri+3 Ri+4 Ri+5 Ri+6 Ri+7

⊕

Ki

Ki+1 Ki+2 Ki+3 Ki+4 Ki+5 Ki+6 Ki+7

F

L

F

L

⊕

⊕

⊕

⊕

Fig. 2. ESSENCE Round Function (iterated 32 times)

Another Look at Complementation Properties x0 , x1

xn−1

=x

xn−1 , xn

x1

357

if x0 = xn x =

xn−1 , x0

x1

t − 1 step

t − 1 step

xt+n−1

xt

xt , xt+1

= F (x) F (x ) =

xt+1

xt+n−1 xt+n−1 , xt+n

if xt = xt+n Fig. 3. Symmetry relation on an LFSR-based function. L is a linear function deﬁned by clocking an LFSR a ﬁxed number of times. If the initial state of the LFSR is rotated by 1 place, there is a probability 2−2 that the output state is rotated as well.

the new subkey Ki is equal to Ki≪1 , because we only need the LSFR-based function to commute with a rotation of 1 bit. The full compression function uses the subkeys K−7 to K24 , so a random message K and its related message K will give related subkeys with probability 2−48 . To ﬁnd a suitable chaining value we will use something similar to message modiﬁcation techniques to get 8 rounds for free. We start from round 31 (one before the last round) and make a random choice of R24 , R25 , . . . , R31 such that each of these value satisfy L(Ri≪1 ) = L(Ri )≪1 . We use the related state ≪1 ≪1 ≪1 = R24 , R25 = R25 ,. . . , R31 = R31 . We ﬁrst compute round 32 forward, R24 and R32 follows the relation with probability 1 because the non bitwise part is ≪1 L(R31 ) and R31 was chosen so that L(R31 ) = L(R31 )≪1 . Then we compute the remaining round backwards and check that the new values still satisfy Ri = Ri≪1 : Ri = Ri+8 ⊕ F (Ri+1 , Ri+2 , . . . , Ri+7 ) ⊕ L(Ri+7 ) ⊕ Ki . In rounds 23, 22, . . . , 17, the non-bitwise part is respectively L(R30 ), L(R29 ), . . . , L(R24 ) so they go through with probability 1. Then, we compute rounds 16 to −7, and each of cost a probability 1/4. In total we have a probability 2−48 that a random choice of R25 , R26 , . . . , R32 gives a correct chaining value. Hence, with complexity 248 , we can construct a pair of messages and chaining values such that: K = K ≪1

R = R≪1

G(K , R ) = G(K, R)≪1

It might be possible to use advanced message modiﬁcations to further improve this complexity.

358

5

C. Bouillaguet et al.

Application to PURE

PURE is a block cipher introduced by Jakobsen and Knudsen to demonstrate the interpolation attack [9]. It is designed as a block cipher with a very strong algebraic structure, and good resistance to diﬀerential and linear cryptanalysis. However, it is weak against algebraic attacks, which was the point of the article. Later, Buchmann et al. deﬁned Flurry, which can be seen as a generalized version of PURE with a key schedule [6]. Again, the point of Flurry was to show that a block cipher can be secure against diﬀerential and linear cryptanalysis, but weak against algebraic attack (Gr¨obner basis techniques in the case of Flurry). PURE and Flurry are not to be used as real ciphers, but serve as demonstration that algebraic attacks can be useful against ciphers with a very strong algebraic structure. By applying a self-similarity attack to PURE, we intend to show that the algebraic structure of a block cipher can be used to mount a simple self-similarity attack that does not need complex polynomial computation (as opposed to the interpolation attack or Gr¨ obner basis techniques). Moreover, we build a distinguisher with only one query as well as a class of weak keys, both for any number of round, whereas the previous algebraic attacks could only break a limited number of rounds. Description of PURE . PURE is a simple Feistel cipher with a monomial S-Box. All the operations are carried in the ﬁnite ﬁeld F2m ;4 the plaintext is composed of two ﬁeld elements, and the key is given as r ﬁeld elements, where r is the number of rounds: L0 = PL Li+1 = Ri

R0 = PR Ri+1 = Li ⊕ (Ri ⊕ Ki )3 .

There is no key schedule in PURE, the key is given as r ﬁeld elements K0 , K1 , . . . , Kr−1 which are used as round subkeys. The Self-Similarity Relation. Because of the strong algebraic structure of PURE, it is natural to look for algebraic relations. In particular we can use the Frobenius mapping (x

→ x2 ) in the ﬁeld F2m : it commutes with any monomial S-Box, and it is linear. It is straightforward to check that the following is a self-similarity relation for PURE: θ(z) = ψ(z) = φ(z) = z 2 More precisely, if Ki = Ki2 for all rounds and PL = PL2 , PR = PR2 , then we will have Li = L2i and Ri = Ri2 for all rounds. This gives a very eﬃcient distinguisher for PURE in the related-key setting. However, we note that since PURE has no key-schedule, it is trivial to make a related-key attack. Nevertheless, a slight generalization of this initial observation leads to the discovery of a class of weak keys for PURE. 4

In the original description of PURE , m was in fact 32, which yields a 64-bit cipher.

Another Look at Complementation Properties

359

A Class of Weak Keys. Given some α ∈ F2m , and 0 < k < m, we now consider the following self-similarity relation: k

k

θ(z) = ψ(z) = φ(z) = z 2 ⊕ α ⊕ α2 Li

Again, it is easy to check that this is actually a self-similarity. If Ki = ψ(Ki ), = φ(Li ) and Ri = φ(Ri ), then we have: Ri+1 = Li ⊕ (Ri ⊕ Ki )3

3 k k 2k = L2i ⊕ α ⊕ α2 ⊕ (Ri ⊕ Ki ) 2k k 3 = Li ⊕ (Ri ⊕ Ki ) ⊕ α ⊕ α2 = φ(Ri+1 )

It must be noted that φ cannot be the identity, thus it can actually be used to distinguish PURE used with a weak key from a random function. The weak k keys are the ﬁxed points of ψ. Now, if k divides m, then x

→ x2 admits all the subﬁeld F2k as ﬁxed points, which means that when Ki = α + xi , with xi ∈ F2k , then the key is weak. It is possible to check this with only one query: just encrypt (α, α), and test the ciphertext for self-similarity. Testing all the possible α’s requires 2m queries. With k = m/2, this yields 2m+rm/2 weak keys out of 2mr . Bad Key-Schedules. A consequence of the previous observation is that there many bad key-schedules for PURE. For example, let us consider the following key schedule based on a Feistel scheme with the same monomial S-Box: K0 = KL , K1 = KR Ki+2 = Ki ⊕ (Ki+1 ⊕ Ci )3

(if i < r − 1)

The Ci ’s are round constants. Their purpose is to break the regularity and avoid slide attacks. However, we now know that if the round constants can be written Ci = α + xi , with xi in a subﬁeld, then weak keys exist for the full scheme. For example, Ci = α and Cr−1 = α + 1 avoids slide properties, but is still a very bad choice since it exhibit weak keys.

6

Application to XTEA

XTEA is a block cipher designed by Needham and Wheeler. It is a Feistel network that encrypts 64-bit plaintexts with a a 128-bit key. The round function is pretty simple, and the security relies more on the high number of times it is iterated: 64 rounds are the recommended setting.

360

C. Bouillaguet et al.

Description of XTEA. The Feistel structure of XTEA can be described as: Li RKi

Ri L0 = PL

4 ⊕ ⊕ 5

Li+1

R0 = PR Li+1 = Ri Ri+1

Ri+1 = Li (F (Ri ) ⊕ RK i ) F (x) = ((x 4) ⊕ (x 5)) x

The key schedule is rather simple: the master key K is made of four 32-bit words, and the 32-bit round key RKi is generated from the master key: RK2i = (i · δ) K((i·δ)11) mod 4 RK2i+1 = ((i + 1) · δ) K((i+1)·δ) mod 4 where δ = 0x9E3779B9 is a constant derived from the golden ratio. Currently, the best known attack on XTEA is the one from [15], which can break up to 36 rounds of XTEA in the related-key model (with four keys) using 265 chosen plaintexts and 2126.44 time (or 264 chosen plaintexts and 2104 time for a weak-key class of 2111 weak keys). The Self-Similarity Relation. The idea of our attack on XTEA is inspired by the complementation property of the DES. In a complementation property, the diﬀerence in the plaintext and the diﬀerence in the key cancel each other before entering the F function. Such attacks are possible if the key schedule allows a ﬁxed diﬀerence in all the subkeys. The key schedule of XTEA is weak enough to do so, but the key is added after the F function, to prevent the complementation property. However, if there is a good diﬀerential characteristic α → β in the F function, we can put a diﬀerence α in the plaintext, and a diﬀerence β in the key to cancel it after the F function. This gives an iterative related-key diﬀerential characteristic. Li ⊕ M ⊕

Li+1 ⊕ M

Ri ⊕ M Ki ⊕ M F

⊕

Li ⊕ α Ki ⊕ β

Ri+1 ⊕ M

Complementation property

Li+1 ⊕ α

⊕

Ri ⊕ α F

Ri+1 ⊕ α

RK iterative diﬀerential on XTEA

Another Look at Complementation Properties

361

More precisely, in the case of XTEA, we use the following weak self-similarity: ψ(Ki ) = Ki 231 226 φ(Li , Ri ) = θ(Li , Ri ) = Li 231 , Ri 231 Because of the key-schedule, we have that if K = ψ(K), then RKi = ψ(RKi ). We also use the fact that: F (x) 231 226 or F (x 231 ) = F (x) 231 226 Therefore, the diﬀerences will cancel out (and the self-similarity will hold for the round function) as long as the carries in F (x) 231 ± 226 are the same as the carries in Ki 231 226 . For a given key K, we can compute the carries in each round, and deduce the probability of the diﬀerential characteristic (which will be the probability that the self-similarity relation holds). For example if there is no carry in the key, then there is a probability 1/2 that there will also be no carry in the F function. If there is a one-bit carry in the key, there is a probability 1/4 that there will be exactly a one-bit carry in the F function. The probability of the diﬀerential path is therefore quite dependent on the key. However, we know exactly the subsets of the key space that correspond to a given set of carries. Therefore, we can count the number of keys giving a speciﬁc probability for the self-similarity relation (Figure 4 shows the number of keys yielding a speciﬁc number of carries). We discuss two possible attack scenarios: attacking all the keys, and attacking a class of weak keys. Note that due to the irregular key schedule, we may obtain better attacks by selecting a subset of the rounds that do not start at round 0. Attacking All the Keys. If we consider rounds 20 to 50, the self-similarity relation holds with probability greater than 2−60 for about half of the keys. If we consider 262 message pairs, there is a good probability to have a right pair. If so, we can use it to recover the key, with 5 extra rounds by guessing K2 and K3 (which are the two keys words aﬀecting the subkeys of these rounds). Now assume that the self-similarity holds with probability greater than 2−60 for a given key, then we expect at least 4 right pairs. A wrong subkey guess is expected to lead to about 2−2 = 1/4 pairs which allegedly satisfy the selfsimilarity property. Hence, if we consider only key proposals for K2 and K3 which suggest two or more pairs, we expect to deal with 264 · 2−5.23 < 259 remaining wrong subkeys (while the probability of discarding the wrong value is only 9.2%). Hence, exhaustive search over the remaining possibilities would take less time than required for the partial decryption. Moreover, it is possible to discard wrong guesses for K2 and K3 , as all the pairs must oﬀer the same number of carries in round 20 and round 50. If this is not the case, then the subkey guess is necessarily wrong. Finally, we note that if the above attack fails, the adversary learns with very good probability that the key is in the subset with more than 60 carries, which

362

C. Bouillaguet et al.

125 120 115 110 105 100 95 90 30

40

50

60

70

80

90

100

Fig. 4. Distribution of the number of carries in rounds 20–50 of XTEA. log2 of the number of keys with a given number of carries. 48% of the keys have less than 60 carries, and 52% have less than 61 carries.

reduces the search space by 1/2. This gives an attack on 36 rounds (20 to 55), with data complexity 263 , and time complexity 2125 on average. This attack can be extended to more rounds if we allow for a smaller set of weak keys. Attacking a Class of Weak Keys. If we consider rounds 10 to 55, we have a weak key class with only 60 carries, which contains 2107.5 keys out of 2128 . We can attack a key in this weak key class using 262 message pairs. Once a good pair exist, we can use it to recover the key, with 4 extra rounds by guessing K2 and K3 . This gives an attack on 50 rounds (10 to 59), with data complexity of 263 chosen plaintexts, and time complexity 2123 on average, for a weak key class of size 2107.5 . A similar attack can be applied to XXTEA by changing ψ(Ki ) = Ki 226 28 2 . However, as the diﬀerence may cause more complex carry chains whose probabilities are lower, this attack can be applied only to a signiﬁcantly reduced version.

7

Conclusion

This work shows one more time that symmetry in the building blocks of a cryptographic primitive can be dangerous. We have shown new ways to build symmetry relations based on rotations or algebraic expressions. We have also described new attacks based on these relations when there exists key and/or plaintexts which are ﬁxed under the similarity relations. The common way to avoid such symmetries is to include round constants (either in the key schedule or in the actual rounds). However, our results on the hash function Lesamnta show that if the

Another Look at Complementation Properties

363

constants suggest some self-similarity property, it may interact with other components to suggest a self-similarity property of the entire primitive. Hence, we conclude that some round constants can be weaker than others, and that highly structured round constants should be avoided.

Acknowledgements We would like to thank the Lesamnta team, and especially to Hirotaka Yoshida for the fruitful discussions. Part of this work is supported by the European Commission through ECRYPT, and by the French government through the Saphir RNRT project.

References 1. Aumasson, J.P., Bjørstad, T.E., Meier, W., Mendel, F.: Observation on the PREMIXING step of CHI-256 and CHI-224. Oﬃcial Comment (2009) 2. Barkan, E., Biham, E.: In How Many Ways Can You Write Rijndael? In: Zheng, Y. (ed.) ASIACRYPT 2002. LNCS, vol. 2501, pp. 160–175. Springer, Heidelberg (2002) 3. Biham, E.: New Types of Cryptoanalytic Attacks Using related Keys (Extended Abstract). In: Helleseth, T. (ed.) EUROCRYPT 1993. LNCS, vol. 765, pp. 398–409. Springer, Heidelberg (1994) 4. Biham, E.: New Types of Cryptanalytic Attacks Using Related Keys. J. Cryptology 7(4), 229–246 (1994) 5. Biryukov, A., Wagner, D.: Slide Attacks. In: Knudsen, L.R. (ed.) FSE 1999. LNCS, vol. 1636, pp. 245–259. Springer, Heidelberg (1999) 6. Buchmann, J., Pyshkin, A., Weinmann, R.P.: Block Ciphers Sensitive to Gr¨obner Basis Attacks. In: Pointcheval, D. (ed.) CT-RSA 2006. LNCS, vol. 3860, pp. 313– 331. Springer, Heidelberg (2006) 7. Gorski, M., Lucks, S., Peyrin, T.: Slide Attacks on a Class of Hash Functions. In: Pieprzyk, J. (ed.) ASIACRYPT 2008. LNCS, vol. 5350, pp. 143–160. Springer, Heidelberg (2008) 8. Hirose, S., Kuwakado, H., Yoshida, H.: SHA-3 Proposal: Lesamnta. Submission to NIST (2008) 9. Jakobsen, T., Knudsen, L.R.: The Interpolation Attack on Block Ciphers. In: Biham, E. (ed.) FSE 1997. LNCS, vol. 1267, pp. 28–40. Springer, Heidelberg (1997) 10. Kelsey, J., Kohno, T.: Herding Hash Functions and the Nostradamus Attack. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 183–200. Springer, Heidelberg (2006) 11. Kelsey, J., Schneier, B., Wagner, D.: Key-Schedule Cryptoanalysis of IDEA, GDES, GOST, SAFER, and Triple-DES. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp. 237–251. Springer, Heidelberg (1996) 12. Knudsen, L.R.: Cryptanalysis of LOKI91. In: Zheng, Y., Seberry, J. (eds.) AUSCRYPT 1992. LNCS, vol. 718, pp. 196–208. Springer, Heidelberg (1993) 13. Kwan, M., Pieprzyk, J.: A General Purpose Technique for Locating Key Scheduling Weakness in DES-like Cryptosystems (Extended Abstract). In: Matsumoto, T., Imai, H., Rivest, R.L. (eds.) ASIACRYPT 1991. LNCS, vol. 739, pp. 237–246. Springer, Heidelberg (1993)

364

C. Bouillaguet et al.

14. Le, T.V., Sparr, R., Wernsdorf, R., Desmedt, Y.: Complementation-Like and Cyclic Properties of AES Round Functions. In: Dobbertin, H., Rijmen, V., Sowa, A. (eds.) AES 2005. LNCS, vol. 3373, pp. 128–141. Springer, Heidelberg (2005) 15. Lu, J.: Related-key rectangle attack on 36 rounds of the XTEA block cipher. Int. J. Inf. Sec. 8(1), 1–11 (2009) 16. Martin, J.W.: ESSENCE: A Candidate Hashing Algorithm for the NIST Competition. Submission to NIST (2008) 17. Mouha, N., Thomsen, S.S., Turan, M.S.: Observations of non-randomness in the ESSENCE compression function. Available online (2009) 18. Naya-Plasencia, M., R¨ ock, A., Aumasson, J.P., Laigle-Chapuy, Y., Leurent, G., Meier, W., Peyrin, T.: Cryptanalysis of ESSENCE. Cryptology ePrint Archive, Report 2009/302 (2009), http://eprint.iacr.org/ 19. Hirose, S., Kuwakado, H., Yoshida, H.: Security Analysis of the Compression Function of Lesamnta and its Impact. Available online (2009) 20. Steil, M.: 17 Mistakes Microsoft Made in the Xbox Security System. In: CCC22 (2005) 21. Wang, X., Yu, H., Wang, W., Zhang, H., Zhan, T.: Cryptanalysis on HMAC/NMAC-MD5 and MD5-MAC. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 121–133. Springer, Heidelberg (2009)

Super-Sbox Cryptanalysis: Improved Attacks for AES-Like Permutations Henri Gilbert1 and Thomas Peyrin2 1

Orange Labs, France [email protected] 2 Ingenico, France [email protected]

Abstract. In this paper, we improve the recent rebound and start-fromthe-middle attacks on AES-like permutations. Our new cryptanalysis technique uses the fact that one can view two rounds of such permutations as a layer of big Sboxes preceded and followed by simple aﬃne transformations. The big Sboxes encountered in this alternative representation are named Super-Sboxes. We apply this method to two second-round SHA-3 candidates Grøstl and ECHO, and obtain improvements over the previous cryptanalysis results for these two schemes. Moreover, we improve the best distinguisher for the AES block cipher in the known-key setting, reaching 8 rounds for the 128-bit version. Keywords: hash function, cryptanalysis, AES, Grøstl and ECHO.

1

Introduction

Hash functions are among the most important and widely spread primitives in cryptography. Informally a hash function H is a function that takes an arbitrarily long message as input and outputs a ﬁxed-length hash value of size n bits. The classical security requirements for such a function are collision resistance and (second)-preimage resistance. Namely, it should be impossible for an adversary to ﬁnd a collision (two diﬀerent messages that lead to the same hash value) in less than 2n/2 hash computations, or a (second)-preimage (a message hashing to a given challenge) in less than 2n hash computations. Recently, most of the standardized hash functions [29, 35] have suﬀered from major improvements in hash function cryptanalysis [38, 39]. As a response, the NIST organized the SHA-3 competition [31] and 51 candidates were accepted to the ﬁrst round. In July 2009, 14 of them have been selected to the second round. Among them, several hash proposals like Grøstl [14] or ECHO [2] use parts of the standardized block cipher AES [9, 30] as internal primitives or mimick the structure of this cipher. The separation between block ciphers and hash functions has always been blurry as many constructions [6, 34] are known that turn the former into the latter. For example, the Davies-Meyer mode converts a secure block cipher into a secure compression function and is incorporated in a large majority of the currently known hash functions. A major diﬀerence between the cryptanalysis S. Hong and T. Iwata (Eds.): FSE 2010, LNCS 6147, pp. 365–383, 2010. c International Association for Cryptologic Research 2010

366

H. Gilbert and T. Peyrin

of block ciphers and hash functions is that the attacker can fully control the inner behavior of a hash function. In other words, the attacker can use more eﬃciently the freedom degrees available on the input (i.e. the number of independent binary variables he has to determine). A new security model for block ciphers, the so-called known-key model [21], was recently proposed in order to ﬁll the gap between those two situations. In this model, the secret key is known to the adversary and its goal is to distinguish the behavior of a random instance of the block cipher from the one of a random permutation by constructing a set of (plaintext, ciphertext) pairs satisfying an evasive property. Such a property is easy to check but impossible to achieve with the same complexity and a non-negligible probability using oracle accesses to a random permutation and its inverse. In particular, reduced versions of the AES have been studied in this setting [21, 28].1 An even more demanding requirement for block ciphers, also introduced for ﬁlling the gap between block ciphers and hash functions, is to behave as an ideal cipher, i.e. a family of independent random permutations indexed by the key space, even when the key values can be chosen by an adversary. It has been recently shown that the full AES-256 does not behave as an ideal cipher due to the existence of a so-called chosen key distinguisher [4]. Cryptanalysis of AES-based hash functions began with the hash family proposal Grindahl [20] for which collision attacks have been found [19, 32]. This showed that truncated diﬀerentials [22] are useful when cryptanalyzing a byteoriented primitive such as the AES. Later on, the rebound attack [27] was shown to lead to substantial eﬃciency improvements in the freedom degrees usage of the attacker [23, 25, 40]. The idea is to build a diﬀerential path and use the available freedom degrees in the “most expensive” part of the path. The “cheaper” parts are then covered in an inside-out manner in both forward and backward directions. More recently, improved variants of the initial rebound attack such as the so-called “start-from-the-middle” attack were also introduced [26]. Our contribution. In this paper, we further improve the rebound or startfrom-the-middle attacks for AES-like permutations. The idea is to view two consecutive rounds of an AES-like permutation as the application of a so-called Super-Sbox [10, 11, 16]: this allows a more eﬃcient use of the freedom degrees.2 Instead of dealing with the classical 8-bit AES Sboxes, one will consider 32-bit Sboxes each composed of two AES Sbox layers surrounding one MixColumns and 1

2

It was noticed in [7] that for any cipher which key space is smaller than the plaintext space, it is possible to construct a (plaintext, ciphertext) pair satisfying an evasive relation by encrypting the key under itself. However, known key distinguishers such as those of [21, 28] distinguish round-reduced versions of AES from a random permutation in a less contrived manner than such a generic evasive relation. Note that a similar technique, independently discovered by Lamberger et al. [23], has been applied by these authors to the Whirlpool hash function. However, this method uses the incoming round subkeys as additional freedom degrees and would not work as is for ﬁxed-key AES-like permutations such as the ones studied in this article (for ECHO or Grøstl). Moreover, the known-key distinguishers would not apply anymore for AES as the key would have to be chosen by the attacker.

Super-Sbox Cryptanalysis: Improved Attacks for AES-Like Permutations

CV

P

M

Q

CV M

367

CV’

CV’ PE8

CV M

CV’ PE10

Fig. 1. The compression functions of Grøstl, ECHO-256, and ECHO-512 illustrated from the top down

one AddConstant function. The resulting attack method, that we propose to name Super-Sbox cryptanalysis, is not only simpler than the previous ones, it also results in cryptanalysis performance improvements since we are now able to ﬁnd deviations from the behavior of a random permutation up to 8 rounds. We also provide an analysis regarding the freedom degrees available during the attack. We apply this technique to (1) AES, (2) the internal permutations P and Q of Grøstl, and (3) the internal permutation PE of ECHO. The link between the compression functions of Grøstl and ECHO and the underlying internal permutations is illustrated in Figure 1. Our results for AES and the internal permutations of Grøstl and ECHO are summarized in the ﬁrst part of Table 1. We obtain an 8-round distinguisher in the known-key model for all versions of the AES block cipher. This is the ﬁrst published “attack” against the 8-round version of AES128, for which the full number of rounds is 10. We also present a distinguisher for the 8-round reduced internal permutations of Grøstl-256 (the full number of rounds is 10) and for the full number of rounds of the internal permutation of ECHO-256 (8 rounds). In the case of Grøstl-256, our distinguishers for roundreduced versions of the internal permutation can be immediately converted into distinguishers for reduced versions of the compression function with the same number of rounds, or even semi-free-start collisions in some cases.3 We outline 3

Similar results were presented at CT-RSA 2010 [13] while the ﬁnal version of this paper was in preparation.

368

H. Gilbert and T. Peyrin

in the second part of Table 1 the results obtained for reduced versions of the Grøstl-256 compression function. In the case of ECHO and its reduced versions, our distinguishers for the internal permutation cannot be converted into distinguishers for the compression function due to the extra protection provided by the ﬁnal shrinking stage of the compression function – namely its convolution eﬀect on the output distribution of the permutation. Table 1. The ﬁrst table gives a summary of results for AES and the internal permutations used in Grøstl-256 and ECHO. The second table shows the results for the compression functions of Grøstl-256 and ECHO. Some structural observations [1, 18] for Grøstl have not been included in the Tables. rounds

AES

7 8

224 248

216 232

Grøstl-256 permutation

7 8

255 2112

264

distinguisher distinguisher

see [26] this paper

ECHO internal permutation

7 8

2384 2768

264 2512

distinguisher distinguisher

see [26] this paper

type

source

target

Grøstl-256 comp. function ECHO comp. function

2

computational memory complexity requirements

target

rounds 6 6 7 7 8 none

type

known-key-distinguisher see [26] known-key-distinguisher this paper

computational memory complexity requirements 2120 264 2120 255 2112

264 264 264 264

source

semi-free-start collision see [27] semi-free-start collision see [26] semi-free-start collision this paper distinguisher see [26] distinguisher this paper

none

none

—

Description of the Analyzed Schemes

We give in this section a generic description of an AES-like permutation and we then provide the parameters in this generic model for AES, Grøstl and ECHO. We refer to the corresponding speciﬁcations [2, 9, 14, 30] for a detailed description of these schemes. A generic n-bit AES-like permutation has an internal state that can be viewed as a square matrix of c-bit cells with r columns and r rows. A cell will be denoted by Ci,j , where i is its row position and j its column position in the matrix, starting the counting from 0. The permutation is composed of R rounds and each round has four layers. The ﬁrst layer (AddConstant or AC) is a constant/key addition function. More precisely, for each cell of the internal state, we XOR a c-bit constant. The second layer (SubBytes or SB) is a non-linear function deﬁned by the application of an Sbox S: for each cell Ci,j of the internal state, = S[Ci,j ]. The third layer (ShiftRows or ShR) permutes the we compute Ci,j position of each cell in its own row: for each cell Ci,j of the internal state, we compute Ci,j = Ci,Subi (j) where Subi (j) is parametrized by the row i. Finally,

Super-Sbox Cryptanalysis: Improved Attacks for AES-Like Permutations

369

the last layer (MixColumns or MC) is a linear function that mixes all the columns of the internal state separately. The round function on an internal state C can thus be deﬁned as: M ixColumns ◦ Shif tRows ◦ SubBytes ◦ AddConstant(C). AddConstant

r cells

⊕⊕⊕⊕⊕⊕⊕⊕ ⊕⊕⊕⊕⊕⊕⊕⊕ ⊕⊕⊕⊕⊕⊕⊕⊕ ⊕⊕⊕⊕⊕⊕⊕⊕ ⊕⊕⊕⊕⊕⊕⊕⊕ ⊕⊕⊕⊕⊕⊕⊕⊕ ⊕⊕⊕⊕⊕⊕⊕⊕ ⊕⊕⊕⊕⊕⊕⊕⊕

r cells

SubBytes S S S S S S S S

S S S S S S S S

S S S S S S S S

S S S S S S S S

S S S S S S S S

S S S S S S S S

S S S S S S S S

ShiftRows

MixColumns

S S S S S S S S

c bits Fig. 2. The generic AES-like permutation

Even though the MixColumns functions used in AES, Grøstl, and ECHO are distinct (we do not provide here their detailed speciﬁcation), we can stick to this generic description since we are only interested in the diﬀerential properties of this layer, which remain essentially the same. Extended known key model for the considered schemes. In what follows, we will introduce distinguishers for permutations and compression functions based upon speciﬁc examples of the above generic AES-like permutation. Therefore we have to explain what we mean by “distinguisher” in this context. Our aim is to capture structural properties that distinguish the behavior of permutations or functions belonging to a family of permutations (resp. functions) indexed by a parameter from the one of a random permutation (resp. a random function). In the case of permutations, these parameters can be viewed as a known key and our notion of distinguisher entirely coincides with the notion of known key distinguisher. In order to also cover the case of compression functions, we introduce a natural extension of the notion of known key distinguisher to a family F = {fi } of functions indexed by a parameter i ∈ I. We (informally) deﬁne a distinguisher for F as a procedure allowing an adversary to construct, when input with a randomly drawn parameter value i of a function of F , a tuple of (input, output) pairs for fi satisfying (with a non negligible probability) an evasive property independent of i. An evasive property means in this context a property impossible to achieve with the same complexity and a non-negligible probability using oracle accesses to a random function.4 We propose to view the 4

Considering a family of functions rather than one single function allows us to express a structural property common to many individual functions. Moreover, it avoids the considerable diﬃculties one would encounter in the case of one single function f for expressing the requirement that the evasive property used to distinguish f must be “independent” of f as to exclude for instance the evasive property trivially provided by each (input,output) pair of f ; in addition, if the size of I is larger than the input size of F , considering a family of functions instead of one single function allows us to avoid the already mentioned paradox of [7].

370

H. Gilbert and T. Peyrin

permutations and compression functions based upon the generic AES construction as families of permutations (resp. functions) indexed by the parameter set I = C × SB equipped with the uniform probability distribution. For a given family, C is deﬁned as the set of possible values for the constants involved in the various AddConstant layers, and SB represents the set of tuples of r2 R permutations involved in the various SubBytes layers (R represents as before the number of rounds of the considered instances of the generic AES construction, and r the number of rows and columns of the matrices representing their states). This allows to deﬁne structural distinguishers for these permutations and these compression functions, using the extended known key model introduced above. 2.1

AES

Following our generic description, AES [30] is a n = 128-bit block cipher that can handle 128, 192 or 256-bit keys and those variants have a diﬀerent number of rounds, 10, 12 and 14 respectively. The internal state is viewed as a 4 × 4 matrix of bytes and SB is an 8-bit Sbox. The ShiftRows transformation is simply deﬁned by Subi (j) = (j − i) mod 4. Finally, we note that in AES the MixColumns transformation of the last round is not applied and that the last round is composed with an extra AddConstant transformation. Since we will analyze AES in the known-key attacker model, the key schedule and the key additions can be replaced by the AddConstant function. In the known key distinguishers for R-round versions of AES considered in the sequel, the set C consists of all R+1-tuples of 128-bit constants equiped with the uniform law, and the set SB can be either deﬁned as the singleton containing the actual SubBytes layers or as the set of all 16R-tuples of bijections over {0, 1}8 . Our distinguishers are equally applicable in both settings, and can be immediately converted into known key distinguishers for R-round versions of AES-128, AES192, and AES-256. 2.2

Grøstl

Grøstl [14] is a double-pipe hash function whose compression function is built upon two AES-like permutations P and Q (that only diﬀer by the constants used during the AddConstant layer). In the case of Grøstl-256, the internal state of those permutations can be viewed as a 8 × 8 matrix of bytes and their number of rounds is 10. The ShiftRows transformation is deﬁned by Subi (j) = (j−i) mod 8. In the case of Grøstl-512, the internal state of those permutations can be viewed as a 8 × 16 matrix of bytes, thus not ﬁtting in our generic model. Finally, as already shown on Figure 1, the compression function takes a message input M and a chaining variable input CV and outputs a new chaining variable CV with CV = P (CV ⊕ M ) ⊕ Q(M ) ⊕ CV. In the subsequent analysis of the security of R-round versions of the Grøstl256 permutations and the associated compression function, we either deﬁne the

Super-Sbox Cryptanalysis: Improved Attacks for AES-Like Permutations

371

set C as the singleton containing the actual constants used in the AddConstant layers or the set of all possible R-tuples of 512-bit constants (this will not make any diﬀerence for our distinguishers), and we deﬁne SB as the set of all the 64 · R-tuples of bijections over {0, 1}8. 2.3

ECHO

ECHO [2] is also a double-pipe hash function. It uses a compression function built upon a 2048-bit AES-like permutation whose i-round version is denoted by PEi . The internal state of this permutation can be viewed as a 4 × 4 matrix of 128-bit words. The Sbox layer on a 128-bit cell is composed of two AES rounds with a ﬁxed key. The AddConstant layer is not present (or equivalently is present with constant values equal to zero) and in order to avoid trivial vulnerabilities that would result from an entirely symmetric round function, each Sbox in ECHO is distinct thanks to diﬀerent key additions in each invocation of the 2-round AES. As for the AES, the ShiftRows transformation is simply deﬁned by Subi (j) = (j − i) mod 4. In the case of the ECHO-256 compression function, 8 rounds of the permutation are applied and a shrinking transformation is performed after the ﬁnal feedforward. This transformation (denoted here by shrink256 ) consists of “XORing” all the four 512-bit columns together. Finally, as already shown on Figure 1, the compression function takes a message input M and a chaining variable input CV and outputs a new chaining variable CV with CV = shrink256 (PE8 (CV ||M ) ⊕ (CV ||M )). In the case of the ECHO-512 compression function, 10 rounds of the permutation are applied and a shrinking transformation is applied after the ﬁnal feedforward. This transformation (denoted by shrink512 ) consists of “XORing” the two ﬁrst and the two last 512-bit columns together. CV = shrink512 (PE10 (CV ||M ) ⊕ (CV ||M )). Since ECHO is a nested design of AES-like permutations, we will use the preﬁx “BIG” when referring to one of the three layers of the 2048-bit permutation. When not using this preﬁx, we will refer to the layers of the 2-round AES permutation in the BIG-Sbox of ECHO. In the subsequent analysis of the security of R-round versions of the ECHO permutation, we deﬁne the set C as the singleton containing the R-tuple of AddConstant layers associated with the constant zero, and SB as the set of all the 16 · R-tuples of bijections over {0, 1}128 .

3

The Super-Sbox Cryptanalysis Technique

In this section, we introduce the Super-Sbox view for two rounds of an AESlike permutation [9, 10, 11, 16]. Based on this observation, we describe a new cryptanalysis technique in the generic framework of Section 2 and we will apply it to the speciﬁc cases of AES, Grøstl and ECHO in the following section.

372

3.1

H. Gilbert and T. Peyrin

The Generic Diﬀerential Paths

In the following attacks, we will consider two distinct generic truncated diﬀerential paths for AES-like permutations. The only diﬀerence considered between two words A and A is the XOR diﬀerence, that is δ = A ⊕ A . The ﬁrst path is 7-round long and the second one is 8-round long. Both are depicted in Figure 3. A white cell denotes a c-bit word without diﬀerence (inactive word) and a dark cell represents a truncated c-bit diﬀerence (active word), that is a non-zero diﬀerence whose actual value is not considered by the attacker. round 0

round 1

round 2

round 3

round 4

round 5

round 6

AC SB

AC SB

AC SB

AC SB

AC SB

AC SB

AC SB

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

round 0

round 1

round 2

round 3

round 4

round 5

round 6

round 7

AC SB

AC SB

AC SB

AC SB

AC SB

AC SB

AC SB

AC SB

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

Fig. 3. 7-round and 8-round diﬀerential paths for AES-like permutations

When dealing with truncated diﬀerentials for AES-like permutations, one can easily check that only the MixColumns transformations will not behave deterministically. Indeed, while AddConstant have no eﬀect on the diﬀerence of a cell and ShiftRows just permutes the array of c-bit diﬀerences, the SubBytes transformation will impact the value of the diﬀerence, but it will not aﬀect the truncated diﬀerence. The matrix multiplication underlying the MixColumns transformation presents the interesting property of being a Maximum-Distance Separable (MDS) mapping: the number of active input and output cells is always greater or equal to r + 1 (unless there is no active input and output cell at all). The probability of a truncated diﬀerential transition through the restriction of the MixColumns transformation to one column that meets the MDS constraints is determined by the number of active cells in the output column. More precisely, if such a diﬀerential transition contains k > 0 active cells in the output column, its probability of success is closely approximated by 2−c(r−k) . For example, a 4 → 1 transition for one column of the AES MixColumns layer has a success probability of approximatively 2−24 . Note that the same reasoning applies when dealing with the inverse function of the MixColumns layer as well. 3.2

Previous Start-from-the-Middle Attacks

By observing the two previous diﬀerential paths, one can easily be convinced that the most costly part is located in the middle rounds, where the full internal

Super-Sbox Cryptanalysis: Improved Attacks for AES-Like Permutations

373

state is active. Therefore, the classical early-round use of the freedom degrees available to the attacker is not successful in this case. It is more eﬃcient to actually utilize the freedom degrees during the middle rounds and then let the rest of the diﬀerential trail be veriﬁed backward and forward in a probabilistic way. 3.3

The Super-Sbox View

In [9, 10, 11, 16], Daemen and Rijmen introduced the super box representation for two rounds of the AES in order to study diﬀerential properties. The underlying idea is simple: by considering (r×c)-bit permutations (named here Super-Sboxes) instead of the usual c-bit S-boxes, two rounds of an AES-like permutation can be represented using only one non-linear layer. More precisely, the application of two AES-like permutation rounds on a internal state C MC ◦ ShR ◦ SB ◦ AC ◦ MC ◦ ShR ◦ SB ◦ AC(C) can be rewritten MC ◦ ShR ◦ SB ◦ AC ◦ MC ◦ SB ◦ ShR ◦ AC(C) since two adjacent SubBytes and ShiftRows transformations commute. The middle part Super-SB = SB ◦ AC ◦ MC ◦ SB of the former composition represents a layer of column-wise applications of r (r × c)-bit Super-Boxes. The transformation associated with two consecutive rounds can thus be rewritten MC ◦ ShR ◦ Super-SB ◦ ShR ◦ AC(C). This is depicted in Figure 4.

ﬁrst round

second round

AC

SB

ShR

MC

AC

SB

ShR

MC

AC

ShR

SB

MC

AC

SB

ShR

MC

AC

ShR

ShR

MC

Super-SB

Fig. 4. Three equivalent views of 2 rounds of an AES-like permutation

374

3.4

H. Gilbert and T. Peyrin

The Super-Sbox Cryptanalysis

Our overall strategy is the same as in the previous rebound or start-from-themiddle attacks: we will try to ﬁnd a pair of internal state values in the middle of a well chosen truncated diﬀerential path (where the full internal state is active) such that the path is veriﬁed for as many possible rounds as possible backward and forward. We call this part the controlled rounds and the rest of the path in both directions will be fulﬁlled probabilistically. In order to describe our attack, we use the 8-round path from Figure 3. With a restricted number of operations on average, we will ﬁnd an internal state values pair such that the path is veriﬁed for three middle rounds: from the beginning of round 2 until the end of round 4. A more detailed description of the three controlled rounds is given in Figure 5. S start

Sstart AC SB

Sin ShR

MC

AC

Send

Sout Super-SB

ShR

MC

ShR

Fig. 5. A detailed Super-Sbox view (see Figure 4) of the controlled rounds (from round 2 to round 4) of the 8-round diﬀerential path from Figure 3

Before describing the attack, we have to make an assumption: we require that the number of potential distinct diﬀerences at the start of the controlled rounds (Sstart ) or at the end of the controlled rounds (Send ) be at least (2c −1)r . In other words, we must have at least r active cells in Sstart or in Send . Note that this assumption holds for all the types of controlled rounds that will be considered in the two diﬀerential paths of Figure 3. Without loss of generality, we consider for the rest of the description that this assumption is fulﬁlled for the ending diﬀerence mask. The controlled rounds. The initial state of the controlled round is Sstart at the beginning of round 2 (see Figure 5). Since the AddConstant and SubBytes layers of this round have no eﬀect on truncated diﬀerentials, we can directly be , i.e. just after those two transformations. Thus, we select a random gin at Sstart diﬀerence value (we do not use truncated diﬀerences in the subsequent proce at the output of the dure) for all the active cells of the internal state Sstart Sbox layer of round 2. Since the diﬀerence mask for the entire internal state is now speciﬁed, one can apply the ShiftRows and MixColumns transformations and enter round 3 with an updated diﬀerence mask. We then easily deduce the input diﬀerence mask Δ = (Δ1 , · · · , Δr ) in Sin for the r-tuple of Super-Sboxes of Super-SB. Now we perform the following local precomputation: for each of the r SuperSboxes, knowing its input diﬀerence mask Δi , we go through all the 2(rc)−1 pairs of input values diﬀering by Δi and compute the Super-Sbox forward. This provides 2(rc)−1 output diﬀerences values (distinct or not). For each Super-Sbox output diﬀerence reached, the attacker stores the appropriate pair(s) of input

Super-Sbox Cryptanalysis: Improved Attacks for AES-Like Permutations

375

that led to it. We name this storage by Tables Ti for each Super-Sbox i and note that this precomputation phase requires about 2rc operations and memory. We now go backward by starting from the end of round 4: we pick a random diﬀerence for all the active cells of the internal state Send at the output of the MixColumns transformation of round 4. We can invert this MixColumns layer and get the diﬀerences on its input. By also inverting the ShiftRows function of round 4, we get the aimed output diﬀerence mask Δ = (Δ1 , · · · , Δr ) in Sout for the r Super-Sboxes of Super-SB. We only have to check whether for all the r Super-Sboxes (numbered i = 1 to r), the output diﬀerence Δi is present in tables Ti . If this is the case, we can eﬃciently enumerate all the pairs of input diﬀerence Δ in Sin leading to an output Δ in Sout . It is easy to see that if Δ = (Δ1 , · · · , Δr ) was a fully random r-tuple of output diﬀerences for the r Super-Sboxes, the average value over Δ of the number n(Δ ) of distinct pairs of input diﬀerence Δ resulting in an output diﬀerence Δ would be exactly 1/2 (this actually holds for any permutation, independently of the fact that this permutation is a r-tuple of Super-Sboxes).5 We make the (natural) heuristic 2 assumption that though Δ is not selected from the whole set D of all 2r c possible diﬀerence values, but from a smaller subset D of (2c − 1)r 2rc diﬀerence values, the average value of n(Δ) over D remains extremely close to 1/2. This assumption is supported by the fact that D consists of diﬀerence values with r2 active cells, and this output diﬀerence pattern meets the MDS constraints of the r Super-Sboxes for any input diﬀerence pattern. Thus, by going through all the potential diﬀerence candidates for Send , we expect to get half of the amount of distinct solutions on average. Since we previously assumed that we have at least 2rc potential diﬀerence candidates for Send , the average complexity to ﬁnd one solution is only two operations as soon as we wish to obtain at least 2rc−1 solutions. In other words, the 2rc cost for building the tables Ti has been absorbed by the fact that this will allow us to test about 2rc distinct output diﬀerence values at a time. Once all the possible output diﬀerences in Send have been exhausted, we can pick a new input diﬀerence candidate in Sstart and build new tables Ti . To conclude, in order to ﬁnd k distinct solutions for the controlled rounds, the overall complexity is max{2rc , k} operations and 2rc memory. The uncontrolled rounds. The rest of the path (the uncontrolled rounds) is fulﬁlled probabilistically. More precisely, we managed so far to get valid candidates from round 2 to round 4, but we have no control on the diﬀerence values 5

The exact distribution of the number n(Δ ) of pairs is complex to derive, but one can at least notice that if n(Δ ) = 0, i.e. the numbers n1 , · · · , nr of input pairs returned by tables T1 , · · · , Tr are all distinct from zero, then n(Δ ) = n1 × · · · × nr × 2r−1 since each tuple of pairs provides 2r−1 pairs of complete blocks (as a matter of fact, the values of each pair of inputs to one Super-Sbox can be swapped, but each of the 2r pairs of blocks one obtains this way are repeated two times). Though this is not essential for the estimate of the attack complexity, we can expect the two most probable values of n(Δ ) to be n(Δ ) = 0 (with a probability about 1 − 2−r ) and n(Δ ) = 2r−1 (with a probability about 2−r ).

376

H. Gilbert and T. Peyrin

in Sstart (since we selected random diﬀerences in Sstart and going through an Sbox layer impacts the diﬀerence values). Similarly, we know the diﬀerences in Send , but the beginning of the next round is a SubBytes transformation that does not allow us to control the behavior of the MixColumns function on round 5. The study of the MixColumns diﬀerential properties indicates that the path will be fulﬁlled with probability about 2−c(r−1) at round 1 and at round 5 since we are aiming for a r → 1 active cells diﬀerential transition through both of those two MixColumns layers. Note that for round 0, round 6 and round 7, the probability of success is equal to one. Finally, the attacker must ﬁnd 22c(r−1) distinct solutions for the controlled rounds, providing a single valid pair for the whole 8-round diﬀerential path with a total complexity of 22c(r−1) operations and 2rc memory. Of course, the same technique applies to the 7-round diﬀerential characteristic of Figure 3 as well, up to the fact that since the condition on the number of distinct diﬀerence values is fulﬁlled at the input of the Super-Sboxes and not on the output, one has to ﬁx a random value for the active cell of the ending diﬀerence (at the input of round 5) and work backward instead of forward. Since only one round of the uncontrolled part (namely round 1) has now to be fulﬁlled probabilistically, the attacker must ﬁnd 2c(r−1) distinct solutions for the controlled rounds, providing a valid pair for the whole 7-round diﬀerential path with a total complexity of about 2rc operations and memory. If one wants to ﬁnd k solutions for the whole 7-round path, then the computational complexity is max{2rc, k × 2c(r−1) }.

3.5

Considering Freedom Degrees

Before moving forward into the study of the various applications of the SuperSbox cryptanalysis, we need to evaluate the freedom degrees available to the attacker. Indeed, we have to be sure that our attacks will ﬁnd with a good probability a valid pair for the whole diﬀerential path considered. That is, we want to be sure that enough valid candidates for the controlled rounds exist so that we have a good probability that one of them will fulﬁll the entire characteristic. Moreover, in some of our attacks, our goal will not only be to ﬁnd one valid pair, but to ﬁnd many of them. In the case of some round reduced versions of Grøstl, we will even use a birthday paradox technique on the set of valid candidates in order to ﬁnd a semi-free-start collision for the compression function. For this reason, we need to evaluate how many valid pairs one can ﬁnd for a speciﬁed diﬀerential path, and how many distinct diﬀerential paths can be considered. A simple counting argument shows that one can generate only about 22c−1 pairs that verify the entire characteristic. Let us ﬁrst consider the 8-round path of Figure 3: the controlled rounds allow to produce about 22rc−1 valid pairs for rounds 2 to 4, out of which about 22rc−1 ×2−2(r−1)c = 22c−1 pairs fulﬁll the entire condition resulting from the diﬀerential transitions at round 1 and 5. In the case of the 7-round path from Figure 3, the controlled rounds allow to produce about

Super-Sbox Cryptanalysis: Improved Attacks for AES-Like Permutations

377

2(r+1)c−1 valid pairs from round 2 to 4, out of which about 2(r+1)c−1 ×2−(r−1)c = 22c−1 fulﬁll the entire condition resulting from the diﬀerential transitions at round 1. We also have to count how many diﬀerential paths such as the ones from Figure 3 can be generated. When the internal state contains only one active cell, there are clearly r2 possible positions for the location of this cell in the matrix. Since this situation happens in the forward and in the backward direction, we get r4 distinct diﬀerential paths. To conclude, we can generate r4 distinct diﬀerential paths, each potentially producing 22c−1 distinct valid pairs.

4

Applications

When trying to obtain distinguishing attacks for 7 rounds of an AES-like permutation, the Super-Sbox cryptanalysis will generally not provide any complexity improvement over existing techniques. However, our method allows the attacker to carry out an attack on a number of rounds that was unreachable before. For example, we provide a new known-key distinguisher attack for 8-round reduced AES and the ﬁrst distinguishers for the 8-round versions of the reduced Grøstl internal permutation, the ECHO internal permutation, and the reduced Grøstl compression function. 4.1

Limited-Birthday Distinguishers

Before moving to applications of the Super-Sbox cryptanalysis, we have to describe the distinguishers we will build. One of our goals is to distinguish an AES-like permutation from an ideal permutation in the known-key setting. The kind of distinguishers we consider consist in deriving pairs of plaintext/ciphertext couples with a zero diﬀerence value at i prescribed input bit positions and a zero diﬀerence value at j prescribed output bit positions (and arbitrary diﬀerence values for the other r2 c − i input bit positions and the other r2 c − j output bit positions). What is the generic attack complexity in the case of an ideal (random) permutation ? More generally, we can study the problem of mapping a i-bit difference mask not necessarily equal to the all-zero word to a j-bit diﬀerence mask through an ideal permutation. A rough analysis might suggest that due to the the birthday paradox, a generic attack requiring 2min{i/2,j/2} exists. However, this is not always the case since we can ﬁnd ourselves in the situation where not enough diﬀerence positions are available in order to take full advantage of the birthday attack. In other words we don’t always have the k/2 unconstrained diﬀerence bits required to mount a 2k/2 collision attack on k bits. Since we handle a permutation, the attacker can choose to study the function or its inverse. Without loss of generality, let’s assume that i ≥ j. Due to the 2 birthday paradox, each structure of 2r c−i input values obtained by ﬁxing the value of those i bits where a zero input diﬀerence is required allows to achieve a zero output diﬀerence on up to 2(r2 c − i) prescribed output bit positions.

378

H. Gilbert and T. Peyrin

– if j ≤ 2(r2 c − i), then one can select 2j/2 input values from one single structure and this suﬃces to achieve a collision on the j target positions. The attack complexity is about 2j/2 . 2 – if j > 2(r2 c−i), then about 2j−2(r c−i) structures have to be used to obtain a collision on the j prescribed positions. Overall, the complexity of the attack 2 2 2 is about 2r c−i × 2j−2(r c−i) = 2i+j−r c . The same reasoning holds when applying the birthday paradox over the r2 c − j free diﬀerence bits on the output and attacking the inverse function. – if i ≤ 2(r2 c − j), then the attack complexity is about 2i/2 . 2 2 – if i > 2(r2 c − j), then the attack complexity is about 2r c−j × 2i−2(r c−j) 2 = 2i+j−r c . 2

It can be shown that overall, the attack complexity is max{2j/2 , 2i+j−r c }. We want to be able to distinguish AES-like permutation-based compression functions as well. Studying the generic attack for an ideal compression function is almost the same as previously. The only diﬀerence is that we cannot consider the inverse function anymore, and we have to take into account both the message and the chaining variable as inputs. Thus, we study the problem of mapping a i-bit zero diﬀerence mask on the input chaining variable and the message (with t denoting the total number of input bits) to a j-bit zero diﬀerence mask on the output through an ideal compression function. By applying the birthday paradox, each structure of 2t−i input values obtained by ﬁxing the input values at the i positions of the input mask bits allows to achieve a collision on up to 2(t − i) prescribed output bit positions. – if j ≤ 2(t − i), then the attack complexity is 2j/2 . – if j > 2(t − i), then 2j−2(t−i) structures have to be used to obtain a collision on the j prescribed positions. Overall, the complexity of the attack is 2t−i × 2j−2(t−i) = 2i+j−t . 4.2

AES

Our ﬁrst application is a known-key distinguishing attack against the AES block cipher. We will focus on the application of this attack to AES-128. Previously known attacks on round-reduced version of AES-128 allow to reach up to 7 rounds, and for 7 rounds we get no improvement, due to the minimal cost 2rc of the Super-Sbox technique. However, we describe here the ﬁrst known-key distinguishing attack against a 8-round reduced version of AES-128 (a recent announcement regarding an unpublished work [5] describes an 8-round chosen-key distinguisher for AES-128). We recall that in the case of AES, the last MixColumns transformation is not applied. We will use the 8-round diﬀerential path from Figure 3 and we already showed that one can get a pair of input fulﬁlling this path with a computation complexity of 22c(r−1) = 248 operations and 232 in memory. The amount of freedom degrees is not an issue here since we only need to ﬁnd one candidate verifying

Super-Sbox Cryptanalysis: Improved Attacks for AES-Like Permutations

379

the whole diﬀerential path. This gives us a pair of plaintext/ciphertext with 4 active cells in the input and 4 active cells in the output, with undetermined non-zero diﬀerences. In the previous section, we gave some evidence that in the case of a perfect random permutation this should require 264 operations, and we can conclude that 8-round reduced AES-128 can be distinguished from an ideal cipher in a known-key model with 248 computations and 232 memory. Note that our distinguishers work even if the last round MixColumns transformation is applied. 4.3

Grøstl

For Grøstl, ﬁnding a valid pair following the generic 8-round path from Figure 3 requires 22c(r−1) = 2112 computations and 264 memory. The obtained pairs have the distinctive property that i = 512 − 64 = 448 predetermined bits of the input diﬀerence and j = 512 − 64 = 448 predetermined bits of the preimage of the output diﬀerence by the linear transformation MixColumns are equal to zero. We thus obtain a distinguisher for the 8-round reduced Grøstl-256 2 internal permutation since the ideal cipher case should require 2i+j−r c = 2384 computations. This immediately provides a distinguisher for the 8-round reduced Grøstl-256 compression function as well, as can be seen on Figure 1, by using the diﬀerential path for the P permutation and inserting no diﬀerence in the message (no diﬀerence will occur in Q) or alternatively with the diﬀerential path for the permutation Q only (and no diﬀerence in P ). In both cases, the input diﬀerence of the compression function belongs to a predetermined vector space of {0, 1}1024 of dimension 8 × 8 = 64 and the output diﬀerence belongs to the sum of two predetermined vector spaces of {0, 1}512 of dimension 64 each, i.e. a predetermined vector space of dimension at most 128 (by analogy i = 1024 − 64 = 960 and j = 512 − 128 = 384). In the ideal compression function case, this should require 2i+j−t = 2320 computations. For completeness, we give in Figure 6 the diﬀerential paths for the Grøstl parameters. Note that in both the 7-round and 8-round cases, we can generate 22×8−1 = 15 2 distinct valid pairs for each characteristic and one can build 84 = 212 distinct diﬀerential paths. We can now try to compute semi-free-start collisions in the round 0

round 1

round 2

round 3

round 4

round 5

round 6

AC SB

AC SB

AC SB

AC SB

AC SB

AC SB

AC SB

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

round 0

round 1

round 2

round 3

round 4

round 5

round 6

round 7

AC SB

AC SB

AC SB

AC SB

AC SB

AC SB

AC SB

AC SB

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

Fig. 6. 7-round and 8-round diﬀerential paths for Grøstl-256

380

H. Gilbert and T. Peyrin

same manner as in [27]: we use the birthday paradox between the solutions found for the P and Q branches in order to ﬁnd colliding diﬀerence values for the active cells of the input and the output. If one expects x active cells in the input and y in the output of the diﬀerential path, then one can ﬁnd colliding values by computing 2(x+y)/2 valid candidates for both P and Q. Note that since it is linear, the very last MixColumns function can be ignored and only the number of active cells before this layer should be considered. Assuming that we have 212 × 215 = 227 freedom degrees available in order to apply this birthday attack would be incorrect: one can apply the birthday paradox only for the same diﬀerential path considered. Thus, we have 215 freedom degrees for each birthday attack, and we can repeat this step 212 times. Overall, we can make the input and output diﬀerence values collide for only log2 ((215 )2 × 212 ) = 42 bits. round 0

round 1

round 2

round 3

round 4

round 5

AC SB

AC SB

AC SB

AC SB

AC SB

AC SB

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

round 0

round 1

round 2

round 3

round 4

round 5

AC SB

AC SB

AC SB

AC SB

AC SB

AC SB

round 6 AC SB

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

ShR MC

Fig. 7. 6-round and 7-round diﬀerential paths for Grøstl-256

For this reason, one can not generate collisions for the 7-round and 8-round reduced compression function with the paths from Figure 3. However, by using the slightly diﬀerent paths depicted in Figure 7, one can ﬁnd semi-free-start collisions for the 6-round reduced Grøstl-256 compression function with 264 = 264 operations and memory and for the 7-round reduced case with 256+64 = 2120 operations and 264 memory. The particularity of those paths is that enough freedom degrees are now left to the attacker in order to complete the ﬁnal birthday attack. The drawback is that one more r → 1 transition is uncontrolled for the same total number of rounds. Thus, this type of paths is more costly than the one from Figure 6. 4.4

ECHO

Since its structure is mimicking the AES, our results regarding the internal permutation of ECHO are very similar, but the complexity has to adapted to the ECHO parameters. By using the 8-round path from Figure 3, we can distinguish 8 rounds of the ECHO internal permutation from an ideal 2048-bit permutation with 2768 computations and 2512 memory (the ideal permutation case would require 21024 computations).

Super-Sbox Cryptanalysis: Improved Attacks for AES-Like Permutations

381

Note that this distinguisher does not apply to the ECHO compression function because of the shrink operation utilized after the internal permutation and the feedforward. As a matter of fact the convolution eﬀect of this operation over the output distribution of the permutation makes it considerably more diﬃcult to mount a distinguishing attack on the compression function of ECHO than on its underlying permutation. Moreover, since the Super-Sbox cryptanalysis of the ECHO permutation presented above requires at least 2512 computations and memory, it is not a well suited starting point for trying to mount a distinguisher or a collision search attack against one of the compression functions of ECHO-256 or ECHO-512 (or one of their single-pipe variants).

5

Conclusion

In this paper, we introduced the Super-Sbox cryptanalysis, which very often improves upon the classical rebound or start-from-the-middle attacks both in terms of eﬃciency and simplicity. This technique leads to improved cryptanalytic results for both Grøstl and ECHO, two SHA-3 candidates, and to the best knownkey distinguisher so far for the AES-128 block cipher.

Acknowledgments The authors would like to thank Matt Robshaw for insightful discussions and the anonymous referees for their helpful comments.

References 1. Barreto, P.S.L.M.: An observation on Grøstl. Comment submitted to the NIST hash function mailing list, [email protected], http://www.larc.usp.br/~ pbarreto/Grizzly.pdf 2. Benadjila, R., Billet, O., Gilbert, H., Macario-Rat, G., Peyrin, T., Robshaw, M., Seurin, Y.: SHA-3 Proposal: ECHO. Submission to NIST (2008), http://crypto.rd.francetelecom.com/echo/ 3. Biryukov, A. (ed.): FSE 2007. LNCS, vol. 4593. Springer, Heidelberg (2007) 4. Biryukov, A., Khovratovich, D., Nikolic, I.: Distinguisher and Related-Key Attack on the Full AES-256. In: Halevi, S. (ed.) [15], pp. 231–249 5. Biryukov, A., Nikolic, I.: A New Security Analysis of AES-128. In: CRYPTO 2009 rump session (2009), http://rump2009.cr.yp.to/b6f3cb038135799a7ea398f99faf4a55.pdf 6. Black, J.R., Rogaway, P., Shrimpton, T.: Black-Box Analysis of the Block-CipherBased Hash-Function Constructions from PGV. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, pp. 320–335. Springer, Heidelberg (2002) 7. Canetti, R., Goldreich, O., Halevi, S.: The random oracle methodology, revisited. J. ACM 51(4), 557 (2004) 8. Cramer, R. (ed.): EUROCRYPT 2005. LNCS, vol. 3494. Springer, Heidelberg (2005)

382

H. Gilbert and T. Peyrin

9. Daemen, J., Rijmen, V.: The Design of Rijndael. In: Information Security and Cryptography. Springer, Heidelberg (2002), ISBN 3-540-42580-2 10. Daemen, J., Rijmen, V.: Two-Round AES Diﬀerentials. Cryptology ePrint Archive, Report 2006/039 (2006), http://eprint.iacr.org/ 11. Daemen, J., Rijmen, V.: Understanding Two-Round Diﬀerentials in AES. In: De Prisco, R., Yung, M. (eds.) SCN 2006. LNCS, vol. 4116, pp. 78–94. Springer, Heidelberg (2006) 12. Dunkelman, O. (ed.): FSE 2009. LNCS, vol. 5665. Springer, Heidelberg (2009) 13. Mendel, F., Rechberger, C., Schl¨ aﬀer, M., Thomsen, S.S.: Rebound Attacks on the Reduced Grøstl Hash Function. In: Pieprzyk, J. (ed.) CT-RSA 2010. LNCS. Springer, Heidelberg (to appear, 2010) 14. Gauravaram, P., Knudsen, L.R., Matusiewicz, K., Mendel, F., Rechberger, C., Schl¨ aﬀer, M., Thomsen, S.S.: Grøstl – a SHA-3 candidate. Submission to NIST (2008), http://www.groestl.info 15. Halevi, S. (ed.): CRYPTO 2009. LNCS, vol. 5677. Springer, Heidelberg (2009) 16. Daemen, J., Rijmen, V.: Plateau Characteristics. Information Security, IET 1(1), 11–17 (2007) 17. Jacobson Jr., M.J., Rijmen, V., Safavi-Naini, R. (eds.): SAC 2009. LNCS, vol. 5867. Springer, Heidelberg (2009) 18. Kelsey, J.: Some notes on Grøstl. Comment submitted to the NIST hash function mailing list, [email protected], http://ehash.iaik.tugraz.at/uploads/d/d0/Grostl-comment-april28.pdf 19. Khovratovich, D.: Cryptanalysis of Hash Functions with Structures. In: Jacobson Jr., M.J., et al. (eds.) [17], pp. 108–125 20. Knudsen, L.R., Rechberger, C., Thomsen, S.S.: The Grindahl Hash Functions. In: Biryukov, A. (ed.) [3], pp. 39–57 21. Knudsen, L.R., Rijmen, V.: Known-Key Distinguishers for Some Block Ciphers. In: Kurosawa, K. (ed.) ASIACRYPT 2007. LNCS, vol. 4833, pp. 315–324. Springer, Heidelberg (2007) 22. Knudsen, L.R.: Truncated and Higher Order Diﬀerentials. In: Preneel, B. (ed.) FSE 1994. LNCS, vol. 1008, pp. 196–211. Springer, Heidelberg (1995) 23. Lamberger, M., Mendel, F., Rechberger, C., Rijmen, V., Schl¨ aﬀer, M.: Rebound Distinguishers: Results on the Full Whirlpool Compression Function. In: Matsui, M. (ed.) [24], pp. 126–143 24. Matsui, M. (ed.): ASIACRYPT 2009. LNCS, vol. 5912. Springer, Heidelberg (2009) 25. Matusiewicz, K., Naya-Plasencia, M., Nikolic, I., Sasaki, Y., Schl¨ aﬀer, M.: Rebound Attack on the Full Lane Compression Function. In: Matsui, M. (ed.) [24], pp. 106– 125 26. Mendel, F., Peyrin, T., Rechberger, C., Schl¨ aﬀer, M.: Improved Cryptanalysis of the Reduced Grøstl Compression Function, ECHO Permutation and AES Block Cipher. In: Jacobson Jr., M.J., et al. (eds.) [17], pp. 16–35 27. Mendel, F., Rechberger, C., Schl¨ aﬀer, M., Thomsen, S.S.: The Rebound Attack: Cryptanalysis of Reduced Whirlpool and Grøstl. In: Dunkelman, O. (ed.) [12], pp. 260–276 28. Minier, M., Phan, R.C.-W., Pousse, B.: Distinguishers for Ciphers and Known Key Attack against Rijndael with Large Blocks. In: Preneel, B. (ed.) [33], pp. 60–76 29. National Institute of Standards and Technology. FIPS 180-1: Secure Hash Standard (April 1995), http://csrc.nist.gov 30. National Institute of Standards and Technology. FIPS PUB 197, Advanced Encryption Standard (AES). Federal Information Processing Standards Publication 197, U.S. Department of Commerce (November 2001)

Super-Sbox Cryptanalysis: Improved Attacks for AES-Like Permutations

383

31. National Institute of Standards and Technology. Announcing Request for Candidate Algorithm Nominations for a New Cryptographic Hash Algorithm (SHA-3) Family. Federal Register 27(212), 62212–62220 (2007) 32. Peyrin, T.: Cryptanalysis of Grindahl. In: Kurosawa, K. (ed.) ASIACRYPT 2007. LNCS, vol. 4833, pp. 551–567. Springer, Heidelberg (2007) 33. Preneel, B. (ed.): AFRICACRYPT 2009. LNCS, vol. 5580. Springer, Heidelberg (2009) 34. Preneel, B., Govaerts, R., Vandewalle, J.: Hash Functions Based on Block Ciphers: A Synthetic Approach. In: Stinson, D.R. (ed.) CRYPTO 1993. LNCS, vol. 773, pp. 368–378. Springer, Heidelberg (1994) 35. Rivest, R.L.: RFC 1321: The MD5 Message-Digest Algorithm (April 1992), http://www.ietf.org/rfc/rfc1321.txt 36. Shoup, V. (ed.): CRYPTO 2005. LNCS, vol. 3621. Springer, Heidelberg (2005) 37. Rijmen, V.: Cryptanalysis and design of iterated block ciphers. Ph.D. thesis, KULeuven (1997) 38. Wang, X., Yin, Y.L., Yu, H.: Finding Collisions in the Full SHA-1. In: Shoup, V. (ed.) [36], pp. 17–36 39. Wang, X., Yu, H.: How to Break MD5 and Other Hash Functions. In: Cramer, R. (ed.) [8], pp. 19–35 40. Wu, S., Feng, D., Wu, W.: Cryptanalysis of the LANE Hash Function. In: Jacobson Jr., M.J., et al. (eds.) [17], pp. 126–140

Author Index

Aumasson, Jean-Philippe

134, 318

Bhattacharyya, Rishiraj 168 Billet, Olivier 55 Bos, Joppe W. 75 Bouillaguet, Charles 347

Nikoli´c, Ivica Nohl, Karsten

333 1

Osvik, Dag Arne ¨ Ozen, Onur 94 Peyrin, Thomas

Canright, David Cid, Carlos 40

75

Dunkelman, Orr

347

Etrog, Jonathan

55

Fouque, Pierre-Alain

134, 365

Reyhanitabar, Mohammad Reza Rijmen, Vincent 286 R¨ ock, Andrea 134 Rønjom, Sondre 40 347

Gilbert, Henri 55, 365 Guo, Jian 318 Hatano, Yasuo

75

270

Sasaki, Yu 116 Shrimpton, Thomas 94 Stam, Martijn 94 Stefan, Deian 75 Susilo, Willy 192 Suzaki, Tomoyasu 19

Kaneko, Toshinobu 270 Khovratovich, Dmitry 333 Knellwolf, Simon 318

Tews, Erik 1 Thomsen, Søren S. Toz, Deniz 286

Laigle-Chapuy, Yann 134 Leurent, Ga¨etan 134, 347 Liang, Bo 250

Varıcı, Kerem

Mandal, Avradip 168 Matusiewicz, Krystian 318 Meier, Willi 134, 318 Minematsu, Kazuhiko 19, 230 Mironov, Ilya 153 Mu, Yi 192 Nandi, Mridul 168, 212 Naya-Plasencia, Mar´ıa 134

304

286

Wang, Lei 116 Wang, Peng 250 Watanabe, Dai 270 Weinmann, Ralf-Philipp Wu, Shuang 250 Wu, Wenling 250 Yamada, Tsuyoshi Zhang, Lei 250 Zhang, Liting 250

270

1

192

Fast Software Encryption: 16th International Workshop, FSE 2009 Leuven, Belgium, February 22-25, 2009 Revised Selected Papers (Lecture Notes in Computer Science Security and Cryptology)

Read more

Digital Watermarking, 9th International Workshop, IWDW 2010, Seoul, Korea, October 1-3, 2010, Revised Selected Papers

Read more

Fast Software Encryption - FSE 2011

Read more

Information Security and Cryptology - ICISC 2009: 12th International Conference, Seoul, Korea, December 2-4. 2009. Revised Selected Papers (Lecture Notes in Computer Science Security and Cryptology)

Read more

Information Security, Practice and Experience: 6th International Conference, ISPEC 2010, Seoul, Korea, May 12-13, 2010, Proceedings (Lecture Notes in Computer Science Security and Cryptology)

Read more

Coding and Cryptology: Second International Workshop, IWCC 2009 (Lecture Notes in Computer Science Security and Cryptology)

Read more

Information Hiding: 12th International Conference, IH 2010, Calgary, AB, Canada, June 28-30, 2010, Revised Selected Papers (Lecture Notes in Computer Science Security and Cryptology)

Read more

Information Security Applications: 10th International Workshop, WISA 2009, Busan, Korea, August 25-27, 2009, Revised Selected Papers (Lecture Notes in Computer Science Security and Cryptology)

Read more

Combinatorial Algorithms: 21st International Workshop, IWOCA 2010, London, UK, July 26-28, 2010, Revised Selected Papers (Lecture Notes in Computer ... Computer Science and General Issues)

Read more

Information Security and Cryptology - ICISC 2010 (Lecture Notes in Computer Science)

Read more

Implementation and Application of Functional Languages. 22nd International Symposium IFL 2010 Revised Selected Papers (Lecture Notes in Computer Science)

Read more

Graph-Theoretic Concepts in Computer Science: 36th International Workshop, WG 2010, Zaros, Crete, Greece, June 28-30, 2010, Revised Papers (Lecture Notes ... Computer Science and General Issues)

Read more

Information Security and Cryptology - ICISC 2003: 6th International Conference, Seoul, Korea, November 27-28, 2003, Revised Papers (Lecture Notes in Computer Science)

Read more

Datalog reloaded : first International Workshop, Datalog 2010, Oxford, UK, March 16-19, 2010. Revised selected papers

Read more

Security Protocols: 14th International Workshop, Cambridge, UK, March 27-29, 2006, Revised Selected Papers (Lecture Notes in Computer Science Security and Cryptology)

Read more

Constraint Processing: Selected Papers (Lecture Notes in Computer Science)

Read more

Digital Watermarking: 7th International Workshop, IWDW 2008, Busan, Korea, November 10-12, 2008, Selected Papers (Lecture Notes in Computer Science Security and Cryptology)

Read more

Engineering Secure Software and Systems: Second International Symposium, ESSoS 2010, Pisa, Italy, February 3-4, 2010, Proceedings (Lecture Notes in Computer Science Security and Cryptology)

Read more

Static Analysis: 17th International Symposium, SAS 2010, Perpignan, France, September 14-16, 2010, Proceedings (Lecture Notes in Computer Science, 6337)

Read more

Information Security and Cryptology - ICISC 2004: 7th International Conference, Seoul, Korea, December 2-3, 2004, Revised Selected Papers

Read more

Information Security and Cryptology - ICISC 2005: 8th International Conference, Seoul, Korea, December 1-2, 2005, Revised Selected Papers

Read more

Advances in Cryptology -- CRYPTO 2010: 30th Annual Cryptology Conference, Santa Barbara, CA, USA, August 15-19, 2010, Proceedings (Lecture Notes in Computer Science Security and Cryptology)

Read more

Graph-Theoretic Concepts in Computer Science: 36th International Workshop, WG 2010, Zaros, Crete, Greece, June 28-30, 2010, Revised Papers (Lecture ... Computer Science and General Issues)

Read more

Formal Aspects in Security and Trust: 6th International Workshop, FAST 2009, Eindhoven, The Netherlands, November 5-6, 2009, Revised Selected Papers ... Computer Science Security and Cryptology)

Read more

Provable Security (Lecture Notes in Computer Science)

Read more

Post-Quantum Cryptography: Third International Workshop, PQCrypto 2010, Darmstadt, Germany, May 25-28, 2010, Proceedings (Lecture Notes in Computer Science Security and Cryptology)

Read more

Decision and Game Theory for Security: First International Conference, GameSec 2010, Berlin, Germany, November 22-23, 2010. Proceedings (Lecture Notes in Computer Science Security and Cryptology)

Read more

Trustworthy Global Computing: 5th International Symposium, TGC 2010, Munich, Germany, February 24-26, 2010, Revised Selected Papers (Lecture Notes in ... Computer Science and General Issues)

Read more

Information Hiding: 11th International Workshop, IH 2009, Darmstadt, Germany, June 8-10, 2009, Revised Selected Papers (Lecture Notes in Computer Science Security and Cryptology)

Read more

Information Hiding: 11th International Workshop, IH 2009, Darmstadt, Germany, June 8-10, 2009, Revised Selected Papers (Lecture Notes in Computer Science Security and Cryptology)

Read more

Recommend Documents

Fast Software Encryption: 16th International Workshop, FSE 2009 Leuven, Belgium, February 22-25, 2009 Revised Selected Papers (Lecture Notes in Computer Science Security and Cryptology)

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...

Digital Watermarking, 9th International Workshop, IWDW 2010, Seoul, Korea, October 1-3, 2010, Revised Selected Papers

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris...

Fast Software Encryption - FSE 2011

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris...

Information Security and Cryptology - ICISC 2009: 12th International Conference, Seoul, Korea, December 2-4. 2009. Revised Selected Papers (Lecture Notes in Computer Science Security and Cryptology)

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...

Information Security, Practice and Experience: 6th International Conference, ISPEC 2010, Seoul, Korea, May 12-13, 2010, Proceedings (Lecture Notes in Computer Science Security and Cryptology)

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...

Coding and Cryptology: Second International Workshop, IWCC 2009 (Lecture Notes in Computer Science Security and Cryptology)

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...

Information Hiding: 12th International Conference, IH 2010, Calgary, AB, Canada, June 28-30, 2010, Revised Selected Papers (Lecture Notes in Computer Science Security and Cryptology)

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...

Information Security Applications: 10th International Workshop, WISA 2009, Busan, Korea, August 25-27, 2009, Revised Selected Papers (Lecture Notes in Computer Science Security and Cryptology)

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...

Combinatorial Algorithms: 21st International Workshop, IWOCA 2010, London, UK, July 26-28, 2010, Revised Selected Papers (Lecture Notes in Computer ... Computer Science and General Issues)

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris...

Information Security and Cryptology - ICISC 2010 (Lecture Notes in Computer Science)

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris...