Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos New York University, NY, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, TX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
3017
3
Berlin Heidelberg New York Hong Kong London Milan Paris Tokyo
Bimal Roy Willi Meier (Eds.)
Fast Software Encryption 11th International Workshop, FSE 2004 Delhi, India, February 5-7, 2004 Revised Papers
13
Volume Editors Bimal Roy Applied Statistics Unit, Indian Statistical Institute 203, B.T. Road, Calcutta 700 108, India E-mail:
[email protected] Willi Meier FH Aargau, 5210 Windisch, P.O. Box , Switzerland E-mail:
[email protected]
Library of Congress Control Number: 2004107501 CR Subject Classification (1998): E.3, F.2.1, E.4, G.2, G.4 ISSN 0302-9743 ISBN 3-540-22171-9 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable to prosecution under the German Copyright Law. Springer-Verlag is a part of Springer Science+Business Media springeronline.com c Springer-Verlag Berlin Heidelberg 2004
Printed in Germany
Typesetting: Camera-ready by author, data conversion by DA-TeX Gerd Blumenstein Printed on acid-free paper SPIN: 11011880 06/3142 543210
Preface
Fast Software Encryption is a now eleven years old workshop on symmetric cryptography, including the design and analysis of block ciphers and stream ciphers as well as hash functions and message authentication codes. The FSE workshop was held the first time in Cambridge in 1993, followed by Leuven in 1994, Cambridge in 1996, Haifa in 1997, Paris in 1998, Rome in 1999, New York in 2000, Yokohama in 2001, Leuven in 2002, and Lund in 2003. This Fast Software Encryption Workshop, FSE 2004, was held February 57, 2004 in Delhi, India. The workshop was sponsored by IACR (International Association for Cryptologic Research) and organized in cooperation with the Indian Statistical Institute, Delhi, and Cryptology Research Society of India (CRSI). This year a total of 75 papers were submitted to FSE 2004. After a seven week reviewing process, 28 papers were accepted for presentation at the workshop. In addition, we were fortunate to have in the program two invited talks, by Adi Shamir and David Wagner. During the workshop a rump section was held. 7 presentations were made & all the presentors were given the option of submitting it for possible inclusion in the proceedings. Only one paper was submitted which was refereed and accepted. This paper appears at the end of these proceedings. We would like to thank the following people. First Springer Verlag for publishing the proceedings in the Lecture Notes in Computer Science. Next the submitting authors, the committee members, the external reviewers, the general co-chairs Subhamoy Maitra and R.L. Karandikar, and the local organizing committee, for their hard work. Bart Preneel for letting us use COSIC’s Webreview software in the review process and Thomas Herlea for his support. We are indebted to Lund University, especially Thomas Johansson, Bijit Roy and Sugata Gangopadhyay for hosting the Webreview site. Additionally we would like to thank Partha Mukhopadhyay, Sourav Mukhopadhyay, Malapati Raja Sekhar, Chandan Biswas for handling all submissions and Madhusudan Karan for putting together the Pre-Proceedings. We also like to thank the sponsors: Infosys Technology Ltd., Honeywell Corporation and Via Technology. The organizing committee consists in Sanjay Burman (CAIR, Bangalore), Ramendra S. Baoni (Bisecure Technologies Pvt. Ltd., Delhi), Hiranmoy Ghosh (Tata Infotech Ltd. Delhi), Abdul Sakib Mondal (Infosys Technologies Ltd., Bangalore), Arup Pal (ISI, Delhi), N.R. Pillai (SAG, Delhi), P.K.Saxena (SAG, Delhi), and Amitabha Sinha (ISI, Kolkata), who served as Treasurer. Let’s thank them all. February 2004
Bimal Roy Willi Meier
VI
Preface
Fast Software Encryption 2004 February 5–7, 2004, Delhi, India Sponsored by the International Association for Cryptologic Research in cooperation with Indian Statistical Institute, Delhi and Cryptology Research Society of India General Co-Chairs Subhamoy Maitra, Indian Statistical Institute, Kolkata and R.L. Karandikar, Indian Statistical Institute, Delhi Program Co-Chairs Bimal Roy, Indian Statistical Institute, Kolkata and Willi Meier, Fachhochschule Aargau, Switzerland
Program Committee Eli Biham Claude Carlet Don Coppersmith Cunsheng Ding Helena Handschuh Thomas Johansson Charanjit S. Jutla Lars R. Knudsen Kaoru Kurosawa Kaisa Nyberg C. Pandu Rangan Dingyi Pei Bart Preneel Matt Robshaw Serge Vaudenay C.E. Veni Madhavan Xian-Mo Zhang
Technion Israel Inria, France IBM, USA Hong Kong University of Science and Technology Gemplus, France Lund University, Sweden IBM Research, USA Technical University of Denmark Ibaraki University, Japan Nokia, Finland Indian Institute of Technology, Chennai Chinese Academy of Sciences K.U. Leuven, Belgium Royal Holloway, University of London, U.K. EPFL, Switzerland Indian Institute of Science, Bangalore Macquarie University, Australia
Preface
VII
External Reviewers Gildas Avoine Thierry Berger Christophe Clavier Orr Dunkelman Marc Girault Shoichi Hirose Fredrik J¨ onsson Ju-Sung Kang Yong Li Marine Minier Enes Pasalic Haavard Raddum Xiaojian Tian Akihiro Yamamura
Thomas Baign`eres Qingjun Cai Nicolas Courtois Eric Filiol Philippe Guillot Tetsu Iwata Jakob Johnsson Tania Lange Yi Lu Jean Monnerat Souradyuti Paul Palash Sarkar Xuesong Wang Julien Brouchier
Elad Barkan Carlos Cid Christophe De Canni`ere Henri Gilbert Louis Guillou Thomas Jakobsen Pascal Junod Joseph Lano Miodrag Mihaljevic Sean Murphy Michael Quisquater Takeshi Shimoyama Guohua Xiong Ju-Sung Kang
New Cryptographic Primitives Based on Multiword T-Functions Alexander Klimov and Adi Shamir Computer Science department The Weizmann Institute of Science, Rehovot 76100, Israel {ask,shamir}@weizmann.ac.il
Abstract. A T-function is a mapping from n-bit words to n-bit words in which for each 0 ≤ i < n bit i of the output can depend only on bits 0, 1, . . . , i of the input. All the boolean operations and most of the numeric operations in modern processors are T-functions, and their compositions are also T-functions. In earlier papers we considered ‘crazy’ T-functions such as f (x) = x + (x2 ∨ 5), proved that they are invertible mappings which contain all the 2n possible states on a single cycle for any word size n, and proposed to use them as primitive building blocks in a new class of software-oriented cryptographic schemes. The main practical drawback of this approach is that most processors have either 32 or 64 bit words, and thus even a maximal length cycle (of size 232 or 264 ) may be too short. In this paper we develop new ways to construct invertible T-functions on multiword states whose iteration is guaranteed to yield a single cycle of arbitrary length (say, 2256 ). Such mappings can lead to stream ciphers whose software implementation on a standard Pentium 4 processor can encrypt more than 5 gigabits of data per second, which is an order of magnitude faster than previous designs such as RC4.
1
Introduction
There are two basic approaches to the design of secret key cryptographic schemes, which we can call ‘tame’ and ‘wild’. In the tame approach we try to use only simple primitives (such as linear feedback shift registers) with well understood behaviour, and try to prove mathematical theorems about their cryptographic properties. Unfortunately, the clean mathematical structure of such schemes can also help the cryptanalyst in his attempt to find an attack which is faster than exhaustive search. In the wild approach we use crazy compositions of operations (which mix a variety of domains in a nonlinear and nonalgebraic way), hoping that neither the designer nor the attacker will be able to analyse the mathematical behaviour of the scheme. The first approach is typically preferred in textbooks and toy schemes, but real world designs often use the second approach. In several papers published in the last few years [5, 6], we tried to bridge this gap by considering ‘semi-wild’ constructions which look like crazy combinations of boolean and arithmetic operations, but have many analyzable mathematical properties. In particular, we defined the class of T-functions which contains arbitrary compositions of plus, minus, times, or, and, xor operations on n-bit words, B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 1–15, 2004. c International Association for Cryptologic Research 2004
2
Alexander Klimov and Adi Shamir
and showed that it is easy to analyse their invertibility and cycle structure for arbitrary word sizes. Such constructions can replace LFSRs and linear congruential mappings (which are vulnerable to correlation and algebraic attacks) in a new class of stream ciphers and pseudo random generators. The paper is organized in the following way. In section 2 we recall the basic definitions from [5] for single word mappings, and consider several ways in which they can be extended to the multiword case. In section 3 we extend our bit-slice technique to analyse the invertibility of multiword T-functions. In section 4 we extend our technique from [6] to analyse the cycle structure of multiword Tfunctions. Finally, in section 5 we provide experimental data on the speed of several possible implementations of our functions on a PC.
2
Multiword T-Functions
Invertible mappings with a single cycle have many cryptographic applications. The main context in which we study them in this paper is pseudo random generation and stream ciphers. Modern microprocessors can directly operate on up to 64-bit words in a single clock cycle, and thus a univariate mapping can go through at most 264 different states before entering a cycle. In some cryptographic applications this cycle length may be too short, and in addition the cryptanalyst can guess a 64 bit state in a feasible computation. A common way to increase the size of the state and extend the period of a generator is to run in parallel and combine the outputs of several generators with different periods. The overall period is determined by the least common multiple of their individual periods. This works well with LFSRs, whose periods 2n1 − 1, 2n2 − 1, . . . can be relatively prime, and thus the overall period can be their product. However, our univariate mappings have periods of 2n1 , 2n2 , . . . whose least common multiple is just 2max(n1 ,n2 ,...) . A partial solution to this problem is to cyclically use a large number of different state update functions, starting from a secret state and a secret index. For example, we can use 64-bit words and 216 − 1 different constants Ck to get a guaranteed cycle length of almost 280 from the following simple generator: Theorem 1. Consider the sequence {(xi , ki )} defined by iterating xi+1 = xi + (x2i ∨ Cki ) mod 2n , ki+1 = ki + 1 mod m, where each x is an n-bit word and Ck is some n-bit constant for each k = 0, . . . , m − 1. Then the sequence of pairs (xi , ki ) has a maximal period (of size m−1 m2n ) if and only if m is odd, and for all k, [Ck ]0 = 1 and k=0 [Ck ]2 = 1. A special case of this theorem for m = 1 is that the function f (x) = x + (x2 ∨ C) is invertible with a single cycle if and only if both the least significant bit and the third least significant bit in C are 1, and the smallest such C is 5.
New Cryptographic Primitives Based on Multiword T-Functions
3
Unfortunately, the cyclic change of state update functions is inconvenient, and it cannot yield really large cycles (e.g., of 2256 possible states). We can try to solve the problem by using a single high precision variable x (say, with 256 bits), but the multiplication of such long variables can become prohibitively expensive. What we would like to do is to define the mapping by operating separately on the various input words, without trying to interpret the result as a natural mathematical operation on multi-precision words. Let us first review the definitions from [5] in the case of univariate mappings. Let x be an n-bit word. We can view x as a vector of bits denoted by ([x]n−1 , . . . , [x]0 ), where the least significant bit has number 0. In this bit notation, the univariate function f (x) = x + 1 (mod 2n ) can be expressed in the following way: [f (x)]0 [f (x)]1 [f (x)]2
= f0 ([x]0 ) = f1 ([x]1 ; [x]0 ) = f2 ([x]2 ; [x]1 , [x]0 ) .. .
= [x]0 ⊕ 1 = [x]1 ⊕ α1 ([x]0 ) = [x]2 ⊕ α2 ([x]1 , [x]0 ) .. .
(1)
[f (x)]n−1 = fn−1 ([x]n−1 ; [x]n−2 , . . . , [x]0 ) = [x]n−1 ⊕ αn−1 ([x]n−2 , . . . , [x]0 ), where each αi denotes one of the carry bits. Note that for any bit position i, [f (x)]i depends only on [x]i , . . . , [x]0 and does not depend on [x]n−1 , . . . , [x]i+1 . We call any univariate function f which has this property a T-function (where ‘T’ is short for triangular ). Note further that each carry bit αi depends only on strictly earlier input bits [x]i−1 , . . . , [x]0 but not on [x]i . This is a special type of a T-function, which we call a parameter. To provide some intuition from the theory of linear transformations on n-dimensional spaces, we can say that Tfunctions roughly correspond to lower triangular matrices, parameters roughly correspond to lower triangular matrices with zeroes on the diagonal, and a Tfunction can be roughly represented as a diagonal matrix plus a parameter. Let us now define these notions for functions which map several input words into one output word. The natural extension of the notion of a T-function in this case is to allow bit i of the output to depend only on bits 0 to i of each one of the inputs. The observation which makes this notion interesting is that all the boolean operations and most of the arithmetic operations available on modern processors are T-functions. In particular, addition (‘+’), subtraction (‘binary −’), negation (‘unary −’), multiplication (‘∗’), or (‘∨’), and (‘∧’), exclusive or (‘⊕’), and complementation (‘¬’) (where the boolean operations are performed on all the n bits in parallel and the arithmetic operations are performed modulo 2n ) are T-functions with one or two inputs. We call these eight functions primitive operations. Note that circular rotations and right shifts are not T-functions, but left shifts can be expressed as multiplication by a power of 2 and thus they are T-functions. Since the composition of T-functions is also a T-function, any ‘crazy’ function which contains arbitrarily many primitive operations is always a T-function.
4
Alexander Klimov and Adi Shamir
In order to define multiword mappings f which can be iterated, we have to further extend the notion to functions with the same number m of input and output words. We can represent the multiword input as the following n × m bit matrix B n×m : ⎞ ⎛ ⎞ ⎛ [x]0,n−1 . . . [x]0,1 [x]0,0 x0 ⎜ x1 ⎟ ⎜ [x]1,n−1 . . . [x]1,1 [x]1,0 ⎟ ⎟ ⎜ ⎟ ⎜ (2) x=⎜ . ⎟=⎜ ⎟. . . .. . .. .. .. ⎠ ⎝ .. ⎠ ⎝ . xm−1 [x]m−1,n−1 . . . [x]m−1,1 [x]m−1,0 We can now consider the columns of the bit matrix as parallel bit slices with no internal bit order, and say that a multiword mapping is a T-function if all the bits in column i of the matrix of output words can depend only on bits in columns 0 to i of the matrix of input words. In this interpretation it is still true that any composition of primitive operations is a multiword T-function, but some of the proven properties of univariate T-functions (e.g., that all the cycle lengths are powers of 2) are no longer true. An alternative definition of multiword T-functions is to concatenate all the input words into one long word, to concatenate all the output words into one long word, and then to use the standard univariate definition of a T-function in order to limit which input bits can affect which output bits. If we denote the l input words by xu , xv , . . ., then we define the single logical variable x by x = (xu , . . . , xw ) = ([x]n(l−1)+(n−1) , . . . , [x]n(l−1) , . . . [x]n−1 , . . . , [x]0 ).
(3)
Note that in this interpretation f (x) = (fu , fv ) = (xu + xv , xv ) is a Tfunction, but the very similar f (x) = (fu , fv ) = (xu , xu +xv ) is not a T-function, and thus we cannot compose primitive operations in an arbitrary way. On the other hand, we can obtain many new types of T-functions in which low-order words can be manipulated by non-primitive operations (such as cyclic rotation) before we use them to compute higher order output words. Our actual definition of multiword T-functions combines and generalizes these two possible interpretations. Let x be an nl × m bit matrix (Bnl×m ): ⎛ [x] ⎞ · · · [x] [x] ··· [x] · · · [x] 0,n(l−1)+(n−1)
⎜ ⎝
0,n(l−1)+1
0,n(l−1)
0,n−1
0,0
[x]1,n(l−1)+(n−1) · · · [x]1,n(l−1)+1 [x]1,n(l−1) ··· [x]1,n−1 · · · [x]1,0 ⎟ . . . . . ⎠. .. .. . . . . . . . . . . ··· . . [x]m−1,n(l−1)+(n−1) · · · [x]m−1,n(l−1)+1 [x]m−1,n(l−1) · · · [x]m−1,n−1 · · · [x]m−1,0
(4) We consider it as an m × l matrix of n ⎛ x0,u ⎜ .. x=⎝ . xm−1,u
bits words:
⎞ . . . x0,w ⎟ .. .. ⎠. . . . . . xm−1,w
We concatenate the l words in each row into a single logical variable, and then consider the collection of the m long variables as the inputs to the T-function.
New Cryptographic Primitives Based on Multiword T-Functions
5
Finally, we allow the bits in column i of the output matrix to depend only on bits in columns 0, . . . , i in the input matrix. To demonstrate this notion, consider the following mapping over 4-tuples of words: x0,v + x1,v x0,u + x1,u x0,v f (x) = . x0,u − x1,u (x1,v 1) x0,v ⊕ x1,v This is a valid T-function under our general multiword definition even though it contains the non-primitive right shift operation 1 .
3
Bit Slice Analysis and Invertibility
The main tool we use in order to study the invertibility of T-functions is bit slice analysis. Its basic idea is to define the mapping from [x]i to [f (x)]i by abstracting out the complicated dependency on [x]0...i−1 via the notion of parameters. For example, the size of the explicit description of the mapping [f (x)]i = φ([x]0 , . . . , [x]i ) in the function f (x) = x + (x2 ∨ 5) grows exponentially with i, but it can be written as [f (x)]i = [x]i ⊕ αi , where αi is some function of [x]0 , . . . , [x]i−1 (that is, a parameter). By using this parametric representation we can easily prove the invertibility of the mapping by induction on i, since if we already know bits 0 to i − 1 of the input x and bit i of the output f (x), we can (in principle) calculate the value of αi and thus derive in a unique way bit i of the input. Intuitively, this is the same technique we use in order to solve a triangular system of linear equations, except that in our case the explicit description of αi can be extremely complicated, and thus we do not use this technique as a real inversion algorithm for f (x), but only in order to prove that this inverse is uniquely defined. The main observation in [5] was that such an abstract parametric representation can be easily derived for any composition of primitive operations by the following recursive definition, in which i can be any bit position except zero: 0
[xy] + x− y ⊕
0 x ∧∨ y 0 i [xy] +
x − y i ⊕
x ∧∨ y
i
= [x]0 ∧ [y]0 = [x]0 ⊕ [y]0 = [x]0 ∧∨ [y]0 = [x]i α[y]0 ⊕ α[x]0 [y]i ⊕ αxy = [x]i ⊕ [y]i ⊕ αx±y = [x]i
⊕ ∧ ∨
[y]i
(5)
To demonstrate this technique, consider our running example: x + (x2 ∨ 5) 0 =
[x]0 ⊕ x2 ∨ 5 0 = [x]0 ⊕ 1 and, for i > 0, x + (x2 ∨ 5) i = [x]i ⊕ ( x2 i ∨ [5]i ) ⊕ αx+(x2 ∨5) = [x]i ⊕ (([x]i α[x]0 ⊕ α[x]0 [x]i ⊕ αx2 ) ∨ [5]i ) ⊕ αx+(x2 ∨5) = [x]i ⊕ α. This invertibility test can be easily generalized to the multivariate case (2). Let us show an example of such a construction. We start from an arbitrary non singular matrix which denotes a possible bit slice mapping, such as: 1α 01
6
Alexander Klimov and Adi Shamir
We can add to this linear mapping an affine part (where α, β, and γ are arbitrary parameters) and get the following bit slice structure: [x0 ]i [x0 ]i ⊕ α [x1 ]i ⊕ β (6) → [x1 ]i [x1 ]i ⊕ γ. It is easy to check that for i > 0 the i-th bit slice of the following mapping matches (6): x0 + (x20 ∧ x1 ) x0 → x1 + x20 . x1 Unfortunately, the least significant bit slice of this mapping is not invertible: [x0 ]0 ⊕ [x0 ]0 [x1 ]0 [x0 ]0 → [x1 ]0 [x1 ]0 ⊕ [x0 ]0 . So we have to apply a little tweak to fix it: x0 x0 + ((x20 ∧ x1 ) ∨ 1) → x1 + x20 . x1 The reader may get the impression that the bit slice mappings of invertible functions are always linear. From (5) it is easy to see that every expression which uses only ⊕, +, − and × has linear i-th bit slice, but in general this is not true.
4
The Single Cycle Property
A T-function has the single cycle property if its repeated application to any initial state goes through all the possible states. Let us recall the basic results from [6] in the univariate case. Invertibility is a prerequisite of the single cycle property. If a T-function has a single cycle modulo 2k then it has a single cycle modulo 2k−1 . If a T-function has a cycle of length l modulo 2k−1 then modulo 2k it has either a cycle of length 2l or two cycles of length l. Taking into account the fact that modulo 21 a function has either one cycle of length two or two cycles of length one we can conclude that the size of any cycle of a T-function is always a power of 2. In the univariate case a bit slice of an invertible T-function has the form [f (x)]i = [x]i ⊕ α. From (5) it follows that f (x) has one of the following forms: f1 (x) = x ⊕ r1 (x), f2 (x) = x + r2 (x) or f3 (x) = xr3 (x), where the ri are parameters (in the case of multiplication additionally we need [r3 ]0 = 1). It is easy to see that [f3 (x)]0 = [x]0 [r3 (x)]0 = [x]0 , that is it has two cycles modulo 2 and so it can not form a single cycle modulo 2n . So, a single cycle function has either1 the first or the second form. In order to analyse the cycle structure of these forms the following definitions of even and odd parameters 1
Note that there is no exclusive or here since every function can be represented in both forms, for example x + 1 = x ⊕ (x ⊕ (x + 1)).
New Cryptographic Primitives Based on Multiword T-Functions
7
were introduced. Suppose that r(x) is a parameter, that is r(x) = r(x + 2n−1 ) (mod 2n ). So, r(x) = r(x + 2n−1 ) + 2n b(x) (mod 2n+1 ). Consider B[r, n] = 2
−n
2n−1 −1
(r(i + 2
n−1
) − r(i))
i=0
(mod 2) =
2n−1 −1
b(i).
(7)
i=0
The parameter is called even if B[r, n] is always zero, and odd if B[r, n] is always one2 . Let us give several examples of even parameters: – r(x) = C, where C is an arbitrary constant (r(x) = r(x + 2n−1 ) and so, b = 0 and B = 0); – r(x) = 2x (r(x + 2n−1 ) = r(x) + 2n (mod 2n+1 ), so b(x) = 1 and B is even as long as 2n−1 is even, that is for n ≥ 2); – r(x) = x2 (r(x + 2n−1 ) = r(x) + 2n x + 22(n−1) , so b(x) = [x]0 and B is even for n ≥ 3); – r(x) = 4g(x), where g(x) is an arbitrary T-function (r(x + 2n−1 ) − r(x) = 4(g(x + 2n−1 ) − g(x)) = 0 (mod 2n )); + r (x), where r and r are simultaneously even or odd param– r(x) = r (x) − ⊕ eters (B = B ⊕ B ). – r(x) = r (x) ∨ C, where C is an arbitrary constant and r (x) is an even parameter (if [C]i = 0 then [r(x)]i = [r (x)]i , and if [C]i = 1 then [r(x)]i = [C]i , so in both cases [r(x)]i is the same as for some even parameter.) The following theorem was proved in [6]: Theorem 2. Let N0 be such that x → x + r(x) mod 2N0 defines a single cycle and for n > N0 the function r(x) is an even parameter. Then the mapping x → x + r(x) mod 2n defines a single cycle for all n. We can use our running example of f (x) = x + (x2 ∨ C) to demonstrate this theorem. If the binary form of C ends with . . . 1 10 1, then C = 5, 7 (mod 8), and x2 ∨ C = C (mod 8) is an odd constant modulo 23 so x + C has a single cycle modulo 23 . In addition, x2 is an even parameter for n ≥ 3, and this is not affected by ‘or’ing it with an arbitrary constant. In [6] it was shown that x → x+(x2 ∨C) is the smallest nonlinear expression which defines a single cycle, in other words there is no nonlinear expression which defines a single cycle and consists of less than three operations. Another important class of single cycle mappings is f (x) = 1 + x + 4g(x) for an arbitrary T-function g(x). It turns out that x86 microprocessors have an instruction which allows us to calculate 1 + x + 4y with a single instruction3 and 2
3
Note that in the general case B is a function of n, and thus the parameter can be neither even nor odd. We often relax these definitions by allowing exceptions for small n such as 1 or 2. The lea (load effective address) instruction makes it possible to calculate any expression of the form C + R1 + kR2 , where C is a constant, R1 and R2 are registers and k is a power of two. Its original purpose was to simplify the calculation of addresses in arrays.
8
Alexander Klimov and Adi Shamir
thus the single cycle mapping f (x) = 1 + x + 4x2 can be calculated using only two instructions. Odd parameters are less common and harder to construct. Their main application is in mappings of the form x ⊕ r(x): Theorem 3. Let N0 be such that x → x ⊕ r(x) mod 2N0 defines a single cycle and for n > N0 the function r(x) is an odd parameter. Then the mapping x → x ⊕ r(x) mod 2n defines a single cycle for all n. Proof. Let us prove this by induction: suppose that the mapping defines a single cycle modulo 2n and we are going to prove that it defines a single cycle modulo 2n+1 . Since there are only two possible cycle structures modulo 2n+1 n (a single cycle of size 2n+1 or two cycles
of sizen 2 ) we will prove that the second case is impossible, that is x(0) n = x(2 ) n at least for one x. Recall that [x]n is the most significant bit of x modulo 2n+1 . Let x(0) = 0, since r is a parameter it follows that r(i) = r(i + 2n ) (mod 2n+1 ), and so
(2n ) 2n −1 n = r(x(0) ) ⊕ . . . ⊕ r(x(2 −1) ) n = i=0 [r(i)]n = x n
2n−1 −1 2n−1 −1 ( r(i + 2n−1 ) n ⊕ [r(i)]n ) = i=0 ( r(i + 2n−1 ) − r(i) n ) = 1. i=0
Here we use the fact that r(i + 2n−1 ) n − [r(i)]n = r(i + 2n−1 ) − r(i) n . Recently the related notions of measure preservation and ergodicity of compatible functions over p-adic numbers were independently studied by Anashin [1].4 His motivation was mathematical rather than cryptographic, and he used different techniques. In order to study if a T-function is invertible (respectively, has a single cycle property) he tried to represent it as f (x) = d + cx + pv(x) (respectively, f (x) = c + rx + p(v(x + 1) − v(x))), or to represent it as Mahler interpolation series, or to use the notion of uniform differentiability. The first characterization is the most general (that is v(x) can be any T-function) and complete (he proved that every invertible (respectively, a single cycle function) can be represented in this form. For example, it follows that there exists vf (x), such that f (x) = x + (x2 ∨ 5) = 1 + x + 2(vf (x + 1) − vf (x)) and in order to prove that f (x) defines a single cycle it is enough to find such a function vf (x). This example shows that this criterion is not that good in practice in checking properties of a given function but it allows us to construct arbitrary complex functions with needed properties. For the second approach any function f can be ∞ x(x−1)···(x−i+1) . It turns represented as a Mahler interpolation series i=0 ai i! out, for example, that a T-function is invertible if and only if a1 2 = 1 and ai 2 ≤ 2−log2 i−1 , for i = 2, 3, . . .. This is used to prove theoretical results similar to the previous one, but once again for practical purposes it is usually hard to represent a given function as a Mahler series. The uniformly differentiable5 func4
5
This could be translated to our terminology as follows: measure preservation — invertibility, ergodicity — a single cycle property, compatible — T-function. To simplify reading we will continue to use our terminology, but an interested reader who will refer to his paper should keep in mind this “dictionary”. It is exactly the same concept as the usual notion of uniform differentiability of real functions, but with respect to the p-adic distance.
New Cryptographic Primitives Based on Multiword T-Functions
9
tions allow us to use Hensel lifting. However, there are several obstacles to the application of this theorem in practice. First of all, it is not always easy to tell if a given function is uniformly differentiable. Consider, for example, x + (x2 ∨ 5): ∨ is not a differentiable operation but according to [2] in this particular case (x + (x2 ∨ 5)) = (x + x2 + 5 − (x2 ∧ 5)) = 1 + 2x + 2x(u ∧ 5)|u=x2 , but (u ∧ 5) = 0 for h2 ≤ 18 , and so the whole expression is uniformly differentiable. Unfortunately, this trick does not work for the general case x + (x2 ∨ C). Moreover, there are invertible mappings which are not uniformly differentiable. The second obstacle is how to find N0 . In the very restricted case of polynomials (expressions which use only +, − and ×) it is possible to calculate this number in advance, but this is not possible even if we add only ⊕: we found a family of functions fi (x) (which use only +, −, and ⊕) such that for every N there is N0 ≥ N , iN0 such that fiN0 has a single cycle modulo 2N0 but not modulo 2N0 +1 . In particular, if f0 = x − 1, fi = ((f (i − 1, x) ⊕ x) + x) ⊕ (x + x) then N0 = i + 2, i = 2r . Both Anashin’s techniques and our techniques can be used to completely characterize all the univariate polynomial mappings modulo 2n which are invertible with a single cycle: d Theorem 4. A polynomial P (x) = i=0 ai xi is invertible modulo any 2n if and only if it is invertible modulo 4, and it has a single cycle modulo any 2n if and only if it has a single cycle modulo 8. 6 Proof. From (5) it follows that [P (x)]0 = [a0 ]0 + [(a1 + · · · + ad )]0 [x]0 and [P (x)]i = [a1 ]0 [x]i ⊕ odd k≥3 [ak ]0 [x]0 [x]i ⊕ α. Since a T-function is invertible if and only if each bit slice is invertible, the following conditions are necessary and sufficient for the invertibility of a polynomial: [(a1 + · · · + ad )]0 = 1, [a1 ]0 = 1 and odd k≥3 [ak ]0 = 0. In order to prove the single cycle property let us represent the polynomial in the following form: P (x) = x+P (x). Since in order to generate a single cycle be invertible it follows that [(a1 + · · · + ad )]0 = 0, P (x) should [a1 ]0 = 0 and odd k≥3 [ak ]0 = 0. Now we need to prove that P (x) is an even 2n−1 −1 2n−1 −1 parameter, that is, that b(i) = 2−n i=0 (P (i + 2n−1 ) − P (i)) i=0 (mod 2) = 0. Let us do it separately for different parts of P . It is easy to show that a0 , a1 x with even a1 and ak xk for even k are even parameters for n ≥ 3. Let us prove that the sum of odd powers is also an even parameter if 2n−1 −1 −n n−1 k ) − ak i k ) = k [ak ]0 = 0 mod 2 and n ≥ 3: 2 odd k≥3 (ak (i + 2 i=0 ik−2 22(n−1)−n + · · ·) = 0 (mod 2). 2−n i k ak kik−1 2n−1 + i k ak ( k(k−1) 2 So, if P (x) defines a single cycle modulo 8 then it defines a single cycle modulo any 2n . On the other hand if P (x) does not define a single cycle modulo 8 then it does not define a single cycle modulo any 2n ≥ 8.
It is easy to verify that exactly 1/8 of all the polynomials are invertible, and 1/64 of all the polynomials have a single cycle, and thus it is easy to pick random polynomials with these properties. Typical examples of quadratic single cycle polynomials are f (x) = (x + 1)(2x + 1) and f (x) = 6x2 − x + 1. 6
See [5] for details.
10
Alexander Klimov and Adi Shamir
Our goal now is to construct invertible multiword T-functions whose iteration defines a single cycle. Let us start with T-functions of type (2), which map m parallel input words to m parallel output words. We would like to construct a mapping ⎞ ⎛ ⎞ ⎛ f0 (x0 , . . . , xm−1 ) x0 ⎜ ⎟ ⎜ .. ⎟ .. ⎠ ⎝ . ⎠→⎝ . xm−1
fm−1 (x0 , . . . , xm−1 )
which is invertible and has the maximum possible period of 2mn . Several simple constructions can be easily shown to be impossible. For example: Theorem 5. No T-mapping of the form (x0 , x1 ) → (f0 (x0 ), f1 (x0 , x1 )) can have a period of 22n . Proof. Suppose there is a mapping (x0 , x1 ) → (f0 (x0 ), f1 (x0 , x1 )), where f0 and f1 are T-functions, such that it has a period of size P = 22i modulo 2i . This means that, for example, f (P ) (0, 0) = (0, 0) and ∀p < P , f (p) (0, 0) = (0, 0). Let p = 22(i−1) . Since the period of the mapping modulo 2i−1 is at most 22(i−1) it follows that f (p) (0, 0) = (0, 0) if and only if the same holds for the most significant bits, but since f0 depends only on x0 modulo 2i the period of f0 is at most 2i (p) (2p) (3p) and so (for sufficiently large i) the most significant bit of f0 = f0 = f0 and since the most significant bit of f1 can assume only two possible values it follows that either f (p) = f (2p) or f (p) = f (3p) which is a contradiction. Also it can be shown that it is impossible to obtain a single cycle function of this type from slice-linear m−1mappings, which are T-functions which have the following form: [fj ]i = k=0 [Cj,k ]i [xk ]i ⊕αj,i , where (Ci,j ) is a constant invertible matrix and m > 2. Note that the construction is trivial in the two extreme cases of m = 1 and n = 1. In the univariate case (m = 1) the answer is given by theorem 3: x → x⊕α, where α is an odd parameter. The case of n = 1 is also simple because every function is then a T-function, and it is easy to define a counting transformation such as fi = xi ⊕ (x0 ∧ · · · ∧ xi−1 ) which goes through all the states in the natural order 0 . . . 00 → 0 . . . 01 → 0 . . . 10 → · · · → 1 . . . 11 → 0 . . . 00. Let us combine these two cases: fi (x0 , . . . , xm−1 ) = xi ⊕ (αi (x0 , . . . , xm−1 ) ∧ x0 ∧ · · · ∧ xi−1 ),
(8)
where each αi is an odd parameter, that is (2n −1,...,2n −1)
[αi (x0 , . . . , xm−1 )]n = 1
(x0 ,...,xm−1 )=(0,...,0)
and [αi ]0 = 1. We can now prove: Theorem 6. The mapping defined by (8) defines a single cycle of length 2mn .
New Cryptographic Primitives Based on Multiword T-Functions
11
Proof. Let us prove it by induction. For n = 1 we know that it is true. Suppose that it is a single cycle modulo 2n and let us prove this for 2n+1 . Suppose without loss of generality that (x0 , . . . , xm−1 )(0) = (0, . . . , 0). From the induction mn (i) mod 2n }2i=0 contains all the hypothesis it follows 0 , . . . , xm−1 )
possible that {(x mn (2 ) (0) (l+2mn ) (l) = x0 ⊕ α0 = 1, more generally, x0 = x0 ⊕1, tuples, so x0 n n n so (2n+1 −1,2n −1,...,2n −1)
[α1 (x0 , . . . , xm−1 )]n ∧ [x0 ]n =
(x0 ,...,xm−1 )=(0,...,0) (2n −1,2n −1,...,2n −1)
[α1 (x0 , . . . , xm−1 )]n .
(x0 ,...,xm−1 )=(0,...,0)
mn+1 (2 ) (0) (0) = x1 ⊕ α1 ∧x0 = x1 ⊕1. So, if we consider the next variable x1 : x1 n
mn+(m−1)
(2 ) Using similar arguments, we can prove that xm−1 = x0m−1 n ⊕ 1, that is modulo 2n+1 the period is 2mn+m = 2m(n+1) .
n
In order to use this theorem we need an odd parameter. We know that f (x) = x + r(x) defines a single cycle if r(x) is an even parameter. We also know that every such function can be represented as f (x) = x ⊕ s(x), where s(x) is an odd parameter. So, for any even parameter r(x) the following expression s(x) = (x + r(x)) ⊕ x is an odd parameter. For example, if r(x) = x2 ∨ 5, then s(x) = (x + (x2 ∨ 5)) ⊕ x is an odd parameter. Note that if f (x) is invertible then [r(x)]0 = 1 and so [s(x)]0 = [r(x)]0 = 1. To construct an odd parameter with m variables we need the following lemma: Lemma 1. (2n −1,...,2n −1)
n
2 t=0
α(t) =
α(x0 ∧ · · · ∧ xm−1 )
(x0 ,...,xm−1 )=(0,...,0)
Proof. In order to prove the lemma it is sufficient to prove that for every t there is an odd number of tuples (x0 , . . . , xm−1 ) such that x0 ∧ · · · ∧ xm−1 = t. In fact, we can directly calculate this number: if t contains k zeros then there are (2m − 1)k such tuples, which is an odd number. For example, the following mapping defines a single cycle for any m and n: fi (x0 , . . . , xm−1 ) = xi ⊕ (s(x0 ∧ · · · ∧ xm−1 ) ∧ x0 ∧ · · · ∧ xi−1 ),
(9)
where s(t) = (t+(t ∨5))⊕t. Note that to evaluate this mapping for each particular (x0 , . . . , xm−1 ) we need one squaring, one addition and 3m bitwise operations, or 3m + 2 operations in total. Unfortunately, this mapping has the property that during update each bit of xm−1 is changed with probability 2−(m+1) instead of 1 2 as expected for a random mapping. To avoid this we can add any even parameter7 with zero in the least significant bit, simplify s(t) and obtain, for example, 2
7
Note that in the multivariate case every parameter which does not use all the variables is even.
12
Alexander Klimov and Adi Shamir
the following mapping: ⎛ ⎛ ⎞ x0 x0 ⎜ x1 ⎜ x1 ⎟ ⎜ ⎟→⎜ ⎝ x2 ⎝ x2 ⎠ x3 x3
⎞ ⊕s ⊕ (x21 ∧ M ) ⊕ (s ∧ a0 ) ⊕ (x22 ∧ M ) ⎟ ⎟, ⊕ (s ∧ a1 ) ⊕ (x23 ∧ M ) ⎠ ⊕ (s ∧ a2 ) ⊕ (x20 ∧ M )
(10)
where a0 = x0 , a1 = a0 ∧ x1 , a2 = a1 ∧ x2 , a3 = a2 ∧ x3 , s = (a3 + C) ⊕ a3 , C is any odd constant and M = 1 . . . 11102 or we can enhance the inter-variable mixing with the following mapping: ⎛ ⎞ ⎛ ⎞ x0 ⊕ s ⊕ 2x1 x2 x0 ⎜ ⎟ ⎜ x1 ⎟ ⎜ ⎟ → ⎜ x1 ⊕ (s ∧ a0 ) ⊕ 2x2 x3 ⎟ (11) ⎝ x2 ⊕ (s ∧ a1 ) ⊕ 2x3 x0 ⎠ ⎝ x2 ⎠ x3 x3 ⊕ (s ∧ a2 ) ⊕ 2x0 x1 During the iteration of these mappings each bit is changed in approximately one half of the cases. Each mapping requires 24 operations to obtain a cycle of size 2256 . We next analyse mappings of type (3), which consider multiple words as concatenated parts of a single multi-precision logical variable x, for example, x = 23n xa + 22n xb + 2n xc + xd . In this case our running example x → x + (x2 ∨ C) can be represented as (xa , xb , xc , xd ) → (fa (xa ; xb , xc , xd ), . . . , fd (xd )) with appropriate fa , fb , fc and fd , but the squaring of a 4n-bit word would be rather inefficient on an n-bit processor. Note that there is no requirement that the whole mapping x → f (x) has a simple interpretation as a mapping of a 4n bit word although we need this interpretation to prove the single cycle property. Let us start with the simplest single cycle mapping x → x + 1. It can be implemented as follows: (xa , xb , xc , xd ) → (xa + κb , xb + κc , xc + κd , xd + 1), where κd is the carry (overflow) from xd , κc is the carry from xc , et cetera. Many contemporary processors (including x86 and SPARC) have an operation adc (addition with carry) (x, y, c) → (z, c ), where z = (x + y + c) mod 2n and c = (x + y + c ≥ 2n ), so the addition of κ can be done at no additional cost8 if the mapping has the following form: (xa , xb , xc , xd ) → (xa + fa + κb , xb + fb + κc , xc + fc + κd , xd + fd + 1). The only restriction on each f is that it should be an even parameter. For example, ⎞ ⎛ ⎛ ⎞ xd + (s2d ∨ Cd ) xd 2 ⎟ ⎜ ⎜ xc ⎟ ⎜ ⎟ → ⎜ xc + (sc2 ∨ Cc ) + κd ⎟ (12) ⎝ xb + (sb ∨ Cb ) + κc ⎠ ⎝ xb ⎠ 2 xa xa + (sa ∨ Ca ) + κb , 8
This is not always true. For example, on Intel Pentium 4 processor the ordinary add instruction has the latency (the number of clock cycles that are required for the execution core to complete the execution of all of the µops that form a IA-32 instruction) 0.5 and the throughput (the number of clock cycles required to wait before the issue ports are free to accept the same instruction again) 0.5, but adc has the latency 8 and the throughput 3 [4].
New Cryptographic Primitives Based on Multiword T-Functions
13
where sd = xd , sc = sd ⊕ xc , sb = sc + xb , sa = sb ⊕ xa , Ca , Cb , Cc are odd constants9 and Cd ends with . . . 1 01 12 . This mapping uses 15 operations to obtain a cycle of size 2256 . So, it can be much faster than the previous example but only on processors that have the adc instruction. If this instruction has to be emulated10 this mapping can be slower. Note that almost all high level programming languages do not have such an instruction so it has to be emulated or to be written in assembly language. The number of operations is not usually a good predictor of the speed of the implementation, since on modern microprocessors several operations can be done in parallel, use different number of clocks, etc. We should take into account that an update function is not the complete generator. There is also an output function which produces the output bits given the internal state. Since the least significant bits of the state are repeated in small cycles the simplest output function is the one that gives away the most significant bits of the state. It seems that it is easier to implement such a function for the second mapping since xa and xb form the most significant part of the state, but their least significant bits are also weak since they depend only of the least significant bits of xi and a single bit carry.11 One solution is to give away the most significant part of xa (32 bits) or use a more sophisticated output function and give away more bits thus raising the ratio of the output bits per operation. For example, O1 = ((xa n n 2 ) ⊕ xb )(((xc 2 ) ⊕ xd ) ∨ 1), where could be substituted with circular rotation if the target microprocessor supports it, uses only five operations and doubles the size of the output. It can be shown that O1 has maximal period and produces each possible value with the same probability (since it is an invertible mapping of xb for fixed xa , xc and xd ). Note that we give here only examples of the possible mappings and output functions, but in order to construct a secure stream cipher they have to be subjected to a lengthy and thorough cryptanalysis.
5
Experimental Results
We tested the actual execution speed of our mappings on an IA-32 machine. We used a standard PC with a 1.7 GHz Pentium 4 processor with 256KB of cache. In each experiment we encrypted one gigabyte of data by encrypting 104 times a buffer which is 105 bytes long (this ensured that the data was prefetched into L2 cache). The tests were done on an otherwise idle Linux system. To calculate 9
10
11
Note that the only purpose of Ca , Cb and Cc is to make parameters out of x2· , that is mask the least significant bit. The evenness of the parameter is guaranteed since n from (7) is larger than 3. There are basically two ways to emulate it: check if a + b ≥ a or a + b ≥ b for unsigned a and b or, usually faster since conditions break pipeline, to set n = 63 and use the most significant bit of the sum as κ· . The second approach adds two operation for each addition of κ· and gives 21 operations overall with reduced word size. It is possible to change the definitions of sa ,sb ,sc ,sd to incorporate right shifts or rotations, but this will also increase the calculation time.
14
Alexander Klimov and Adi Shamir
the overhead produced by memory access we “encrypted” the buffer by xoring it with a 32 bit constant. It took 0.42 seconds, and we did not subtract it from our actual results. In this paper we propose a general approach rather than a concrete stream cipher, and thus in our performance tests we experimented with many different state sizes, update mappings and output functions. As a typical example, we used 256 bit states defined as four words of length n = 64 and updated them by (11). We wanted to use a simple output function which is just the top half of each xi , but we discovered that this variant is vulnerable to a (theoretical) attack in which the attacker guesses the 17 least significant bits of each xi (i.e., a total of 68 guessed bits) and searches in 234 bytes of data for places where two of the xi end with 17 zeroes and thus their product ends with 34 zeros. To protect against this attack, we slightly modified the state update function to: ⎛ ⎞ ⎞ ⎛ ⊕ (2(x1 ∨ C1 )x2 ) x0 x0 ⊕ s ⎜ x1 ⎟ ⎟ ⎜ ⎜ ⎟ → ⎜ x1 ⊕ (s ∧ a0 ) ⊕ (2x2 (x3 ∨ C3 )) ⎟ , (13) ⎝ x2 ⎠ ⎝ x2 ⊕ (s ∧ a1 ) ⊕ (2(x3 ∨ C3 )x0 ) ⎠ x3 x3 ⊕ (s ∧ a2 ) ⊕ (2x0 (x1 ∨ C1 )) where C1 and C2 are constants with several ones in the least significant half, for example, C1 = 1248124816 and C3 = 4812481216 . To take advantage of the SSE2 instruction set of Pentium 4 processors (which contains 128-bit integer instructions that allow us to operate on two 64-bit integers simultaneously), we run two generators ((x0 , x1 , x2 , x3 ) and (x0 , x1 , x2 , x3 )) in parallel and the output is generated by (x0 32) ⊕ x1 , (x2 32) ⊕ x3 , (x0 32) ⊕ x1 and (x2 32) ⊕ x3 . Since each command operates on a pair (xi , xi ) we can run two generators instead of one at no additional cost. With this optimization, the complete encryption operation required only 1.56 seconds per gigabyte, and thus the experimentally verified encryption speed was approximately 5.13 gigabits/second. To put our results in perspective, the reader should consider the RC4 stream cipher, which is one of the fastest software oriented ciphers available today. The web site of Crypto++ contains benchmarks for highly optimized implementations of many well known cryptographic algorithms [3]. For RC4 it quotes a speed of 110 megabyte per second on a 2.1 GHz Pentium 4 processor. This can be scaled to 1000/110 × 1.7/2.1 = 11.2 seconds required to encrypt a gigabyte of data on a 1.7GHz processor, which is almost an order of magnitude slower than the speed we obtain with our approach.
References [1] V. Anashin, “Uniformly Distributed Sequences of p-adic integers, II”. Available from http://www.arxiv.org/ps/math.NT/0209407. 8 [2] V. Anashin, private communication. 9 [3] Crypto++ 5.1 Benchmarks: http://www.eskimo.com/∼weidai/benchmarks.html. 14 [4] “IA-32 Intel Architecture Optimization Reference Manual”. Available from http://www.intel.com/design/pentium4/manuals/248966.htm. 12
New Cryptographic Primitives Based on Multiword T-Functions
15
[5] A. Klimov and A. Shamir, “A New Class of Invertible Mappings”, CHES 2002. 1, 2, 3, 5, 9 [6] A. Klimov and A. Shamir, “Cryptographic Applications of T-functions”, SAC 2003. 1, 2, 6, 7
Towards a Unifying View of Block Cipher Cryptanalysis David Wagner University of California, Berkeley
Abstract. We introduce commutative diagram cryptanalysis, a framework for expressing certain kinds of attacks on product ciphers. We show that many familiar attacks, including linear cryptanalysis, differential cryptanalysis, differential-linear cryptanalysis, mod n attacks, truncated differential cryptanalysis, impossible differential cryptanalysis, higherorder differential cryptanalysis, and interpolation attacks can be expressed within this framework. Thus, we show that commutative diagram attacks provide a unifying view into the field of block cipher cryptanalysis. Then, we use the language of commutative diagram cryptanalysis to compare the power of many previously known attacks. Finally, we introduce two new attacks, generalized truncated differential cryptanalysis and bivariate interpolation, and we show how these new techniques generalize and unify many previous attack methods.
1
Introduction
How do we tell if a block cipher is secure? How do we design good ciphers? These two questions are central to the study of block ciphers, and yet, after decades of research, definitive answers remain elusive. For the moment, the art of cipher evaluation boils down to two key tasks: we strive to identify as many novel cryptanalytic attacks on block ciphers as we can, and we evaluate new designs by how well they resist known attacks. The research community has been very successful at this task. We have accumulated a large variety of different attack techniques: differential cryptanalysis, linear cryptanalysis, differential-linear attacks, truncated differential cryptanalysis, higher-order differentials, impossible differentials, mod n attacks, integrals, boomerangs, sliding, interpolation, the yo-yo game, and so on. The list continues to grow. Yet, how do we make sense of this list? Are there any common threads tying these different attacks together? In this paper, we seek unifying themes that can put these attacks on a common foundation. We have by no means accomplished such an ambitious goal; rather, this paper is intended as a first step in that direction. In this paper, we show how a small set of ideas can be used to generate many of today’s known attacks. Then, we show how this viewpoint allows us to compare the strength of different types of attacks, and possibly to discover new attack techniques. We
This work is supposed by NSF ITR CNS-0113941.
B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 16–33, 2004. c International Association for Cryptologic Research 2004
Towards a Unifying View of Block Cipher Cryptanalysis
A
17
g X i
f
?
B
h - ? Y
Fig. 1. An example of a commutative diagram. The intended meaning of this diagram is that h ◦ f = i ◦ g
hope this perspective will be of some interest, if only to see a different way to think about the known cryptanalytic attacks on block ciphers.
2
Background
What is a block cipher? A block cipher is a map E : K × M → M so that Ek is invertible for all keys k ∈ K, and both Ek and Ek−1 can be efficiently computed. The set M is the space of texts; for instance, for AES, it is M = {0, 1}128 . When is a block cipher secure? A block cipher is secure if it behaves as a pseudorandom permutation. In other words, it must be secure against distinguishing attacks: no efficient algorithm A given interactive access to encryption and decryption black boxes should be to distinguish the real cipher (i.e., Ek and Ek−1 ) from a truly random permutation (i.e., π and π −1 , where π is uniformly distributed on the set of all permutations on M) with non-negligible advantage. −1 The distinguishing advantage of an attack A is given by Adv A = Pr[AEk ,Ek = −1 1] − Pr[Aπ,π = 1]. In this paper, we focus exclusively on distinguishing attacks. Usually, once a distinguishing attack is found, a key-recovery attack soon follows; the hard part is in finding a distinguishing attack in the first place, or in building a cipher secure against distinguishing attacks. How are block ciphers built? Most block ciphers are product ciphers. In other words, the cipher is built as the composition of individual round transformations: we choose a round function f : M → M, compute a sequence of round keys k1 , . . . , kn as a function of the key k, and set Ek = fkn ◦ · · · ◦ fk1 . The function f computes one round of the cipher. Commutative Diagrams. In the discussion to follow, it will be useful to introduce a concept from abstract algebra: that of commutative diagrams. Commutative diagrams are a concise notation for expressing functional composition properties. An example of a commutative diagram can be found in Fig. 1. In this example, the symbols A, B, X, Y represent sets, and the symbols f, g, h, i are functions with signatures f : A → B, g : A → X, h : B → Y , and i : X → Y . We say that the diagram “commutes” if h ◦ f = i ◦ g, or in other words, if
18
David Wagner
h(f (a)) = i(g(a)) for all a ∈ A. Notice how paths correspond to functions, obtained by composing the maps associated with each edge in the path. In this diagram, there are two paths from A to Y , corresponding to two functions with signature A → Y . Informally, the idea is that it doesn’t matter which path we follow from A to Y ; we will obtain the same map either way. More complicated diagrams can be used to express more complex relationships, and identifying the set of implied identities is merely a matter of chasing arrows through the diagram. Markov Processes. Also, we recall the notion of Markov processes. A Markov process is a pair of random variables I, J, and to it we associate a transition matrix M given by Mi,j = Pr[J = j|I = i]. We call the sequence of random variables I −J −K a Markov chain if K is conditionally independent of I given J. We can associate transition matrices M, M , M to the Markov processes I − J, J − K, and I − K, respectively, and if I − J − K forms a Markov chain, we will have M = M ·M . In other words, composition of Markov processes corresponds to multiplication of their associated transition matrices. The maximum advantage of an adversary at distinguishing one Markov process from another can be calculated using decorrelation theory [14]. Let ||M ||∞ denote the ∞ norm of the matrix M , i.e., ||M ||∞ = maxi j |Mi,j |. We can consider an adversary A who is allowed to choose a single input, feed it through the Markov process, and observe the corresponding output. The maximum advantage of any such adversary at distinguishing M from M will then be exactly 1 2 ||M − M ||∞ . If U denotes the uniform m × n transition matrix, i.e., Ui,j = 1/n for all i, j, then ||M1 M2 − U ||∞ ≤ ||M1 − U ||∞ · ||M2 − U ||∞ . The above calculations can be extended to calculate the advantage Adv A = Pr[AM = 1] − Pr[AM = 1] of an adversary A who can interact repeatedly with the Markov process. First, if 12 ||M − M ||∞ = denotes the advantage of a single-query adversary, then an adversary making q queries has advantage at most q · . In practice, when is small, the advantage of a q-query adversary √ often scales roughly as ∼ q · . Hence, as a rough rule of thumb, Θ(1/2 ) queries often are necessary and sufficient to distinguish M from M with nontrivial probability [3]. We emphasize, though, that this heuristic is not always valid; there are many important exceptions. The advantage of a q-query adversary can be computed more precisely. If M is q = Mi1 ,j1 × a m×n matrix, let [M ]q denote the mq ×nq matrix given by ([M ] )i,j · · · × Miq ,jq . Define the matrix norm ||M ||a = maxi1 j1 · · · maxiq jq |Mi,j |. Then, in an adaptive attack, the maximum advantage of any q-query adversary is exactly maxA Adv A = 12 ||[M ]q − [M ]q ||a . In a non-adaptive attack, the maximum advantage of any q-query non-adaptive adversary is exactly 12 ||[M ]q − [M ]q ||∞ . Organization. The rest of this paper studies cryptanalysis of product ciphers. First, we describe commutative diagrams and their relevance to cryptanalysis.
Towards a Unifying View of Block Cipher Cryptanalysis
M
19
ρ Y
fk
g
?
M
?
ρ - Y
Fig. 2. A local property of the round function fk
Then, we explore statistical attacks, a probabilistic generalization of commutative diagram attacks, and then we further generalize by introducing the notion of higher-order attacks. Finally, we explore algebraic attacks.
3
Commutative Diagram Attacks
The basic recipe for analyzing a product cipher is simple: 1. Identify local properties of the cipher’s round functions. 2. Piece these together to obtain a global property of the cipher as a whole. In this way, we seek to exploit the structure of a product cipher—namely, its construction as a composition of round functions—to simplify the cryptanalyst’s task. How do we identify local properties of the round function that are both non-trivial and can be spliced together suitably? This is where commutative diagrams can help. Let fk : M → M denote a round function. Suppose we can find a property of the input that is preserved by the round function; then this would suffice as a local property. If there is some partial information about x that allows to predict part of the value of fk (x), this indicates a pattern of non-randomness in the round function that might be exploitable. One way to formalize this is using projections. A projection is a function ρ : M → Y from the text space to a smaller set Y . If we have two projections ρ, ρ so that ρ (fk (x)) can be predicted from ρ(x), then we have a local property of the round function. To make this more precise, we look for projections ρ : M → Y , ρ : M → Y and a function g : Y → Y so that ρ ◦ fk = g ◦ ρ for all k ∈ K, or in other words, so that the diagram in Fig. 2 commutes for all k. A commutative diagram is trivial if it remains satisfied if we replace fk by any random permutation π. Each non-trivial commutative diagram for f is an interesting local property of the round function. Such local properties can be pieced together to obtain global properties by exploiting the compositional behavior of commutative diagrams. Refer to Fig. 3. If both small squares commute (i.e., if ρ ◦ fk1 = g ◦ ρ and ρ ◦ fk2 = g ◦ ρ ), then whole diagram commutes (e.g., ρ ◦ fk2 ◦ fk1 = g ◦ g ◦ ρ). In other words, if ρ, ρ form a local property of the first round fk1 and if ρ , ρ form a local
20
David Wagner
M fk1
ρ Y g
?
M
?
ρ - Y
g
fk2
?
M
?
ρ - Y
Fig. 3. Splicing together a local property for fk1 and a local property for fk2 to obtain a property of their composition
property of the second round fk1 , then ρ, ρ form a global property of the first two rounds fk2 ◦ fk1 . Note that the requirement is that we have a local property of each round, and that the local properties match up appropriately. If ρ, ρ is a local property of the first round and ϕ, ϕ is a local property of the second round, these two can only be composed if ρ = ϕ. Thus there is a compatibility requirement that must be satisfied before two local properties can be composed. The same kind of reasoning can be extended inductively to obtain a global property of the cipher as a whole. The requirement is that we obtain local properties for the rounds that are compatible. See Fig. 4. If each local property is non-trivial, then the global property so obtained will be non-trivial. Any non-trivial global property for the cipher as a whole immediately leads to a distinguishing attack. Suppose we have ρ, ρ , g so that ρ ◦Ek = g ◦ρ holds for all k. Then our distinguishing attack is straightforward: we obtain a few known? plaintext/ciphertext pairs (xi , yi ) and we check whether ρ (yi ) = g(ρ(xi )) holds for all of them. When the known texts are obtained from the real cipher (Ek ), these equalities will always hold. However, since our property is non-trivial, the equalities will not always hold if the pairs (xi , yi ) were obtained from an ideal cipher (π, a random permutation). The distinguishing advantage of such an attack depends on the details of the projections chosen, but we can typically expect to obtain a significant attack. Example: Madryga. As a concrete example of a commutative diagram attack, let us examine Madryga, an early cipher design. Eli Biham discovered that the Madryga round function preserves the parity of its input. Hence, we may choose the parity function as our projection ρ : {0, 1}64 → {0, 1}, i.e., ρ(x1 , . . . , x64 ) = x1 ⊕ · · · ⊕ x64 . When taken with the identity function, we obtain a global property for Madryga, as depicted in Fig. 5. This yields a dis-
Towards a Unifying View of Block Cipher Cryptanalysis
M
21
ρ0 Y0
fk1
g1
? ρ1 ? - Y1 M .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .? .? ρn−1 M Yn−1 gn
fkn
?
M
ρn - ? Yn
Fig. 4. Splicing together local properties for each round to obtain a global property for Ek , the cipher as a whole
{0, 1}64
ρ-
Ek
{0, 1} id
?
{0, 1}64
ρ-
?
{0, 1}
Fig. 5. A global property of the Madryga block cipher. Here ρ is the parity function
tinguishing attack on Madryga with advantage 1/2, as Pr[ρ(Ek (x)) = ρ(x)] = 1 yet Pr[ρ(π(x)) = ρ(x)] = 1/2. Commutative diagrams are mathematically elegant. However, they are not, on their own, powerful enough to successfully attack many ciphers; another idea is needed. We shall describe next how these ideas may be extended to model statistical attacks, which turn out to be significantly more powerful.
4
Statistical Attacks
We now turn our attention to statistical attacks. The natural idea is to look at diagrams that only commute with some probability. A reasonable first attempt might be to introduce the notion of probabilistic commutative diagrams. See Fig. 6, which is intended to show a diagram that commutes with probability p. In other words, though the relation ρ ◦ fk = g ◦ ρ
22
David Wagner
M fk
ρ Y prob. p
?
M
g
?
ρ - Y
Fig. 6. A local property that holds with probability p
does not hold, we do have Pr [ρ (fk (X)) = g(ρ(X))] = p
X←M
for all k ∈ K, where here the probability is taken with respect to the choice of X uniformly at random from M. Written informally: ρ ◦ fk = g ◦ ρ holds with probability p. (We could easily imagine many variants of this definition, for instance, by taking the probability over both the choice of X and k; however, we will not pursue such possibilities here.) Probabilistic commutative diagrams share many useful properties with their deterministic cousins. First, probabilistic commutative diagrams can be composed. Suppose projections ρ, ρ form a local property with prob. p for the first round, and ρ , ρ form a local property with prob. p for the second round. Then ρ, ρ form a property for the composition of the first two rounds, and assuming our cipher is a Markov cipher [9], the composed property holds with prob. at least p · p . Second, probabilistic commutative diagrams for the whole cipher usually lead to distinguishing attacks. Suppose ρ ◦ Ek = g ◦ ρ holds with probability p, and ρ ◦ π = g ◦ ρ holds with probability q. Then there is a simple distinguishing attack that uses one known-plaintext/ciphertext pair (x, y) and has advantage ?
|p − q|: we simply check whether ρ (y) = g(ρ(x)). Probabilistic commutative diagrams may appear fairly natural on first glance, but on further inspection, they seem to be lacking in some important respects. For our purposes, it will be useful to introduce a more general notion, which we term stochastic commutative diagrams. If we let the random variable X be uniformly distributed on M, the maps ρ, ρ , Ek induce a Markov process on the pair of random variables ρ(x), ρ (Ek (X)). The associated transition matrix M is given by Mi,j = Pr [ρ (Ek (X)) = j|ρ(X) = i]. X←M
Note that there is an implicit dependence on k, but for simplicity in this paper we will only consider the case where each key k ∈ K yields the same transition matrix M . (This corresponds to assuming that the Hypothesis of Stochastic Equivalence holds.) An example stochastic commutative diagram is shown pictorially in Fig. 7. Stochastic commutative diagrams yield distinguishing attacks. Let M be the transition matrix induced by the Markov process ρ(X), ρ (π(X)), where π is
Towards a Unifying View of Block Cipher Cryptanalysis
ρ Y
M Ek
23
stochastic
?
M
M
?
ρ - Y
Fig. 7. A stochastic commutative diagram for Ek . Here M is the transition matrix of the Markov process ρ(X) − ρ (Ek (X))
{0, 1} Ek
ρ{0, 1} stochastic
?
{0, 1}
ρ -
M
?
{0, 1}
Fig. 8. A formulation of linear cryptanalysis as a stochastic commutative diagram attack. Here ρ and ρ are linear maps
a random permutation. Then our stochastic commutative diagram yields a distinguishing attack that uses one chosen plaintext query and achieves advantage 1 2 ||M − M ||∞ . Linear Cryptanalysis. Linear cryptanalysis may now be recognized as a special case of a stochastic commutative diagram attack. Suppose we have a -bit block cipher Ek : {0, 1} → {0, 1}. In Matsui’s linear cryptanalysis [10], the codebreaker somehow selects a pair of linear maps ρ, ρ : {0, 1} → {0, 1}, and then we use the stochastic commutative diagram shown in Fig. 8. For instance, the linear characteristic Γ → Γ corresponds to the projections ρ(x) = Γ · x, ρ (x) = Γ · x. In a linear attack, we obtain a 2 × 2 transition matrix M of the form 1 1 + − M = 12 2 12 2 . (1) − 2 2 2 + 2 The transition matrix associated to a random permutation π is U , the 2 × 2 matrix where all entries are 1/2. Therefore, the distinguishing advantage of a linear cryptanalysis attack using one known text is 12 ||M − U ||∞ = /2. It is not hard to verify that Θ(1/2 ) known texts suffice to obtain an attack with distinguishing advantage 1/2 (say). Compare to Matsui’s rule of thumb, which says that 8/(/2)2 = 32/2 texts suffice. Matsui’s piling-up lemma can also be re-derived within this framework. Let M1 , M2 be transition matrices for the first and second round, respectively,
24
David Wagner
taking the form shown in Equation (1) albeit with replaced by 1 , 2 . It is not hard to verify that M1 M2 also takes the form shown in Equation (1), but with replaced by 1 2 . Hence ||M1 M2 − U ||∞ = ||M1 − U ||∞ × ||M2 − U ||∞ . This is exactly the piling-up lemma for computing the bias of a multi-round linear characteristic given the bias of the characteristic for each round. After seeing this formulation of linear cryptanalysis, our use of the name “projection” to describe the maps ρ, ρ can be justified as follows. Consider the vector subspace V = {0, Γ } of {0, 1}. We obtain a canonical isomorphism of vector spaces V ∼ = {0, 1}. Then we can view ρ : {0, 1} → V as taking the form of a projection onto the subspace V. In other words, we write {0, 1} as the direct sum {0, 1} = V ⊕ V T , write each x ∈ {0, 1} as a sum x = y ⊕ z for y ∈ V and z ∈ V T , and then let ρ(x) = y be the projection of x onto V. Mod n Cryptanalysis. Notice that mod n attacks also fall within this framework. If M = Z/2 Z, we can use the projection ρ : Z/2 Z → Z/nZ given by ρ(x) = x mod n. In this way we recover the mod n attack. In general, if M is any abelian group with subgroup S ⊆ M, we may consider projections of the form ρ : M → M/S given by ρ(x) = x mod S. Linear cryptanalysis is simply the special case where M = ({0, 1}, ⊕) and S has index 2, and mod n cryptanalysis is the special case where M = (Z, +) and S = nZ. Linear Cryptanalysis with Multiple Approximations. Linear cryptanalysis with multiple approximations also falls naturally within this framework. Suppose we have a list of masks Γ1 , Γ2 , . . . , Γd ∈ {0, 1}, and assume that these masks are linearly independent as vectors in {0, 1}. Let V be the vector subspace of dimension d spanned by Γ1 , . . . , Γd , i.e., V = {0, Γ1 , Γ2 , Γ1 ⊕ Γ2 , . . . }. Choose the canonical isomorphism V ∼ = {0, 1}d, i.e., i ci Γi → (c1 , . . . , cd ). We can define the projection ρ : {0, 1} → {0, 1}d by ρ(x) = (Γ1 · x, . . . , Γd · x), or equivalently, as the projection ρ : {0, 1} → V from {0, 1} onto the subspace V. We can build ρ from Γ1 , . . . , Γd similarly. Then, we consider the Markov process M induced by Ek , ρ, ρ . The distinguishing advantage of this attack is given by 12 ||M − M ||∞ , as before, except that now we are working with 2d × 2d matrices. Notice that these d-bit projections ρ, ρ simultaneously capture all 22d linear approximations of the form Γ → Γ for some Γ ∈ V, Γ ∈ V , so this is a fairly powerful attack.
5
Higher-Order Attacks
The next idea is to examine plaintexts two (or more) at a time. If f : M → M is any function, let fˆ : M2 → M2 be defined by fˆ(x, x ) = (f (x), f (x )). More generally, we can take fˆ : Md → Md and fˆ(x1 , . . . , xd ) = (f (x1 ), . . . , f (xd )) for any fixed d; this is known as a d-th order attack. Then, to distinguish Ek from π, the idea is to use stochastic commutative diagrams that separate Eˆk from π ˆ.
Towards a Unifying View of Block Cipher Cryptanalysis
M×M Eˆk
ρ M stochastic
?
M×M
25
M
ρ - ? M
Fig. 9. The basis of differential-style attacks, as a commutative diagram. We use the projection ρ(x, x ) = x − x
Complementation Properties. Complementation attacks form a simple instance of a higher-order attack. Suppose there is some ∆ so that Ek (x ⊕ ∆) = Ek (x) ⊕ ∆. Then we can define ρ : M × M → M by ρ(x, x ) = x − x , and we obtain the diagram shown in Fig. 9. Note that in this case the existence of the complementation property implies that M∆,∆ = 1. If M denotes the transition = 1/(|M| − 1) for matrix induced by an ideal cipher (namely, π), then M∆,j each j = 0, and so ||M − M ||∞ ≥ 1 − 1/(|M| − 1) + (|M| − 2)/(|M| − 1) = 2 − 2/(|M| − 1). In other words, there is an attack using only 2 chosen plaintexts and achieving distinguishing advantage 1 − 1/(|M| − 1). Differential Cryptanalysis. A natural extension is to generalize the above attack by looking for some matrix element M∆,∆ with surprisingly large probability, rather than looking for a matrix element with probability 1. Indeed, such a modification yields exactly Biham & Shamir’s differential cryptanalysis [2], and any large matrix element M∆,∆ gives us a differential ∆ → ∆ with probability p = M∆,∆ . Notice that when p 1/(|M| − 1), we have 1 2 ||M − M ||∞ ≥ p − 1/(|M| − 1), hence with 2 chosen plaintexts we obtain an attack with advantage ≈ p. One can readily verify that with 2m chosen plaintexts, the iterated attack has advantage 1 − (1 − 1/p)m − (1 − 1/(|M| − 1))m , which is ≈ 1−1/e for m = 1/p. Thus, 2/p chosen plaintexts suffice to distinguish with good advantage. Also, our framework easily models differential cryptanalysis with respect to other groups. For instance, when using additive differentials in the group M = (Z/264 Z, ), we can choose the projection ρ : M × M → M given by ρ(x, x ) = x − x mod 264 . Impossible Differential Cryptanalysis. Alternatively, we could look for matrix elements in M that are surprisingly small. If we look for entries that have probability 0, say M∆,∆ = 0, then we obtain the impossible differential attack. In this case the differential ∆ → ∆ will be impossible (it can never happen for the real cipher Ek ). This yields an attack that can distinguish with good advantage once about |M| texts are available to the attacker.
26
David Wagner
Differential-Linear Cryptanalysis. Write Ek = f ◦ f . In a differentiallinear attack [5], one covers the first half of the cipher (f ) with a differential characteristic and approximates the second half of the cipher (f ) with a linear characteristic. This can be modeled within our framework as follows. Our development so far suggests we should use the projections ρ, ρ : M2 → M, ρ(x, x ) = ρ (x, x ) = x − x to cover f and projections η, η : M → {0, 1} to cover f . However, in this case, we will not be able to match η up with ρ , because neither their domains nor their ranges agree. The solution is to introduce functions ηˆ, ηˆ : M2 → {0, 1} given by ηˆ(x, x ) = η(x − x ) and ηˆ (x, x ) = η (x − x ). Suppose our differential characteristic has probability p 1/|M|, or in other words, ρ, ρ commute with f with probability p. Then ρ, ηˆ will usually commute with fˆ with non-trivial probability (heuristically, about 12 + p2 , though this is not guaranteed). Likewise, suppose our linear characteristic holds with probability 12 ± 2 , or in other words, η, η commute with f with probability 12 ± 2 . Then ηˆ, ηˆ will form a linear approxi2 mation for fˆ with probability 12 ± 2 . These two properties can be composed to 2
obtain a property ρ, ηˆ for the whole cipher, typically with probability 12 ± p2 . Hence a differential-linear attack with two chosen texts will typically have distinguishing advantage roughly 12 p2 . Consequently, our framework predicts that such a cipher can be broken with Θ(1/p2 ) chosen texts. This corresponds closely to the classical estimate [1].
Higher-Order Differential Cryptanalysis. Higher-order differentials [7] can also be modeled within our framework. Let us give a simple example. If f (X) is a polynomial of degree 2, then f (X + ∆0 + ∆1 ) − f (X + ∆0 ) + f (X + ∆1 ) − f (X) is a constant polynomial, giving a way to distinguish f from random with 4 chosen plaintexts. This corresponds to choosing 4th order projections ρ(w, x, y, z) = (x − w, y − w, z − x − y + w) and ρ (w, x, y, z) = z − x + y − w, deriving a transition matrix M , and noticing that we have a matrix entry M(∆0 ,∆1 ,0),∆ whose value is 1 for the real cipher but much smaller for a random permutation. More generally, if f is a polynomial of degree d, then the dth order differential of f is a constant, and the d + 1-th order differential is zero. Such higher order differential attacks can likewise be expressed be expressed as a higher-order commutative diagram attack. Truncated Differential Cryptanalysis. Truncated differential attacks [8] also fit within our framework. Given a block cipher Ek : {0, 1} → {0, 1}, we choose projections of the form ρ : {0, 1} × {0, 1} → {0, 1}m, where ρ(x, x ) = ϕ(x − x ) and ϕ : {0, 1} → {0, 1}m is an appropriately chosen linear map. Then we can look for an entry M∆,∆ in the transition matrix so obtained that has surprisingly large probability, and this will correspond to a truncated differential ∆ → ∆ of the same probability. This truncated differential corresponds to the class of 22−2m conventional differentials δ → δ , where δ ∈ ϕ−1 (∆) and δ ∈ ϕ−1 (∆ ). Alternately, we can look for an entry with surprisingly low
Towards a Unifying View of Block Cipher Cryptanalysis
27
probability, and this yields an impossible (or improbable) truncated differential that can be used in an attack. In most truncated differential attacks, the linear map ϕ simply ignores part of the block. For instance, the truncated difference (a, 0, 0, b) (where a, b are arbitrary differences) might correspond to the linear map ϕ(w, x, y, z) = (x, y) and the projected value ∆ = (0, 0). However, for maximum generality, we allow ϕ to be chosen as any linear map whatsoever. The above account of truncated differentials is slightly naive. It leads to very large matrices, because a truncated difference of the form (say) (a, 1, 2, b) is distinguished from the truncated difference (a, 1, 3, b). However, in practice it is more common for cryptanalysts to care only about distinguishing between zero and non-zero words, with little reason to make any distinction between the different non-zero values. Fortunately, our treatment can be amended to better incorporate typical cryptanalytic practice, as follows. Consider, as a concrete example, truncated differential attacks on Skipjack, where the block is M = {0, 1}64 and where attacks typically look at which of the four 16-bit words of the difference are zero or not. Consider the following 67 vector subspaces of {0, 1}64: {0}, {(a, 0, 0, 0) : a ∈ {0, 1}16}, {(0, a, 0, 0) : a ∈ {0, 1}16 }, . . . , {(a, a, 0, 0) : a ∈ {0, 1}16}, {(a, 0, a, 0) : a ∈ {0, 1}16 }, . . . , {(a, b, 0, 0) : a, b ∈ {0, 1}16}, {(a, 0, b, 0) : a, b ∈ {0, 1}16}, . . . , {(a, b, c, d) : a, b, c, d ∈ {0, 1}16 }. These can be put into one-to-one correspondence with the 67 vector subspaces of {0, 1}4 in a natural way. Moreover, to any block x ∈ {0, 1}64 we can associate its characteristic vector (χ1 , . . . , χ67 ), where χi is 1 if x is in the i-th subspace and 0 otherwise. This induces an equivalence relation ∼ on {0, 1}64, where two blocks are considered equivalent if they have the same characteristic vector. We can now consider the projection ρ : {0, 1}64 → {0, 1}64/ ∼ that maps x to its equivalence class under ∼. In this way we obtain a 67 × 67 transition matrix M that captures the probability of all 672 word-wise truncated differentials for Skipjack [11]. A similar construction can be used for ciphers of other word and block lengths. This leads to smaller transition matrices and a more satisfying theory of truncated differential cryptanalysis. Generalized Truncated Differential Cryptanalysis. Armed with these ideas, we can now propose a new attack not previously seen in the literature. We retain the basic set-up from truncated differential attacks, but we replace the probabilistic commutative diagrams with stochastic commutative diagrams. In other words, instead of looking for a single entry in the transition matrix with unusually large (or small) probability, we use the matrix norm 12 ||M − M ||∞ . This approach allows us to exploit many small biases spread throughout the matrix M , rather than being confined to only taking advantage of one bias and ignoring the rest. This may look like a very small tweak; however, it contributes considerable power to the attack. As we shall see, it subsumes all the previous attacks as special cases of generalized truncated differentials. The ability to unify many existing
28
David Wagner
attacks, and to generate new attacks, using such a simple and natural-looking extension to prior work is one of the most striking features of our framework. Generalized truncated differentials generalize conventional differential and impossible differential attacks. If there is a single entry in M with unusually large (or small) probability, then the matrix norm 12 ||M − M ||∞ will also be large. Hence, the existence of a conventional differential or impossible differential attack with advantage implies the existence of a generalized truncated attack with the same advantage . Likewise, linear cryptanalysis can also be viewed as a special case of generalized truncated differentials. As we argued before, if η, η : {0, 1} → {0, 1} form a linear characteristic with probability 12 ± 2 , then ηˆ, ηˆ given by ηˆ(x, x ) = 2 η(x − x ), etc., has probability 12 ± 2 . It may appear that we have diluted the power of the attack, because the distinguishing advantage has decreased from /2 (for one known text in a linear attack) to 2 /2 (for one pair of texts in a generalized truncated attack). However, this is offset by an increase in the number of pairs of texts available in a generalized truncated attack: given a pool of n known texts, one can form n2 pairs of texts. These two factors turn out to counterbalance each other. If k out of n texts follow η, η , then k 2 + (n − k)2 out of n2 pairs follow ηˆ, ηˆ . Note that h(k) = k 2 + (n − k)2 is a strictly increasing function of k, for k ≥ n/2. Hence for any threshold T in a linear attack, there is a corresponding threshold h(T ) that makes the generalized truncated differential attack work with the same number of known texts and roughly the same distinguishing advantage. Consequently, the existence of a linear attack implies the existence of a generalized truncated attack with about the same advantage. Differential-linear attacks are also subsumed by generalized truncated differentials. Because both a differential and a linear characteristic can be viewed as a generalized truncated differential, they can be concatenated. In a differentiallinear attack, the transition is abrupt and binary, but generalized truncated attacks allow to consider other attacks, for instance with a gradual transition between differential- and linear-style analysis, or with hybrids partway between differential and linear attacks. Intuitively, the power of generalized truncated differential attacks comes from the extra degrees of freedom available to the cryptanalyst. In a differential attack, the cryptanalyst can freely choose which matrix entry M∆,∆ to focus on, but has little control over ρ, ρ . In a linear attack, the cryptanalyst can freely choose ρ, ρ in some clever way, but has no choice over the matrix M . A generalized truncated differential attack allows the cryptanalyst to control both aspects of the attack at the same time.
6
Algebraic Attacks
One noticeable trend over the past few decades is that more and more block cipher designs have come to incorporate algebraic structure. For instance, the AES S-box is based on inversion in the field GF (28 ). Yet this brings an opportunity
Towards a Unifying View of Block Cipher Cryptanalysis
F
29
id F qk1
fk1
?
F .. .. .. .. .. .. .. .. .? F
id - ? F .. .. .. .. .. .. .. .. . id - ? F qkn
E kn
?
F
id - ? F
Fig. 10. The basis of an interpolation attack. Here each qki (X) is a polynomial over the field F
for attacks that exploit this structure, and a rich variety of algebraic cryptanalytic methods have been devised: interpolation attacks, rational interpolation, probabilistic interpolation, and so on. Interpolation Attacks. The basic interpolation attack [7] is easy to understand. We express each round fki as a polynomial qki (X) ∈ F[X] over some field F, and in this way we obtain the commutative diagram shown in Fig. 10. Notice that composition of commutative diagrams allows to express the whole cipher as a polynomial qk (X) = qkn (· · · (qk2 (qk1 (X))) · · · ). The polynomial qk (X) may depend on the key k in some possibly complex way, hence the attacker usually will not know qk (X) a priori. Consequently, a distinguishing attack based on this property must work a little differently. The standard interpolation attack exploits the fact that d + 1 points suffice to uniquely determine a polynomial of degree d. Given deg qk (X) + 1 known plaintext/ciphertext pairs for Ek , we can reconstruct the polynomial qk (X) using Lagrange interpolation (for instance), and then check one or two additional known texts for consistency with the recovered polynomial. This allows to distinguish any cipher with this property from a random permutation. The idea can be generalized in many ways. We need not restrict ourself to univariate polynomials; we can generalize to multivariate polynomials q(X), where X = (X1 , . . . , Xm ) represents a vector of m unknowns, and where q(X) = (q1 (X), . . . , qm (X)) represents a vector of m multivariate polynomials. In this case, the number of texts needed corresponds to the number of coefficients of qk not known to be zero.
30
David Wagner
-
Ek
bk-
? Fig. 11. The basis of a bivariate interpolation attack. Here bk (X, Y ) represents a bivariate polynomial over F, and the intended interpretation of the diagram is that bk (x, Ek (x)) = 0 for all x ∈ F
Also, one can naturally derive statistical versions of interpolation cryptanalysis by replacing the commutative diagram in Fig. 10 with a probabilistic commutative diagram with some probability p. Then noisy polynomial reconstruction techniques (e.g., list decoding of Reed-Solomon codes) will allow us to mount a distinguishing attack. We can also use meet-in-the-middle techniques, using the polynomial qk to cover the first half of the cipher and qk to cover the remaining rounds. Notice how these generalizations come naturally under the commutative diagram framework. Rational Interpolation Attacks. Another generalization is the notion of rational interpolation attacks. If q(X), q (X) are any two polynomials with no common factor and where q (X) is not the zero polynomial, then r(X) = q(X)/q (X) is called a rational polynomial. It is then natural to consider the variant on the commutative diagram in Fig. 10 where each polynomial qki is replaced by a rational polynomial rki . Note that rational polynomials are closed under composition, hence we can derive a global approximation for the whole cipher from rational approximations of the individual round functions. If we can express the cipher as a rational polynomial Ek (x) = qk (x)/qk (x), and if we have a supply of known plaintext/ciphertext pairs (xi , yi ), then we obtain the equations yi · qk (xi ) = qk (xi ). Linear algebra reveals the rational polynomial qk (X)/qk (X), which gives us a distinguishing attack on the cipher. Bivariate Interpolation. The notion of interpolation and rational interpolation can be generalized to obtain what might be called a bivariate interpolation attack. The idea is to seek a family of bivariate polynomials bk (X, Y ) ∈ F[X, Y ] so that bk (x, fk (x)) = 0 for all x ∈ M and all k ∈ K. This gives a local property of the round function f . Local bivariate properties can be composed to obtain a global property for the whole cipher. Suppose the first round satisfies a bivariate relation b(x, f (x)) = 0
Towards a Unifying View of Block Cipher Cryptanalysis
31
and the second round satisfies b (y, f (y)) = 0. Define b (X, Z) = ResY (b(X, Y ), b (Y, Z)). Then the composition of the two rounds will satisfy the relation b (x, f (f (x))) = 0 for all x ∈ F. The latter follows from a property of the resultant: given f (Y ), g(Y ) ∈ R[Y ], the resultant ResY (f (Y ), g(Y )) is a value in R, and if f, g share a common root over R, then the resultant will be zero. Letting R = F[X, Z], f (Y ) = b(X, Y ), and g(Y ) = b (Y, Z) verifies the claimed result about b (X, Z). In this way, we can compose bivariate relations for each round to obtain a bivariate relation for the whole cipher. Once we have a bivariate relation bk (x, Ek (x)) = 0 for the whole cipher, we can use polynomial interpolation to reconstruct bk given a sufficient quantity of known plaintext/ciphertext pairs. Unfortunately, in general the degree of the bivariate polynomial for the whole cipher can grow rapidly as the number of rounds increases. Notice that interpolation attacks fall out as a special case of bivariate interpolation. If the first and second rounds of the cipher can be expressed as polynomials q(X), q (Y ), this induces bivariate relations b(X, Y ) = q(X) − Y and b (Y, Z) = q (Y ) − Z. Taking the resultant yields b (X, Z) = ResY (b(X, Y ), b(Y, Z)) = ResY (q(X) − Y, q (Y ) − Z) = q (q(X)) − Z, which is nothing more than a round-about derivation of the obvious fact that the composition of first and second rounds may be expressed by the polynomial q (q(X)). Likewise, rational interpolation attacks are a special case of bivariate interpolation. Suppose the first and second rounds can be expressed as rational polynomials p(X)/p (X) and q(Y )/q (Y ). We obtain the bivariate relations b(X, Y ) = p (X) · Y − p(X) and b (Y, Z) = q (Y ) · Z − q(Y ). Taking the resultant yields a bivariate relation for the composition of the first two rounds. Probabilistic bivariate attacks have actually been suggested before by Jakobsen [6] and applied by others to DES [12], but it was not previously explained how to compose local approximations to obtain global approximations, nor was it noticed that bivariate attacks generalize and unify interpolation and rational interpolation.
7
Discussion
Closure Properties. The common theme here seems to be that closure properties enable cryptanalysis. For instance, differential and linear attacks exploit the fact that the set of linear functions is closed under composition: g ◦ f is linear if f, g are. Likewise for the set of polynomials, of rational polynomials, and so on. More generally, we may form a norm N (·) on functions that grows slowly under composition and that corresponds somehow to the cost of an attack. Consider interpolation attacks: letting N (q) = deg q for polynomials q(X), we find
32
David Wagner
N (g ◦ f ) ≤ N (f ) × N (g), hence if we can find low-degree properties for individual round functions, the corresponding global property for the whole cipher will have not-too-large degree. Perhaps other ways to place a metric space structure on the set of bijective functions f : M → M will lead to other cryptanalytic advances in the future. Related Work. Commutative diagram cryptanalysis draws heavily on ideas found in previous frameworks, most notably Vaudenay’s chi-squared cryptanalysis [13] and Harpes’ partitioning cryptanalysis [4]. Vaudenay’s work used linear projections ρ, ρ : {0, 1} → {0, 1}m, and then applied the χ2 statistical test to the pair (ρ(X), ρ (Ek (X)). It turns out that the power of the χ2 test is closely related to the 2 norm, ||M − M ||2 , hence chi-squared cryptanalysis can be viewed as a variant of stochastic commutative diagrams where a different matrix norm is used. Partitioning cryptanalysis generalized this to allow arbitrary (not necessarily linear) projections ρ, ρ . We borrowed methods from Vaudenay’s decorrelation theory [14] to calculate the distinguishing advantage of our statistical attacks. Also, Vaudenay shows how to build ciphers with provable resistance against all non-adaptive d-limited attacks, which corresponds to security against dth order commutative diagram attacks. This work builds on an enormous quantity of work in the block cipher literature; it is a synthesis of many ideas that have previously appeared elsewhere. Due to space limitations, we have been forced to omit mention of a great deal of relevant prior work, and we apologize for all omissions.
8
Conclusion
We have introduced commutative diagram cryptanalysis and shown how it provides a new perspective on many prior attacks in the block cipher literature. We also described two new attack methods, generalized truncated differential cryptanalysis and bivariate interpolation, and demonstrated how they generalize and unify many previous attacks. It is an interesting open problem to extend this framework to incorporate more attacks, to discover more new attacks, or to build fast ciphers that are provably secure against commutative diagram cryptanalysis.
References [1] E. Biham, O. Dunkelman, N. Keller, “Enhancing Differential-Linear Cryptanalysis,” ASIACRYPT 2002, Springer-Verlag, LNCS 2501, pp.254–266. 26 [2] E. Biham, A. Shamir, Differential Cryptanalysis of the Data Encryption Standard, Springer-Verlag, 1993. 25 [3] D. Coppersmith, S. Halevi, C. Jutla, “Cryptanalysis of Stream Ciphers with Linear Masking,” CRYPTO 2002, Springer-Verlag, LNCS 2442, pp.515–532. 18 [4] C. Harpes, J. L. Massey, “Partitioning Cryptanalysis,” FSE ’97, Springer-Verlag, LNCS 1267, pp.13–27. 32
Towards a Unifying View of Block Cipher Cryptanalysis
33
[5] M. E. Hellman, S. K. Langford, “Differential-linear cryptanalysis,” CRYPTO ’94, Springer-Verlag, LNCS 839, pp.26–39. 26 [6] T. Jakobsen, “Cryptanalysis of Block Ciphers with Probabilistic Non-Linear Relations of Low Degree,” CRYPTO ’98, Springer-Verlag, LNCS 1462, pp.212–222. 31 [7] T. Jakobsen, L. R. Knudsen, “Attacks on Block Ciphers of Low Algebraic Degree,” J. Cryptology, 14(3):197–210, 2001. 26, 29 [8] L. Knudsen, “Truncated and Higher Order Differentials,” FSE ’94, SpringerVerlag, LNCS 1008, pp.196–211. 26 [9] X. Lai, J. Massey, S. Murphy, “Markov Ciphers and Differential Cryptanalysis,” EUROCRYPT ’91, Springer-Verlag, LNCS 547, pp.17–38. 22 [10] M. Matsui, “Linear Cryptanalysis Method for DES Cipher,” EUROCRYPT ’93, Springer-Verlag, LNCS 765, pp.386–397. 23 [11] B. Reichardt, D. Wagner, “Markov truncated differential cryptanalysis of Skipjack,” SAC 2002, Springer-Verlag, LNCS 2595, pp.110–128. 27 [12] T. Shimoyama, T. Kaneko, “Quadratic Relation of S-box and Its Application to the Linear Attack of Full Round DES,” CRYPTO ’98, Springer-Verlag, LNCS 1462, pp.200–211. 31 [13] S. Vaudenay, “An Experiment on DES: Statistical Cryptanalysis,” ACM CCS ’96, ACM Press, pp.139–147. 32 [14] S. Vaudenay, “Decorrelation: A Theory for Block Cipher Security,” J. Cryptology, Springer-Verlag, 16(4):249–286, Sept. 2003. 18, 32
Algebraic Attacks on Summation Generators Dong Hoon Lee, Jaeheon Kim, Jin Hong, Jae Woo Han, and Dukjae Moon National Security Research Institute 161 Gajeong-dong, Yuseong-gu, Daejeon, 305-350, Korea {dlee,jaeheon,jinhong,jwhan,djmoon}@etri.re.kr
Abstract. We apply the algebraic attacks on stream ciphers with memories to the summation generator. For a summation generator that uses n LFSRs, an algebraic equation relating the key stream bits and LFSR output bits can be made to be of degree less than or equal to 2log2 n , using log 2 n + 1 consecutive key stream bits. This is much lower than the upper bound given by previous general results. We also show that the techniques of [6, 2] can be applied to summation generators using 2k LFSRs to reduce the effective degree of the algebraic equation. Keywords: stream ciphers, algebraic attacks, summation generators
1
Introduction
Among recent developments on stream ciphers, the algebraic attack has gathered much attention. In this attack, an algebraic equation relating the initial key bits and the output key stream bits is set up and solved through linearization techniques. Algebraic attack was first applied to block ciphers and public key cryptosystems [8, 9, 3]. And its first successful application to stream cipher was done on Toyocrypt [4]. As the method was soon extended to LILI-128 [7], it gathered much attention. Stream ciphers that utilize memory were first thought to be much more resistant to these attacks, but soon it was shown that even these cases were subject to algebraic attacks [1, 5]. The summation generator proposed by Ruepel [12] is a nonlinear combiner with memory. It is known that the generator produces sequences whose period and correlation immunity are maximum, and whose linear complexity is conjectured to be close to the period. Hence it serves as a good building block for stream ciphers. However, a correlation attack on the summation generator that uses two linear feedback shift registers (LFSR) was presented in [11], even though it is also stated in [11] that this attack is not plausible if there are more than two LFSRs in use. Another well known attack on summation generator is given by [10] and points in the opposite direction. It uses feedback carry shift registers (FCSR) to simulate the summation generator and indicates that for a fixed initial key size, breaking them into too many LFSRs will add to its weakness.
B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 34–48, 2004. c International Association for Cryptologic Research 2004
Algebraic Attacks on Summation Generators
LFSR1 LFSR2 . . . LFSRn
35
xt1xt2Tt =
n
-
xtj + ct
z t = T t mod 2
j=1
xtn-
t
ct+1 = T2 Fig. 1. Structure of summation generators
In this work, we study the summation generator with the algebraic attack in mind. We show that for a summation generator that uses n LFSRs, an algebraic equation relating the key stream bits and LFSR output bits can be made to be of degree less than or equal to 2log2 n , using log2 n + 1 consecutive key stream bits. This is much lower than the upper bound on the degree of algebraic equations that is guaranteed by the general works [1, 5]. We also show that the techniques of [6, 2] can be applied to summation generators using 2k LFSRs to reduce the effective degree of the equation further. The reader may confer to Tables 3 and 4, appearing in later sections, for a quick look on the improvements we have given to understanding the actual strength of algebraic attacks on summation generators. Actually, Table 3 should be a good reference in view of algebraic attacks for anyone considering the use of a summation generator. The rest of this paper is organized as follows: We recall the definition of a summation generator and introduce elementary symmetric boolean functions in Section 2. We derive an algebraic equation that is satisfied by the summation generator on 4 LFSRs in Section 4. Induction is used in extending this to the summation generator on 2k LFSRs in the following section. Section 5 deals with the general n case. We show that the degree of the equation can be reduced further in the case n = 2k using the technique of [6] in Section 6, and present our conclusion in Section 7.
2 2.1
Preliminaries Summation Generators
The summation generator proposed by Rueppel [12] is a nonlinear combiner with memory. We consider a summation generator that uses n binary LFSRs (Figure 1). The output of the j-th LFSR at time t is denoted by xtj ∈ {0, 1}. Since we are dealing with binary values, we will not be using powers of these terms, hence the superscript t should not cause any confusion.
36
Dong Hoon Lee et al.
The current carry value from the previous stage is denoted by ct . Note that the carry can be expressed in k = log2 n bits. We shall let ct = (ctk−1 , . . . , ct1 , ct0 ) be the binary expression of the carry value. The binary output z t and the carry value for the next stage from the summation generator is given by z t = xt1 ⊕ xt2 ⊕ · · · ⊕ xtn ⊕ ct0 , ct+1 = (xt1 + xt2 + · · · + xtn + ct )/2 . 2.2
(1) (2)
Symmetric Polynomials
Let us denote by Sit , the i-th elementary symmetric polynomial in the variables {xt1 , . . . , xtn }. We view them as boolean functions rather than as polynomials. Explicitly, they are S0t = 1, S1t = ⊕nj=1 xtj , S2t = ⊕1≤j1 <j2 ≤n xtj1 xtj2 , .. . Snt = xt1 xt2 · · · xtn . We shall also call these by the name elementary symmetric boolean functions. For any fixed 0 ≤ b ≤ n, consider the condition j xtj = b. This condition states that b of the n variable xtj are equal to 1 and that the others are equal to 0. Since Sat is symmetric for any 0 ≤ a ≤ n, it makes sense to evaluate Sat under this condition. Let ma,b denote this value. It is also clear that the values (ma,b )nb=0 completely determines the value of Sat at an arbitrary input. To calculate ma,b , one has only to count how many of the monomials contained in Sat is nonzero. Hence we have b ma,b ≡ (mod 2). (3) a Now, consider the (n + 1) × (n + 1) matrix M = (ma,b ). The n = 4 case is given in Table 1, as an example. Denote by M , the n× n matrix obtained by removing the last row and column from M . When we want to make explicit the number of variable used in defining M and M , we shall write M (n) and M (n). Lemma 1. For powers of 2, the matrix M satisfies k M (2 ) M (2k ) . M (2k+1 ) = 0 M (2k )
Algebraic Attacks on Summation Generators
37
Table 1. The matrix M for n = 4 t xj 0 1 2 3 4 S0t 1 1 1 1 1 S1t 0 1 0 1 0 S2t 0 0 1 1 0 S3t 0 0 0 1 0 S4t 0 0 0 0 1
Proof. Let us first divide M into four parts and write II I . M = III IV It is clear from (3) that the second quadrant, part II, is M (2k ), as claimed. The equation also shows that the lower triangular part of M is zero. Hence the third quadrant is filled with zero, as claimed. We now show that I, II, and IV are identical. Once more, referring to (3), it suffices to show k k 2 +b b 2 +b ≡ (mod 2) (4) ≡ a a 2k + a for all 0 ≤ a, b < 2k . To this end, we may easily check that k
(1 + x)2
+b
k
= (1 + x)2 (1 + x)b k
≡ (1 + x2 )(1 + x)b
(mod 2)
k
= (1 + x)b + x2 (1 + x)b . The coefficient of xa that would appear in the expansion of the left hand side k (1 + x)2 +b is the middle term of (4). Since a < 2k , the xa term in the right hand side may appear only in (1 + x)b and is equal to the first term of (4). This shows k the first equality. Similarly, comparison of the coefficients of x2 +a shows the equality between the first and the last terms of (4). This completes the proof. This lemma allows one to write the matrix M (2k ) explicitly for any given k. Then, owing to (3), the matrix M (n) for any n may be obtained as a submatrix of a big enough M (2k ). For example, Table 1 is a submatrix of M (8). The basic theory on symmetric polynomials tells us that the set consisting of products of elementary symmetric polynomials forms a basis for the space of symmetric polynomials. In case of symmetric boolean functions, the following lemma can easily be seen to be true. Lemma 2. Any boolean function that is symmetric in its variables, may be written as a linear combination of the elementary symmetric boolean functions.
38
Dong Hoon Lee et al. Table 2. The next carry bits in relation to the current carry value ct = (ct1 , ct0 ) ct+1 0 ct+1 1
3
(0,0) S2t S4t
(0,1) S1t ⊕ S2t S3t ⊕ S4t
(1,0) S0t ⊕ S2t S2t ⊕ S4t
(1,1) S0t ⊕ S1t ⊕ S2t S1t ⊕ S2t ⊕ S3t ⊕ S4t
Summation Generator on 4 LFSRs
We fix n = 4 throughout this section and derive an algebraic equation of degree 4 that is satisfied by the summation generator. The variables of the final equation will consist of the LFSR output bits and the key stream bits but will not contain any carry bits. Since we are dealing with 4 = 22 LFSRs, we have k = 2, and it suffices to use 2 bits in expressing the carry value. It is clear that the carry value should be symmetric with respect to the order of the LFSRs in use. Recalling Lemma 2, this implies that, when the current carry value ct is fixed, the carry bits ct+1 0 and ct+1 may be expressed as linear combinations of the elementary symmetric 1 boolean functions Sit . In the general case when the current carry bits are not fixed, they will be linear combinations of the Sit with boolean functions of the carry bits ct0 and ct1 used as coefficients. For each value of ct and LFSR inputs possible, we explicitly calculated ct+1 0 and ct+1 1 . We then used Table 1 to expressed them using the elementary symmetric boolean functions. The result is given in Table 2. Lemma 3. For a summation generator on 4 LFSRs, the following expresses the next stage carry bits as functions of the current LFSR output bits and current carry bits. ct+1 = S2t ⊕ ct0 S1t ⊕ ct1 0 ct+1 1
=
S4t
⊕
ct0 S3t
⊕
ct1 S2t
(5) ⊕
ct0 ct1 S1t .
Proof. Using Table 2, we may write ct+1 = {(1 ⊕ ct0 )(1 ⊕ ct1 )S2t } ⊕ {ct0 (1 ⊕ ct1 )(S1t ⊕ S2t )} 0 ⊕ {(1 ⊕ ct0 )ct1 (S0t ⊕ S2t )} ⊕ {ct0 ct1 (S0t ⊕ S1t ⊕ S2t )} and = {(1 ⊕ ct0 )(1 ⊕ ct1 )S4t } ⊕ {ct0 (1 ⊕ ct1 )(S3t ⊕ S4t )} ct+1 1 ⊕ {(1 ⊕ ct0 )ct1 (S2t ⊕ S4t )} ⊕ {ct0 ct1 (S1t ⊕ S2t ⊕ S3t ⊕ S4t )}. Simplification of these equations gives the claimed statements.
(6)
Algebraic Attacks on Summation Generators
39
Lemma 4. For n = 4, the following expresses the carry bits of the summation generator as polynomials in the LFSR output bits and the key stream bits. ct0 = S1t ⊕ z t . ct1
=
S2t
⊕ (1 ⊕ z
(7) t
)S1t
⊕
S1t+1
⊕z
t+1
.
(8)
The first carry bit ct0 is linear and the second carry bit ct1 is of degree 2 in the LFSR output bits. Proof. The first equation follows immediately from (1). It is a linear function on into (5) gives the variables xtj . Substituting this and its shift ct+1 0 ct1 = S2t ⊕ (S1t ⊕ z t )S1t ⊕ S1t+1 ⊕ z t+1 . Now, since we are dealing with boolean functions, we have (S1t )2 = S1t and the second equation follows. Finally, with this lemma, we may remove all occurrence of the carry bits from (6) to obtain the following proposition. Proposition 1. The following algebraic equation holds true for a summation generator on 4 LFSRs. S4t ⊕ (1 ⊕ z t )S3t ⊕ S2t S1t+1 0=
⊕ (1 ⊕ z t+1 )S2t ⊕ S2t+1 ⊕ (1 ⊕ z t )S1t S1t+1 ⊕ (1 ⊕ z t )(1 ⊕ z t+1 )S1t ⊕ (1 ⊕ z t+1 )S1t+1 ⊕ S1t+2 ⊕ z t+2 .
It is of degree 4 in the output bits of the LFSRs and uses 3 consecutive key stream bits. While simplifying the substitution of (7), (8), and shift of (8) into (6), we have used the equalities S1t S2t = S3t and S1t S3t = S3t of boolean functions.
4
Summation Generator on n = 2k LFSRs
Let us denote by Fit+1 (n), the (symmetric) polynomial that expresses the next in terms of LFSR outputs xtj and current carry bits ctj . Since stage carry bit ct+1 i we shall be dealing with polynomials on different number of variables, we shall write Sjt (n) to denote the elementary symmetric boolean function on n variables. As an example, we saw in the previous section that F0t+1 (22 ) = S2t (22 ) ⊕ ct0 S1t (22 ) ⊕ ct1 , F1t+1 (22 )
=
S4t (22 )
⊕
ct0 S3t (22 )
⊕
ct1 S2t (22 )
(9) ⊕
ct0 ct1 S1t (22 ).
(10)
40
Dong Hoon Lee et al.
Let us suppose that for some n = 2k , the polynomials Fit+1 (2k ) are given by Fit+1 (2k ) = ⊕j fi,j Sjt (2k ).
(11)
Here, each coefficient fi,j is a boolean function defined on the current carry bits ct0 , ct1 , . . . , ctk−1 . For example, we see from (10) that f1,4 = 1,
f1,3 = ct0 ,
f1,2 = ct1 ,
f1,1 = ct0 ct1 ,
f1,0 = 0,
for k = 2. The following proposition will allow us to inductively calculate all Fit+1 (2k ) for any i and k. Proposition 2. Suppose equation (11) holds for some k. Then we have Fit+1 (2k+1 ) = ⊕j fi,j Sjt (2k+1 ), for i < k − 1, t+1 k+1 Fk−1 (2 ) = ⊕j fk−1,j Sjt (2k+1 ) ⊕ ctk , t k+1 Fkt+1 (2k+1 ) = ⊕j fk−1,j Sj+2 ) ⊕ ctk ⊕j fk−1,j Sjt (2k+1 ) . k (2
(12) (13) (14)
Proof. See the appendices. As an immediate application of this proposition to (9) and (10), we may write F0t+1 (23 ) = S2t (23 ) ⊕ ct0 S1t (23 ) ⊕ ct1 , F1t+1 (23 ) F2t+1 (23 )
= =
(15)
S4t (23 ) ⊕ ct0 S3t (23 ) ⊕ ct1 S2t (23 ) ⊕ ct0 ct1 S1t (23 ) ⊕ ct2 , S8t (23 ) ⊕ ct0 S7t (23 ) ⊕ ct1 S6t (23 ) ⊕ ct0 ct1 S5t (23 ) ⊕ ct2 S4t (23 ) ⊕ ct0 ct2 S3t (23 ) ⊕ ct1 ct2 S2t (23 ) ⊕ ct0 ct1 ct2 S1t (23 ).
(16) (17)
Now, let us briefly recall the process we went through in Section 3 in obtaining the degree 4 equation of Proposition 1. We started out with three equations. z t = S1t ⊕ ct0 . ct+1 0 ct+1 1
= =
S2t S4t
⊕ ⊕
ct0 S1t ct0 S3t
(18) ⊕ ⊕
ct1 . ct1 (S2t
(19) ⊕
ct0 S1t ).
(20)
ct0
We used (18) to write as a degree 1 equation that involves just the key stream bit and the LFSR outputs. This was substituted in the next equation (19) to write ct1 as a degree 2 equation of the same kind. Finally, these expressions for ct0 and ct1 were substituted in the last equation to obtain the degree 4 equation that involves only key stream bits and LFSR output bits. What would happen if we wanted to do the same for n = 23 . We would start with the following set of equations. z t = S1t ⊕ ct0 . ct+1 0 ct+1 1 ct+1 2
= = =
⊕ ct0 S1t ⊕ ct1 . ⊕ ct0 S3t ⊕ ct1 S2t ⊕ ct0 ct1 S1t ⊕ ct2 . ⊕ ct0 S7t ⊕ ct1 S6t ⊕ ct0 ct1 S5t ⊕ ct2 (S4t ⊕ ct0 S3t ⊕ ct1 S2t ⊕ ct0 ct1 S1t ).
S2t S4t S8t
(21) (22) (23) (24)
Algebraic Attacks on Summation Generators
41
Notice that the first two equations here are identical to (18) and (19), as stated by (12) of Proposition 2. Hence, as before, ct0 and ct1 will be written as degree 1 and 2 polynomials. The main part of (23) is identical to (20), as stated by (13). They only differ in that ct2 appears at the end of (23). Since (20) is of degree 4, equation (23) gives a degree 4 expression for ct2 . Now, as given by (14), the right hand side of (24) may be broken into two big terms of degree (less than or equal to) 8. The first term is a degree 4 equation shifted by degree 4 and the second term is a product of two degree 4 equations. Finally, the left hand side of (24) is of degree 4. Hence, substitution of ct0 , ct1 and ct2 into (24) gives a degree 8 polynomial connecting various LFSR output bits and key stream bits. One can easily see that the above argument is general enough to be seen as the induction step needed in proving the following theorem. Theorem 1. Consider a summation generator on n = 2k LFSRs. There exists an algebraic equation connecting LFSR output bits and k + 1 consecutive key stream bits in such a way that it is of degree 2k in the LFSR output bits.
5
The General Case
The following is an easy corollary to Theorem 1. Theorem 2. Consider a summation generator of n LFSRs. We shall let k = log2 n. There exists an algebraic equation connecting LFSR output bits and k + 1 consecutive key stream bits in such a way that it is of degree less than or equal to 2k in the LFSR output bits. Proof. We may model a summation generator on n LFSRs as a summation generator on 2k LFSRs with (2k − n) of the LFSRs set to zero. Hence, our claim follows from Theorem 1. For small n’s, we explicitly calculated the algebraic equations. Table 3 compares the upper bounds on the degree of the algebraic equation claimed by various methods. We believe this table is big enough to cover any practically usable summation generator and should serve as a good reference for anyone implementing a summation generator and considering its immunity to algebraic attacks.
Table 3. Degree bounds on algebraic equations for summation generators n
2
3 4
5
6
7
8
9
10
11
12
13
14
15
16
[1, 5]
2
5 6
10
12
14
16
23
25
28
30
33
35
38
40
Thm 2 explicit calc.
2 2
4 4 3 4
8 6
8 6
8 7
8 8
16 12
16 12
16 13
16 14
16 14
16 14
16 15
16 16
42
6
Dong Hoon Lee et al.
Reducing the Degree Further for the n = 2k Case
For the case when n = 2k , we may reduce the degree of the algebraic equation a little bit further, assuming that we have access to consecutive key stream bits. We will apply the fast algebraic attack, which is introduced by Courtois [6] and is improved by Armknecht [2]. Following the notation of [3, 6], we classify multivariate equations that relate key bits ki and output bits zj into types given by their degrees of ki and zj . We say that a polynomial is of type k d z f if all of its monomial terms are of the form ki1 · · · kid zj1 · · · zjf . Capital letters will be used in a similar manner to denote types of equations that may contain lower degree monomials also. For example, K 2 = k 2 ∪ k ∪ 1 and KZ = kz ∪ k ∪ z ∪ 1. In [6], double-decker equations (DDE) of degree (d, e, f ) are defined to be any multivariate equation of type K d ∪ K e Z f . Cases where d > e are of interest from the attacker’s point of view. The following proposition states that the equations describing the summation generator on 2k input LFSRs, given by Theorem 1, are DDEs. Theorem 3. A double-decker equation of degree (2k , 2k − 1, 2k−1 ) that relates initial key bits and output stream bits exists for the summation generator on 2k LFSRs. Proof. Let us, once more, recall the n = 22 case. The following is a very simple illustration of the process we went through in obtaining the degree 4 equation. (18) (19), (25)
⇒
Z 1 = K 1 ⊕ ct0
⇒
ct0 is of type K 1 ∪ Z 1
⇒
K ∪ Z = K ∪ (K ∪ Z )K ⊕
⇒
ct1 is of type K 2 ∪ K 1 Z 1
1
1
2
(25)
1
1
1
ct1 (26)
⇒
(20), (26)
K ∪ K Z 1 = (K 2+2 ∪ K 1+2 Z 1 ) ∪ (K 2 ∪ K 1 Z 1 )(K 2 ∪ K 1 Z 1 ) 2
1
⇒
relation of type K 4 ∪ K 3 Z 2
(27)
Let us next consider the n = 23 case also. Since changing the number of variables, i.e., replacing Sjt (22 ) with Sjt (23 ) does not change the degree of these equations, the degree 23 equation is obtained as follows. (18) = (21), (25)
⇒
ct0 is of type K 1 ∪ Z 1
(19) = (22), (26)
⇒
(20) ∼ (23), (27)
⇒
ct1 ct2
(28)
is of type K ∪ K Z
1
(29)
is of type K ∪ K Z
2
(30)
2 4
1 3
⇒
(24), (30)
K ∪ K Z 2 = (K 4+4 ∪ K 3+4 Z 2 ) ∪ (K 4 ∪ K 3 Z 2 )(K 4 ∪ K 3 Z 2 ) 4
3
⇒
relation of type K 8 ∪ K 7 Z 4
(31)
Algebraic Attacks on Summation Generators
43
This shows that the general case may be proved by induction. To prove the induction step, it suffices to show that k
k
K2 ∪ K2
−1
k−1
Z2
k
= (K 2
+2k
k
∪ K2
k
k
∪ (K 2 ∪ K 2 k+1
k+1
−1+2k −1
k−1
Z2
k−1
Z2
) k
k
)(K 2 ∪ K 2
−1
k−1
Z2
)
k
gives a DDE of type K 2 ∪K 2 −1 Z 2 . Checking the validity of this statement is trivial. And this completes the proof. We may assume that all periods of the LFSRs used in the summation generator are relatively prime. Then the summation generator satisfies the requirement of the attack described in [6], if we have access to consecutive key stream bits. The following steps may be taken to reduce the complexity of the algebraic attack. 1. Compute an algebraic equation explicitly using the formulae given in Proposition 2. 2. Write the resulting DDE in the following form. Lt (k) = Rt (k, z). Here, Lt (k) is the sum of all monomials of type K d appearing in the equation and Rt (k, z) is the sum of all other monomials of type K e Z f . t 3. Fix an arbitrary nontrivial initial key k and compute the value L (k ) for . a sequence of length 2 m d 4. Using the Berlekamp-Massey algorithm, find a linear relation α = (αt )t such that αt Lt (k ) = 0. t
We note that steps 1,2,3,4 are independent of the initial key k, hence we can pre-compute the relation α. 5. We have obtained an algebraic equation of degree e. It is given by αt Rt (k, z) = 0. t
6. Apply the general algebraic attack given in [7] to the above equation. Example 1. The DDE for n = 22 is given as follows: ⎛ t ⎞ ⎞ ⎛ t t S4 ⊕ S3t ⊕ S1t+1 S2t z S3 ⊕ z t+1 S2t ⊕ z t S1t S1t+1 ⎜ ⎟ ⎜ t+1 t+1 ⎟ ⎝ ⊕ S2 ⊕ S2t ⊕ S1t S1 ⎠ = ⎝ ⊕ z t+1 S1t+1 ⊕ z t z t+1 S1t ⎠ . ⊕ S1t+2 ⊕ S1t+1 ⊕ S1t
⊕ z t+1 S1t ⊕ z t S1t ⊕ z t+2
We take 4 LFSRs defined by the following characteristic polynomials. L1 : x3 + x + 1 L2 : x5 + x2 + 1 L3 : x7 + x + 1 L4 : x11 + x2 + 1
44
Dong Hoon Lee et al. Table 4. Complexity comparison of attacks on summation generators generator size [1, 5]
data
computation
[10]
data computation
Thm 1
data computation
Thm 3
data computation
(m, 2k )
m 2k−1 (k+1) w m 2k−1 (k+1) m +k+1 k
T = 22 T log2 T log2 log2 T m m2kw 2
k
2m k 2m−1w 2k −1
(128, 22 )
(256, 22 )
(256, 23 )
232.3
238.4
283.1
90.8
2
107.9
2
267
2233.2
235 277.5
2
236 279.5
223.3 265.5
227.4 276.9
248.5 2136.3
218.4 251.6
221.4 260.1
243.6 2122.3
142.7
Since 26
15, 000, we compute 30,000 bits from the left hand side of the 4 above equation for some arbitrary nontrivial key bits k . After applying the Berlekamp-Masseyalgorithm, we found that the sequence has a linear relation . With a consecutive key stream of length of the order of length 3892 < 26 4 26 6520 = 3892 + (3 − 1) + ( 3 − 1), one will be able to find the 26 bit initial key k. Table 4 gives a simple comparison of the data and computational complexities needed for attacks on summation generators. Let w be the Gaussian elimination exponent. We shall use w = log2 7, as given by the Strassen algorithm. Let the summation generator with 2k input LFSRs use an m-bit initial key. We remark that for [10] and Theorem 3, the key stream needs to be consecutive. For [1, 5] and Theorem 1, the key stream need only be partially consecutive, i.e., we need groups of k + 1 consecutive bits, but these groups may be far apart from each other. Hence, a straightforward comparison of data complexity might not be fair. Also, the values for [10] have been calculated assuming that the LFSRs in use have been chosen well, so that their 2-adic span is maximal.
7
Conclusion
We have applied the general results of [1, 5] and [6] on stream ciphers with memories to the summation generator. Our results show that the degree of algebraic equation obtainable and the complexity of the attack applicable are much lower than given by the general results. For a summation generator that uses n LFSRs, the algebraic equation relating the key stream bits and LFSR output bits can be made to be of degree less than or equal to 2log2 n , using log2 n+1 consecutive key stream bits. Under certain conditions, for the n = 2k case, the effective degree may further be reduced by 1. And for small n’s we have summarized the degrees of the explicit equations in Table 3. The table should be taken into account by anyone using a summation generator.
Algebraic Attacks on Summation Generators
45
References [1] F. Armknecht and M. Krause, Algebraic attacks on combiners with memory, Advances in Cryptology - Crypto 2003, LNCS 2729, Springer-Verlag, pp. 162–175, 2003. 34, 35, 41, 44 [2] F. Armknecht, Improving Fast Algegraic Attacks, this proceeding, 2004. 34, 35, 42 [3] N. Courtois, The security of Hidden Field Equations (HFE), CT-RSA 2001, LNCS 2020, Springer-Verlag, pp. 266–281, 2001. 34, 42 [4] N. Courtois, Higher order correlation attacks, XL algorithm and Cryptanalysis of Toyocrypt, ICISC 2002, LNCS 2587, Springer-Verlag, pp. 182–199, 2002. 34 [5] N. Courtois, Algebraic attacks on combiners with memory and several outputs, E-print archive, 2003/125. 34, 35, 41, 44 [6] N. Courtois, Fast algebraic attack on stream ciphers with linear feedback, Advances in Cryptology - Crypto 2003, LNCS 2729, Springer-Verlag, pp. 176–194, 2003. 34, 35, 42, 43, 44 [7] N. Courtois and W. Meier, Algebraic attacks on stream ciphers with linear feedback, Advances in Cryptology - Eurocrypt 2003, LNCS 2656, Springer-Verlag, pp. 345–359, 2003. 34, 43 [8] N. Courtois and J. Pieprzyk, Cryptanalysis of block ciphers with overdefined systems of equations, Asiacrypt 2002, LNCS 2501, Springer-Verlag, pp. 267–287, 2002. 34 [9] A. Kipnis and A. Shamir, Cryptanalysis of the HFE public key cryptosystem by relinearization, Advances in Cryptoloy - Crypto’99, LNCS 1666, Springer-Verlag, pp. 19–30, 1999. 34 [10] A. Klapper and M. Goresky, Cryptanalysis based on 2-adic rational approximation, Advances in Cryptology - Crypto ’95, LNCS 963, Springer-Verlag, pp. 262– 273, 1995. 34, 44 [11] W. Meier and O. Staffelbach, Correlation Properties of Combiners with Memory in Stream Cipher, Journal of Cryptology, vol.5, pp. 67–86, 1992. 34 [12] R. A. Rueppel, Correlation immunity and the summation generator, Advances in Cryptology - Crypto’85, LNCS 219, Springer-Verlag, pp. 260–272, 1985. 34, 35
A
Proof of Proposition 2, Equation (12)
To prove (12), we shall evaluate its right hand side at some arbitrary input value . But before we do this, let us and show that it equals the next carry bit ct+1 i make some observations. Note that we may take ct+1 ≡ ( j xtj + ct )/2i+1 (mod 2) (32) i as the definition of the carry bits. We may evaluate the right hand side of (11) at j xtj = 0 and equate it with the evaluation of (32) at the same point to shows cti+1 = fi,0 . Now, from the discussions in Section 2 on the matrix M , we know that at j xtj = 2k , we have S0t (2k ) = S2t k (2k ) = 1 and all other Sjt (2k ) = t k 0. Once more, evaluating (11) and (32) at j xj = 2 with this in mind
46
Dong Hoon Lee et al.
shows cti+1 = fi,0 ⊕fi,2k . Since we already know cti+1 = fi,0 , this implies fi,2k = 0, or equivalently, that the term S2t k does not appear in the linear sum (11). We shall now evaluate the right hand side of (12) at some fixed LFSR output values (xt1 , . . . , xt2k+1 ) and carry value ct . Set x = j xtj , x = remainder of x divided by 2k , c = remainder of ct divided by 2k . From the above observation, we know that the right hand side of (12) contains some of the terms S0t (2k+1 ), . . . , S2t k −1 (2k+1 ), but does not contain any of the terms Sj (2k+1 ) for j ≥ 2k . We also know from discussions of Section 2 that the evaluation of each Sjt (2k+1 ) at x for j < 2k is equal to its evaluation at x . Note also that the term ct2k does not appear as input to any of the coefficients fi,j in the right hand side of (12). Hence, RHS of (12) at x and ct = RHS of (12) at x and c = RHS of (11) at x and c = (x + c )/2i+1
(mod 2)
= (x + c )/2
(mod 2)
t
=
ct+1 i
i+1
t
at x and c .
The condition i ≤ k − 2 has been used in the fourth equality. And the last equality is just (32). We have completed the proof that (12) is a valid expression for the next carry bits.
B
Proof of Proposition 2, Equation (13)
The proof for (13) is very similar to that of (12) and hence we shall be very brief. We ask the readers to read Appendix A before reading this section. The carry bit is given by t t k (mod 2). ct+1 k−1 ≡ ( j xj + c )/2 Evaluation of (11) at j xtj = 0 and 2k shows that fk−1,0 = 0 and fk−1,2k = 1, hence we now always have S2t k as a linear term, with coefficient equal to 1, in the sums (11) and (13). The temporary values x, x , and c may be defined as before. From the discussions of Section 2, one may write t k+1 (mod 2) S2k (2 ) at x = x/2k = S2t k (2k+1 ) at x ⊕ x/2k and
t k+1 Sj (2 ) at x = Sjt (2k+1 ) at x
Algebraic Attacks on Summation Generators
47
for j < 2k . Also note that none of the terms Sj (2k+1 ), for j > 2k , appears in the sum (13) and that the term ctk visible in (13) is its only use in (13). Hence, RHS of (13) at x and ct = RHS of (13) at x and c ⊕ ctk = RHS of (13) at x and c ⊕ ctk ⊕ x/2k = RHS of (11) at x and c ⊕ ctk ⊕ x/2k
= (x + c )/2 ⊕ k
ctk
= (x + c )/2 t
k
⊕ x/2 k
(mod 2) (mod 2)
(mod 2)
(mod 2)
t = ct+1 k−1 at x and c .
This completes the proof.
C
Proof of Proposition 2, Equation (14)
Define x = j xtj . Careful reading of Appendices A and B shows that the first term of (14) simplifies to 0 for 0 ≤ x ≤ 2k , t k+1 )= ⊕j fk−1,j Sj+2 (33) k (2 (x − 2k + ct )/2k ⊕ ctk for 2k < x ≤ 2k+1 , and that the second term satisfies ctk ⊕j fk−1,j Sjt (2k+1 ) = ctk (x + ct )/2k ⊕ ctk .
(34)
It now suffices to compare the sum of these two values with ct+1 ≡ (x + ct )/2k+1 k
(mod 2).
Define x = x − 2k when x > 2k and define c = ct − 2k when ctk = 1. Case 1) x ≤ 2k , ctk = 0. This is the most easy case. (33) ⊕ (34) = 0 = (x + ct )/2k+1 . Case 2) x ≤ 2k , ctk = 1. (33) ⊕ (34) = (x + ct )/2k ⊕ 1 = (x + c )/2k = {(x + c ) + 2k }/2k+1 = (x + c)/2k+1 .
(35)
48
Dong Hoon Lee et al.
Case 3) x > 2k , ctk = 0. (33) ⊕ (34) = (x + ct )/2k = {(x + ct ) + 2k }/2k+1 = (x + ct )/2k+1 . Case 4) x > 2k , ctk = 1. (33) ⊕ (34) = (x + ct )/2k ⊕ (x + ct )/2k = (x + ct )/2k ⊕ ((x + ct )/2k ⊕ 1) =1 = (x + ct )/2k+1 . This completes the proof.
Algebraic Attacks on SOBER-t32 and SOBER-t16 without Stuttering Joo Yeon Cho and Josef Pieprzyk Center for Advanced Computing – Algorithms and Cryptography Department of Computing, Macquarie University, NSW, Australia, 2109 {jcho,josef}@ics.mq.edu.au
Abstract. This paper presents algebraic attacks on SOBER-t32 and SOBER-t16 without stuttering. For unstuttered SOBER-t32, two different attacks are implemented. In the first attack, we obtain multivariate equations of degree 10. Then, an algebraic attack is developed using a collection of output bits whose relation to the initial state of the LFSR can be described by low-degree equations. The resulting system of equations contains 269 equations and monomials, which can be solved using the Gaussian elimination with the complexity of 2196.5 . For the second attack, we build a multivariate equation of degree 14. We focus on the property of the equation that the monomials which are combined with output bit are linear. By applying the Berlekamp-Massey algorithm, we can obtain a system of linear equations and the initial states of the LFSR can be recovered. The complexity of attack is around O(2100 ) with 292 keystream observations. The second algebraic attack is applicable to SOBER-t16 without stuttering. The attack takes around O(285 ) CPU clocks with 278 keystream observations. Keywords: Algebraic attack, stream ciphers, linearization, NESSIE, SOBER-t32, SOBER-t16, modular addition, multivariate equations
1
Introduction
Stream ciphers are an important class of encryption algorithms. They encrypt individual characters of a plaintext message one at a time, using a stream of pseudorandom bits. Stream ciphers generally offer a better performance compared with block ciphers. They are also more suitable for implementations where computing resources are limited (mobile phones) or when characters must be individually processed (reducing the delay) [4]. Recently, there were two international calls for cryptographic primitives. NESSIE is an European initiative [2] and CRYPTREC [1] is driven by Japan. Many stream ciphers have been submitted and evaluated by the international cryptographic community. NESSIE announced the final decision at Feb. 2003 and none of candidates of stream ciphers was selected in the final report.
Supported by ARC Discovery grant DP0451484.
B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 49–64, 2004. c International Association for Cryptologic Research 2004
50
Joo Yeon Cho and Josef Pieprzyk
According to the final NESSIE security report [3], there were four stream cipher primitives which were considered during the phase II : BMGL [15], SNOW [12], SOBER-t16 and SOBER-t32 [17]. The security analysis of these stream ciphers is mainly focused on the distinguishing and guess-determine attacks. Note that ciphers for which such attacks exist are excluded from the contest (even if those attack do not allow to recover the secret elements of the cipher). For SOBER-t16 and SOBER-t32, there are distinguishing attacks that are faster than the key exhaustive attack. SOBER-t16 is distinguishable from the truly random keystream with the work factor of 292 for the version without stuttering and with the work factor of 2111 for the version with stuttering [13]. The same technique is used to construct a distinguisher for SOBER-t32 with complexity of 286.5 for the non-stuttering version [13]. For the version with stuttering [14], the distinguisher has the complexity 2153 . The distinguishing attacks are the weakest form of attack and normally identify a potential weakness that may lead to the full attack that allows to determine the secret elements (such as the initial state) of the cipher. Recent development of algebraic attacks on stream ciphers already resulted in a dramatic cull of potential candidates for the stream cipher standards. The casualties of the algebraic attacks include Toyocrypt submitted to CRYPTREC [7] and LILI-128 submitted to NESSIE [10]. In this paper, we present algebraic attacks on SOBER-t32 and SOBER-t16 without stuttering. For unstuttered SOBER-t32, two different attacks are implemented. In the first attack, we apply indirectly the algebraic attack on combiner with memory, even though SOBER-t32 does not include an internal memory state. We extract a part of most significant bits of the addition modulo 232 and the carry generated from the part of less significant bits is regarded as the internal memory state (which is unknown). Our attack can recover the initial state of LFSR with the workload of 2196.5 by 269 keystream observations, which is faster than the exhaustive search of the 256-bit key. For the second attack, we build a multivariate equation of degree 14. This equation has a property that the monomials which are combined with output bit are linear. By applying the Berlekamp-Massey algorithm, we can obtain a system of simple linear equations and recover the initial states of the LFSR. The attack takes O(2100 ) CPU clocks with around 292 keystream observations. We apply the second algebraic attack to SOBER-t16 without stuttering. Our attack can recover the initial state of LFSR with the workload of O(285 ) using 278 keystream observations for unstuttered SOBER-t16. This paper is organized as follows. In Section 2, an algebraic attack method is briefly described. The structure of SOBER-t32 is given in Section 3. In Section 4, the first algebraic attack on SOBER-t32 is presented. the second algebraic attack on SOBER-t32 is presented in Section 5. In Section 6, an algebraic attack on SOBER-t16 is presented. Section 7 concludes the paper.
Algebraic Attacks on SOBER-t32 and SOBER-t16 without Stuttering
2 2.1
51
Algebraic Attacks Previous Works
The first application of algebraic approach for analysis of stream ciphers can be found in the Courtois’ work [7]. The idea behind it is to find a relation between the initial state of LFSR and the output bits that is expressible by a polynomial of a low degree. By accumulating enough observations (and corresponding equations) the attacker is able to create a system of equations of a low algebraic degree. The number of monomials in these equations is relatively small (as the degree of each equation is low). We treat monomials as independent variables and solve the system by the Gaussian elimination. For Toyocrypt, the algebraic attack shown in [7] is probabilistic and requires the workload of 292 with 219 keystream observations. In [10], the authors showed that Toyocrypt is breakable in 249 CPU clocks with 20K bytes of keystream. They also analyzed the NESSIE submission of LILI-128 showing that it is breakable within 257 CPU clocks with 762 GBytes memory. Recently, the algebraic approach has been extended to analyze combiners with memory [5, 6, 8]. Very recently, the method that allows a substantial reduction of the complexity of all these attacks is presented in [9]. 2.2
General Description of Algebraic Attacks
Let S0 = (s0 , · · · , sn−1 ) be an initial state of the linear shift register at the clock 0. The state variables are next input to the nonlinear block producing the output v0 = N B(S0 ) where N B is a nonlinear function transforming the state of the LFSR into the output. At each clock t, the state is updated according to the following relation St = L(st , · · · , sn−1+t ) with L being a multivariate linear transformation. The output is vt = N B(St ). The general algebraic attack on such stream ciphers works as follows. For details see [7] or [10]. – Find a multivariate relation Q of a low degree d between the state bits and the bits of the output. Assume that the relation is Q(S0 , v0 ) = 0 for the clock 0. – The same relation holds for all consecutive clocks t so Q(St , vt ) = Q(Lt (S0 ), vt ) = 0 where the state St at the clock t is a linear transformation of the initial state S0 or St = Lt (S0 ). Note that all relations are of the same degree d. – Given consecutive keystream bits v0 , · · · vM−1 , we obtain a system of M equations with monomials of degree at most d. If we collect enough observations so the number of linearly independent equations is at least as large as the number T of monomials of degree at most d, then the system has a unique solution revealing the initial state S0 . We can apply the Gaussian reduction algorithm that requires 7 · T log2 7 operations [18].
52
Joo Yeon Cho and Josef Pieprzyk
Finding relations amongst the input and output variables is a crucial task in each algebraic attack. Assume that we have a nonlinear block with n binary inputs and m binary outputs or simply the n × m S-box over GF (2). The truth table of the box consists of 2n rows. The columns point out all input and output variables (monomials ofdegree 1). We can add columns for all terms (monomials) such terms. We can continue adding columns for of degree 2. There are n+m 2 higher degree monomials until 2n <
d n+m i=1
i
where d is the highest degree of the monomials. Informally, the extended truth table can be seen as a matrix having more columns than rows so there are some columns (monomials) that can be expressed as a linear combination of other columns establishing a required relations amongst monomials of the S-box.
3 3.1
Brief Description of SOBER-t32 Notation
All variables operates on 32-bit words. Refer to the Figures 1. – – – – – 3.2
⊕ : addition in GF (232 ), : addition modulo 232 . si,j : the j-th bit of the state register si . si,j→k : a consecutive bit stream from the j-th bit to k-th bit of si . x = s0 s16 . xi is the i-th bit of x. α : the output of the first S-box. αi is the i-th bit of α. SOBER-t32
SOBER-t32 is a word-oriented synchronous stream cipher. It operates on 32-bit words and has a secret key of 256 bits (or 8 words). SOBER-t32 consists of a linear feedback shift register (LFSR) having 17 words (or 544 bits), a nonlinear filter (NLF) and a form of irregular decimation called stuttering. The LFSR produces a stream St of words using operations over GF (232 ). The vector St = (st , · · · , st+16 ) is known as the state of the LFSR at time t, and the state S0 = (s0 , · · · , s16 ) is called the initial state. The initial state and a 32-bit, key-dependent constant called K are initialized from the secret key by the key loading procedure. A Nonlinear Filter (NLF) takes some of the states, at time t, as inputs and produces a sequence vt . Each output stream vt is obtained as vt = N LF (St ) = F (st , st+1 , st+6 , st+13 , st+16 , K). The function F is described in the following subsection. The stuttering decimates the stream that is produced by NLF and outputs the key stream. The detailed description is given in [17].
Algebraic Attacks on SOBER-t32 and SOBER-t16 without Stuttering
-
s0
53
s16
x xH f(x) -
xL
?
S-box
?
α - l s1
- ?
s6
l K s13
? v
Fig. 1. The non-linear filter of SOBER-32 without stuttering
The Linear Feedback Shift Register. SOBER-t32 uses an LFSR of length 17 over GF (232 ). Each register element contains one 32-bit word. The contents of the LFSR at time t is denoted by st , · · · , st+16 . The new state of the LFSR is generated by shifting the previous state one step (operations are performed on words) and updating the state of the most significant word according to the following linear equation st+17 = st+15 ⊕ st+4 ⊕ β · st where β = 0xc2db2aa3. The Nonlinear Filter. At time t, the nonlinear filter takes five words from the LFSR states, st , st+1 , st+6 , st+13 , st+16 and the constant K as the input and produces the output vt . The nonlinear filter consists of function f , three adders modulo 232 and the XOR addition. The value K is a 32-bit key-dependent constant that is determined during the initialization of LFSR. K is kept constant throughout the entire session. The function f translates a single word input into a word output. The output of the nonlinear filter, vt , is equal to vt = ((f (st st+16 ) st+1 st+6 ) ⊕ K) st+13 where the function f (x) is illustrated in Figure 1. The function uses a S-box with 8-bit input and 32-bit output. The input x to the function f (x) is split into two strings, xH and xL . The first string xH is transformed by the S-box that gives α.
54
Joo Yeon Cho and Josef Pieprzyk
The Stuttering. The output of the stream cipher is obtained by decimating the output of the NLF in an irregular way. It makes correlation attack harder. In this paper, however, we will ignore the stuttering phase and assume that the attacker is able to observe the output vt directly.
4 4.1
The First Attack on SOBER-t32 without Stuttering Building Multivariate Equations for the Non-linear Filter
If we look at the structure of the non-linear filter, the following equations relating the state S0 = (s0 , · · · , s16 ) of LFSR and the intermediate values x, α with the output keystream v are established. ⎧ x0 = s0,0 ⊕ s16,0 ⎪ ⎪ ⎪ ⎪ α ⎪ 0 = x0 ⊕ s1,0 ⊕ s6,0 ⊕ K0 ⊕ s13,0 ⊕ v0 ⎪ ⎪ ⎨ x1 = s0,1 ⊕ s16,1 ⊕ s0,0 s16,0 (1) α1 = x1 ⊕ s1,1 ⊕ s6,1 ⊕ K1 ⊕ s13,1 ⊕ v1 ⊕ K0 s13,0 ⊕ ⎪ ⎪ ⎪ (x ⊕ α )(s ⊕ s ⊕ s ) ⊕ s (s ⊕ s ) ⊕ s s ⎪ 0 0 1,0 6,0 13,0 1,0 6,0 13,0 6,0 13,0 ⎪ ⎪ ⎪ ⎩ .. . Lemma 1. Variables xi and αi can be expressed as an equation of degree d ≥ i + 1 over the bits of the state variables. Let’s consider the first modular addition x = s0 s16 . If we denote the carry in the 24-th bit by ca, the addition modulo 232 can be divided into two independent additions. One is the addition modulo 28 for the most significant 8 bits and another is the addition modulo 224 for the remaining 24 bits. Then, x24→31 = s0,24→31 s16,24→31 ca (modulo 28 ) x0→23 = s0,0→23 s16,0→23 (modulo 224 )
Lemma 2. If the carry in the 24-th bit position is regarded as an unknown, the degrees of equations which are related to each xi (24 ≤ i ≤ 31) are reduced to (i − 23). Now, we reconstruct a partial block of non-linear filter : the addition modulo 232 and the S-box. These two blocks can be considered as a single S-box. This is going to open up a possibility of reduction of degree of relations derived for the complex S-box. Furthermore, we consider the carry in the 24-th bit of addition modulo 232 as an another input variable that is unknown (so we will avoid using it in the relation and treat it as an unknown state [6, 8]). The structure of the new block is shown in Figure 2. The addition is modified to add two 8-bit strings (s0,24→31 and s16,24→31 ) with the carry c24 that is unknown. Thus this part is an addition modulo 28 . The output is put to the
Algebraic Attacks on SOBER-t32 and SOBER-t16 without Stuttering
Input s16,24→31 (8-bit)
A-
s0,24→31 (8-bit)
55
c24 (1-bit)
?
S-box
Output
? α1 α0 α (32-bit)
Fig. 2. A combined structure of partial addition modulo 28 and S-box
S-box that has 32-bit output and amongst the output bits the least significant two bits are α1 , α0 . If we regard the carry in addition modulo 2n as an internal memory state, an algebraic attack on combiner with memory is able to be applied to the new block since the structure of block is very similar to the model which is analyzed in [6, 8]. Lemma 3. Let A be the combined block of modular addition and S-box with input s0 , s16 and output α. If the carry in the 24-th bit c24 is considered as an unknown, there is a multivariate equation that relates s0,24→31 , s16,24→31 and output bits α0 , α1 without the carry bit c24 . Proof. Let’s create the following matrix. – Rows are all the possibilities for s16,24→31 , s0,24→31 and carry bit c24 . There are 217 rows enumerating all possible values for the input bits. – The columns are all the monomials of degree up to 9 which are coming from the input bits s16,24→31 , s0,24→31 and the output bits α1 , α0 . The number of columns is 9 18 1 18 ∼ 17 = 217 + = 2 + 214.5 i 2 9 i=0 If we apply the Gaussian elimination to this matrix, definitely we can obtain equations of degree up to 9 because the number of columns is greater than that of rows. Let’s denote the equation which is derived above as F (s0,24→31 , s16,24→31 , α0 , α1 ) of degree 9. If α0 and α1 are replaced by Equation (1) for the multivariate equation F , the degree of F becomes at most 10 which consists of only state bits of the LFSR, K and output bits of the NLF. In Appendix A, we present a toy example for illustrating the idea of the attack.
56
4.2
Joo Yeon Cho and Josef Pieprzyk
Complexity
We can apply the general algebraic attack to unstuttered SOBER-t32 by using the equation F . The number of monomials T of degree up to 10 are chosen from 544 unknowns. 10 544 ∼ 69 T = =2 i i=0 From [10], the attack requires roughly 7 · T log2 7 ∼ = 2196.5 CPU clocks with 269 keystream observations. In CRYPTO’03, a method that can solve the multivariate equations more efficiently by pre-computation is presented [9]. Even though the constant factor of complexity is not precisely estimated, this method seems to allow a further reduction of the complexity of our attack. According to [9], the main workload of algebraic attack is distributed into pre-computation stage and Gaussian reduction stage. The pre-computation is operated by LFSR synthesis method, say the Berlekamp-Massey algorithm. This step needs to take O(S log S + Sn) steps if the asymptotically fast versions of the Berlekamp-Massey algorithm is used, where S is the size of the smallest linear dependency and n is the number of the state bits. Then, Gaussian elimination is applied to the monomials of reduced degree. For our attack, the pre-computation operation needs about O(278.3 ) steps. Then, Gaussian reduction can be applied to the monomials of degree up to 9. It takes about O(2180 ) steps with 2126.4 bits of memory.
5 5.1
The Second Attack on SOBER-32 without Stuttering An Observation of Modular Addition
Let us consider the function of modular addition c = a b where a = (a31 , · · · , a0 ), b = (b31 , · · · , b0 ) and c = (c31 , · · · , c0 ). Lemma 4. Let ci be the i-th output bit of the modular addition. Then, c0 = a0 ⊕ b0 , c1 = a1 ⊕ b1 ⊕ a0 b0 and for 2 ≤ i ≤ 31, ci = ai ⊕ bi ⊕ ai−1 bi−1 ⊕
i−2 t=0
at b t {
i−1
(ar ⊕ br )}
r=t+1
Each ci is expressed as a function of input bits of degree i + 1. Theorem 1. Let ci , 24 ≤ i ≤ 31 be the i-th output bit of modular addition c = ab. If ci is multiplied by (1⊕a23 ⊕b23 ), then, the degree of ci ·(1⊕a23 ⊕b23 ) is reduced to (i − 22). Proof. For 24 ≤ i ≤ 31, ci can be separated into two parts : one part includes (a23 ⊕ b23 ) and the remaining part does not. Therefore, ci = p(a23→i , b23→i ) ⊕ (a23 ⊕ b23 ) · q(a0→i , b0→i )
Algebraic Attacks on SOBER-t32 and SOBER-t16 without Stuttering
57
where p and q are functions of the input bits. Then, ci · (1 ⊕ a23 ⊕ b23 ) = p(a23→i , b23→i ) · (1 ⊕ a23 ⊕ b23 ) From Lemma 4, the degree of ci · (1 ⊕ a23 ⊕ b23 ) is (i − 22). 5.2
Building a System Equation for the Non-linear Filter
If we look at the structure of the non-linear filter, we can easily see that the following equation holds. (see Figure 1) α0 = s0,0 ⊕ s16,0 ⊕ s1,0 ⊕ s6,0 ⊕ s13,0 ⊕ v0 ⊕ K0
(2)
An Equation for α0 . The bit α0 is the least significant output bit of the S-box. α0 can be represented by a non-linear equation where variables consist of only the input bits of the S-box. Let’s construct the following matrix. – Rows generate all the possibilities for (x31 , · · · , x24 ), so there are 28 rows enumerating all possible values for the input bits. – The columns are all the monomials of degree up to 8 which are coming from the input bits (x31 , · · · , x24 ) and the least significant output bit α0 . The number of columns becomes 28 + 1 If we apply the Gaussian elimination to this matrix, we can obtain a non-linear equation because the number of columns is greater than that of rows. Simulation shows that the degree of the equation for α0 is 6. (See Appendix B) In the next step, let’s take a look at the first modular addition of the nonlinear filter, which is x = s0 s16 . By Theorem 1, for 24 ≤ i ≤ 31, xi · (1 ⊕ s0,23 ⊕ s16,23 ) becomes xi · (1 ⊕ s0,23 ⊕ s16,23 ) = g(s0,23→i , s16,23→i ) where g is a multivariate equation of degree up to (i − 22). Let Ai be a monomial which is built over variables from the set {x24 , · · · , x31 }. Then, Ai · (1 ⊕ s0,23 ⊕ s16,23 ) = Gi (s0,23→31 , s16,23→31 ) where Gi is a non-linear equation of degree dGi . Lemma 5. The degree of Ai · (1 ⊕ s0,23 ⊕ s16,23 ) is at most 16. We can see that the number of variables which give effect on the degree dGi is at most 18, which is the set of variable {s0,23→31 , s16,23→31 }. However, not all the monomials are available. For example, a monomial s0,31 · s0,30 · · · s0,23 · s16,31 · · · s16,23 , which is of degree 18, cannot happen. By careful inspection, we see that the degree of monomials isat most 16. As shown in Appendix B, α0 = i Ai . If α0 is multiplied by (1⊕s0,23 ⊕s16,23 ), the monomials which include (s0,23 ⊕ s16,23 ) vanish. Lemma 6. The degree of α0 · (1 ⊕ s0,23 ⊕ s16,23 ) is at most 14. The degree of remaining monomials is expected to be not bigger than 16. However, computer simulation shows that the degree of monomials is not bigger than 14.
58
Joo Yeon Cho and Josef Pieprzyk
The Degree of Equation (2). If we multiply (1 ⊕ s0,23 ⊕ s16,23 ) by Equation (2), then we get α0 · (1 ⊕ s0,23 ⊕ s16,23 ) = (s0,0 ⊕ s16,0 ⊕ s1,0 ⊕ s6,0 ⊕ s13,0 ⊕ v0 ⊕ K0 ) · (1 ⊕ s0,23 ⊕ s16,23 )
(3)
Let’s consider the left part of the equation. The bit α0 plays a major role in determining the degree of the equation. We know that α0 · (1 ⊕ s0,23 ⊕ s16,23 ) has the monomials of maximum degree 14. The right part equation becomes quadratic by multiplication. Therefore, we can obtain a multivariate equation of degree 14. 5.3
Applying an Algebraic Attack
Let’s recall the recent algebraic attack introduced in [9]. Let Std denote the monomials of state variables of the degree up to d and Vtd denote the monomials of output variables of the degree up to d at clock t. Then, the final equation of degree 14 can be described as a following way. St14 ⊕ St Vt = 0
(4)
If we put all the monomials on the left side which do not include the output variables, Left(St ) = St14 St14 = St Vt or Right(St , Vt ) = St Vt We can see that Lt (S0 ) = St where S0 is the initial state of the state variables and L is a connection function which is linear over GF (2). If we collect N > 14 544 consecutive equations, a linear dependency γ = (γ0 , . . . , γN −1 ) for left i i side equations must exist and N −1
γt · Left(Lt (S0 )) = 0,
γi ∈ GF (2)
t=0
Let’s recover γ from the given sequence. (see [9]) We choose a non-zero random key S0 and compute 2T outputs bits ct of the left side equations. ct = Left(Lt (S0 )),
for t = 0, . . . , 2T − 1
where ct ∈ GF (2). Then we apply the well-known Berlekamp-Massey algorithm to find the smallest connection polynomial that generates the sequence c = (c0 , . . . , c2T −1 ). If we find γ successfully, the same linear dependency holds for the right hand side. So, N +i−1 γt−i · Right(Lt (S0 ), Vt ), i = 0, 1, . . . (5) 0= t=i
We can see that Equation (5) is linear. If we collect and solve these equations for consecutive keystreams, we can recover the initial state bits of the LFSR with small complexity.
Algebraic Attacks on SOBER-t32 and SOBER-t16 without Stuttering
5.4
59
The Complexity of the Algebraic Attack
If we denote T as the number of monomials of degree up to 14 that are chosen from n = 544 unknowns, then 14 544 ∼ 91 T = =2 i i=0 We see that recovering the linear dependency γ dominates the complexity of computation. It is estimated to take O(T log(T )+T n) by using improved versions of the Berlekamp-Massey algorithm. [9, 11] Therefore, our attack is estimated to take around O(2100 ) CPU clocks with around 292 keystream observations. For memory requirement, we need to store at most T bits of memory for the linear dependency γ. We need also some memory for Equation (5) but it is much smaller than T . Therefore, we expect that our attack needs around 291 bits of memory.
6
Algebraic Attack on SOBER-t16 without Stuttering
The structure of SOBER-t16 is a very similar to that of SOBER-t32. Major differences from SOBER-t32 are – operation based on 16-bit word – the linear recurrence equation – a S-box with 8-bit input and 16-bit output For detail description of SOBER-t16, see [16]. We can apply a very similar algebraic attack presented in Section 5 for the unstuttered SOBER-t16. Let’s look at Equation (2), which holds in SOBER-t16 as well. If we multiply Equation (2) by (1 ⊕ s0,7 ⊕ s16,7 ), then we get α0 · (1 ⊕ s0,7 ⊕ s16,7 ) = (s0,0 ⊕ s16,0 ⊕ s1,0 ⊕ s6,0 ⊕ s13,0 ⊕ v0 ⊕ K0 ) · (1 ⊕ s0,7 ⊕ s16,7 )
(6)
Let’s consider the left part of the equation. The bit α0 plays a major role in determining the degree of the equation. Computer simulation shows that α0 ·(1⊕ s0,7 ⊕ s16,7 ) has the monomials of maximum degree 14. The right part equation becomes quadratic by multiplication. Therefore, we can obtain a multivariate equation of degree 14. The remaining process for attack follows Section 5.3. The Complexity. If we denote T as the number of monomials of degree up to 14 that are chosen from n = 272 unknowns, then 14 272 ∼ 76.5 T = =2 i i=0 Therefore, our attack is estimated to take around O(285 ) CPU clocks with 278 keystream observations. For memory requirement, we expect that our attack needs around 276.5 bits of memory.
60
7
Joo Yeon Cho and Josef Pieprzyk
Conclusion
In this paper we present two algebraic attacks on SOBER-t32 without stuttering. For the first attack, we have built multivariate equations of degree 10. The carry at a specific bit position in addition modulo 2n is regarded as an internal memory state. From some subset of the keystream observation, we can derive sufficient multivariate equations which are utilized in the algebraic attack. By solving these equations, we are able to recover all the initial states of LFSR and constant value K with roughly 2196.5 CPU clocks and 269 keystream observations. Furthermore, fast algebraic attack with pre-computation allows more reduction of the complexity of our attack. For the second attack, we derive a multivariate equation of degree 14 over the non-linear filter. The equation is obtained by multiplying the initial equation by a carefully chosen polynomial. Then, a new algebraic attack method is presented to recover the initial state bits of the LFSR. The attack is estimated to take O(T log(T ) + T n) ∼ = O(2100 ) CPU clocks with 292 keystream observations. By the similarity of the structure, we can apply the second algebraic attack to SOBER-t16 without stuttering. Our attack takes around O(285 ) CPU clocks using 278 keystream observations for unstuttered SOBER-t16.
8
Acknowledgement
We are grateful to Greg Rose, Frederik Armknecht and unknown referees for their very helpful comments.
References [1] Cryptrec. http://www.ipa.go.jp/security/enc/CRYPTREC/index-e.html. 49 [2] Nessie : New european schemes for signatures, integrity, and encryption. https://www.cryptonessie.org. 49 [3] Nessie security report. Technical Report V2.0, Feb. 2003. 50 [4] S. Vanstone A. Menezes, P. Oorschot. Handbook of Applied Cryptography. CRC Press, fifth edition, October 1996. 49 [5] F. Armknecht. A linearization attack on the bluetooth key stream generator. Cryptology ePrint Archive, Report 2002/191, 2002. http://eprint.iacr.org/. 51 [6] F. Armknecht and M. Krause. Algebraic attacks on combiners with memory. In Advances in Cryptology - CRYPTO 2003, volume 2729 / 2003, pages 162 - 175. Springer-Verlag, October 2003. 51, 54, 55 [7] N. Courtois. Higher order correlation attacks, xl algorithm and cryptanalysis of toyocrypt. Cryptology ePrint Archive, Report 2002/087, 2002. http://eprint.iacr.org/. 50, 51 [8] N. Courtois. Algebraic attacks on combiners with memory and several outputs. Cryptology ePrint Archive, Report 2003/125, 2003. http://eprint.iacr.org/. 51, 54, 55 [9] N. Courtois. Fast algebraic attacks on stream ciphers with linear feedback. In Advances in Cryptology - CRYPTO 2003, volume LNCS 2729, pages 176 - 194. Springer-Verlag, October 2003. 51, 56, 58, 59
Algebraic Attacks on SOBER-t32 and SOBER-t16 without Stuttering
61
[10] N. Courtois and W.Meier. Algebraic attacks on stream ciphers with linear feedback. In E. Biham, editor, Advances in Cryptology - EUROCRPYT 2003, LNCS 2656, pages 345 - 359. Springer-Verlag, January 2003. 50, 51, 56 [11] J. Dornstetter. On the equivalence between belekamp’s and euclid’s algorithms. IEEE Trans. on Information Theory, IT-33(3):428-431, May 1987. 59 [12] P. Ekdahl and T.Johansson. Snow. Primitive submitted to NESSIE, Sep. 2000. 50 [13] P. Ekdahl and T.Johansson. Distinguishing attacks on sober-t16 and t32. In V. Rijmen J. Daemen, editor, Fast Software Encryption, volume LNCS 2365, pages 210-224. Springer-Verlag, 2002. 50 [14] P. Ekdahl and T.Johansson. Distinguishing attacks on sober-t16 and t32. In Proceedings of the Third NESSIE Workshop, 2002. 50 [15] J. Hastad and M. Naslund. Bmgl: Synchronous key-stream generator with provable security. Primitive submitted to NESSIE, Sep. 2000. 50 [16] P.Hawkes and G.Rose. Primitive specification and supporting documentation for sober-t16 submission to nessie. In Proceedings of the first NESSIE Workshop, Belgium, Sep. 2000. 59 [17] P.Hawkes and G.Rose. Primitive specification and supporting documentation for sober-t32 submission to nessie. In Proceedings of the first NESSIE Workshop, Belgium, Sep. 2000. 50, 52 [18] V. Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 13:354356, 1969. 51
A
A Toy Example to Build Equations by Reconstructing Block
This appendix illustrates a small example for building an low degree equation by reconstructing the non-linear block of SOBER-t32. Figure 3 represents the structure of new built block. The input and output operate on 4-bit. The most significant two bits of each input are added modulo 22 with carry and become the input of S-box. The S-box is 2 × 4 substitution defined by the following look-up table : {7,10,9,4}. Let’s denote as following – s0,H = {s0,3 , s0,2 } and s0,L = {s0,1 , s0,0 } – s16,H = {s16,3 , s16,2 } and s16,L = {s16,1 , s16,0 } – bH = {b3 , b2 } and bL = {b1 , b0 } At first, we construct a matrix containing the input of modular addition and the S-box output. – Rows are the possibilities for {carry, s0,3 , s0,2 , s16,3 , s16,2 } – Columns are all monomials of degree to 3 which are coming from {s0,3 , s0,2 , s16,3 , s16,2 , b1 , b0 }. Applying the Gaussian elimination, we can build at least 10 multivariate equations. The matrix below shows a part of the combination of monomials. Note
62
Joo Yeon Cho and Josef Pieprzyk s0,H
Input
s16,H
-
s0,L
s16,L
-
carry
? S-box bH
bL
? - k
? Output
c
?
Fig. 3. A reconstructed block for the modular addition and S-box
that the number is represented by hexadecimal format. One of the equations that are built by the Gaussian elimination is s0,2 s16,2 b1 ⊕ s0,2 s0,3 s16,2 ⊕ s0,2 s16,2 s16,3 = 0 This equation is verified in the table below. Let us denote the equation as We see that the Q is always zero for all possibilities of input value. ⎡ ⎤ F F F F F F F F 1 ⎢ s16,2 5 5 5 5 5 5 5 5 ⎥ ⎢ ⎥ ⎢ 3 3 3 3 3 3 3 3 ⎥ s 16,3 ⎢ ⎥ ⎢ s0,2 0 F 0 F 0 F 0 F ⎥ s0 s16 s0 s16 (s0 s16 )H b1 b0 ⎢ ⎥ ⎢ ⎥ s 0 0 F F 0 0 F F 0 0 0 0 3 0,3 ⎢ ⎥ ⎢ ⎥ A 5 A 5 5 A 5 A b 0 1 1 0 3 0 ⎢ ⎥ ⎢ ⎥ b C 9 3 6 9 3 6 C 0 2 2 0 3 1 ⎢ ⎥ .. .. ⎢ ⎥ .. .. .. .. .. ⎢ ⎥ . . . . . . . ⎢ ⎥ ⎢ s0,2 s16,2 b1 0 1 0 4 0 1 0 4 ⎥ 0 F F 3 0 ⎢ ⎥ ⎢ ⎥ .. .. 1 0 1 0 3 ⎢ ⎥ . . ⎢ ⎥ . . . . .. . . . . ⎢ s0,2 s0,3 s16,2 0 0 0 5 0 0 0 5 ⎥ . . . . . ⎢ ⎥ ⎢ ⎥ .. .. F F E 3 0 ⎢ ⎥ . . ⎢ ⎥ ⎢ s0,2 s16,2 s16,3 0 1 0 1 0 1 0 1 ⎥ ⎣ ⎦ .. .. . .
Q.
Q 0 0 0 .. . 0 0 .. . 0
Algebraic Attacks on SOBER-t32 and SOBER-t16 without Stuttering
B
63
Algebraic Equations of S-Box in SOBER-t32 and SOBER-t16
For the S-box of SOBER-t32, we denote the 8-bit input stream of S-box as (a31 , a30 , . . . , a24 ) and the 32-bit output stream as (b31 , b30 , . . . , b0 ). Each bi can be also represented by the combination of monomials which are composed of only the input stream. In particular, the least significant output bit, b0 is described as the combination of monomials of degree up to 6 as follows. b0 = 1 ⊕ a24 ⊕ a25 ⊕ a24 a25 ⊕ a26 ⊕ a25 a26 ⊕ a25 a27 ⊕ a24 a25 a27 ⊕ a26 a27 ⊕ a24 a26 a27 ⊕ a25 a26 a27 ⊕ a25 a28 ⊕ a24 a25 a28 ⊕ a25 a26 a28 ⊕ a27 a28 ⊕ a24 a27 a28 ⊕ a24 a25 a27 a28 ⊕ a26 a27 a28 ⊕ a25 a26 a27 a28 ⊕ a29 ⊕ a24 a29 ⊕ a24 a25 a29 ⊕ a26 a29 ⊕ a24 a25 a26 a29 ⊕ a27 a29 ⊕ a24 a25 a27 a29 ⊕ a24 a26 a27 a29 ⊕ a25 a26 a27 a29 ⊕ a24 a28 a29 ⊕ a25 a28 a29 ⊕ a24 a27 a28 a29 ⊕ a25 a27 a28 a29 ⊕ a24 a25 a27 a28 a29 ⊕ a25 a26 a27 a28 a29 ⊕ a24 a25 a26 a27 a28 a29 ⊕ a25 a26 a30 ⊕ a24 a25 a26 a30 ⊕ a24 a27 a30 ⊕ a25 a27 a30 ⊕ a26 a27 a30 ⊕ a24 a26 a27 a30 ⊕ a28 a30 ⊕ a24 a28 a30 ⊕ a26 a28 a30 ⊕ a24 a25 a26 a28 a30 ⊕ a27 a28 a30 ⊕ a24 a27 a28 a30 ⊕ a25 a26 a27 a28 a30 ⊕ a29 a30 ⊕ a24 a29 a30 ⊕ a25 a29 a30 ⊕ a24 a25 a26 a29 a30 ⊕ a27 a29 a30 ⊕ a24 a27 a29 a30 ⊕ a24 a25 a27 a29 a30 ⊕ a25 a26 a27 a29 a30 ⊕ a24 a28 a29 a30 ⊕ a25 a28 a29 a30 ⊕ a24 a25 a28 a29 a30 ⊕ a26 a28 a29 a30 ⊕ a25 a26 a28 a29 a30 ⊕ a27 a28 a29 a30 ⊕ a24 a27 a28 a29 a30 ⊕ a25 a27 a28 a29 a30 ⊕ a24 a25 a27 a28 a29 a30 ⊕ a26 a27 a28 a29 a30 ⊕ a24 a26 a27 a28 a29 a30 ⊕ a25 a26 a27 a28 a29 a30 ⊕ a31 ⊕ a26 a31 ⊕ a25 a26 a31 ⊕ a24 a25 a26 a31 ⊕ a24 a27 a31 ⊕ a25 a26 a27 a31 ⊕ a24 a25 a26 a27 a31 ⊕ a25 a28 a31 ⊕ a24 a25 a28 a31 ⊕ a26 a28 a31 ⊕ a24 a26 a28 a31 ⊕ a24 a25 a26 a28 a31 ⊕ a24 a27 a28 a31 ⊕ a25 a27 a28 a31 ⊕ a24 a25 a27 a28 a31 ⊕ a24 a25 a26 a27 a28 a31 ⊕ a25 a29 a31 ⊕ a24 a25 a29 a31 ⊕ a24 a26 a29 a31 ⊕ a25 a26 a29 a31 ⊕ a24 a25 a26 a29 a31 ⊕ a27 a29 a31 ⊕ a24 a27 a29 a31 ⊕ a25 a27 a29 a31 ⊕ a24 a25 a27 a29 a31 ⊕ a26 a27 a29 a31 ⊕ a25 a26 a27 a29 a31 ⊕ a28 a29 a31 ⊕ a24 a28 a29 a31 ⊕ a25 a28 a29 a31 ⊕ a26 a28 a29 a31 ⊕ a25 a27 a28 a29 a31 ⊕ a26 a27 a28 a29 a31 ⊕ a25 a26 a27 a28 a29 a31 ⊕ a30 a31 ⊕ a24 a30 a31 ⊕ a24 a25 a30 a31 ⊕ a26 a30 a31 ⊕ a24 a26 a30 a31 ⊕ a24 a25 a26 a30 a31 ⊕ a24 a27 a30 a31 ⊕ a25 a26 a27 a30 a31 ⊕ a24 a28 a30 a31 ⊕ a24 a25 a28 a30 a31 ⊕ a25 a26 a28 a30 a31 ⊕ a27 a28 a30 a31 ⊕ a25 a27 a28 a30 a31 ⊕ a24 a25 a27 a28 a30 a31 ⊕ a24 a26 a27 a28 a30 a31 ⊕ a29 a30 a31 ⊕ a25 a26 a29 a30 a31 ⊕ a27 a29 a30 a31 ⊕ a24 a27 a29 a30 a31 ⊕ a25 a27 a29 a30 a31 ⊕ a26 a27 a29 a30 a31 ⊕ a28 a29 a30 a31 ⊕ a24 a28 a29 a30 a31 ⊕ a24 a25 a28 a29 a30 a31 ⊕ a26 a28 a29 a30 a31 ⊕ a25 a27 a28 a29 a30 a31
For S-box of SOBER-t16, we denote the 8-bit input stream of S-box as (a15 , a14 , . . . , a8 ) and the 16-bit output stream as (b15 , b14 , . . . , b0 ). Then, the least significant output bit, b0 is described as the combination of monomials of degree up to 6 as follows. b0 = 1 ⊕ a8 ⊕ a9 ⊕ a8 a9 ⊕ a10 ⊕ a8 a10 ⊕ a8 a9 a10 ⊕ a11 ⊕ a9 a11 ⊕ a8 a9 a11 ⊕ a8 a10 a11 ⊕ a8 a9 a12 ⊕ a8 a10 a12 ⊕ a11 a12 ⊕ a9 a11 a12 ⊕ a8 a9 a11 a12 ⊕ a9 a10 a11 a12 ⊕ a9 a10 a13 ⊕ a11 a13 ⊕ a8 a11 a13 ⊕ a9 a11 a13 ⊕ a8 a9 a11 a13 ⊕ a10 a11 a13 ⊕ a9 a10 a11 a13 ⊕ a12 a13 ⊕ a9 a12 a13 ⊕ a8 a9 a12 a13 ⊕ a9 a10 a12 a13 ⊕ a11 a12 a13 ⊕ a8 a11 a12 a13 ⊕ a9 a11 a12 a13 ⊕ a10 a11 a12 a13 ⊕ a8 a10 a11 a12 a13 ⊕ a9 a10 a11 a12 a13 ⊕ a9 a14 ⊕ a10 a14 ⊕ a8 a10 a14 ⊕ a9 a10 a14 ⊕ a8 a9 a10 a14 ⊕ a11 a14 ⊕ a8 a11 a14 ⊕ a9 a11 a14 ⊕ a8 a9 a11 a14 ⊕ a9 a10 a11 a14 ⊕ a8 a9 a10 a11 a14 ⊕ a8 a12 a14 ⊕ a9 a12 a14 ⊕ a8 a9 a12 a14 ⊕ a10 a12 a14 ⊕
64
Joo Yeon Cho and Josef Pieprzyk
a8 a10 a12 a14 ⊕ a9 a10 a12 a14 ⊕ a8 a9 a10 a12 a14 ⊕ a9 a11 a12 a14 ⊕ a8 a10 a11 a12 a14 ⊕ a8 a9 a13 a14 ⊕ a10 a13 a14 ⊕ a8 a10 a13 a14 ⊕ a8 a9 a10 a13 a14 ⊕ a11 a13 a14 ⊕ a8 a9 a11 a13 a14 ⊕ a8 a10 a11 a13 a14 ⊕ a9 a10 a11 a13 a14 ⊕ a8 a9 a10 a11 a13 a14 ⊕ a12 a13 a14 ⊕ a8 a12 a13 a14 ⊕ a9 a12 a13 a14 ⊕ a8 a9 a12 a13 a14 ⊕ a10 a12 a13 a14 ⊕ a8 a10 a12 a13 a14 ⊕ a9 a10 a12 a13 a14 ⊕ a8 a9 a10 a12 a13 a14 ⊕ a9 a11 a12 a13 a14 ⊕ a8 a10 a11 a12 a13 a14 ⊕ a9 a10 a11 a12 a13 a14 ⊕ a8 a15 ⊕ a9 a15 ⊕ a8 a9 a10 a15 ⊕ a8 a10 a11 a15 ⊕ a8 a9 a10 a11 a15 ⊕ a8 a9 a12 a15 ⊕ a8 a10 a12 a15 ⊕ a9 a10 a12 a15 ⊕ a8 a9 a10 a12 a15 ⊕ a11 a12 a15 ⊕ a8 a11 a12 a15 ⊕ a8 a9 a11 a12 a15 ⊕ a8 a9 a10 a11 a12 a15 ⊕ a13 a15 ⊕ a8 a9 a13 a15 ⊕ a8 a10 a13 a15 ⊕ a11 a13 a15 ⊕ a10 a11 a13 a15 ⊕ a9 a10 a11 a13 a15 ⊕ a12 a13 a15 ⊕ a9 a12 a13 a15 ⊕ a8 a9 a12 a13 a15 ⊕ a10 a12 a13 a15 ⊕ a8 a10 a12 a13 a15 ⊕ a9 a10 a12 a13 a15 ⊕ a10 a11 a12 a13 a15 ⊕ a8 a10 a11 a12 a13 a15 ⊕ a9 a10 a11 a12 a13 a15 ⊕ a14 a15 ⊕ a8 a14 a15 ⊕ a8 a10 a14 a15 ⊕ a8 a9 a10 a14 a15 ⊕ a11 a14 a15 ⊕ a8 a10 a11 a14 a15 ⊕ a9 a10 a11 a14 a15 ⊕ a8 a9 a10 a11 a14 a15 ⊕ a12 a14 a15 ⊕ a9 a12 a14 a15 ⊕ a10 a12 a14 a15 ⊕ a8 a10 a12 a14 a15 ⊕ a8 a9 a10 a12 a14 a15 ⊕ a11 a12 a14 a15 ⊕ a9 a11 a12 a14 a15 ⊕ a10 a11 a12 a14 a15 ⊕ a8 a10 a11 a12 a14 a15 ⊕ a9 a10 a11 a12 a14 a15 ⊕ a13 a14 a15 ⊕ a8 a13 a14 a15 ⊕ a9 a10 a13 a14 a15 ⊕ a8 a9 a10 a13 a14 a15 ⊕ a11 a13 a14 a15 ⊕ a9 a11 a13 a14 a15 ⊕ a8 a10 a11 a13 a14 a15 ⊕ a9 a10 a11 a13 a14 a15 ⊕ a8 a12 a13 a14 a15 ⊕ a9 a12 a13 a14 a15 ⊕ a9 a10 a12 a13 a14 a15 ⊕ a9 a11 a12 a13 a14 a15
Improving Fast Algebraic Attacks Frederik Armknecht Theoretische Informatik Universit¨ at Mannheim, 68131 Mannheim, Germany
[email protected]
Abstract. An algebraic attack is a method for cryptanalysis which is based on finding and solving a system of nonlinear equations. Recently, algebraic attacks where found helpful in cryptanalysing LFSR-based stream ciphers. The efficiency of these attacks greatly depends on the degree of the nonlinear equations. At Crypto 2003, Courtois [8] proposed Fast Algebraic Attacks. His main idea is to decrease the degree of the equations using a precomputation algorithm. Unfortunately, the correctness of the precomputation step was neither proven, nor was it obvious. The three main results of this paper are the following: First, we prove that Courtois’ precomputation step is applicable for cryptographically reasonable LFSR-based stream ciphers. Second, we present an improved precomputation algorithm. Our new precomputation algorithm is parallelisable, in contrast to Courtois’ algorithm, and it is more efficient even when running sequentially. Third, we demonstrate the improved efficiency of our new algorithm by applying it to the key stream generator E0 from the Bluetooth standard. In this case, we get a theoretical speed-up by a factor of about 8, even without any parallelism. This improves the fastest attack known. Practical tests confirm the advantage of our new precomputation algorithm for the test cases considered. Keywords: Algebraic attacks, stream ciphers, linear feedback shift registers, Bluetooth
1
Introduction
Stream ciphers are designed for online encryption of secret plaintext bit streams M = (m1 , m2 , · · ·), mi ∈ 2 , which have to pass an insecure channel. Depending on a secret key K, the stream cipher produces a regularly clocked key stream Z = (z1 , z2 , . . .), zi ∈ 2 , and encrypts M by adding both streams termwise over 2 . The legal receiver who uses the same stream cipher and the same key K, decrypts the received message by applying the same procedure. Many popular stream ciphers are LFSR-based. They consist of some linear feedback shift registers (LFSRs) and an additional device, called the nonlinear
This work has been supported by grant 620307 of the DFG (German Research Foundation).
B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 65–82, 2004. c International Association for Cryptologic Research 2004
66
Frederik Armknecht
combiner. An LFSR produces a sequence over 2 depending on its initial state. They can be constructed very efficiently in hardware and can be chosen such that the produced sequence has a high period and good statistical properties. For these reasons, LFSR-based stream ciphers are widely used in cryptography. A lot of different nontrivial approaches to the cryptanalysis of LFSR-based stream ciphers were discussed in the literature, e.g. fast correlation attacks (Meier, Staffelbach [18], Chepyzhov, Smeets [5], Johansson, Joensson [12, 13]), backtracking attacks (Golic [10], Zenner, Krause, Lucks [24], Fluhrer, Lucks [9], Zenner [23]), time-space tradeoffs (Biryukov, Shamir [4]), BDD-based attacks (Krause [15]) etc. For such stream ciphers many corresponding design criteria (correlation immunity, large period and linear complexity, good local statistics etc.) were developed, e.g. Rueppel [20]. Recently, a new kind of attack was proposed: algebraic attack. For some ciphers, algebraic attacks outmatched all previously known attacks (e.g. Courtois, Meier [7], Armknecht, Krause [1], Courtois [6]). An algebraic attack consists of two steps: First find a system of equations in the bits of the secret key K and the output bits zt . If the combiner is memoryless, the methods of [7] can be used. A general method, which applies to combiners with memory too, was presented in [1]. If enough low degree equations and known key stream bits are given, the secret key K can be recovered by solving this system of equations in a second step. For this purpose, several methods (Linearization, XL, XSL, Groebner bases, . . .) exist. Most of them run faster if the degree of the equations is low. Hence, the search for systems of low degree equations is a desirable goal in algebraic attacks. Having this in mind, fast algebraic attacks were introduced by Courtois at Crypto 2003 [8], using a very clever idea. Before solving the system of equations, the degree of the system of equations is decreased in a precomputation step. The trick is to eliminate all high degree monomials independent of the key stream bits which has to be done only once. For this purpose, an algorithm A was proposed, using the well known Berlekamp-Massey algorithm [17, 19]. Unfortunately, the correctness of A was neither proven, nor is it obvious. Therefore, the applicability of fast algebraic attacks on LFSR-based stream ciphers in general was still an open question. In this paper, we finally give a positive answer to this question. We present a non-trivial proof that fast algebraic attacks work under rather weak assumptions which are satisfied by all LFSR-based stream ciphers we know. In particular, it is not necessary that the periods are pairwise co-prime which is not true in general (e.g., see the cipher used in the Bluetooth standard). To get the result, we prove some statements about minimal polynomials of linear recurring sequences. To the best of our knowledge, these statements have not been published elsewhere in open literature. Thus, our results may be of independent interest. In general, an attacker has knowledge of the whole cipher except the secret key. Hence, it seems to be logical to exploit these information to perform the precomputation step more directly, reducing the number of operations. This was
Improving Fast Algebraic Attacks
67
the motivation for developing a new algorithm B. Contrary to A, which runs strictly sequential, B can be performed partly in parallel. Thus, our new method improves the efficiency of fast algebraic attacks significantly. We demonstrate this by applying B to the E0 key stream generator used in the Bluetooth standard for wireless communication. A theoretical examination shows that our algorithm has a speed-up factor of about 8, even without any parallelism. This improves the fastest known attack against the E0 cipher. Practical tests on reduced versions of E0 confirm the advantage of B for the test cases considered. The paper is organized as follows: in Section 2 we describe fast algebraic attacks. In Section 3 we give the missing correctness proof for fast algebraic attacks. In Section 4 we improve fast algebraic attacks by introducing a new precomputation step. In Section 5 we demonstrate the higher efficiency of our new method in theory and practice. Section 6 concludes the paper.
2
Fast Algebraic Attacks
Let Z := (zt ) := (zt )∞ t=0 be the key stream produced by an LFSR-based key stream generator, using a secret key K ∈ {0, 1}n. K is defined as the initial states of the used LFSRs. At each clock t, the internal state of the cipher consists of Lt (K) and some additional memory bits1 , where L : {0, 1}n → {0, 1}n is a linear boolean function known to the attacker. An algebraic attack works as follows: The first step is to find a boolean function Fˆ = 0 such that for an integer δ ≥ 0 the equation Fˆ (Lt (K), . . . , Lt+δ (K), zt , . . . , zt+δ ) = 0
(1)
is true for all clocks t. If the cipher is memoryless, the methods described in [7] can be used. Unfortunately, these methods neither work with ciphers using memory, nor are they guaranteed to find equations with the lowest degree. Therefore, in general the method proposed in [1] is the better choice. It can be applied to every cipher and finds for certain the equations with the lowest degree. The second step is to use (1) to get a system of equations describing K in dependence of the observed key stream Z. For each row zt , . . . , zt+δ of output bits known to the attacker, replace these values in (1). The result is a valid equation in the bits of the secret key K. The third and final step is to recover the secret key K by solving this system of equations. One possibility to do so is the linearization method. Due to the linearity of L, all equations (1) have a degree ≤ degFˆ . Therefore, the number m of different monomials occurring is limited. If the attacker has enough known key stream bits at his disposal the number of linearly independent equations equals m. By substituting each monomial by a new variable, the attacker gets 1
Where zero memory bits are possible.
68
Frederik Armknecht
a linear system of equations in m unknowns which can be solved by Gaussian elimination or more refined methods like the one by Strassen [22]. In general, m will be about nd . Hence, the lower the degree d the more efficient the attack. Therefore, an attacker using an algebraic attack will always try to find a system of low degree equations. A very clever approach, called ”fast algebraic attack”, to decrease the degree of a given system of equations was presented in [8]. We will discuss this method now. Suppose that (1) can be rewritten as 0 = Fˆ (Lt (K), . . . , Lt+δ (K), zt , . . . , zt+δ ) = F (Lt (K), . . . , Lt+δ (K)) + G(Lt (K), . . . , Lt+δ (K), zt , . . . , zt+δ ) =: Ft (K) + Gt (K, Z)
(2)
where the degree e of G in K is lower than the degree d of Fˆ .2 Furtheron, we assume that the attacker knows coefficients c0 , . . . , cT −1 ∈ {0, 1} such that T −1
ci · Ft+i (K) = 0
∀t, K.
(3)
i=0
−1 Using (2) and (3), we get by Ti=0 ci · Gt+i (K, Z) = 0 an equation in K and Z with a lower degree e < d. Therefore, the attacker can reduce the system of equations given by (1) with degree d intoa new system of equations of degree e < d where all equations are of the type ci Gt+i . Note, that this improves the third step enormously, but requires more known key stream bits zt to perform step two. Of course, it is vital for the whole approach that such coefficients c0 , . . . , cT −1 can be found efficiently. In [8], the following algorithm A was proposed: ˆ and compute zˆt := Ft (K) ˆ for t = 1, . . . , 2T . 1. Chose a reasonable3 key K 2. Apply the Berlekamp-Massey algorithm to find c0 , . . . , cT −1 with T −1
ˆ =0 ci · Ft+i (K)
∀t.
(4)
i=0
It is known that the Berlekamp-Massey algorithm finds coefficients with the smallest value of T fulfilling (4). This needs about O(T 2 ) basic operations. Together with the first step, algorithm A has to perform about O(T 2 + 2T |K|) steps. In general, the exact value of T is unknown but an upper bound is the maximum number of different monomials occurring. The result of algorithm A is correct if (4) implies (3) which has not been proven in [8]. The only correctness argument indicated there was based on the 2
3
For example, this assumption is true for the three ciphers E0 , Toyocrypt and LILI128. ˆ can be any value in {0, 1}n . But if one of the LFSRs is initialised with the In [8], K ˆ has all-zero state, the algorithm returns a wrong result in most cases. Therefore, K to be chosen such that the initial states are all non-zero.
Improving Fast Algebraic Attacks
69
assumption that the sequences produced by the LFSRs have pairwise co-prime periods. But this is not true in general. A counter-example is the key stream generator E0 used in the Bluetooth standard for wireless communication: the two periods 233 − 1 and 239 − 1 share the common factor 7. On the other hand algorithm A does not work correctly in general without any preconditions. An example is given in appendix D. This raises the question which preconditions are necessary for the correctness of algorithm A and if they are fulfilled by E0 . One of the major achievements from this paper is to show the correctness of algorithm A under weaker assumptions. As we are not aware of any LFSRbased stream cipher for which these conditions are not true (including E0 ), we assume that fast algebraic attacks as described in [8] can be mounted against most LFSR-based stream ciphers discussed in public.
3
Proof of Correctness
In this section, we prove the correctness of algorithm A under cryptographically reasonable assumptions. First, we repeat some known facts about linear recurring sequences. Theorem 1. (Lidl, Niederreiter [16]) A sequence Z = (zt ) over 2 is called a linear recurring sequence if coefficients c0 , . . . , cT −1 ∈ {0, 1} (not alli zero) exist such that ci zt+i = 0 is true for all values t ≥ 1. In this case, ci x ∈ 2 [x] is called a characteristic polynomial of the sequence Z. Amongst all characteristic polynomials of Z exists one unique polynomial min(Z) which has the lowest degree. We will call it the minimal polynomial of Z. A polynomial f (x) ∈ 2 [x] is a characteristic polynomial of Z if and only if min(Z) divides f (x). From now on, a sequence Z will always stand for a linear recurring sequence. Furtheron, we will denote by 2 the algebraic closure of the field 2 .4 By the roots of f (x) 2 [x], we will always mean the roots in 2 . .
.
Definition 1. Let R1 , . . . , Rκ ⊆ 2 be pairwise disjunct, R := R1 ∪ . . . ∪ Rκ . We say that a pair of vectors (α1 , . . . , αn ) ∈ Rn , (β1 , . . . , βm ) ∈ Rm factorizes uniquely over R1 , . . . , Rκ if the following holds α1 · . . . · αn = β1 · . . . · βm ⇒ αi · (βj )−1 = 1, 1 ≤ l ≤ κ αi ∈Rl
k
with {i1 , . . . , ik } ⊆ {1, . . . , n} −−→ and α = (α1 , . . . , αn ) ∈ R , we define the vector µ(α) := (αi1 , . . . , αik ) ∈ Rk . For a monomial µ =
j=1 n
xij ∈
βj ∈Rl
2 [x1 , . . . , xn ]
Example 1. Set R1 := {α, αβ} and R2 := {β} with β = 1. The pair of vectors (α, β) and (αβ) does not factorize uniquely over R1 , R2 because of α · β = αβ but α · (αβ)−1 = β −1 = 1. 4
I.e., 2 is the smallest field such that has at least one root in 2 .
2
⊂
2
and each polynomial f (x) ∈
2 [x]
70
Frederik Armknecht
The motivation for this definition is that we need in our main theorem that certain products of roots of minimal polynomials are unique in the sense above (see appendix A. The main theorem of our paper is: (1)
(κ)
Theorem 2. Let Z1 = (zt ), . . . , Zκ = (zt ) be sequences with pairwise coprime minimal polynomials which have only non-zero roots. Let Ri denote the set of roots of min(Zi ) in 2 , F : n2 → 2 be an arbitrary boolean function and I := (i1 , . . . , in ) ∈ {1, . . . , κ}n and δ := (δ1 , . . . , δκ ) ∈κ be two vectors. We set R := Ri1 × . . . × Rin and divide F = µi into a sum of monomials. Furtheron, for arbitrary d := (d1 , . . . , dκ ) ∈ κ the sequences Z := (zt ) (d) and Z (d) := (zt ) are defined by (i )
(i )
(d)
1 n , . . . , zt+δ ), zt zt := F (zt+δ 1 n
(i )
(i )
1 n := F (zt+δ , . . . , zt+δ ) 1 +di n +din 1
−−−→ −−−−→ If all pairs of vectors µi (α), µj (α ) with α, α ∈ R factorize uniquely over R1 , . . . , Rκ , then min(Z) = min(Z (d) ). What is the connection to algorithm A? From the theory of LFSRs, it can be easily argued that the sequence Zˆ = (ˆ zt ) from algorithm A is a linear recurring sequence and that cˆ0 , . . . , cˆT −1 correspond to its minimal polynomial m. Zˆ is produced in the way sequence Z is described in theorem 2 which assures that the minimal polynomial m remains unchanged if we shift each of the sequences produced by the LFSRs individually. As in the general the produced sequences have maximal period, the minimal polynomial found by algorithm A is the same for each possible key K. In appendix B we show that the conditions of theorem 2 are satisfied for a large class of LFSR-based ciphers automatically, independent of F ,I and δ. Before we can prove theorem 2, we need some statements about minimal polynomials. Theorem 3. ([16], Theorem n 6.21) Let Z = (zt ) be a sequence with characteristic polynomial f (x) = i=1 (x − αi ) where the roots lie in 2 . If the roots α1 , . . . , αn are all distinct, i.e. each root has multiplicity one, then for each t, zt can be expressed in the following way: zt =
n
Ai αti
i=1
where A1 , . . . , At ∈ quence Z.
2
are uniquely determined by the initial values of the se-
n Theorem 4. Let Z = (zt ) be a sequence with zt = i=1 Ai αti with pairwise distinct elements αi ∈ 2 and non-zero coefficients Ai . Let m(x) ∈ 2 [x] be the polynomial with the lowest degree such that m(αi ) = 0 for 1 ≤ i ≤ n. Then m(x) is the minimal polynomial min(Z). In particular, each root of min(Z) has multiplicity one.
Improving Fast Algebraic Attacks
71
Proof. We show that f (x) ∈ 2 [x] is a characteristic polynomial of Z if and only if f (αi ) = 0 for all i. Thus, m(x) is the characteristic polynomial with the r lowest degree what is the definition of min(Z). Let f (x) = k=0 ck xk . Then for each t, we have n r r n n r t+k k ck zt+k = ck Ai αi ck αi αti = (Ai f (αi )) αti = Ai k=0
k=0
i=1
i=1
i=1
k=0
For 1 ≤ i ≤ n and 0 ≤ t ≤ n − 1, let M := (αti ) be a Vandermonde-matrix of size n×n. As the elements αi are pairwise distinct, M is regular. Thus, the expression above equals to zero for each t if and only if (A1 f (α1 ), . . . , An f (αn )) ∈ {0, 1}n is an element of the kernel of M , i.e. Ai f (αi ) = 0. As the coefficients Ai were assumed to be non-zero, this is equivalent to f (αi ) = 0 for all i. 2 Proof of theorem 2. By theorem 4 all roots in Ri have multiplicity one. Therefore, (i) by theorem 3, each sequence Zi can be expressed by zt = α∈Ri Aα αt with unique coefficients Aα . For each i it holds
(i)
zt+δi =
Aα αδi αt
Aα αt+δi =
α∈Ri
α∈Ri
and therefore zt = F (
(Aα αδi1 )αt , . . . ,
α∈Ri1
(Aα αδin )αt )
α∈Rin
We set P := {µi (α)|α ∈ R, 1 ≤ i ≤ l}. The sequences Z and Z (d) can be expressed by (d) t zt = Aπ π t , zt = A(d) π π π∈P
π∈P (d)
with unique coefficients Aπ and Aπ . We show that Aπ is non-zero if and only (d)
if Aπ is non-zero. Then the equality of min(Z) and min(Z (d) ) follows by theorem 4. (d) A We express the coefficients A. π and Aπ in dependence of the coefficients α.
d1 d · For π = α1 ·. . .·αm ∈ P, αi ∈ Ri , we define by π the product αi ∈R1 αi
dκ − → − → ...· . As all pairs of vectors π1 , π2 factorize uniquely over R1 , . . . , αi ∈Rκ αi Rκ , this expression is independent of the factorization α1 · . . . · αm . Analogously, di d for α = (α1 , . . . , αn ) ∈ R we set αd := (α1 1 , . . . , αnin ). With these definitions, we get zt = µi (Aα ) π t π∈P µi
α∈R µi (α)=π
=Aπ
72
Frederik Armknecht (d)
where Aα = (Aαi1 , . . . , Aαin ). Therefore, the coefficients Aπ can be expressed by A(d) π =
µi
µi (Aα αd ) =
α∈R µi (α)=π
µi
µi (Aα )µi (αd ) =
α∈R µi (α)=π
µi
µi (Aα )π d
α∈R µi (α)=π
= Aπ · π d As the roots of mi (x) are all non-zero by assumption, it is π d = 0. There(d) fore, Aπ = 0 iff Aπ = 0. 2 The proof shows why a precondition is necessary for the correctness of the pre-computation step. Otherwise, it could happen that for some π it is Aπ = 0 (d) but Aπ = 0 (or vice versa). In this cases, the corresponding minimal polynomials could be different (see appendix D).
4
Improvements
In this section, we show how algorithm A can be improved. Let min(F ) denote the (unique) minimal polynomial found by algorithm A. Until now, we made no use of the knowledge of the minimal polynomials m1 (x), . . . , mκ (x). The idea is to compute min(F ) and/or the parameter T more or less directly from the known minimal polynomials and F . For this purpose, we cite some statements about minimal polynomials. n Definition m 2. Consider two co-prime polynomials f (x) = i=1 (x − αi ) and g(x) = j=1 (x − βj ) with no multiple roots. Then we define f (x) ⊗ g(x) :=
(x − αi βj ),
f (x) ⊗ f (x) :=
i,j
(x − αi αj ) · f (x).
1≤i<j≤n (1)
(κ)
Theorem 5. ([16], Th. 6.57 + 6.67) Let Z1 = (zt ), . . . , Zκ = (zt ) be sequences with pairwise co-prime min(Zi ). Then min(Z1 + . . . + Zκ ) = min(Z1 ) · . . . · min(Zκ ) min(Zi · Zj ) = min(Zi ) ⊗ min(Zj ), (1)
where Z1 + . . . + Zκ := (zt
(κ)
∀i = j (i)
(j)
+ . . . + zt ) and Zi · Zj := (zt · zt ).
Theorem 6. (Key [14], Th. 1) Let Z = (zt ) be a sequence and l := deg(min(Z)). If d is an integer with 1 ≤ d < l, then the sequence (zt · zt+d ) has the minimal polynomial min(Z) ⊗ min(Z) of degree l(l+1) 2 . Before we proceed further, we will take a closer look on the complexities of the operators · and ⊗ . Theorem 7. (Schoenhage [21]) Let two polynomials f (x) resp. g(x) of 2 [x] be given of degrees ≤ m. Then, the product f (x) · g(x) can be computed with an effort of O(m log m log log m).
Improving Fast Algebraic Attacks
73
Theorem 8. (Bostan, Flajolet, Salvy, Schost [3], Theorem 1) Let f (x) resp. g(x) be two co-prime polynomials of degree n resp. m with no multiple roots. Then the polynomial f (x) ⊗ g(x) can be computed directly within O(nm log2 (nm/2) log log(nm/2) + nm log(nm) log log(nm))
=:T (nm)
operations in
2
without knowing the roots of f (x) or g(x).
This implies a kind of divide-and-conquer approach for computing the minimal polynomial from F . The trick is to split the function F into two or more functions F1 , . . . , Fl such that the corresponding minimal polynomials min(Fi ) are pairwise co-prime. Then by theorem 5 it holds min(F ) = min(F1 ) · . . . · min(Fl ). In some cases, such a partition can be hard to find or may not even exist. If the minimal polynomials min(Fi ) are not pairwise co-prime the product p(x) := min(F1 ) · . . . · min(Fl ) is a characteristic polynomial of each possible ˆ i.e. the coefficients of p(x) fulfill equation (3) also. Therefore, using sequence Z, p(x) makes a fast algebraic attack possible though it may require more known key stream bits than really necessary. We compare the effort of this approach to that of algorithm A. For simplicity we assume l = 2, i.e. min(F ) = min(F1 ) · min(F2 ). Let T1 := deg(min(F1 )) ≤ deg(min(F2 )) =: T2 . Then deg(min(F )) = T1 + T2 . As said before, algorithm A needs about O(T12 + T22 + 2(T1 + T2 )|K| + 2T1 T2 )) basic operations. Instead of using algorithm A, we can do the following: First compute min(F1 ) and min(F2 ). In general, this can be done with algorithm A or in some cases by using the ⊗-product (see details later). If we use algorithm A, the complexity of these operations are O(T12 +2T1) resp. O(T22 +2T2 ). Notice that both operations can be performed in parallel. Having computed min(F1 ) and min(F2 ), the second and final step consists of computing the product min(F1 ) · min(F2 ) = min(F ). By theorem 7, this has an effort of O(T2 log T2 log log T2 ) which implies an overall effort of O(T12 + T22 + 2(T1 + T2 )|K| + T2 log T2 log log T2 ) In general, it is log T2 log log T2 2T1 . Thus, our new approach has a lower runtime than algorithm A. The advantage increases if F can be divided in more than two parts. In some cases, the precomputation step can be improved even a bit further. Assume that at least one of the Fi mentioned above can be written as a product Fi = G1 · G2 such that min(G1 ) and min(G2 ) are co-prime. This is for example almost always the case when Fi is a monomial. Then, by theorem 5 it holds min(Fi ) = min(G1 ) ⊗ min(G2 ) which implies a similar strategy. Let again be T1 := deg(min(G1 )) ≤ deg(min(G2 )) =: T2 . Using algorithm A would need about O(T12 T22 + 2T1 T2 |K|)
74
Frederik Armknecht
operations. Instead, we can we compute min(G1 ) and min(G2 ) with algorithm A in a first step. This takes O(T12 + T22 + 2(T1 + T2 )|K|) operations. If min(G1 ) and/or min(G2 ) are already known, than this step can be omitted. This is for example the case if Fi is the product of the output of two or several distinct LFSRs (see the E0 example in appendix C). In the second step, we use the algorithm described in [3] to compute min(G1 ) ⊗ min(G2 ). The effort is O(T (T1 T2 )) which is in O(T1 T2 log2 (T1 T2 ) log log(T1 T2 )). Altogether, this approach needs O(T1 T2 log2 (T1 T2 ) log log(T1 T2 ) + T12 + T22 + 2(T1 + T2 )|K|) operations. This shows the improvement. If we perform the operations of the first step in parallel, the time needed to get the result can be decreased further. In the following, we summarize our approaches by proposing the following new algorithm B:
Algorithm B Given: Pairwise co-prime primitive polynomials m1 (x), . . . , mk (x), an arbitrary boolean function F as described in section 2 and a partition F = F1 + . . . + Fl such that the minimal polynomials min(Fi ) are pairwise co-prime Task: Find min(F ) Algorithm: – Compute the minimal polynomials min(Fi ) by using algorithm A. This can be done in parallel. If Fi = G1 · G2 with co-prime min(G1 ) and min(G2 ) (e.g., Fi is a monomial), then the ⊗-product algorithm described in [3] can be used. – Compute min(F ) = min(F1 ) · . . . · min(Fl ) using the algorithm in [21].
5
Application to the E0 Key Stream Generator
In this section, we demonstrate the efficiency of algorithm B on the E0 key stream generator which is part of the Bluetooth standard for wireless communication [2]. It uses four different LFSRs with pairwise co-prime primitive minimal polynomials m1 , m2 , m3 , m4 of degrees T1 = 25, T2 = 31, T3 = 33 and T4 = 39. The sequences produced by the LFSRs have the periods 2T1 − 1, . . . , 2T4 − 1. As the values 2T3 − 1 and 2T4 − 1 share the common factor 7, the assumption made in [8] is not satisfied. In [1], a boolean function Fˆ was developed fulfilling equation (1). Fˆ can be divided as shown in (2) into F + G with deg(F ) = 4 > 3 = deg(G). The function F is (i) (j) (k) (l) (1) (2) (3) (4) zt zt zt+1 zt+1 + zt zt zt zt F = 1≤i<j≤4 1≤k
Improving Fast Algebraic Attacks
75
Table 1. Fastest previous attacks against E0 Attack [1] [8] new (sequential) new (parallel)
Data Memory Pre-computation Attack Complexity 224 248 268 224 237 246 249 24 37 43 2 2 2 249 24 37 42 2 2 2 249
(i)
where Zi = (zt ) is the sequence produced by LFSR i. Let Ri be the set of roots of mi . Inspired by the definition in theorem 2, we define P := {αi αj αk αl | αs ∈ Rs , 1 ≤ i < j ≤ 4, 1 ≤ k < l ≤ 4} ∪{α1 α2 α3 α4 | αi ∈ Ri }. We have checked with Maple that for all pairs π.1 , π2 ∈. P the pair of vectors − →1 and − →2 factorize uniquely over the union R1 ∪ R2 ∪ R3 ∪. R4 . Hence, the π π weaker assumptions from theorem 2 are fulfilled here and a unique minimal polynomial min(F ) exists. Furtheron, it can be showed that F can be written as F = F1 + . . . + F11 such that the minimal polynomials min(Fi ) are pairwise co-prime (see appendix C). By theorem 5, min(F ) can be expressed by min(F ) =
11
min(Fi )
(5)
i=1
The degree of min(F ) and thus the parameter T is upper bounded by 1≤i<j≤4
Ti (Ti + 1)Tj (Tj + 1) + 4
1≤i<j
Ti Tj Tk
Ti + Tj + Tk − 1 + T1 T2 T3 T4 2
In [2], the degrees T1 , T2 , T3 , T4 are defined as 25, 31, 33, 39 respectively. Thus, the degree of min(F ) is ≤ 8.822.188 ≈ 223.07 . The computation of min(F ) using algorithm A would need most about 246.15 basic operations. Equation (5) implies the usage of algorithm B. In the first step we apply algorithm A to compute the minimal polynomials min(Fi ), i = 1, . . . , 11. This takes an overall number of basic operations of ≈ 243.37 . Exploiting parallelism, we have only to wait the time needed to perform ≈ 241.91 basic operations.5 The second step is the computation of the product of these minimal polynomials. Here, this takes altogether about ≈ 228.25 basic operations. Performed in sequential, algorithm B needs about 243.37 basic operations which is almost 8 times faster than algorithm A. If we exploit the parallelism mentioned above, the number of basic operations we have to wait is about 241.91 which is more than 16 times faster than in algorithm A. This improves the fastest known attack against the E0 cipher. Table 1 sums up the fastest previous attacks known: 5
Of course, the number of operations remains unchanged.
76
Frederik Armknecht Table 2. Comparison of algorithm A and B on reduced versions of E0 C1 C2 110 1100 101 1001
C3 10100 10010
C4 1000100 1000001
Algorithm A Algorithm B A/B 10 h 41 m 43 s 12 m 3 s 53.32 11 h 2 m 49 s 12 m 7 s 54.75 10 h 50 m 0 s 11 m 59 s 54.30 10 h 52 m 59 s 11 m 55 s 54.86 10 h 53 m 31 s 11 m 58 s 54.65 110 1100 10100 10100010100 78 h 30 m 16 s 1 h 43 m 25 s 45.55 101 10100 1100000 11100010000 18 d 18 h 26 m 0 s 13 h 50 m 7 s 32.56
An ad-hoc implementation in Maple (without using parallelism and the algorithm of [3]), applied to reduced versions of E0 with shorter LFSRs, confirmed the improved efficiency of our new algorithm. The results can be found in table 2. In the first four columns, the coefficients of the four minimal polynomials are given. The next two columns show the time consumptions of algorithm A and B respectively which are compared in the last column. In all cases, our new algorithm B was significantly faster than algorithm A, even without using parallelism. The speed-up factor was much higher than predicted theoretically and depended on the chosen minimal polynomials and the initial states. In all cases, the degree of min(F ) was equal to the upper bound estimated on page 75. Hence, we expect that the upper bound is tight for the real E0 key stream generator also.
6
Conclusion
In this paper, we discussed the fast algebraic attacks introduced by Courtois at Crypto 2003. Using a very clever idea, ”traditional” algebraic attacks can be improved significantly in many cases by performing an efficient precomputation step. For this purpose, an algorithm A was proposed. Unfortunately, neither a correctness proof was given, nor was the correctness obvious. In this paper, we gave the missing proof based on a cryptographically reasonable assumptions. To do so, it was necessary to prove some non-trivial statements about minimal polynomials of linear recurring sequences. To the best of our knowledge, these have not been published elsewhere in open literature. In addition, we showed that the knowledge of the minimal polynomials of the LFSRs used in the cipher can help to improve fast algebraic attacks. For this reason, we developed an algorithm B which is based on a deeper understanding of minimal polynomials of linear recurring sequences. Finally, we demonstrated the higher efficiency of algorithm B by applying it to the E0 key stream generator used in the Bluetooth standard for wireless communication. In this case, theoretical analysis yielded that algorithm B runs almost 8 times faster than algorithm A. This improves the fastest known attack
Improving Fast Algebraic Attacks
77
against E0 . An ad-hoc implementation applied to reduced versions of E0 confirmed the advantage of our new algorithm for the test cases considered, even without using parallelism.
Acknowledgement The author would like to thank Erik Zenner, Stefan Lucks, Matthias Krause and some unknown referees for helpful comments and discussions.
References [1] Frederik Armknecht, Matthias Krause: Algebraic attacks on Combiners with Memory, Proceedings of Crypto 2003, LNCS 2729, pp. 162-176, Springer, 2003. 66, 67, 74, 75 [2] Bluetooth SIG, Specification of the Bluetooth system, Version 1.1, February 22, 2001. Available at http://www.bluetooth.com/. 74, 75 [3] Alin Bostan, Philippe Flajolet, Bruno Salvy, Eric Schost: Fast Computation With Two Algebraic Numbers, submitted, 2003. 73, 74, 76 [4] Alex Biryukov, Adi Shamir: Cryptanalytic Time/Memory/Data tradeoffs for Stream Ciphers, Proceedings of Asiacrypt 2000, LNCS 1976, pp. 1-13, Springer, 2000. 66 [5] Vladimor V. Chepyzhov, Ben Smeets: On A Fast Correlation Attack on Certain Stream Ciphers, Proceedings of Eurocrypt 1991, LNCS 547 pp. 176-185, Springer, 1991. 66 [6] Nicolas Courtois: Higher Order Correlation Attacks, XL Algorithm and Cryptanalysis of Toyocrypt, ICISC 2002, LNCS 2587. An updated version (2002) is available at http://eprint.iacr.org/2002/087/. 66 [7] Nicolas Courtois, Willi Meier: Algebraic attacks on Stream Ciphers with Linear Feedback, Eurocrypt 2003, Warsaw, Poland, LNCS 2656, pp. 345-359, Springer, 2003. An extended version is available at http://www.minrnak.org/toyolili.pdf. 66, 67 [8] Nicolas Courtois: Fast Algebraic Attacks on Stream Ciphers with Linear Feedback, Proceedings of Crypto ’03, LNCS 2729, pp. 177-194, Springer, 2003. 65, 66, 68, 69, 74, 75 [9] Scott R. Fluhrer, Stefan Lucks: Analysis of the E0 Encryption System, Proceedings of Selected Areas of Cryptography ’01, LNCS 2259, pp. 38-48, Springer, 2001. 66 [10] Jovan Dj. Golic: Cryptanalysis of Alleged A5 Stream Cipher, Proceedings of Eurocrypt 1997, LNCS 1233, pp. 239-255, Springer, 1997. 66 [11] Rainer Goettfert, Harald Niederreiter: On the Linear Complexity of Products of Shift-Register Sequences, Proceedings of Eurocrypt ’93, pp. 151-158, LNCS 765, Springer, 1994. [12] Thomas Johansson, Fredrik Joensson: Fast Correlation Attacks Based on Turbo Code Techniques, Proceedings of Crypto 1999, LNCS 1666, pp. 181-197, Springer, 1999. 66 [13] Thomas Johansson, Fredrik Joensson: Improved Fast Correlation Attacks on Stream Ciphers via Convolutional Codes, Proceedings of Eurocrypt 1999, pp. 347362, Springer, 1999. 66
78
Frederik Armknecht
[14] Edwin L. Key: An Analysis of the Structure and Complexity of Nonlinear Binary Sequence Generators, IEEE Transactions on Information Theory, Vol. IT-22, No. 6, November 1976. 72 [15] Matthias Krause: BDD-Based Cryptanalysis of Key stream Generators; Proceedings of Eurocrypt ’02, pp. 222-237, LNCS 2332, Springer, 2002. 66 [16] Rudolf Lidl, Harald Niederreiter: Introduction to finite fields an their applications, Cambridge University Press, 1994. 69, 70, 72, 82 [17] J. L. Massey: Shift-register synthesis and BCH decoding, IEEE Trans. Information Theory, IT-15 (1969), pp. 122-127, 1969. 66 [18] Willi Meier, Othmar Staffelbach: Fast Correlation Attacks on certain Stream Ciphers, Journal of Cryptology, pp. 159-176, 1989. 66 [19] Alfred J. Menezes, Paul C. Oorschot, Scott A. Vanstone: Handbook of Applied Cryptography, Chapter 6, CRC Press. 66 [20] Rainer A. Rueppel: Stream Ciphers; Contemporary Cryptology: The Science of Information Integrity. G. Simmons ed., IEEE Press New York, 1991. 66 [21] A. Schoenhage: Schnelle Multiplikation von Polynomen ueber Koerpern der Charakteristik 2, Acta Informatica 7 (1977), pp. 395-398, 1977. 72, 74 [22] Volker Strassen: Gaussian Elimination is Not Optimal; Numerische Mathematik, vol 13, pp 354-356, 1969. 68 [23] Erik Zenner: On the Efficiency of the Clock Control Guessing Attack, Proceedings of ICISC 2002, LNCS 2587, Springer, 2002. 66 [24] Erik Zenner, Matthias Krause, Stefan Lucks: Improved Cryptanalysis of the SelfShrinking Generator ACISP 2001, LNCS 2119, Springer, 2001. 66
A
Motivation
In this section, we give a motivation for the somewhat technical definition 1. For this purpose we discuss some kind of ”abstract example”. Let A = (at ) resp. B = by the co-prime minimal polynomials ma (x) = (bt ) be two sequences produced (x − αi ) and mb (x) = (x − βi ). Then by theorem 3, at resp. bt can be expressed by at = cαi αti , bt = cβi βit i
i
Consequently, the shifted sequences can be expressed by a at+da = cαi αt+d = cαi αdi a αti =: c˜αi αti i i
bt+db =
i
i
cβi βit+db
=
i
i
cβi βidb βit
=:
c˜βi βit
i
Furtheron, we define the sequences = i,j cαi cαj αti αtj + i cβi βit = π cπ π t z t = at · at + b t z˜t = at+da · at+da + bt+db = π c˜π π t where π ∈ P := {αi · αj } ∪ {βi }. If we assume that all roots αi and βj are non-zero6 this holds for the elements in P too. The goal is to find a necessary 6
This is for example true if ma (x) and mb (x) are irreducible and have a degree > 1.
Improving Fast Algebraic Attacks
79
precondition which guarantees that min(zt )= min(˜ zt ) (regardless of the values of da and db ). By theorem 4 we know that such a precondition is cπ = 0 ⇔ c˜π = 0
(6)
How can we be sure that this is true in our case? We show what can go wrong on an example. Let π ∈ P fixed. We distinguish now between two different cases: 1. π has the following two different representations in P : π = α1 · α2 = α3 2. π has the following two different representations in P : π = α1 · α2 = α3 · β1 with β1 = 1 In the first case, we can express cπ and c˜π by cπ = (cα1 · cα2 + cα3 ) c˜π = (˜ cα1 · c˜α2 + c˜α3 ) = (cα1 · αd1a · cα2 · αd2a + cα3 · αd3a ) = (cα1 · cα2 · (α1 · α2 )da + cα3 · (α3 )da ) =π
=π
= (cα1 · cα2 + cα3 ) · π da = cπ π d a As we said before π ∈ P is non-zero. Hence, it is cπ = 0 ⇔ c˜π = 0 and ˜ is the same for condition 6 is fulfilled. Therefore, we can be sure that min(Z) all choices of da and db . Now let us have a look at the second case. W.l.o.g., we assume da < db . Then cπ = (cα1 · cα2 + cα3 · cβ1 ) cα1 · c˜α2 + c˜α3 · c˜β1 ) c˜π = (˜ = (cα1 · cα2 · π da + cα3 · cβ1 · π da · β1db −da ) = (cα1 · cα2 + cα3 · cβ1 · β1db −da ) · π da In this case (depending on cα1 , cα2 , cα3 , cβ1 , β1 , da and db ) it could happen that cπ = 0 but c˜π = 0 (or vice versa). The motivation for definition 1 was to avoid cases like case 2. To match the case considered here on definition 1, we set R1 := {αi |i = . . .} (the roots . of ma (x)), R2 := {βj |j = . . .} and R = R1 ∪ R2 . Let V = (v1 , . . . , vn ) ∈ Rn and W := (w1 , . . . , wm ) ∈ Rm . Definition 1 was that V and W factorize uniquely over R1 , R2 if vi · wj−1 = 1 and vi · wj−1 = 1 v1 · . . . · vn = w1 · . . . · wm ⇒ vi ∈R1
wj ∈R1
vi ∈R2
wj ∈R2
Now we check if the definitions are fulfilled for case 1 and 2 for the representations of π:
80
Frederik Armknecht
Case 1: It is V = (v1 , v2 ) = (α1 , α2 ) ∈ R1 × R1 and W = (w1 ) = (α3 ) ∈ R1 . As all of them are elements of R1 we only have to check v1 · v2 · w1−1 = π · π −1 = 1 Case 2: It is V = (v1 , v2 ) = (α1 , α2 ) ∈ R1 × R1 and W = (w1 , w2 ) = (α3 , β1 ) ∈ R1 × R2 . We have to check if v1 · v2 · w1−1 = 1 and w2−1 = 1. But this is not true as by assumption w2 = β1 = 1. In the proof of theorem 2 it is shown that cases like the second one can be → avoided if we require that the vectors − π with π ∈ P do all factorize uniquely over R1 , R2 .
B
On the Practical Relevance of Theorem 2
In this section we show that theorem 2 applies to a large class of LFSR-based ciphers automatically. Let m1 (x), . . . , mκ (x) ∈ 2 [x] be the primitive minimal polynomials of the used LFSRs such that the roots are all pairwise distinct and non-zero. For the following classes of ciphers, the assumptions of theorem 2 are always satisfied: 1. The cipher is a filter generator, i.e. κ = 1. 2. The degrees of the minimal polynomials are pairwise co-prime. Set R := R1 ∪ . . . ∪ Rκ . We define by P := {α1 ·. . .·αn |αi ∈ R, n ∈ } the set of all possible multiple products of elements in R. We show now that in both cases, → − → all pairs of vectors − α , β with α, β ∈ P factorize uniquely over R1 , . . . , Rκ . As P is a superset for all possible sets {µ(α)| . . .} this proves that the conditions of → − → theorem 2 are satisfied. Let from now on − α = (α1 , . . . , αn ) and β = (β1 , . . . , βm ) denote two vectors such that α1 · . . . · αn and β1 · . . . · βm are elements in P. Let us start with the first case. Here, we have αi · βj−1 = 1 ⇔ αi · βj−1 = 1 α1 · . . . · αn = β1 · . . . · βm ⇔ .
.
αi ∈R1
βj ∈R1
This concludes the first case. For the second case we remember the fact that 2n ⊆ 2m iff n divides m. In particular, 2n ∩ 2m = 2c with c := gcd(n, m). We denote by Ti the degree of the minimal polynomial mi (x). Then the elements α, α−1 ,α ∈ Ri , and all multiple products are elements of 2Ti . Let l be arbitrary with 1 ≤ l ≤ κ and set Sl := T1 · . . . · Tl−1 · Tl+1 · . . . · Tκ . Then α1 · . . . · αn = β1 · . . . · βm implies γl := αi βj−1 = αi βj−1 αi ∈Rl
βj ∈Rl
∈
2Tl
αi ∈Rl
βj ∈Rl
∈
2Sl
Therefore, γl ∈ 2Tl ∩ 2Sl . By assumption the values Ti are pairwise co-prime. Hence, it is gcd(Tl , Sl ) = 1 and γl ∈ 21 = 2 . As the roots are all non-zero, γl equals to 1 for each choice of l. This concludes the second case.
Improving Fast Algebraic Attacks
C
81
The E0 Key Stream Generator
In this section, we show that theorem 2 and algorithm B are applicable to the E0 cipher together with the following boolean function: (i) (j) (k) (l) (1) (2) (3) (4) zt zt zt+1 zt+1 + zt zt zt zt (7) F = 1≤i<j≤4 1≤k
Let from now on denote i, j, k, l integers from the set {1, 2, 3, 4}. We define the following three sets of indices I2 := {(i, j) | i < j} I3 := {(i, j, k) | i < j < k} I4 := {(i, j; k, l) | i < j, k < l, {i, j} ∪ {k, l} = {1, 2, 3, 4}} Then, F can be rewritten as (i) (i) (j) (j) zt zt+1 zt zt+1 F =
(i,j)∈I 2
(8)
=:F(i,j)
(k) (k) (j) (j) (i) (i) + fij · zt zt+1 + fik · zt zt+1 + fjk · zt zt+1
(i,j,k)∈I3
=:F(i,j,k)
+
(9)
(i) (j)
(k) (l)
(1) (2) (3) (4)
zt zt+1 zt zt+1 + zt zt zt zt
(i,j;k,l)∈I4
(10)
F˜ (i) (j)
(j) (i)
where fij = zt zt+1 + zt zt+1 . Let Ri be the set of roots of mi , the minimal polynomial of the ith LFSR, and Ri2 := {αα |α, α ∈ Ri , α = α }. We define the sets : R(i,j) := {αi αj | (i, j) ∈ I2 , αs ∈ Rs ∪ Rs2 } R(i,j,k) := {αi αj αk | (i, j, k) ∈ I3 , αs ∈ Rs ∪ Rs2 } ˜ := {αi αj αk αl | (i, j; k, l) ∈ I4 , αs ∈ Rs } R The first thing we observe is that the sets defined above are all pairwise disjunct. Furthermore, it is easy to see that the roots of min(F(i,j) ) are a subset of R(i,j) and so on. Thus, the minimal polynomials are pairwise co-prime and we can write by theorem 5 min(F ) as the product of 11 different minimal polynomials: min(F(i,j) ) · min(F(i,j,k) ) · min(F˜ ) min(F ) = (i,j)∈I2
(i,j,k)∈I3
Actually, even more can be said. Using theorem 5 and the fact that the polynomials mi are pairwise co-prime, we have min(F(i,j) ) = (mi ⊗ mi ) ⊗ (mj ⊗ mj ) of degree Ti (Ti + 1)/2 · Tj (Tj + 1)/2. Before we can estimate min(F(i,j,k) ) and min(F˜ ), we need the following theorem:
82
Frederik Armknecht
Theorem 9. ([16], Th. 6.55) Given sequences Z1 , . . . , Zκ , the minimal polynomial min(Z1 + . . . + Zκ ) divides lcm(min(Z1 ), . . . , min(Zκ )). First, we consider min(F(i,j,k) ). By theorem 5, it can be easy seen that (i) (j)
(j) (i)
min(zt zt+1 ) = min(zt zt+1 ) = mi ⊗ mj , (k) (k)
and by theorem 6, that min(zt zt+1 ) = mk ⊗ mk . By theorem 9, the minimal polynomial min(F(i,j,k) ) divides the least common multiple of the polynomials (mi ⊗ mi ) ⊗ (mj ⊗ mk ), (mj ⊗ mj ) ⊗ (mi ⊗ mk ) and (mk ⊗ mk ) ⊗ (mi ⊗ mj ). Notice that all three polynomials share the common factor mi ⊗ mj ⊗ mk .7 Thus, the degree of min(F(i,j,k) ) is upper bounded by Ti (Ti − 1) Tj (Tj − 1) Tk (Tk − 1) Tj Tk + Ti Tk + Ti Tj 2 2 2 Ti + Tj + Tk − 1 = Ti Tj TK 2 The minimal polynomial min(F˜ ) can be found exactly. We observe that Ti Tj Tk +
(i) (j)
(k) (l)
(1) (2) (3) (4)
min(zt zt+1 zt zt+1 ) = min(zt zt zt zt ) = m1 ⊗ m2 ⊗ m3 ⊗ m4 =: m Thus, min(F˜ ) divides m. As m1 , . . . , m4 are irreducible, this holds for m too. Therefore, min(F˜ ) is equal to 1 or equal to m. But the first case would imply that the expression in (10) is the all-zero sequence which is obviously wrong. Ergo, min(F˜ ) is equal to m of degree T1 T2 T3 T4 . Summing up, for the degree T of min(F ) it holds T ≤
Ti (Ti + 1)Tj (Tj + 1) + 4
(i,j)∈I2
D
(i,j,k)∈I3
Ti Tj Tk
Ti + Tj + Tk − 1 + T1 T2 T3 T4 2
Why Preconditions are Necessary
In this section, we show that algorithm A (and B) do not work properly without some preconditions. To do so we give an example. We consider the case of two LFSRs La and Lb with the primitive minimal polynomials ma (x) = 1 + x + x2 and mb (x) = 1 + x + x4 . Notice that the polynomials are co-prime but the corresponding periods are not. Let at resp. bt denote the outputs of LFSR A and B and define the sequence Z := (zt ) by zt := at bt + at + bt + at at+1 + bt bt+1 . Let Ka = (a1 , a2 ) be the initial state of LFSR La and Kb = (b1 , b2 , b3 , b4 ) be the initial state of LFSR Lb . For the correctness of the pre-computation step, min(Z) should be the same for all non-zero choices of Ka and Kb . This is not the case. For Ka = (1, 0) and Kb = (1, 1, 1, 0) it is min(Z) = 1+x2 +x3 +x6 +x7 +x9 . But for Ka = (0, 1) and Kb = (1, 1, 1, 1) it is min(Z) = 1 + x15 . 7
The reason is that f divides f ⊗ f and that (f · g) ⊗ h = (f ⊗ h) · (g ⊗ h).
Resistance of S-Boxes against Algebraic Attacks Jung Hee Cheon1 and Dong Hoon Lee2 1
Department of Mathematics, Seoul National University
[email protected] 2 National Security Research Institute (NSRI)
[email protected]
Abstract. We develop several tools to derive linear independent multivariate equations from algebraic S-boxes. By applying them to maximally nonlinear power functions with the inverse exponents, Gold exponents, or Kasami exponents, we estimate their resistance against algebraic attacks. As a result, we show that S-boxes with Gold exponents have very weak resistance and S-boxes with Kasami exponents have slightly better resistance against algebraic attacks than those with the inverse exponents. Keywords: Algebraic Attack, S-boxes, Boolean Functions, Nonlinearity, Differential Uniformity
1
Introduction
Recently, Courtois and Pieprzyk proposed an algebraic attack for block ciphers [4]. Their attack on AES [11] exploits algebraic properties of S-boxes: If we can obtain many equations of small number of monomials from S-boxes, a block cipher with the S-boxes can be represented by many equations of small number of variables. By solving these multivariate equations by so called the XSL algorithm, we may find the key of the block cipher. In the AES case, they introduce another viewpoint of the S-box as a quadratic equation xy = 1 in x and y rather than as a higher degree equation y = 1/x in x, and obtain additional quadratic equations by multiplying appropriate monomials. More precisely, they obtain 23 quadratic equations with a total of 81 distinct terms from the S-box of AES and show that the equations are linearly independent by simulation. In this paper, we give a theoretical approach to obtain linearly independent multivariate equations from algebraic S-boxes. Multivariate equations are said to be linearly independent if they are linearly independent when every distinct monomial is considered as a new variable. We develop three tools to prove linear independence. The first tool is that if a vector Boolean function is nonlinear, their component functions should be linearly independent as multivariate equations. k k k+1 We apply this to n × n S-boxes x2 +1 and n × 2n S-boxes (x2 +1 , x2 +1 ) over F2n which are known to be nonlinear when gcd(n, 2k) = 1 and |k − n/2| > 1, respectively [5]. The second one is that if for a vector Boolean function F (x, y) : B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 83–94, 2004. c International Association for Cryptologic Research 2004
84
Jung Hee Cheon and Dong Hoon Lee
F2n × F2n → F2m and g : F2n → F2n , F (x, g(x)) has m linearly independent component functions, so has F (x, y). The third one is that linear independence of multivariate functions is invariant under affine transformation of inputs and linear transformation of outputs. By applying these tools, we can prove that 5n equations obtained from the inverse function xy = 1 in F2n (or its affine transformation) are linearly independent for any positive integer n. Further we apply them to estimate the resistance of power functions with well-known Gold exponents and Kasami exponents against algebraic attacks [7, 8]. Those S-boxes are the only power functions which are known to be maximally nonlinear (MN) and almost perfect nonlinear (APN) [6]. Note that ‘MN’ and ‘APN’ imply the best resistance against linear cryptanalysis and differential cryptanalysis, respectively [1, 2, 9]. Our analysis shows that the S-boxes with Gold exponents have very weak resistance and the S-boxes with Kasami exponents have better resistance against algebraic attacks while all of them have similar resistance against differential and linear cryptanalysis. It would be an interesting problem to apply algebraic attacks to the ciphers using Gold power functions as S-boxes such as MISTY [10] which is selected as standard block algorithms in NESSIE [12]. In Section 2, we introduce some preliminaries on nonlinearity, APN, and resistance against algebraic attacks. In Section 3, we propose some auxiliary lemmas used to show the linear independence of multivariate equations. In Section 4, we deal with the resistance of the above three families of S-boxes and compare them. We conclude in Section 5.
2
Preliminaries
In this section, we introduce the definitions of nonlinearity, APN, and resistance against algebraic attacks, and remind some useful results for algebraic S-boxes. Definition 1. A function F : F2n → F2n is called a almost perfect nonlinear (APN) if each equation F (x + a) − F (x) = b
for a ∈ F∗2n , b ∈ F2n
has at most two solutions x ∈ F2n . Note that APN functions have the best resistance against differential cryptanalysis. When n is odd, we have many classes of APN power functions. But when n is even, we have only two classes of APN power functions, that is, Gold exponents and Kasami exponents [7, 8, 6]. The Hamming distance between two Boolean functions f : F2n → F2 and g : F2n → F2 is the weight of f + g. The minimal distance between f and any affine function from F2n into F2 is the nonlinearity of f . Given a vector Boolean function F = (f1 , . . . , fm ) : F2n → F2m , b · F denotes the Boolean function b1 f1 + b2 f2 + · · · + bm fm for each b = (b1 , b2 , · · · , bm ) ∈ F2m . Then the nonlinearity of F is defined as minimal nonlinearity of component functions as follows:
Resistance of S-Boxes against Algebraic Attacks
85
Definition 2. The nonlinearity of F , N (F ), is defined as N (F ) = min N (b · F ) = ∗ b∈F2m
min wt(b · F + φ)
b=0,φ∈A
where A is the set of all affine functions over F2n . n−1
It is known that N (F ) ≤ 2n−1 − 2 2 . If n is odd, N (F ) can be maximal, we call such functions maximally nonlinear (MN) functions. For even n, it is an open question to determine the maximal value. It is known that if n is odd and F is maximally nonlinear then F is almost perfect nonlinear [6]. Now we define the resistance against algebraic attacks as in [4]. Definition 3. Given r equations of t monomials in Fn2 , we define Γ = ((t − r)/n)(t−r)/n as the resistance of algebraic attacks (RAA). This quantity was introduced by Courtois and Pieprzyk [4]. They showed that the S-box of AES and the S-boxes of Serpent have Γ ≈ 222.9 and Γ ≈ 28.0 , respectively. They claimed it can be a serious weakness of these ciphers and Γ should be greater than 232 for secure ciphers. Note that this measure is not an exact measure of XSL algorithm and an improvement of algorithm on solving multivariate equations may result in different measures. However, it is true that this quantity reflects a difficulty of solving multivariate equations in some sense. Thus we will use this quantity to measure the resistance of algebraic attacks in this paper.
3
Auxiliary Lemmas
Definition 4. Given Boolean functions f1 , . . . , fm from Fn2 to F2 , they are said to be linearly independent over F2 if they are linearly independent as multivariate n polynomials, or equivalently if m i=1 ai fi (x) = 0 for all x ∈ F2 with a1 , . . . , am ∈ F2 implies a1 = · · · = am = 0. Lemma 1. Consider two vector Boolean functions F (x, y) : F2n × F2n → F2m and g : F2n → F2n . If F (x, g(x)) has m linearly independent component functions, so does F (x, y) in F2 [x1 , . . . , xn , y1 , . . . , yn ]. Proof. Suppose that F (x, y) = (f1 (x, y), . . . , fm (x, y)) has m linearly dependent component functions, i.e. there are not-all-zero a1 , . . . , am ∈ F2 such m m that i=1 ai fi (x, y) = 0. Then we have i=1 ai fi (x, g(x)) = 0, which implies that fi (x, g(x))’s are linearly dependent. It contradicts that F (x, g(x)) has m linearly independent components. Therefore F (x, y) should have m linearly independent component functions. Lemma 2. Any permutation F : F2n → F2n has n linearly independent component functions.
86
Jung Hee Cheon and Dong Hoon Lee
Proof. Suppose that there exist not-all-zero a1 , . . . , an ∈ F2 such that n a = (f1 , . . . , fn ). Then the image of F is a subset of i=1 i fi (x) = 0 for F n the hyperplane given by i=1 ai fi (x) = 0. Since the hyperplane has dimension less than n, F can not be a permutation. Therefore if F is a permutation, its n component functions should be linearly independent. Lemma 3. Consider a vector Boolean function F : F2n → F2m . If the nonlinearity of F is non-zero, F has m linearly independent component functions. Proof. m Suppose that there exist not-all-zero a1 , . . . , am ∈ F2 such that i=1 ai fi (x) = 0 for F = (f1 , . . . , fm ). If we take b = (a1 , . . . , am ), we can see that b · F is a zero function and so has zero nonlinearity. Thus the nonlinearity of F , the minimum of nonlinearity of the component functions, is also zero. Therefore any nonlinear function should have m linearly independent component functions. For the nonlinearity of S-boxes, we have the following results [5]: k
N (x2
+1
) ≥ 2n−1 − 2
N (x , x , · · · , x 3
5
2k+1
)≥2
n+gcd(n,2k) −1 2
n−1
,
(1)
−k·2 .
(2)
n 2
By applying these results to Lemma 3, we obtain the following corollary: Corollary 1. Let k be a positive integer. k (1) If n does not divide 2k, x2 +1 has n linearly independent component functions. (2) If k ≤ 2n/2−1 , F = (x3 , x5 , · · · , x2k+1 ) : F2n → F2kn have kn linearly independent component functions. 3.1
Invariants under Transformations
Now we show that linear independence is invariant under invertible transformations of inputs and invertible linear transformations of outputs. m Lemma 4. Let T : Fn2 → Fn2 be an invertible transformation and S : Fm 2 → F2 n an invertible linear transformation. A vector Boolean function F : F2 → Fm 2 has m linearly independent component functions over F2 if and only if so does S ◦ F ◦ T.
Proof. Since we consider invertible transformations T and S, we are enough to show that F has m linearly independent component functions when either F ◦ T or S ◦ F does. n , am ∈ F2 satisfies mLet F (x) = (f1 (x), . . . , fmn(x)) for x ∈ F2 . Assume a1 , . . . m a f (x) = 0 for all x ∈ F . Since T is invertible, we have i i 2 i=1 i=1 ai fi (T y) = 0 n for all y ∈ F2 . Since F ◦ T has m linearly independent component functions, we have a1 = · · · = am = 0, which implies the independence of m component functions of F .
Resistance of S-Boxes against Algebraic Attacks
87
−1 m ), we have fi = mIf we let S = (pij ) for pij ’s ∈ F2 and S ◦ F = (g1 , . . . , g m p g . If there are not-all-zero a , . . . , a ∈ F satisfying 1 m 2 j=1 ij j i=1 ai fi (x) = 0, we have m m m m { ai pij gj (x)} = { ai pij }gj (x) = 0. (3) i=1 j=1
j=1 i=1
m Since g1 , . . . , gm are linearly independent, i=1 ai pij = 0 for all j. We can see a1 = · · · = am = 0 from the invertibility of S −1 = (pij ). Hence m component functions of F should be linearly independent. Remark that if S is an affine transformation, Lemma 4 does not hold. For example, F : F22 → F32 : (x1 , x2 ) → (x1 + 1, x2 + 1, x1 + x2 + 1) has 3 linearly independent components, but after the affine transformation S : F32 → F32 : (x, y, z) → (x + 1, y + 1, z + 1) is taken to F , S ◦ F = (x1 , x2 , x1 + x2 ) is not linearly independent anymore. However, if we consider a constant term as one of variables, we can have this invariant property. That is, if 1, f1 , . . . , fm are linearly independent, S(f1 , . . . , fm ) are linearly independent. Also if all of fi ’s do not have constant terms, independence property is preserved under an affine transformation S.
4
Independent Equations
From now on, we consider a polynomial over a finite field. If we fix a basis, this polynomial can be regarded as multivariate equations. Unless confused, we will consider a polynomial as multivariate equations without specifying a basis. Because equations of higher degree than two do not help in the point of algebraic attacks to S-box, our purpose is to get linearly independent equations whose degree are at most two as many as possible. When we are given m quadratic equations from F (x) = 0, we can consider the following methods to get more quadratic equations: 1. Multiplication by linear or quadratic equations. 2. Composition with quadratic equations. Note that composition of a monomial with affine equations gives only dependent equations and composition with equations of higher degree usually gives equations of higher degree. The first case is restricted by the following lemma. Lemma 5. Suppose that n > 2 and k ≥ 1. Assume that the Hamming weight k of d is at most 2. The product xm of two monomials x2 +1 and xd is linear or quadratic only in the following cases: 4 if k = 1, (Linear) 1. If d = 1, then m = k 2 +2 if k = 1. (Quadratic) k k+1 2. If d = 2 , then m = 1 + 2 . (Quadratic)
88
Jung Hee Cheon and Dong Hoon Lee
23 if k = 2, (Linear) 2k + 22 if k = 2. (Quadratic) k k+1 4. If d = 2 + 1, then m = 2 + 2. (Quadratic) 5. If d = 2k+1 + 2k , then m = 2k+2 + 1. (Quadratic)
3. If d = 3, then m =
Proof. It is sufficient to check the Hamming weight of m = 2k +1+d mod (2n − n 1), since x2 −1 = 1. Assume that w(d) = 1, i.e d = 2l for some l < n. Then m becomes 1 + 2k + 2l < (2n − 1). Unless two of {0, k, l} are equal, xm is cubic. This covers first two cases of the lemma. Assume that w(d) = 2, i.e d = 2l + 2s for some l < s < n. Then m becomes 1 + 2k + 2l + 2s < (2n − 1). If all of {0, k, l, s} are distinct, then xm is quartic. Hence at least two of them are equal, especially l = 0 or l = k since 0 < k. If l = 0 then s should be 1 or k (Case 3 and 4). If l = k then s should be k + 1 (Case 5). This completes the proof. 4.1
Inverse Exponents
First we count the number of linearly independent equations from xy − 1 = 0. A composition of xy − 1 = 0 with any quadratic equation gives a equation of degree larger than two. In order to get another quadratic equations, we must multiply linear or quadratic equations: 1. 2. 3. 4. 5.
The original equation: F (x, y) = xy − 1 Multiplied by x: G0 (x, y) = x2 y − x Multiplied by y: H0 (x, y) = xy 2 − y Multiplied by x3 : G1 (x, y) = x4 y − x3 Multiplied by y 3 : H1 (x, y) = xy 4 − y 3
First, we must show that each of equations has n linearly independent component functions. Using Lemma 1 and Lemma 2, we can easily see that F (x, y) has n linearly independent component functions since F (x, y) = xy − 1 is permutation for any nonzero y. Each component of G0 and H0 has a unique variable xi and yi respectively, hence they are linearly independent. Both G1 and H1 have n linearly independent components by Lemma 1, Lemma 4, and Corollary 1 using the following equations: n
G1 (x, ax2 n
H1 (ay 2
−2
−2
) = (a − 1)x3
, y) = (a − 1)y 3
since any non-zero (a − 1) is an invertible linear transformation. In order to show that all components produced by the above polynomials are linearly independent, it is better to look at the matrix form. Each row corresponds to the equations from G = (G0 , G1 ), H = (H0 , H1 ), and F . ⎛ ⎞ ⎛ ⎞ xi xj M1 0 M2 0 ⎜ ⎟ ⎝ 0 M3 M4 0 ⎠ ⎜ yi yj ⎟ = 0, ⎝ xi yj ⎠ 0 0 M5 M6 1
Resistance of S-Boxes against Algebraic Attacks
89
Table 1. The type and the number of distinct monomials Eq.
Type
#
F
x i yj , 1
n2 + 1
G0
x i yj , x i
n2 + n
H0
x i yj , yi
n2 + n
G1 xi yj , xi xj , xi
3n(n+1) 2
H 1 x i yj , yi yj , yi
3n(n+1) 2
where each Mi represents a nonzero matrix and each monomial in the column vector represents all monomials of similar forms (For example, xi xj represents all xi xj for 1 ≤ i, j ≤ n.). It is sufficient to show that the rank of the coefficient matrix is 5n. If we consider the coefficient matrix as a 3 × 3 block matrix, we can see that the rank is the sum of the ranks of M1 , M3 , and (M5 M6 ). Since F has n linearly independent components, we know that the rank of (M5 M6 ) is n. Lemma 6. Each of the ranks of M1 and M3 is 2n. Proof. We refine the monomials xi xj for 1 ≤ i, j ≤ n as xi and xi xk for 1 ≤ i < k ≤ n. Then M1 (xi xj ) is expressed as the following:
A B xi M1 (xi xj ) = . C D xi xk Since (A B) represents the term x in G0 , A is the identity matrix of size n and B = 0. Since (C D) represents the term −x3 in G1 , we can write −x3 = C(xi ) + D(xi xk ). Since C(xi ) is a linear function over Fn2 , the nonlinearity of D(xi xk ) is equal to that of x3 . Therefore D(xi xk ) has n linearly independent components by Lemma 3, hence the rank of D is n. This implies that the rank of M1 is 2n. We can show that the rank of M3 is also 2n by the similar argument. Now we are ready to measure the resistance of S-boxes with inverse exponents by Γ value. The type and the number of distinct monomials in the equations from F , G, and H is as the following table. From Table 1, we have the following theorem. Theorem 1. Consider xy = 1 in F2n . Let t be the number of monomials and r the number of linearly independent equations. Then we can have the following parameters (r, t, Γ ) for xy = 1:
90
Jung Hee Cheon and Dong Hoon Lee
2
1.
n, n + 1,
n2 −n+1 n
2n, n2 + n + 1,
2.
3n, n + 2n + 1,
3. 4.
4n,
5.
4.2
n2 −n+1 n
2
(3n+2)(n+1) , 2
5n, 2n2 + n + 1,
n2 −n+1 n
n2 −n+1 n
for F n2 −n+1 n
n2 −n+1 n
3n2 −3n+2 2n
for F and {G0 or H0 } for F , G0 , and H0
3n2 −3n+2 2n
2n2 −4n+1 n
2n2 −4n+1 n
for F , G0 , H0 and {G1 or H1 } for all 5 polynomials
Gold Exponents
When gcd(k, n) = 1, 2k +1 is called a Gold exponent [7]. Note that any quadratic monomial can be changed into a monomial with a Gold exponent by an affine transformation. By multiplying monomials, we obtain k
1. The original equation: F1 (x, y) = x2 +1 − y k 2. Multiplied by linear equations: F2 (x, y) = x2 +2 − xy and F3 (x, y) = k+1 k x2 +1 − x2 y 3. Multiplied by xd1 y d2 : F4 (x, y) = x4 y − xy 2 only for k = 1 4. Composition with xd : F5 (x, y) = x9 − y 3 only for k = 1. k
Since the original equation consists of x2 +1 and y, we should multiply monomials of type xd or xd1 y d2 . In the first case, xd should be linear so that we have k d = 1 or d = 2k by Lemma 5. In the second case, x2 +1+d1 , y d2 , xd1 , and y 1+d2 should be linear so that (d1 , d2 ) = (1, 1). For composition case, if d is 2s , the product produces only dependent equations on the original equations. Thus the Hamming weight of d should be two. Then m = (2k + 1)(1 + 2l ) = 1 + 2l + 2k + 2k+l . Only when l = k = 1, xm can be quadratic. F1 has n independent component functions since each component contains k k k distinct yi . We can see that F2 (x, ax2 +1 ) = (1 − a)x2 +2 and F3 (x, ax2 +1 ) = k+1 k k (1 − a)x2 +1 . When k = 1, F2 (x, ax2 +1 ) = (1 − a)x4 and F3 (x, ax2 +1 ) = (1 − a)x5 are permutations unless n = 2, 4. Also each of F4 (x, ax3 ) = (1 − a)x7 and F5 (x, a1/3 x3 ) = (1 − a)x9 has n linearly independent components if gcd(n, 3) = 1 and n = 2, 4 respectively. Thus F4 and F5 have by Lemma 1. We show that all components produced by the above equations are linearly independent by the matrix argument similar to the inverse exponents case. At first, assume that k = 1. Each row corresponds to the equations from F1 , F2 , F3 , F4
Resistance of S-Boxes against Algebraic Attacks
and F5 .
⎛
M1 ⎜ M4 ⎜ ⎜ M6 ⎜ ⎝ 0 M10
M2 0 0 0 M11
M3 0 M7 0 M12
0 0 0 0 M13
91
⎞ ⎞⎛ xi 0 ⎜ ⎟ M5 ⎟ ⎟ ⎜ yi ⎟ ⎜ xi xk ⎟ = 0. M8 ⎟ ⎟ ⎟⎜ M 9 ⎠ ⎝ yi yk ⎠ 0 xi yj
Each of M2 , M4 , M7 , M9 , and M13 represents −y, x4 , x5 , x4 y − xy 2 , and y 3 , respectively. Since all of them has n linearly independent component functions, each of the matrices has rank n. Further, if we consider the coefficient matrix by a 5 × 5 block matrix, we can easily convert it to a upper triangular matrix with diagonal M2 , M4 , M7 , M9 , and M13 by elementary row operations. Thus it has rank 5n and all components of the equations are linearly independent. Next, assume that k > 1. Each row corresponds to the equations from F1 , F2 , and F3 . ⎛ ⎞ ⎛ ⎞ xi M1 M2 M3 0 ⎜ ⎟ ⎝ M4 0 M5 M6 ⎠ ⎜ yi ⎟ = 0. ⎝ xi xk ⎠ M7 0 M8 M9 xi yj Since M2 represents −y, it is invertible. Thus we are enough to show that all components of F2 and F3 are linearly independent. Let F (x, y) = (F2 (x, y), k k k+1 F3 (x, y)). We have F (x, ax2 +1 ) = ((1 − a)x2 +2 , (1 − a)x2 +1 ). By Corolk−1 k+1 n−k+1 n−k−1 +1 +1 , x2 ) are nonlinear lary 1 we can see (x2 +1 , x2 +1 ) and (x2 if k < n/2 − 1 and k > n/2 + 1, respectively. Note that both of them are k affine transformations of F (x, ax2 +1 ). Thus unless |k − n/2| ≤ 1, F (x, y) has 2n linearly independent component functions. k
Theorem 2. Consider y = x2 +1 with gcd(k, n) = 1 in F2n . Let t be the number of monomials and r the number of linearly independent equations. Then we can have the following parameters (r, t, Γ ): (1) If k = 1, we can obtain 5 linearly independent polynomials. Thus we get the followings:
n+1 n+1 2 for F1 1. n, n(n+3) , 2 2
3n−5 n(3n+1) 3n−5 2 2. 3n, , for F2 , F3 , and F4 if n = 2, 4 and gcd(n, 3) = 1 2 2
3n−5 3n(n+1) 3n−5 2 for F1 , F2 , F3 , and F4 if n = 2, 4 and 3. 4n, , 2 2 gcd(n, 3) = 1
2n−4 for all polynomials if n = 2, 4 and 4. 5n, n(2n + 1), (2n − 4) gcd(n, 3) = 1. (2) Otherwise, we can obtain 3 linearly independent polynomials. Thus we get the followings:
n+1 2 for F1 1. n, n(n+3) , n+1 2 2
3n−3 2 for F1 , F2 , and F3 if |k − n/2| ≤ 1. 2. 3n, 3n(n+1) , 3n−3 2 2
92
Jung Hee Cheon and Dong Hoon Lee Table 2. Exponent
Inverse
Gold
Comparison of RAA for Almost Perfect Nonlinear Functions
n−1
2
(k = 1)
Gold (k > 1)
2
|k − n/2| > 1 Kasami
4.3
RAA Γ
Alg. Deg. # of # of Monomials Eqns
k+1
3n
n2 + 2n + 1
5n
2n2 + n + 1
n
n(n+3) 2
3n
n(3n+1) 2
4n
3n(n+1) 2
5n
n(2n + 1)
n
n(n+3) 2
3n
3n(n+1) 2
n
n2 + n
n2 −n+1 n
When n = 8
n2 −n+1 n
2n2 −4n+1 n
2n2 −4+1 n
n+1 n+1 2
2
3n−5 3n−5 2
2
3n−5 3n−5 2
2
(2n − 4)2n−4 n+1 n+1 2
2
3n−3 3n−3 2
2
nn
Γ = 222.7 Γ = 246.8 Γ = 210.8 Γ = 232.5 Γ = 232.5 Γ = 243.0 Γ = 210.8 Γ = 237.3 Γ = 224
Kasami Exponents
When gcd(n, k) = 1 and k > 1, 22k − 2k + 1 is called a Kasami exponent [8]. A Kasami exponent has the Hamming weight k +1, but by applying composition k 3k by 2k + 1, we obtain a quadratic equation F1 : y 2 +1 − x2 +1 . 3k k By multiplying xd1 y d2 to F1 , we have xd1 +1+2 y d2 − xd1 y d2 +1+2 . Hence 3k k all xd1 , y d2 , xd1 +1+2 , and y d2 +1+2 should be linear monomials. It contradicts Lemma 5. Thus F1 is the only quadratic equation we can obtain. F1 has monomials of the type xi xj and yi yj . The number of monomials is n2 + n. 2k
k
Theorem 3.1 Consider y = x2 −2 +1 with gcd(k, n) = 1 in F2n . We can obtain n linearly independent equations in n2 + n variables. Then RAA is Γ = nn . 4.4
Comparison
Table 1 shows the comparison of the resistance of algebraic attacks. Surprisingly, more equations give larger RAA in each exponent. It is because RAA increases as t − r increases and additional equations requires new variables more than new 1
k
By substituting x = z 2 +1 , we can obtain two independent quadratic equations x = k 3k z 2 +1 and y = z 2 +1 with n(n + 5)/2 variables, which reduces its RAA significantly. It will be introduced in the full version of this paper [3].
Resistance of S-Boxes against Algebraic Attacks
93
equations. From the table, we can see that the power functions with Kasami exponents have slightly better resistance against algebraic attacks, and the power functions with Gold exponents have very weak resistant against algebraic attacks.
5
Conclusion
In this paper, we developed several tools to prove linear independence of multivariate equations from algebraic S-boxes. By applying these tools to APN power functions, we learned that a power function with a Gold exponent is very weak against algebraic attacks and a power function with a Kasami exponent has slightly stronger resistance against algebraic attacks. An open problem is to find S-boxes with Γ > 232 as indicated in [4]. Also, it is an interesting topic to apply algebraic attacks to block ciphers using a power function with a Gold exponent such as MISTY which is selected as standard block algorithms in NESSIE [12].
Acknowledgement We are thankful to Hyun Soo Nam and Dae Sung Kwon for helpful discussions.
References [1] E. Biham and A. Shamir, “Differential Cryptanalysis of DES-like Cryptosystems,” Journal of Cryptology, vol. 4, pp. 3–72, 1991. 84 [2] E. Biham and A. Shamir, Differential Cryptanalysis of the Data Encryption Standard, Springer-Verlag, 1993. 84 [3] J. Cheon and D. Lee, “Almost Perfect Nonlinear Power Functions and Algebraic Attacks,” Manuscript, 2004. 92 [4] N. Courtois and J. Pieprzyk, “Cryptanalysis of Block Ciphers with Overdefined Systems of Equations,” Proc. of Asiacrypt 2002, LNCS 2501, Springer-Verlag, pp. 267–287, 2002. 83, 85, 93 [5] J. Cheon, S. Chee and C. Park, “S-boxes with Controllable Nonlinearity,” Advances in Cryptology - Eurocrypt’99, Springer-Verlag, pp. 286–294, 1999. 83, 86 [6] H. Dobbertin, “Almost Perfect Nonlinear Power Functions on GF (2n ): The Welch Case,” IEEE Trans. Inform. Theory, Vol. 45, No. 4, pp. 1271-1275, 1999. 84, 85 [7] R. Gold, “Maximal Recursive Sequences with 3-valued Recursive Cross-correlation Functions,” IEEE Trans. Inform. Theory, vol. IT-14, pp. 154–156, 1968. 84, 90 [8] T. Kasami, “The Weight Enumerators for Several Classes of Subcodes of the Second Order Binary Reed-Muller Codes,” Infor. Contr., Vol. 18, pp. 369–394, 1971. 84, 92 [9] M. Matsui, “Linear Cryptanalysis Method for DES cipher,” Advances in Cryptology - Eurocrypt’93, Springer-Verlag, pp. 386–397, 1993. 84 [10] M. Matsui, “New Block Encryption Algorithm MISTY,” Proc. of FSE’97, LNCS 1267, Springer-Verlag, pp. 54–68, 1997. 84
94
Jung Hee Cheon and Dong Hoon Lee
[11] Advances Encryption Standards. http://csrc.nist.gov/CryptoToolkit/aes/. 83 [12] New European Schemes for Signatures, Integrity, and Encryption. https://www.cosic.esat.kuleuven.ac.be/nessie/. 84, 93
Differential Attacks against the Helix Stream Cipher Fr´ed´eric Muller DCSSI Crypto Lab 51 boulevard de la Tour-Maubourg, 75700 Paris - 07 SP, France
[email protected]
Abstract. In this paper, we analyze the security of the stream cipher Helix, recently proposed at FSE’03. Helix is a high-speed asynchronous stream cipher, with a built-in MAC functionality. We analyze the differential properties of its keystream generator and describe two new attacks. The first attack requires 288 basic operations and processes only 212 words of chosen plaintext in order to recover the secret key for any length up to 256 bits. However, it assumes the attacker can force nonces to be used twice. Our second attack relies on weaker assumptions. It is a distinguishing attack that detects internal state collisions after 2114 words of chosen plaintext.
1
Introduction
A stream cipher is a secret key cryptosystem that transforms a short random secret key K into a long pseudo-random sequence also called keystream, which is XORed to the plaintext to produce the ciphertext. Although it is possible to obtain a similar primitive with a block cipher in a “pseudo-random number generator” mode (like OFB or CFB [6]), it is generally not considered to offer optimal speed performances. To respond efficiency considerations, fast stream ciphers reveal useful in real-life applications, especially those using live data transmission. Many recent stream ciphers proposals have been made in that direction including SEAL [16], SNOW [2], Scream [10] or Sober-t32 [11]. However, the security of stream ciphers is still an issue (see [1, 3, 7]), especially when compared to the level of confidence in block ciphers security. For instance, all stream ciphers candidates for the NESSIE project [14] revealed various degrees of weakness allowing at least distinguishing attacks faster than exhaustive search, while no second round block cipher was successfully attacked. As a consequence, NESSIE did not select any stream cipher in its final portfolio. Thus the actual challenge is to design fast stream ciphers and provide a better confidence in their security level. Several new ciphers aim at reaching these expectations. Helix was recently proposed at FSE’03 [5]. It is an asynchronous stream cipher based on a fast keystream generator. Its advantage over other new ciphers is to offer both confidentiality and integrity. Indeed, after encryption, Helix can B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 94–108, 2004. c International Association for Cryptologic Research 2004
Differential Attacks against the Helix Stream Cipher
95
produce a tag that guarantees the integrity of the message for very little additional computation and without requiring a second pass. This functionality is very useful in many applications where encryption and authentication must function together on streaming data. Recently, several block cipher modes of operation also providing integrity “almost for free” (see [9, 12, 15]) have been proposed, but some of them appear to be patented, which is supposedly not the case of Helix. Moreover, the analysis of Helix is an interesting topic since new mechanisms that will be included in the new 802.11i standard for wireless networks are apparently fairly close to Helix [4, 18]. The new standard will have to repair some cryptologic flaws from the previous 802.11b standard, which resulted from weaknesses in RC4 key scheduling and from an improper use of initialization vectors [8]. In this paper, we analyze the security of Helix against chosen plaintext and chosen nonce attacks. We present two attacks which are both faster than exhaustive search. Our first attack recovers the secret key (for any length up to 256 bits) with time complexity of 288 basic operations and using 212 words of chosen plaintext. It assumes an attacker could force encryption of several messages using the same pair (key,nonce). Our second attack is based on internal state collisions and distinguishes Helix from random with data complexity of 2114 blocks. This attack uses chosen nonces and chosen plaintext but never re-uses a pair (key,nonce). Our paper is organized as follows : first, we briefly describe Helix. Then, in Section 3, we show two weaknesses of the cipher which are further developed in Section 4. In Section 5, we describe two attacks based on the previous observations.
2
Description of Helix
Helix offers two main features : encryption of a plain message and production of a Message Authentication Code (MAC) to ensure integrity. Several modes of operation for Helix are proposed by its authors - encryption only, MAC only, PRNG, . . . Here, we describe briefly the mechanisms of Helix that are important in our attacks. More details about this design can be obtained in [5]. We mostly handle 32 bits values that we denote as words. Besides, ⊕ denotes bitwise addition on these values and + addition modulo 232 . ROT Ln (x) is the circular rotation of the word x by n bits to the left. We also use the notations LSB and MSB to refer to the least and most significant bit of a word. 2.1
General Structure of the Cipher
Helix is an asynchronous stream cipher, based on an iterated block function applied to an internal state of 160 bits. The input consists in a secret key K of varying length, up to 256 bits, and a nonce N of 128 bits. The internal state before encryption of the i-th word of plaintext is represented as 5 words (i)
(i)
(Z0 , . . . , Z4 )
96
Fr´ed´eric Muller
K N 160 bits
F 1 0 0 1 0 1 0 1
P
160 bits C
Fig. 1. The general structure of Helix
which are initialized for i = 0 using K and N . Details of this initialization mechanism are irrelevant here. The general structure of the encryption algorithm is described in Figure 1. It basically uses a block function F to update the internal state in function of the plaintext P , the key K and the nonce N . More precisely, during the i-th round, the internal state is updated with F , using the i-th word of plaintext Pi and two words derived from K, N and i, denoted as Xi,0 and Xi,1 . We refer to them as the “round key words”. Hence, (i+1)
(Z0
(i+1)
, . . . , Z4
(i)
(i)
) = F (Z0 , . . . , Z4 , Pi , Xi,0 , Xi,1 ) (i)
The i-th keystream word, also denoted as Si , is equal to Z0 . It is added to Pi to produce the i-th ciphertext word Ci . Thus, (i)
Si = Z 0 Ci = Si ⊕ Pi This process is repeated until all words of the plaintext have been encrypted. Finally, a last step (described in [5]) can generate a tag of 128 bits that constitutes the MAC. More details on this general framework are given in the following sections. 2.2
The Block Function
The round function F of Helix mixes three types of basic operations on words: bitwise addition represented as ⊕, addition modulo 232 represented as , and cyclic shifts represented as <<<. F relies on two consecutive applications of
Differential Attacks against the Helix Stream Cipher
0
1
INPUT 2
3
97
4
A <<< 15 <<< 25 <<< 9 <<< 10 <<< 17
B
<<< 30 <<< 13 <<< 20 <<< 11 <<< 5
0
1
2 3 OUTPUT
4
Fig. 2. The half-round “helix” function G
a single “helix” function, which constitutes half of the round function. This “helix” function is denoted as G and is represented in Figure 2. G uses two auxiliary inputs (A, B). In the first half of the round function, (A, B) = (0, Xi,0 ) and in the second half, (A, B) = (Pi , Xi,1 ). Thus, the block function can be described by the following relations (i)
(i)
(i)
(i)
(i)
(i)
(Y0 , . . . , Y4 ) = G(Z0 , . . . , Z4 , 0, Xi,0 ) (i+1)
(Z0 (i)
(i+1)
, . . . , Z4
) = G(Y0 , . . . , Y4 , Pi , Xi,1 )
(i)
where (Y0 , . . . , Y4 ) is the internal state in the middle of the computation. 2.3
Role of K and N
To protect the cipher against related-key attacks, a first step is applied that computes a working key K from the actual secret key U . Independently of its length l(U ), K is always 256 bits long and is used in all subsequent operations instead of U . The derivation of K is based on 8 rounds of a Feistel network. The result is also represented as 8 words: K0 , . . . , K7 . Besides, Helix uses a nonce N to obtain different keystream sequences with the same secret key. N is always 128 bits long and is generally represented as 4 words: N0 , . . . , N3 . An expansion phase turns it into a 256 bits value by creating 4
98
Fr´ed´eric Muller
additional words N4 , . . . , N7 defined as Nk+4 := (k mod 4) − Nk for k = 0, . . . , 3. During the i-th round of encryption, the round key words Xi,0 and Xi,1 are computed as Xi,0 := Ki mod 8 Xi,1 := K(i+4) mod 8 + Ni mod 8 + Xi + i + 8 ⎧ ⎨ (i + 8)/231 if i mod 4 = 3 if i mod 4 = 1 Xi := 4 l(U ) ⎩ 0 otherwise These values depend only on i, K and Ni . Besides, it is straightforward to reconstruct the secret key from these values for 4 consecutive rounds when the nonce is known.
3
Some Weaknesses of Helix
In this section, we describe two weaknesses of the block function. They respectively concern the role of the plaintext words and the nonce words at each round. 3.1
Influence of Each Plaintext Word
Since Helix requires a plaintext-dependent keystream, it is reasonable to analyze the round function assuming an attacker can control the plaintext introduced. In general, an attacker should not be able to recover any information about the secret key or the internal state of the cipher, by observing the keystream corresponding to chosen plaintext. Using the notations of Section 2, Pi denotes the i-th word of plaintext. It is introduced inside Helix internal state at the i-th advance. Then, at the beginning of the (i + 1)-th advance, a new keystream word Si+1 is produced. From the description of Helix, one sees that Pi is introduced only in the second half of the (i) block function (as the input A of Figure 2). It is XORed to Y3 , then added (i) to Y0 . The result is then modified only once before the end of the round excepting cyclic shifts - through a XOR with some intermediate value (referred to as a). However, it is easy to verify that a is actually independent of Pi . Thus Si+1 can be computed as (i+1)
Si+1 = Z0
(i)
= ROT L20 (a ⊕ ROT L9 (Y0
(i)
+ (Y3
⊕ Pi )))
If the plaintext word Pi = Pi ⊕ ∆ was introduced instead of Pi , then the next , such that keystream word would be Si+1 δ = Si+1 ⊕ Si+1 = ROT L29 ((x + (y ⊕ Pi )) ⊕ (x + (y ⊕ Pi ⊕ ∆)))
Differential Attacks against the Helix Stream Cipher (i)
where x and y respectively denote the intermediate words Y0 that Pi = 0, then for any difference ∆ on the plaintext,
99
(i)
and Y3 . Suppose
∆ = ROT L3 (δ) = (x + y) ⊕ (x + (y ⊕ ∆))
(1)
is the corresponding difference on the keystream. In Section 4, we will discuss how an attacker can take advantage of this differential property. 3.2
Influence of Each Nonce Word
Similar differential properties hold regarding each nonce word. Indeed, the nonce N serves two purposes in Helix : – Fill the initial 160 bits of internal state. – Derive two words Xi,0 and Xi,1 introduced at round i. Concerning this second task, it appears from Section 2.3 that the two “key words” introduced at each round do not depend on the full nonce. Actually, the round key words at round i depend only on Ni mod 4 . Therefore, if we consider two distinct nonces N and N where only one word changes, the round function will essentially apply the same mapping on the internal state, for 3 rounds out of 4. This property has consequences on the propagation of state collisions. Moreover, if only one nonce word Ni is modified to Ni + ∆ then, for rounds j such that j mod 4 = i, both round key words remain unchanged. For other positions, Xj,1 is changed to (Xj,1 ± ∆) while Xj,0 is unchanged. Since Xj,1 is introduced at the very end of the block function, we have a differential property, like in Section 3.1. When all other inputs are unchanged, the difference on the keystream words resulting from this difference ∆ on the nonce word Ni is ∆ = a ⊕ (a ± ∆)
(2)
for some unknown internal value a (see Figure 2).
4
Differential Properties of Addition Modulo 232
We have seen that differential patterns on the plaintext or the nonce propagate to simple differential patterns on the keystream. More precisely, the differential property on the plaintext is related to a general problem concerning linear approximations of addition modulo 232 that can be summarized by relation (1). In this section, we will describe various ways to take advantage of this observation. 4.1
Related Problems
A well known problem (see [13]) is, given two fixed words x and y, to find a pair (∆, ∆ ) such that ∆ = (x + y) ⊕ (x + (y ⊕ ∆))
(3)
100
Fr´ed´eric Muller
and that is observed with high probability. This problem has been studied from a theoretical point of view in [17]. However, in the present situation, we are looking things the other way around since x and y are unknown to us but we might be able to choose ∆ and observe ∆ . More precisely, we want to 1. find statistical properties that can be easily detected in order to distinguish Helix from a random source. 2. recover some secret information about the internal state of Helix (the values of x and y for instance). 4.2
A “Dummy” Distinguisher
Suppose an attacker encrypts two messages that begin similarly, but, at some point, differ on one word by ∆ = 0x80000000 Then, the difference on the next keystream word (called ∆ ) is such that ∆ = ∆, since there is no propagation from MSBs to LSBs during an addition. Using this relation, the block function of Helix can be distinguished from a random source with two chosen messages, but this requires to use twice the same key and the same nonce. This attack scenario is discussed in Section 5. In the next section, we go further by trying to actually recover the two internal values x and y using relation (3). 4.3
Recovering x and y
In this section, we are interested in recovering the two intermediate values x and y involved in relation (3). Thus, we have to consider the following problem Problem 1. Let x and y be two given constants of 32 bits. For any ∆, ∆ = (x + y) ⊕ (x + (y ⊕ ∆))
(4)
is given. How many (x, y) are possible solutions ? Give an efficient algorithm to recover these solutions. First, it is easy to see that the solution is not always unique. Indeed, if x = 0, then ∆ does not depend on y. However, in average, the number of candidates is small. In this section, we propose an efficient algorithm to recover the two unknown values x and y with a limited number of observations. The following notations are used : wj denotes the j-th bit of a word w. Besides, let cj denote the carry bit at position j in the addition of x and y ⊕ ∆. For all j, 0 ≤ j ≤ 31, (x + (y ⊕ ∆))j = xj ⊕ yj ⊕ ∆j ⊕ cj and initially c0 = 0. We also suppose that x = 0.
Differential Attacks against the Helix Stream Cipher
101
Claim. Let t, 0 ≤ t ≤ 30, denote the position of the least significant bit ’1’ of x. Then, there are exactly 2t+3 valid pairs (x, y), solutions of the previous problem. Recovering these solutions can be done by testing at most 93 chosen values of ∆. We use the following induction – Assume all bits of x and y are known up to position (i − 1). – If any xj = 1 with 0 ≤ j < i, then • By choosing an appropriate value of ∆k for j ≤ k < i, it is possible to obtain any value of ci (0 or 1), since everything is known up to position i. • In both cases, pick both values of ∆i (0 and 1) and set all other bits of ∆ to 0. The resulting value of ∆i+1 depends only on the carry bit ci+1 . • Recover xi and yi by comparing the different distributions (see Table 1) – Otherwise • Necessarily, ci = 0 • Using Table 1, it is still possible to recover xi . • No information on yi is obtained. Therefore, by induction, all bits of x can be recovered from position 0 to 30 (it is impossible to recover x31 because no observation can be made about position 32 of ∆ ). Similarly, all bits of y from position (t + 1) to 30 can be recovered. The other t + 3 bits of x and y need to be guessed. When x = 0, our analysis remains valid by taking t = 30. In fact, 3 queries are enough to distinguish the distributions in Table 1. Thus, at most 3×31 = 93 queries are sufficient to recover a valid solution (x, y). Besides, it is easy to verify that flipping the bit yt will imply to flip all bits xj and yj for t < j < 31 in order to obtain an other valid solution, since all carry bits also get flipped. Therefore all solutions of the system can be expressed directly from a single solution, without any extra query. We performed some experiments using various values of x and y and always identified with success the expected number of 2t+3 solutions. Table 1. Distribution of ∆i+1 depending on xi and yi xi 1 1 1 1 1 1 1 1
yi 1 1 1 1 0 0 0 0
ci 0 0 1 1 0 0 1 1
∆i 0 1 0 1 0 1 0 1
∆i+1 δ δ⊕1 δ δ δ δ⊕1 δ⊕1 δ⊕1
xi 0 0 0 0 0 0 0 0
yi 1 1 1 1 0 0 0 0
ci 0 0 1 1 0 0 1 1
∆i 0 1 0 1 0 1 0 1
∆i+1 δ δ δ⊕1 δ δ δ δ δ⊕1
102
5
Fr´ed´eric Muller
Attacks against Helix
In this section, two attacks against Helix are developed. The first one is a distinguishing attack using chosen plaintext, which is extended to a key recovery attack requiring 288 basic operations and about 212 block encryptions. A second attack takes advantage of choosing similar nonces to detect internal state collisions. 5.1
A Distinguishing Attack
In Section 3.1, we have shown that the introduction of a chosen difference on the plaintext from a fixed internal state results in predictable patterns on the keystream. However, to turn these observations into an attack, it is necessary to consider the following scenario – The attacker requests encryption of some random message P = (P1 , . . . , Pn ) under some pair (key,nonce) = (K, N ). The resulting ciphertext is C = (C1 , . . . , Cn ). – He requests encryption with (K, N ) of an other message where Pn−1 is re placed by Pn−1 = Pn−1 ⊕ ∆. This yields the ciphertext C = (C1 , . . . , Cn ). – The attacker observes ∆ = Cn ⊕ Cn . In this case, we have seen that a real Helix output can be distinguished from a random output, by picking ∆ = 0x80000000 (then, necessarily, ∆ = ∆). 5.2
A Simple Key Recovery Attack
Now, we wish to extend the observations of Section 4.3. This technique allowed an attacker to retrieve up to 64 bits of intermediate values by observing the keystream corresponding to well chosen plaintexts. Actually, this information leakage is an important weakness, since it reduces the entropy of the internal state. Using an appropriate guessing technique, one may hope to turn it into a key recovery attack. Such an attack is generally called a guess-then-determine attack, since an attacker will first guess some internal state bits and then determine the correct guess using available information. First, let us consider the round number i of Helix encryption. We suppose an (i) (i) attacker has access to the keystream word Z0 and to a few candidates for Y0 (i) and Y3 as described in Section 4.3. These two intermediate words depend on (i) (i) the internal state at input of round i : (Z0 , . . . , Z4 ) and on the first round key word Xi,0 . This is represented in Figure 3 where each box is a 32 bits value and dashed boxes represent known values. An attacker may hope to use these conditions to reduce the number of possible internal states to 2128 × 232 × 2−64 = 296 (i)
(i)
Actually, this number can be reached by guessing Z2 , Z3 and Xi,0 . Then (i) (i) the attacker can retrieve Z1 and Z4 by looking precisely at the function G
Differential Attacks against the Helix Stream Cipher
round key word
111 000 111 000
103
Internal state
G
000 111 000 111
000 111 000 111
Intermediate state
Fig. 3. The framework of the simple attack
(see Figure 2). Thus, the attacker can indeed find 296 candidates for the internal state at the beginning of round i. To tell which candidate is correct, some of the previous rounds (say τ = 5 rounds) need to be inverted. This can be done without (i−j) (i−j) increasing the number of candidates, provided Y0 and Y3 are known, for 0 ≤ j < τ . For this purpose, the recovery technique of Section 4.3 needs to be applied τ times here. As long as it returns few solutions, an appropriate round inversion reduces the number of candidates - roughly by a factor 232 . Thus, for τ = 5 we eventually obtain a unique candidate, and enough “round key words” to directly retrieve the complete secret key. To summarize, this simple attack requires to guess 96 bits of internal state and to apply τ times the technique described in Section 4.3 to recover intermediate values. However, this technique does not provide a unique solution, which increases the time complexity of the attack. Actually, only the round i is in the critical path and with probability 12 , the number of solutions here is only 8. In this “good” case, the complexity of the attack is 296 × 8 = 299 basic instructions. In “bad” cases, there are more than 8 solutions at position i, but the attacker may easily find another position i where there are only 8 solutions. The data complexity corresponds to the encryption of τ ×93 pairs of messages of length at most τ = 5 words. Thus, the number of plaintext blocks encrypted is 2 × 5 × 5 × 93 212 5.3
An Improved Attack
A more subtle guessing technique can be applied using bitwise analysis of the (i) block function. The “subtle” attack consists in guessing only 2 words, Z3 and (i) (i) (i) (Z1 +Z4 ), plus 17 LSBs of Z2 . Then, like in the “simple” attack, the attacker (i) (i) can obtain the 17 LSBs of ROT L25 (Z4 ) and thus the 17 LSBs of ROT L25 (Z1 ). Looking at the block function of round i − 1, the attacker knows two output
104
Fr´ed´eric Muller
words, and has partial knowledge of the three other output words. Two relations can be written, involving one unknown intermediate word a (i)
(i)
Z3 = ROT L21 (Z1 ) + a (i−1)
Y3
(i)
(i)
= ROT L28 (Z1 ) ⊕ ROT L21 (Z2 ) (i)
⊕ROT L26 (Z4 ) ⊕ ROT L19 (a) From the first relation, one sees that guessing the 4 LSBs of a will give the (i) attacker a candidate for the 21 LSBs of a (using partial knowledge of Z1 ). Then, (i−1) is obtained. using the second relation, a condition on bit number 13 of Y3 This condition eliminates half of the candidates. Then, each additional guessed (i) bit of Z2 provides one extra condition, that is immediately used to discard half of the guesses. This ”early abort” technique results in a guessing complexity of 232 × 232 × 217 × 24 = 285 The backtracking can be performed here exactly as before to complete the attack. The resulting time complexity is reduced to 8 × 285 = 288 guesses (each requiring a few boolean operations on 32 bit words). Furthermore, the existence of even better guessing techniques should be investigated. 5.4
Practical Impact
Previously, we have proposed a differential attack on Helix, using chosen plaintext. It requires to obtain twice the same internal state as input of the block function. Thus, the attacker needs to encrypt twice with the same key and the same nonce, and to introduce a difference in the plaintext at some point. However in [5], it is specified that ”the sender must ensure that each (K, N ) is used at most once to encrypt a message”, otherwise Helix ”loses its security properties”. According to the authors, this requirement is not restrictive since it is underlying many similar situations in cryptography. For instance, when using a synchronous stream cipher, if secret key and nonce are unchanged, the same pseudo-random sequence is generated twice, which breaks the confidentiality. Similar problems may also be encountered when using a block cipher in OFB mode for instance. In general, a distinguishing attack is always possible when nonces are re-used. We believe the situation is more preoccupying in the case of Helix since we obtain key recovery attacks and not only distinguishing attacks. On the one hand, there are situations where the previous scenario is not realistic. Indeed, the secret key may be used to communicate only in one direction. In this case, it is straightforward for the sender never to re-use the same nonce (he can use counters for instance). Apparently, this is true for wireless networks, where each pair of users have two separate secret keys, one for each direction. A differential attack cannot be applied there, unless the attacker gains physically access to the encryption machine and can force nonce repetition. This may be possible in some particular occasions, but in general it is a strong assumption.
Differential Attacks against the Helix Stream Cipher
105
On the other hand, in most situations, our differential attack scenario seems realistic. For instance, several users often need to share a secret key. Even if they split properly the nonce space, what happens if the same message is sent to multiple receivers ? An attacker can sit in the middle, and modify the ciphertext on one of the communication channels. Then, by comparing a “faulty” decryption with a correct decryption, he may obtain the kind of differential information he needs. To conclude, we think the security impact of our attacks will highly depend on the context, but in general, one should expect the block function of Helix to resist better against differential attacks. Overall, the secrecy of the key cannot reasonably rely on the absence of nonce repetition. 5.5
A Chosen Nonce Attack
A weakness regarding the influence of each nonce word has been identified in Section 3.2. Here, we propose an extension to a distinguishing attack against Helix. Its complexity is much bigger than the previous attack. However it has the advantage of being based on weaker assumptions. Indeed, in this case, the attacker does not need to encrypt several messages with the same pair (key,nonce). Instead, we suppose that the same plaintext P is encrypted twice with the same secret key, but two distinct nonces N and N such that N = (N0 , N1 , N2 , N3 ) N = (N0 , N1 , N2 , N3 + ∆) Then, as argued in 3.2, the block function is essentially the same for any round i such that i mod 4 = 3. If a state collision occurs on the input of such a round, it will also propagate to a state collision for the input of the next round. Thus state collisions on inputs of rounds i such that i mod 4 = 0 imply collisions on 4 consecutive blocks of keystream. Moreover, the difference on the 5-th block can be predicted exactly (by picking ∆ = 10 . . . 0x for instance). Thus, we obtain a detectable condition on 160 bits of keystream. This is sufficient to detect state collisions with good probability. Therefore, contrarily to what is claimed in [5], state collisions in Helix can be detected. However, the length of messages is not allowed to exceed 262 blocks, so collisions are unlikely to be observed for practice purpose. 5.6
Forcing the Collisions
In this section, we show that the previous attack can be extended into a distinguisher against Helix with only 2114 encrypted blocks. This is an important result, since it constitutes a break of the cipher, according to the definition given by the authors [5]. The general idea is to work on a large set of nonces that will preserve collisions during a few rounds. Then these collisions can be detected by observing the corresponding keystream blocks. More precisely, we build a message P of the
106
Fr´ed´eric Muller
maximal authorized length 262 words by repeating 262 times the same word P0 . Then, P is encrypted under a fixed unknown secret key K using different nonces of the form N (δ,∆) = (N0 + δ, N1 + δ, N2 + δ, N3 + ∆) with four fixed constants (N0 , . . . , N3 ). δ is of the form 8 × x where x spans all values from 0 to 220 and ∆ spans all 232 possible words. Therefore the number of blocks encrypted is 262 × 232 × 220 = 2114 As before, we consider any state collisions that occurs between two different nonces N (δ1 ,∆1 ) and N (δ2 ,∆2 ) , at two different positions in the encryption, respectively i1 and i2 . We would like this state collision to be preserved for several rounds, in order to detect some properties on the keystream, as in the previous Section. We are sure that the plaintext word introduced is always P0 , by construction. Furthermore we would like to have the same round key words for both encryptions. Hence, these positions should satisfy i1 mod 8 = i2 mod 8 = 0 in order to have Xi1 +j,0 = Xi2 +j,0 for all j. Besides, if δ1 + i1 = δ2 + i2 mod 232
(5)
then Xi1 +j,1 = Xi2 +j,1 when j mod 4 = 3. In this case, the state collision is preserved during at least 3 rounds. Concerning rounds i1 + 3 and i2 + 3, we would like to also preserve the collision, thus we need Xi1 +3,1 = Xi2 +3,1 or ∆1 + i1 + Xi1 +3 = ∆2 + i2 + Xi2 +3 mod 232
(6)
With these three assumptions, the state collision is preserved at least until the rounds i1 + 7 and i2 + 7 which results in collisions on 8 consecutive words of keystream. To mount an attack, we first store sequences of 8 consecutive keystream words, for each message and for each position i such that i mod 8 = 0. Then, 114 we look for a collision among the 2 8 = 2111 entries in this table. This can be achieved by sorting the table, with complexity of 2111 × 111 2118 basic instructions. Then, since we consider objects of 256 bits, the number of “fortuitous” collisions in the table is 2111 × 2111 × 2−256 0 2 Besides, when a “true” state collision occurs, a collision is also observed on the entries of the table, provided the additional assumptions (5) and (6) hold. (5) holds with probability 2−29 , since all terms are multiples of 8, and (6) holds with probability 2−32 . Therefore, the number of “true” collision observed in the table is in average 2111 × 2111 × 2−160 × 2−29 × 2−32 = 1 2
Differential Attacks against the Helix Stream Cipher
107
Thus we have considered enough encrypted data to detect some particular state collisions that are preserved during a few rounds. We achieve it by observing patterns of 8 consecutive words of keystream. For a true Helix output we expect to find a collision in the previous table, while it will not be the case for a random output. Actually this distinguishing attack can be slightly improved if we take into account the case i mod 8 = 4. To conclude, we have proposed a distinguishing attack against Helix requiring the encryption of 2114 words of plaintext under chosen nonces. This attack is faster than exhaustive search, processes less than 2128 blocks of plaintext and respects the security requirements proposed in [5], since no pair (key,nonce) is ever re-used to encrypt different messages. Therefore, this attack constitutes a theoretical break of Helix.
6
Conclusion
This paper describes two attacks against the new stream cipher Helix. The first one recovers the secret key with a reasonably low complexity in time and data, so we think it should be considered as an important threat. The assumptions we use are quite usual (chosen plaintext, chosen nonce), but they are outside the security model proposed by the authors of the cipher. However, we also propose a second attack, less efficient but which relies on weaker assumptions. This distinguishing attack constitutes a break of Helix according to the definition given by the authors. Both attacks result from weak differential properties of the encryption function regarding the plaintext and the nonce. In general, our attack illustrates the fact that one should be careful to protect new stream ciphers against differential-like attacks.
References [1] D. Coppersmith, S. Halevi, and C. Jutla. Cryptanalysis of Stream Ciphers with Linear Masking. In M. Yung, editor, Advances in Cryptology – Crypto’02, volume 2442 of Lectures Notes in Computer Science, pages 515–532. Springer, 2002. 94 [2] P. Ekdahl and T. Johansson. SNOW - a New Stream Cipher. In First Open NESSIE Workshop, KU-Leuven, 2000. Submission to NESSIE. Available at http://www.it.lth.se/cryptology/snow/. 94 [3] P. Ekdahl and T. Johansson. Distinguishing Attacks on SOBER-t16 and t32. In J. Daemen and V. Rijmen, editors, Fast Software Encryption – 2002, volume 2365 of Lectures Notes in Computer Science, pages 210–224. Springer, 2002. 94 [4] N. Ferguson. Michael: an improved MIC for 802.11 WEP. Document 2-020. Available at http://grouper.ieee.org/groups/802/11/. 95 [5] N. Ferguson, D. Whiting, B. Schneier, J. Kelsey, S. Lucks, and T. Kohno. Helix, Fast Encryption and Authentication in a Single Cryptographic Primitive. In T. Johansson, editor, Fast Software Encryption – 2003, 2003. To appear. 94, 95, 96, 104, 105, 107 [6] FIPS PUB 81. DES Modes of Operation, 1980. 94
108
Fr´ed´eric Muller
[7] S. Fluhrer. Cryptanalysis of the SEAL 3.0 Pseudorandom Function Family. In M. Matsui, editor, Fast Software Encryption – 2001, volume 2355 of Lectures Notes in Computer Science, pages 135–143. Springer, 2001. 94 [8] S. Fluhrer, I. Mantin, and A. Shamir. Weaknesses in the Key Scheduling Algorithm of RC4. In S. Vaudenay and A. M. Youssef, editors, Selected Areas in Cryptography – 2001, volume 2259 of Lectures Notes in Computer Science, pages 1–24. Springer, 2001. 95 [9] V. D. Gligor and P. Donescu. Fast Encryption and Authentication: XCBC Encryption and XECB Authentication Modes. In M. Matsui, editor, Fast Software Encryption – 2001, volume 2355 of Lectures Notes in Computer Science, pages 192–108. Springer, 2001. 95 [10] S. Halevi, D. Coppersmith, and C. Jutla. Scream : a Software-efficient Stream Cipher. In L. Knudsen, editor, Fast Software Encryption – 2002, volume 2332 of Lectures Notes in Computer Science, pages 195–209. Springer, 2002. 94 [11] P. Hawkes and G. Rose. Primitive Specification and Supporting Documentation for SOBER-t32. In First Open NESSIE Workshop, 2000. Submission to NESSIE. 94 [12] C. Jutla. Encryption Modes with Almost Free Message Integrity. In B. Pfitzmann, editor, Advances in Cryptology – Eurocrypt’01, volume 2045 of Lectures Notes in Computer Science, pages 529–544. Springer, 2001. 95 [13] H. Lipmaa and S. Moriai. Efficient Algorithms for Computing Differential Properties of Addition. In M. Matsui, editor, Fast Software Encryption – 2001, volume 2355 of Lectures Notes in Computer Science, pages 336–350. Springer, 2001. 99 [14] NESSIE - New European Schemes for Signature, Integrity and Encryption. http://www.cryptonessie.org. 94 [15] P. Rogaway, M. Bellare, J. Black, and T. Krovetz. OCB/ A Block-cipher Mode of Operation for Efficient Authenticated Encryption. In Eight ACM Conference on Computer and Communications Security (CCS-8), pages 196–205. ACM Press, 2001. 95 [16] P. Rogaway and D. Coppersmith. A Software-optimized Encryption Algorithm. In R. Anderson, editor, Fast Software Encryption – 1994, volume 809 of Lectures Notes in Computer Science, pages 56–63. Springer-Verlag, 1994. 94 [17] J. Wallen. Linear Approximations of Addition Modulo 2n . In T. Johansson, editor, Fast Software Encryption – 2003, 2003. To appear. 100 [18] IEEE P802.11, The Working Group for Wireless LANs. http://grouper.ieee.org/groups/802/11/. 95
Improved Linear Consistency Attack on Irregular Clocked Keystream Generators H˚ avard Molland The Selmer Center , Dept. of Informatics University of Bergen, P.B. 7800 N-5020 Bergen, Norway
Abstract. In this paper we propose a new attack on a general model for irregular clocked keystream generators. The model consists of two feedback shift registers of lengths l1 and l2 , where the first shift register produces a clock control sequence for the second. This model can be used to describe among others the shrinking generator, the step-1/step-2 generator and the stop and go generator. We prove that the maximum complexity for attacking such a model is only O(2l1 ). Keywords: Stream ciphers, irregular clocked generators, linear consistency test
1
Introduction
The goal in stream ciphers is to expand a short key into a long keystream z that is difficult to distinguish from a truly random bit stream. It should not be possible to reconstruct the short key from z. The message is then encrypted by mod-2 additions with the keystream. In this paper we analyze additive stream ciphers where the keystream is produced by an irregular clocked linear feedback shift register (LF SR). This model produces bit streams with high linear complexity, which is a important criteria for pseudo random sequences. The cipher model we attack is composed of two LF SRs, LF SRs of length ls and LF SRu of length lu . LF SRs produces a bit stream s and LF SRu produces a bit stream u. The bit stream s is sent through a function D(). Finally D() outputs the clock control sequence of integers, c, which is used to clock LF SRu . See Fig. 1 for an illustration, and Sec. 2.1 for a full description of the model. The effect of the irregular clocking is that u is irregularly decimated. The result from the decimation is the keystream z. Thus, the positions of the bits in the original stream u are altered and the linearity of the stream are destroyed. This gives the keystream z high linear complexity. There have been several previous attacks on this scheme. One popular method is to use the constrained Levenshtein distance (CLD) (also called edit distance),
This work was supported by the Norwegian Research Council under Grant 146874/420.
B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 109–126, 2004. c International Association for Cryptologic Research 2004
110
H˚ avard Molland
LFSRs D
LFSRu
s
c u
z
Fig. 1. The general model for irregular clocked keystream generators
which is the number of deletions, insertions, or substitutions required to transform one sequence into another. In [10, 9] they find the optimal edit distance and present efficient algorithms for its computation. Another technique is to use the linear consistency test (LCT), see Handbook of Cryptography (HAC) [1] and [3]. Here the ls clock control initialization bits are guessed and used to restore the positions the keystream bits had in u. This gives the guess u∗ = (.., ∗, zi , ..., zj , ...∗, ..., zk , ..., ∗, ...), where zi , zj , zk are some keystream bits and the stars are the deleted bits. They now perform the LCT on u∗ , using the Gaussian algorithm on an equation set with lu unknowns derived from LF SRu and u∗ . If the equation set is consistent the guess is outputted as the correct initialization bits for LF SRs . The Gaussian algorithm will use about 1 3 ls 3 3 lu calculations and the total complexity for the attack is O(2 · lu ). In [7, 8] and the recent paper [11] they guess only a few of the clock control bits before they reject/accept the guess, using the Gaussian algorithm. If the guessed bits pass the test, they do a exhaustive search on the remaining key space. It is hard to estimate the running time for the attacks in [7, 8, 9, 10]. The attack in [11] is estimated to have a upper bound complexity O(L3 · 2Lλ ), where λ = log A/(1 + log A), L = l1 + l2 and A is the number of different clocking numbers from the D() function. Most of the previous LCT attacks have in common that they try to find the initialization bits for both LF SRs and LF SRu at once. We have a much more simple and algorithmic approach to the problem. The resulting algorithm is deterministic and has a lower and easily estimated running time which is independent from the number of clock control behaviors A, and the length lu of LF SRu . We will show that our attack has lower computational complexity than the previous LCT attacks. We also do a test similar to the LCT, but our test is much more efficient since we are not using the Gaussian algorithm to reject or accept the initialization bits for LF SRs . Our rejection test has constant complexity O(K), where K is only 2 parity check operations in average. Thus, the total complexity for the attack becomes O(2ls ). The basic idea for the test is simple. From the generator polynomial gu (x) for LF SRu we derive a low weight cyclic equation that will hold for all bitstreams
Improved Linear Consistency Attack
111
generated by LF SRu. In Appendix C.3 we describe a modified version of Wagners General birthday algorithm [4] that finds the low weight cyclic equation. For each guess of c we generate the guess u∗ for u. Then we try the cyclic equation at a given number of entries in the u∗ stream. If the equation hold every time, we can conclude that the bits are generated by LF SRu , and it is most likely that we have the correct guess for c. If the guess is wrong we have to test the equation at in average 2 entries before the guess is rejected. A naive implementation of this algorithm will, as shown in Section 3.2, have complexity O(2ls · N ) where N is the length of the guess u∗ . The reason for this is that we have to calculate a new u∗ for each guess for c. The real advantage in this paper is the new algorithm we present in Section 3.4. The algorithm is iterative and except for the first iteration it calculates each guess u∗ using just a few operations. The idea is to go through the guesses for c cyclically. This way we can reuse most of u∗ from one guess to another. In worst case our attack needs 2ls iterations to succeed, and we have the complexity O(2ls ). Thus, by using the cyclic properties of feedback shift registers, we have got rid of the lu3 factor they have in the LCT attacks in [1, 3, 7, 8, 11]. In Section 4 we present some simulations of the algorithm.
2 2.1
A General Model for Irregular Clocked Generators Description
We will first give a general description of irregular clocked generators. Let gu (x) be the feedback polynomial for the shift register LF SRu of length lu , and let gs (x) be the feedback polynomial for a shift register LF SRs of length ls . LF SRu is called the data generator, and LF SRs is called the clock control generator. From gs (x) we can calculate a clock control sequence c in the following way. Let ct = D(sv , sv+1 , ..., sv+ls −1 ) ∈ {a1 , a2 , ..., aA } , aj ≥ 0 be a function where the input (sv , sv+1 , ..., sv+ls −1 ) is the inner state of LF SRs after v feedback shifts and A is the number of values that ct can take. Let pj be the probability pj = Prob(ct = aj ). The way LF SRs is clocked is defined by the specific generator. Often LF SRs and ct are synchronized, which means that v = t. LF SRu produces the stream u = (u0 , u1 , ...) The clock ct decides how many times LF SRu is clocked before the output bit from LF SRu is taken as keystream bit zt . Thus the keystream zt is produced by zt = uk(t) , where k(t) is the total sum of the clock at time t, that is k(t) ← k(t − 1) + ct . Let u = (u0 , u1 , ..., uN −1 ) be the bit stream produced by the shift register LF SRu . The resulting sequence will then be zt = uk(t) , 1 < t < M . This gives the following definition for the clocking of LF SRu . Definition 1. Given bit stream u and clock control sequence c, let z = Q(c, u) be the function that generates z of length M by Q(c, u) : zt ← uk(t) , 0 ≤ t < M
112
H˚ avard Molland
where k(t) =
t
j=0 cj
− S, S ∈ {0, 1}
The parameter S only is for synchronization, and most often S = 1. Finally we let sI = (s0 , s1 , ..., sls −1 ) and uI = (u0 , u1 , ..., uls −1 ) be the initialization states for LF SRs and LF SRu . Together, sI and uI defines the secret key for the given cipher system. If aj ≥ 1, 1 ≤ j ≤ A, the function Q(c, u) can be looked on as a deletion channel with input u and output z. The deletion rate is Pd = 1 − A
1
j=1
p j aj
.
(1)
Thus, given a stream z of length M , the expected length N of the stream u is M =M p j aj . (1 − Pd ) j=1 A
E(N ) =
2.2
(2)
Some Examples for Clock Control Generators
The Step-1/Step-2 Generator. The clocking function is defined by Q(c, u) : zt ← uk(t) , 0 ≤ t < M, and D(st ) = 1 + st . We see that the number of outputs 1 1 is A = 2, with probabilities pj = 1/2, 1 ≤ j ≤ 2. This gives Pd = 1 − 1 +2 1 = 3, 2
2
and E(N ) = 32 M . Since this generator is simple, we will use it in the examples in this paper. Example 1. Assume we have a irregular clock control stream cipher as defined in Section 2.1, with gs (x) = x3 + x2 + 1. We let sI = (s0 , s1 , s2 ) = (1, 0, 1) and we get c by ct = D(st ): c = (2, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, ...). Let gu (x) = x4 + x3 + 1 and LF SRu be initialized with uI = (1, 1, 0, 0). We get the following bit stream u = (1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1).
(3)
Using c on u, the bits are discarded in this way, u∗ = (∗, 1, 0, ∗, 1, 0, 0, ∗, 1, ∗, 1, ∗, 0, 1, ∗, 1, 1, 0, ∗, 1, ∗, 0, ∗, 1, 1, ∗, 1, 0, 1)
(4)
Finally the output bit from the cipher will be z = Q(c, u) = (1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1).
(5)
Improved Linear Consistency Attack
113
The LILI-128 Clock Control Generator. The clock control generator which is one of the building blocks in the LILI-128 cipher [12] is similar to the step1/step-2 generator but c has a larger range. The generator is defined by Q(c, u) : zt ← uk(t) , 0 ≤ t < M, and ct = D(st+i1 , st+i2 ) = 1 + st+i1 + 2st+i2 . This gives A = 4, pj = 14 , 1 ≤ j ≤ 4, and Pd = 1 − 4 1 1 j = 35 , and the length of u is j=1 4
expected to be N = 52 M.
The Shrinking Generator. In the shrinking generator, the output bit uk from LF SRu is outputted as keystream bit zt if the output sk from LF SRs equals one. If sk = 0 then uk is discarded. To be able to attack the generator with our algorithm we must have the clock control sequence c. The clock control sequence for the shrinking generator can be generated as follows. Let y − 1 be the number of consecutive zeros from sv = 1, that is (sv , sv+1 , ..., sv+ls −1 ) = (1, 0, ..., 0, ∗, ..., ∗). Then the clocking function y
is defined as D(sv , sv+1 , ..., sv+ls −1 ) = y. It follows from the definition of the shrinking generator that LF SRs and LF SRu are synchronized, so LF SRs must be clocked ct times before the next bit is outputted. Thus the clock control sequence is ct = D(sk(t) , sk(t)+1 , ..., sk(t)+ls −1 ),where k(t) ← k(t − 1) + ct for each iteration. Q(c, u) is the same as for the generators above. If we analyze the clock control sequence, ct ∈ {1, 2, ..., ..., ls − 1}, where pj = 1/2j , when ls is a large number (ls > 10). This gives A = ls − 1, Pd = 1 − ls 1 1 j ≈ 0.5 and E(N ) = 2M , as intuitively expected.
3
j=1 2j
A New Attack on Irregular Clocked Generators
The idea behind the attack is to guess the clock control sequence c, and reconstruct the original positions the key stream bits in z had in u using the reversed ˆ looking similar to function Q∗ (c, z) defined below. From this we get a sequence u ˆ is a sequence that could have be generated by (4). When this is done, we test if u LF SRu using some linear equations we know hold over any sequences generated by LF SRu . If the test holds, we assume we have made the correct guess for c. Knowing the correct c, we can use the Gaussian algorithm as described in [3] to find the initialization bits for u. 3.1
The Basics
First we state a definition. Definition 2. Given the clock control sequence c and keystream z, let the function u∗ = Q∗ (c, z) be the (not complete) reverse of Q, defined as Q∗ (c, z) : u∗k(t) ← zt , 0 ≤ t < M,
t where k(t) = j=0 cj − S, and u∗k =* for the entries k in u∗ where u∗k is deleted. When this occurs we say that u∗k is not defined.
114
H˚ avard Molland
The length of u∗ will be N ∗ = M−1 j=0 cj . Note that the only difference between this definition and Definition 1, is that u and z have changed sides. Thus Q∗ (c, z) is a reverse of the Q(c, u). But since some bits are deleted, the reverse is not complete and we get the stream u∗ . As seen in Example 1 we can reverse the keystream (5) back to (4) but not completely back to the original stream (3), since the deleted bits are not known. The probability for a bit u∗k being defined is Prob(u∗k ) = 1−Pd. This happens when k = k(t) holds for for some t, 0 ≤ t < M . It follows that the sum δ = u∗k + u∗k+j1 + ... + u∗k+jw−1 will be defined if and only if all of the bits in the sum are defined. Thus the sum δ will be defined for given k in u∗ with probability w 1 ∗ ∗ ∗ w Pdef = Prob(uk , uk+j1 , ..., uk+jw−1 ) = (1 − Pd ) = A . (6) j=1 pj aj 3.2
Naive Attack
Using definition 2, we first present a naive high complexity attack. In the next section we present a more advanced and low complexity version of the attack. Let sI be the initial state for LF SRs , and let Lv (sI ) be the inner state after v feedback shifts. Without loss of generality we assume S = 1 and that LF SRs is clocked once for each output ct . Thus, v = t and ct = D(Lt (sI )) is the output from the clock generator after t feedback shifts. We are given a keystream z of length M which is generated with z = Q(c, u). Assume we have found an equation uk + uk+j1 + ... + uk+jw−1 = 0 that holds over u. First we guess the initial state LF SRs and generates the corresponding guess ˆ c for c using the D() function. Using definition 2 we can calculate u∗ = Q∗ (ˆc, z). Then we try to find m (typically m = ls + 10, we add 10 to prevent false alarms) entries in u∗ where the equation is defined. If the equation holds for every entry it is defined, we assume we have found the correct guess for sI . If not, we make a new guess and do the test again. The pseudo code for this algorithm is given below. Input The keystream z of length M 1. Preprocessing: Find an equation of low weight that holds over the stream u of length N . 2. For all possible guesses ˆsI do the following: 3. Generate the clock control sequence cˆ of length M by ct = D(Lt (ˆsI )). M ˆ ∗ = Q∗ (ˆc, z). using u 4. Generate u ˆ∗ of average length N = (1−P d) ˆ ∗ where the equation is defined. 5. Find m entries (k1 , k2 , ..., km ) in the stream u 6. If the equation holds for all the m entries over u ˆ ∗ , then stop the search and output the guess ˆsI as the key for LF SRs . The problem with this algorithm is that for each guess for ˆsI , we have to generate a new clock control stream of length M and generate u ˆ∗ = Q∗ (ˆc, z) of length N . In larger examples, N and M will be large numbers, say around 106 . Since the
Improved Linear Consistency Attack
115
complexity is O(N · 2ls ), the run time for this algorithm will in many cases be worse than the algorithm in [3]. In the next session we present an idea that fixes this problem. 3.3
Final Idea
The problem in the previous section was that we had to generate M bits of the clock control stream for each guess for sI . This can be avoided if we go through the guesses in a more natural way. We start by a initial guess ˆsI = (0, 0, ..., 1), and let the i’th guess be the internal state of the LF SRs after i feedback shifts. Let ci = (ci0 , ci1 , ..., ciM−1 ) be the i’th guess for the clock control sequence defined by cit = D(Li+t (1, 0, ..., 0)), 0 ≤ t < M . Let ui = Q∗ (ci , z) be the −1 i corresponding guess for u∗ of length Ni = M t=0 ct . We can now give a iterative i+1 i method for generating u from u . Lemma 1. We can transform ui into ui+1 = Q∗ (ci+1 , z) using the following i method: Delete the first ci0 entries (∗, ..., ∗, z0 ) in ui , append the ci+1 M−1 = cM entries (∗, ..., ∗, zM ) at the end, and replace zt with zt−1 for 1 ≤ t ≤ M . Proof. See Appendix B. Lemma 1 gives us a fast method for generating all possible guesses for u given a keystream z. See Table 1 for an intuitive example of how the lemma works. Next we prove a theorem that allows us to reuse the equation set defined for ui . Theorem 1. If the sum βui ,k = uk + uk+k1 + ... + uk+kw−1 = zt + zt+j1 + ... + zt+jw−1 = γz,t is defined over ui , then the sum βui+1 ,k−ci0 = zt−1 + zt+j1 −1 + ... + zt+jw−1 −1 = γz,t−1 is defined over ui+1 . Proof. See Appendix B. The main result from this theorem is that the equation set that is defined over ui will still be defined over ui+1 if we shift the equations ci0 entries to the left over ui+1 . This means that we can just shift the equations 1 entry to the left over z, and we will have an sum that is defined for the guess ˆsI = D(Li+1 (1, 0, ..., 0). Thus, the theorem indicates that we can go around a lot of computations if we let the i’th guess for the inner state of LF SRs be Li (1, 0, ..., 0).
116
H˚ avard Molland
Table 1. Example of a walk through of the key. The bits in bold font show how the pattern of defined bits in ui shifts to the left, while the actual key bits stay relatively put. Also notice how the entries zi in the patterns are replaced with zi−1 after one iteration. For example the sub stream z7 , z8 , z9 , ∗, z10 → z6 , z7 , z8 , ∗, z9 . This means that if the sum z7 + z8 + z9 + z10 is defined for ci , then z6 + z7 + z8 + z9 will be defined for ci+1 Guessed clock sequence ci (2, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2) (1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2) (1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2) (2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1) (2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 2)
3.4
Resulting ’known’ bits of ui = Q∗ (ci , z). (∗, z0 , z1 , z2 , ∗, z3 , ∗, z4 , ∗, z5 , z6 , ∗, z7 , z8 , z9 , ∗, z10 ) (z0 , z1 , ∗, z2 , ∗, z3 , ∗, z4 , z5 , ∗, z6 , z7 , z8 , ∗, z9 , ∗, z10 ) (z0 , ∗, z1 , ∗z2 , ∗, z3 , z4 , ∗, z5 , z6 , z7 , ∗, z8 , ∗, z9 , ∗, z10 ) (∗, z0 , ∗, z1 , ∗z2 , z3 , ∗, z4 , z5 , z6 , ∗, z7 , ∗, z8 , ∗, z9 , z10 ) (∗, z0 , ∗z1 , z2 , ∗, z3 , z4 , z5 , ∗, z6 , ∗, z7 , ∗, z8 , z9 , ∗, z10 )
The Complete Attack
We will now present a new algorithm that make use of the observations above. We start by analyzing LF SRu (See Appendix C.3) to find an equation λ λ : uk + uk+j1 + ... + uk+jw−1 = 0 that holds over all u generated by LF SRu for any k ≥ 0. Let the first guess for the initialization state for s be ˆsI = (1, 0, 0, ..., 0), generate c0 by c0t = D(Lt (1, 0, ...0)), t < M , and u0 = Q∗ (c, z). Next we try to find m places (k1 , k2 , ..., km ) in u0 where the equation λ is defined. From this we get the equation set u0k1 + u0k1 +j1 + ... + u0k1 +jw−1 = 0 u0k2 + u0k2 +j1 + ... + u0k2 +jw−1 = 0 .. .. . . . u0km + u0km +j1 + ... + u0km +jw−1 = 0 Since every ukx +jy in this equation set is defined in u0 , we can replace ukx +jy with the corresponding bit zt in the keystream z. Thus, u0 is a sequence of pointers to z and we can write the equations over z as the equation set Ω : zt1,1 + zt1,2 + ... + zt1,w = 0 zt2,1 + zt2,2 + ... + zt2,w = 0 .. .. . . . ztm,1 + ztm,2 + ... + ztm,w = 0
(7)
We are now finished with the precomputation. Next, we test the equation set to see if all the equations hold. If not, we iterate using the algorithm below which outputs the correct sI . Knowing sI it is easy to calculate uI = (u0 , u1 , ..., ulu −1 ) using the Gaussian algorithm once on an equation set derived from sI and LF SRu .
Improved Linear Consistency Attack
117
Input The keystream z of length M , the equation λ, the equation set Ω, the pointer sequence u0 , the states L0 (1, 0, ...0) and LM (1, 0..., 0), Set i ← 0 i M+i 1. Calculate ci0 = D(Li (1, 0, ..., 0)), and ci+1 (1, 0, ..., 0)). M−1 = cM = D(L i+1 ∗ i+1 2. Use lemma 1 to generate u = Q (c , z) and lower all indexes in the equation set Ω by one. Theorem 1 guarantees that the equations are defined over ui+1 . 3. If the first equation in Ω gets a negative index, then remove the equation from Ω. Find a new index at the end of ui+1 where λ is defined, and add the new equation over z to Ω. 4. If the current equation set Ω holds, stop the algorithm and output sI = Li+1 (1, 0, ..., 0) as the initialization state for LF SRs . 5. If δ does not hold, we set i ← i + 1 and go to step 1.
Note 1. To reach the desired complexity (2ls ) a few details on the implementation of the algorithm are needed. These details are given in Appendix A. All changes during the iterations are done on ui and the equation set Ω. Thus, each guess Li (1, 0, ..., 0) for sI result in an unique equation set Ω. The z stream is never altered. Example 2. We continue on the generator in Example 1. We have found the equation uk + uk+6 + uk+8 = 0, which corresponds to the multiple h(x) = 1 + x6 + x8 . We have z of length 19, and want to find sI . The length of ui will be N ≈ 32 19 = 28.5. We set the first guess to sI0 = (1, 0, 0). From this we generate the clock control sequence using the function c0t = D(Lt (1, 0, 0)), 0 ≤ t ≤ M − 1, and we get c0 = (2, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2). Then we spread out the z stream corresponding to c, that is u0 = Q∗ (c0 , z). From this we get the sequence u0 = (∗, z0 , z1 , z2 , ∗, z3 , ∗, z4 , ∗, z5 , z6 , ∗, z7 , z8 , z9 , ∗, z10 , ∗, z11 , ∗, z12 , z13 , ∗, z14 , z15 , z16 , ∗, z17 , ∗, z18 ). We search through u0 to find 4 entries where the equation uk + uk+6 + uk+8 = 0 is defined. Since all the defined entries in u0 points to bits in the z stream, we get the following set of equations Ω over z: z0 + z4 + z5 = 0 z6 + z10 + z11 = 0 z7 + z11 + z12 = 0 z13 + z17 + z18 = 0
118
H˚ avard Molland
We test the equations to see if all the equations hold. If the set does not hold, we continue as follows. We shift the LF SRs once, and it will have sI1 = L1 (1, 0, 0) = (0, 0, 1) as inner state. We calculate c1M−1 = c0M = D(LM (1, 0, 0)). Then we use Lemma 1 to calculate u1 from u0 . That is, we delete the c00 = 2 entries (∗, z0 ), append the (c118 = 2) entries (∗, z19 ) at the end, and at last replace the pointer zt with zt−1 for 1 ≤ t ≤ M . We get this guess for u: u1 = (z0 , z1 , ∗, z2 , ∗, z3 , ∗, z4 , z5 , ∗, z6 , z7 , z8 , ∗, z9 , ∗, z10 , ∗, z11 , z12 , ∗, z13 , z14 , z15 , ∗, z16 , ∗, z17 , ∗, z18 ). If an equation is defined for the entry t in z for the guess sI0 , it will now be defined for the entry t − 1 in z for the guess sI1 as guaranteed by Theorem 1. From this Ω becomes: z−1 + z3 + z4 = 0 z5 + z9 + z10 = 0 z6 + z10 + z11 = 0 z12 + z16 + z17 = 0 We remove the first equation from Ω since it has a negative index, and find a new index at the end of u1 where λ is defined. We find the equation z13 +z17 +z18 = 0 and add it to Ω. We test the equations to see if all the equations hold. If the set does not hold, we continue the algorithm. 3.5
Complexity and Properties
Precomputation. If the generator polynomial gu (x) for LF SRu has sufficient low weight, say ≤ 10, we can use it directly in our algorithm with w = weight(gu ) and h(x) = gu (x). In such a case we do not need much precomputation. The only precomputation is to generate u0 of length N , where the length of N is calculated below. If gu (x) has too high weight we use a modified version of Wagners algorithm for the generalized birthday problem [4] to find a multiple h(x)=a(x)g(x) of weight w = 2r and degree lh . The multiple h(x) gives a new recursion of low weight. The fast search algorithm is described in Appendix C.3. See Table 2 for some multiples found by the algorithm. When we have found a polynomial h(x) = 1+xj1 +...+xjw−1 with jw−1 = lh , the corresponding equation λ over u is uk + uk+j1 + ... + uk+jw−1 = 0. We want to find m places in the stream u where λ is defined. From equation (6) we have that an equation of weight w is defined at an random entry in u with a probability Pdef = (1 − Pd )w . Thus we must test around m/(1 − Pd )w entries to find m equations over z. To be able to do this u must have length N > lh +
m . (1 − Pd )w
(8)
Improved Linear Consistency Attack
119
Table 2. The table shows some weight 4 multiples of different polynomials found using the algorithm in Appendix C.3. The algorithm used 1 hour and 15 minutes to find the multiple of the degree 80 polynomial, mostly due to heavy use of hard disc memory. The search for the multiple of the degree 60 polynomial took 14 seconds g(x)
h(x) = a(x)g(x)
x40 + x38 + x35 + x32 + x28 + x26 + x22 + x20 + x17 + x16 + x14 + x13 + x11 + x10 + x9 + x8 + x6 + x5 + x4 + x3 + 1
x24275 + x6116 +x1752 + 1
x60 + x58 + x56 + x52 + x51 + x50 + x49 + x48 + x47 + x46 + x44 + x41 + x40 + x39 + x36 + x29 + x28 + x27 + x26 + x25 + x23 + x21 + x20 + x19 + x16 + x15 + x11 + x10 + x9 + x4 + x2 + x + 1
x2464041 + x1580916 +x131400 + 1
x80 + x79 + x78 + x76 + x75 + x69 + x68 + x57 + x56 + x55 + x54 + x52 + x49 + x46 + x45 + x44 + x42 + x37 + x36 + x35 + x32 + x31 + x30 + x28 + x27 + x26 + x24 + x23 + x21 + x20 + x19 + x13 + x12 + x10 + x8 + x6 + x4 + x3 + 1
x312578783 + x309946371 +x210261449 + 1
To avoid false keys, we choose m > ls . From the expectation (2) of N we have E(M ) = N (1 − Pd ) = (1 − Pd )lh + (1−Pm w−1 , and we have proved the following d) proposition: Proposition 1. Let an equation over u be defined by h(x) of weight w and degree lh . To get a equation set Ω of m > ls equations over z, the length of the z stream must be M > (1 − Pd )lh +
m . (1 − Pd )w−1
(9)
where m ≈ ls + 10. We see that the keystream length M is dependent of the degree lh of h(x) of weight w = 2r . The degree lh is then again highly dependent on the search algorithm we use to find h(x). When we use the search algorithm in Appendix C.3 with the proposed parameters we show in the appendix that lh will be in order r+l of lu ≈ Tmem (lu , r) = 2 r+1 . Decoding. If this algorithm is implemented properly (Appendix A) it will have worst case complexity O(2ls ) with a very little constant factor. In average the number of iterations will be in the order of 2ls −1 . At each iteration i we shift the sliding window ci0 to the right over ui . Then we shift the equation set 1 to the left over z, and test it. If we have the wrong guess for sI , each equation in the set will hold with a probability 12 . When we reach an equation that does not hold we know that the guess for sI is wrong and we break off the test. Thus the average number of equations we have to ls j m limm→∞ m j=1 j·2 /2 = limm→∞ i=1 i/2i = 2. This gives evaluate per guess is l s 2 an average constant factor of 2 parity check tests for each of the 2ls guesses. Thus the complexity is O(2 · 2ls ) = O(2ls )
120
H˚ avard Molland
Table 3. The attacks are done in C code on a 2.2 GHz Pentium IV running under Linux. Note how the running time is exactly the same for lu = 40 and lu = 60. We have set the number of equations to m = 35. The polynomials g(x) and h(x) are from Table 2 Degree ls Degree lu Degree lh of Number Decoding Length of gs (x) of gu (x) h(x) of Iterations time M of z 25
40
24275
225
9 sec.
10000
26
40
24275
226
18 sec
10000
25
60
2464041
225
9 sec
1000000
26
60
2464041
226
18 sec
1000000
Each time an equation gets a negative index, we must delete it and search for a new equation at the end of ui . We expect to search through 1/(1 − pd )w−1 (1−pd ) entries in ui to find a new equation. This is done every M−lum iteration in average, and will have little impact on the decoding complexity. When we after i iterations have found the initialization bits for LF SRs , we use the Gaussian algorithm on the linear equation set derived from LF SRu and ui to find the initialization bits for LF SRu . This has complexity O(lu3 ) and will have little effect on the everall complexity of the algorithm.
4
Simulations
We have done the attack on 4 small cipher systems, defined with clock control generator polynomials of degree 25 and 26, and the data generator polynomials of degree 40 and 60 from Table 2. The clock function D() is the LILI-clock function as described in Section 2.2. Note that we only attack the irregular clocking building block in LILI and not the complete LILI-128 cipher. In LILI128 the stream is filtered through a boolean function, and this is beyond the scope of this paper. We have used Proposition 1 and Equation 8 to calculate the length M of z and length N of u (rounded up to nearest thousand and hundred thousand). The number of parity check equations over z is set to m = 35 ≈ ls + 10. Recall that the number of paritycheck equations does not effect the complexity. Table 3 shows how the running time of the attack is unchanged when the degree of gu (x) gets larger. The impact from a larger lu is that we need longer keystream. Normally we would stop the search when we have found the correct key. But then the running time would be highly dependent on where the key is in the
Improved Linear Consistency Attack
121
keyspace. To avoid this we have gone trough the whole key space to be able to compare the different attacks in the table. In a real attack the average running time would be half the running times in Table 3. To compare with previous LCT attacks, the Gaussian factor 13 lu3 would be around 72000 for lu = 60, and around 21333 for ls = 40. In our attack the constant factor is only 2 in average. Thus the same attacks presented in Table 3 would take several hours or even days using the previous LCT algorithm.
5
Conclusion
We have presented a new linear consistency attack with lower complexity than previous on a general model for irregular clocked stream ciphers. We have tested the attack in software and confirmed that the attack has a very low running time that follows the expected complexity O(2ls ). Thus the run time complexity is independent of the degree lu of LF SRu . Further on, if we modify the algorithm, it will work on systems where noise is added on keystream z. Using much higher m and giving each guess sI a metric, we can perform an correlation attack with complexity O(m·2ls ) on such systems. Initial tests seem very promising and we will come back to this matter in future work.
Acknowledgement I would like to thank my supervisor prof. Tor Helleseth and Dr. Matthew Parker for helpful discussions and for reading and helping me in improving this paper.
References [1] A. Menezes, P. van Oorschot, S. Vanstone, “Handbook of Applied Cryptography”, CRC Press, 211-212, 1997. 110, 111 [2] D. Gollmann and W.G Chambers, “Clock-controlled shift registers: a review”, IEEE Journal on Selected Areas in Communications, 7 (1989), 525-533. [3] K. Zeng, C. Yang and Y. Rao, “On the linear consistency test (LCT) in cryptanalysis with applications”, Advances in Cryptology-CRYPTO ’89 (LNCS 435), 164-174, 1990. 110, 111, 113, 115 [4] D. Wagner, “A Generalized Birthday problem”, Advances in cryptology-CRYPTO ’02, (LNCS 2442), 288-303, 2002. 111, 118, 125 [5] T. Johansson, and F. J¨ onsson, “Fast Correlation Attacks on Stream Ciphers via Convolutional Codes”, Advances in Cryptology-EUROCRYPT’99, Lecture Notes in Computer Science, Vol. 1592, Springer-Verlag, 1999, pp. 347-362. 125 [6] Patrik Ekdahl, Willie Meier, Thomas Johannson, “Predicting the Shrinking Generator with Fixed Connections”, EUROCRYPT 2003, Lecture Notes in Computer Science, Vol. , Springer-Verlag, 1999, 330-334. [7] J. D. Golic, “Cryptanalysis of three mutually clock-controlled stop/go shift registers.” IEEE Trans. Inf Theory, 46(3),525-533, 2000. 110, 111
122
H˚ avard Molland
[8] E. Zenner, M. Krause, and S. Lucks,“Improved cryptanalysis of the self-shrinking generator”, Proc ACISP ’01, (LNCS 2119), 21-35, 2001. 110, 111 [9] Jovan Dj. Golic, Miodrag J. Mihaljevic,”A Generalized Correlation Attack on a Class of Stream Ciphers Based on the Levenshtein Distance” (Journal of Cryptology 3 ), 201-212, 1991. 110 [10] Jovan Dj. Golic and Slobodan V. Petrovic,”A Generalized Correlation Attack with a Probabilistic Constrained Edit Distance”, (LNCS 658), 472, 1993. 110 [11] Erik Zenner, “On the Efficiency of the Clock Control Guessing Attack”, ICISC 2002, (LNCS 2587). 110, 111 [12] L. Simpson, E. Dawson, J. Dj Golic, and W. Millan, “LILI keystream generator”, SAC’2000, (LNCS 2012),248-261,2000. 113
Appendix A
Implementation Details
To reach the desired complexity O(2ls ), the implementation of the algorithm needs some tricky details: 1. In Lemma 1 we get ui+1 by among other things deleting the ci0 first bits of ui . This is done using the sliding window technique, which means that we move the viewing to the right instead of shifting the whole sequence to the left. This way the shifting can be done in just one operation. To avoid heavy use of memory, we slide the window over an array of fixed length N , so that the entries that become free at the beginning of the array are reused. Thus, the left and right of the sliding window after i iterations will be (lef t, right) = (i mod N, i + Ni mod N ), where N ¿Ni , for all i, 0 ≤ i < 2ls 2. In lemma 1 every reference zt+1 in u is replaced with zt for every 0 ≤ t ≤ M , which would take M operations. If we skip the replacements we note that after i iterations the entry zt in u will become zt+i . It is also important to notice that when we write u = (..., z0 ..., zt , ..., zM , ...), the entries z0 , ..., zt , ..., zM are pointers from u to z. They are not the actual key bits. Thus, in the implementation we do not replace zt with zt−1 . But when we after i iterations in the search for equations find an equation uik + uik+j1 + ... + uik+jw−1 = 0 that is defined, we replace the corresponding zt1 + zt2 + ... + ztw with zt1 −i + zt2 −i + ... + ztw −i , to compensate. 3. We do not have to keep the whole clock control sequence ci in memory. We only need the two clocks, ci0 and ci+1 M−1 , since they are used by lemma 1 to generate ui+1 .
B B.1
Proofs of Lemma 1 and Theorem 1 Proof of Lemma 1
Proof. Let ci be the clocking integer sequence for a given i, 0 ≤ i < 2ls . We = cit+1 , 0 ≤ t < M − 1, which means that pattern of the defined see that ci+1 t
Improved Linear Consistency Attack
123
bits in ui+1 are the same as the pattern in ui shifted ci0 to the left. From this i i we deduce the following for given 1 ≤ t ≤ M and k = t−1 j=0 cj − 1: If uk = zt i+1 i i for given k then uk−ci = zt−1 . If we delete the first c0 bits of u and get u we 0
i+1 = zt−1 for k = t−1 − 1. If will have that if uk = zt for given k, then ui+1 j=0 cj k we now replace every zt in u with zt−1 for 0 < t < M and get u we see that i i+1 uk = ui+1 we just have to k , 0 ≤ k < Ni − c0 . To finally transform u into u i+1 append the cM−1 entries (∗, ..., zM−1 ) at the end of u .
B.2
Proof of Theorem 1
Proof. Let ui = (..., z0 , ..., ∗, ..., zt , ..., zt+j1 , ..., ∗, ... , zt+jw , ..., zM−1 ) ci0 −1
k
k+k1
k+kw−1
Ni −1
be the stream of length Ni we get using ui = Q∗ (ci , z). The notation means that uici −1 = z0 , uik = zt , and uiNi −1 = zM−1 . We see that the sum βui ,k = 0
uik + uik+k1 + ... + uik+kw−1 is defined over ui . The corresponding sum over z will be γz,t = zt + zt+j1 + ... + zt+jw−1 . Then the clock control sequence we get from ci+1 = Li+1+t (1, 0, ..., 0) will be t i+1 i+1 ci+1 = (ci1 , ..., ciM−1 , ci+1 M−1 ) = (c0 , ..., cM−1 ).
The main observation here is the following: We transform ui into ui+1 by deleting the first ci0 entries (∗, ..., z0 ) in ui , appending (∗, ..., ∗, zM ) at the end, and then ci0
ci+1 M
replacing zt with zt−1 for 1 ≤ t ≤ M , as explained in lemma 1. From this we get the sequence ui+1 = (..., z0 , ..., ∗, ..., zt−1 , ..., zt+j1 −1 , ..., ∗, ... , zt+jw −1 , ..., zM−1 ). (10) ci+1 −1 0
k−ci0
k+k1 −ci0
k+kw−1 −ci0
Ni+1 −1
We can easily see from (10) that the sum βui+1 ,k−ci0 = uk−ci0 + uk+k1 −ci0 + ... + ck+kw−1 −ci0 is defined since every entry in the sum is defined. The corresponding sum over z is γz,t−1 = zt−1 + zt+j1 −1 + ... + zt+jw−1 −1 .
C C.1
Searching for Parity Check Equations The Generator Matrix
Let g(x) = 1 + gl−1 x + gl−2 x2 + ... + g1 xl−1 + xl , gi ∈ F2 , gl = g0 = 1 be the primitive feedback polynomial of degree l for a shift register that generates the sequence u = (u0 , u1 , ..., uN −1 ). The corresponding recurrence is ut+l = g1 ut+l−1 + g2 ut+l−2 + ... + gl ut . Let α be defined by g(α) = 0. From this we get
124
H˚ avard Molland
the reduction rule αl = g1 αl−1 + g2 αl−2 + ... + gl−1 α + 1. Then we can define the generator matrix for sequence ut ,0 < t < N by the l × N matrix G = [α0 α1 α2 ...αN −1 ].
(11)
For each i > l, using the reduction rule αi can be written as αi = hil−1 αl−1 + ... + hi2 α2 + hi1 α + hi0 . We see that every column i ≥ l is a combination of the first l columns, and any column i in G can be represented by gi = [hi0 , hi1 , ..., hil−1 ]T .
(12)
Now the sequence u with length N and initialization state uI = (u0 , u1 , ..., ul−1 ), can be generated by u = uI G. The shift register is now turned in to a (N, l) block code. C.2
Equations
Let u be a sequence generated by the generator polynomial g(x) with degree l. It is well known that if we can find w > 2 columns in the generator matrix G, that sum to zero, (gj0 + gj1 + . . . + gjw−1 )T = (0, 0, ..., 0),
(13)
for l ≤ j0 , j1 , ..., jw−1 < N , we get an equation of the form uj0 + uj1 + ... + ujw−1 = 0. j0
j1
(14) jw−1
The equation (13) can be formulated as α + α + ... + α = 0. Thus, if (13) holds, the equation αt (αj0 + αj1 + ... + αjw−1 ) = αj0 +t + αj1 +t + ... + αjw−1 +t = 0 also holds for 0 ≤ t < N − jw−1 . From this we can conclude that the equation is cyclic and can be written as ut+j0 + ut+j1 + ... + ut+jw−1 = 0,
(15)
for 0 ≤ t < N − jw . We can also use the indexes j1 , j2 , ...jw to formulate the polynomial h(x) = xj0 + xj1 + ... + xjw−1 . If j0 , j1 ... is found using the method above, we will have the relationship h(x) = g(x)a(x), for a polynomial a(x). Thus h(x) is a multiple of g(x). C.3
Fast Method for Finding an Multiple Weight w = 2r
A previous and naive search algorithm for finding multiple h(x) of weight w and degree ¡n is as follows. It corresponds to searching for w columns in G that sum to zero mod 2. First sort the generator matrix G. Then for every choice of the columns gj0 , gj1 , ..., gjw−2 in G search for the w’th column gjw−1 that gives gjw−1 = gj0 +
Improved Linear Consistency Attack
125
gj1 + ... + gjw−2 . This algorithm is not very effective and has the complexity O(nw−1 logn). By using hashing techniques we can get down to O(nw−1 ). We can do better if we use the iterative method explained next. The algorithm is a modification of the Generalized birthday algorithm in [4]. First we sort the n × l generator matrix G1 = G in respect to the l − B1 lowest entries in the columns, for a proper B1 . The columns that are equal in the lowest l − B1 bits, will now be beside each other. If we sum them, the sum will be zero in the lowest l − B1 bits. Next we go through the matrix and sum all the columns that are equal in the l − B1 lowest entries and store the sums in a new matrix G2 . If we find m1 sums, the matrix G2 will have size m1 × B1 , since the m1 sums we find will have 0 in the l − B1 lowest entries. For each column i in G2 we also store the indexes of the two columns from G that where summed to column i. Next we sort G2 in respect to the B1 − B2 lowest bits, and do the same procedure over again and get a new matrix G3 of size m2 × B2 . We repeat the procedure until we in round r set Br to be zero. After the r’th round we will hopefully have found 2r columns in G that sum to zero. According to Section C.2 we will now have found an multiple of g(x). This algorithm is much faster than the naive algorithm, but it “misses” a lot possible multiples and needs bigger matrix G. Now we will present some new properties for this algorithm. The first round of the algorithm is similar to the well known search algorithm in [5] for finding equations of the type c0 u0 + c1 u1 + ... + cB−1 uB−1 = ui + uj . From this paper we have that the expected number of equations m1 is given by m1 = n(n−1)/2l−B1 . When n is large we can approximate m1 by E(m1 ) =
n2 2l−B1 +1
.
(16)
Since the algorithm is iterative we can use (16) again for the next round and we 4 m2 have E(m2 ) = 2B1 −B1 2 +1 = 22l−BN1 −B2 +3 . Generally for each round i we will have E(mi ) =
m2r−1
2Bi−1 −Bi +1
(17)
for B0 = l and m0 = N . r−1 The iterative search algorithm has complexity O( i=0 mi logmi ) since we have to sort the matrices G1 , G2 ..., Gr . Thus, it is not the complexity that limits the algorithm, but the memory. Given an polynomial g(x) of degree l, we will now present a bound for needed memory for finding a multiple h(x) of weight w = 2r . Assume that we have a computer with Tmem memory units and that one column in G1 takes up one memory unit. Then it will be natural to use a column G of the maximum size Tmem × l. To use the memory most efficiently, we will try find around mi = Tmem sums in each round i, that is Gi = Tmem ∗ Bi−1 . Thus we can set N = m1 = ... = mr−1 = Tmem . We just need find one multiple, so mr = 1. Setting these restriction we can now give an easy expression for how much memory that is needed to find a multiple of weight w = 2r of g(x) of degree l.
126
H˚ avard Molland
Theorem 2. Given a primitive polynomial g(x) of degree l, and r + 1 divides r + l, the expected amount of memory needed to find a weight w = 2r multiple h(x) of g(x) using the iterative search algorithm is r+l
Tmem (lu , r) = 2 r+1 ,
(18)
r+l with Bi = i + l − i r+1 , 1 ≤ i ≤ r − 1, Br = 0.
Proof. From equation (17) we have these formulas for m1 , ..., mr : m1 = m2 =
n2 , 2l−B21 +1 m1 , 2B1 −B2 +1
.. .
mr =
m2r−1 Br−1 −Br +1 2
.
We require that m1 = m2 = ... = mr−1 = n = Tmem , and mr = 1. We solve n2 , in respect to B1 and get n = m1 = 2l−B 1 +1 B1 = 1 + l − logn. We use equation (17) and solve n =
n2 2Bi−1 +Bi +1
(19)
in respect to Bi and get
Bi = Bi−1 + 1 − logn.
(20)
Using (20) together with B1 , we get this expression for Bi : Bi = i + l − ilogn. Next we solve mr = 1, that is
2
n 2Br−1 −Br +1
(21)
= 1. Solving in respect to n and r+l
putting in (21) for i = r − 1 and setting Br = 0, we get n = 2 r+1 . The algorithm require that all the Bi ’s are integers. This will only be the case then we can set n = 2x , for some x. If we want the expression to be exact, we get the requirement r+l that x = r+1 must be an integer. Thus r + 1 must divide r + l. The theorem does not give a guarantee for finding a equations, it just say that we are expected to find one equation. Thus, in practical searches we may use around twice as many bits to assure success.
Correlation Attacks Using a New Class of Weak Feedback Polynomials H˚ akan Englund, Martin Hell, and Thomas Johansson Dept. of Information Techonolgy, Lund University P.O. Box 118, 221 00 Lund, Sweden
Abstract. In 1985 Siegenthaler introduced the concept of correlation attacks on LFSR based stream ciphers. A few years later Meier and Staffelbach demonstrated a special technique, usually referred to as fast correlation attacks, that is very effective if the feedback polynomial has a special form, namely, if its weight is very low. Due to this seminal result, it is a well known fact that one avoids low weight feedback polynomials in the design of LFSR based stream ciphers. This paper identifies a new class of such weak feedback polynomials, polynomials of the form f (x) = g1 (x) + g2 (x)xM1 + . . . + gt (x)xMt−1 , where g1 , g2 , . . . , gt are all polynomials of low degree. For such feedback polynomials, we identify an efficient correlation attack in the form of a distinguishing attack.
1
Introduction
Stream cipher design and cryptanalysis are topics that have received lots of attention recently, due to new interesting designs that are very fast in software, see e.g. [3, 8, 20, 10, 13] and many others. One way to design a stream cipher is to use a Linear Feedback Shift Register (LFSR) sequence as input to a nonlinear function. The shift register is initialized using a short random string and the output from the cipher is a much longer string that has many random like properties. But the LFSR output is linear and in some way the linearity of the sequence must be destroyed through some nonlinear process. Different attacks can be used to attack the nonlinear part of a cipher. A popular topic lately has been algebraic attacks [5, 6]. These attacks can be mounted if the nonlinear function gives rise to a large system of equations containing equations of low degree. A different attack is to use the correlation between the input and the output of the nonlinear function. In 1985 Siegenthaler introduced the concept of correlation attacks on LFSR based stream ciphers. His basic idea was to perform a divide-and-conquer attack by exploring the correlation between the output sequence of the generator and the output sequence of one individual LFSR (assuming a nonlinear combination generator). A few years later Meier and Staffelbach demonstrated a special technique, usually referred to as fast correlation attacks [17]. This attack is very effective if the feedback polynomial has a special form, namely, if its weight is very low. B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 127–142, 2004. c International Association for Cryptologic Research 2004
128
H˚ akan Englund et al.
The ideas resemble a lot so-called iterative decoding of error correcting codes, for example low density parity check codes. Since then, many ideas concerning fast correlation attacks have been presented, see e.g. [1, 2, 14, 15, 19, 11]. Due to this seminal result, it is a well known fact that one avoids low weight feedback polynomials in the design of LFSR based stream ciphers. This paper identifies a new class of such weak feedback polynomials, namely, polynomials of the form f (x) = g1 (x) + g2 (x)xM1 + . . . + gt (x)xMt−1 , where g1 , g2 , . . . , gl are all polynomials of low degree. For such feedback polynomials, we identify an efficient correlation attack in the form of a distinguishing attack. In a distinguishing attack the key is not recovered, instead one tries to distinguish an observed keystream from a truly random stream. This attack is not as powerful as the key recovery attack, in which one finds the key that corresponds to the plaintext and the ciphertext. Whereas most previous work on correlation attacks have been focused on key recovery attacks, more recent work in cryptanalysis of stream ciphers have, to a large extent, been concerned with distinguishing attacks, see e.g. [9, 4]. It should also be noted that a distinguishing attack can sometimes be turned into a key recovery attack, in a similar way as for block ciphers. The results of the paper are as follows. For the new class of such weak feedback polynomials, given above, we present an algorithm for launching an efficient fast correlation attack. The applicability of the algorithm is twofold. Firstly, if the feedback polynomial is of the above form with a moderate number of polynomials ,gi , the new algorithm will be much more powerful than applying any previously known (like Meier-Staffelbach) algorithm. This could be interpreted as feedback polynomials of the above kind should be avoided in designing new LFSR based stream ciphers. Secondly, for an arbitrary feedback polynomial, a standard approach is to search for low weight multiples of the feedback polynomial and then to apply the Meier-Staffelbach approach to fast correlation attacks using a low weight multiple. We can do the same and search for multiples of the feedback polynomial that have the above form. It turns out that this approach is in general less efficient than searching for low weight polynomial, but we can always find specific instances of feedback polynomials where we do get an improvement. The remaining parts of the paper are presented as follows. In Section 2 we give the basic preliminaries for the attack. In Section 3 we discuss how a basic distinguishing attack is mounted and in Section 4 we expand this attack by using vectors with noise variables. In Section 5 we give the consequences when tweaking the different parameters of the attack. Section 6 discusses the problem of finding a multiple of the characteristic polynomial. In Section 7 we compare our attack to the basic attack and in Section 8 we give our conclusions.
Correlation Attacks Using a New Class of Weak Feedback Polynomials LF SR
g(x)
129
BSC
sn -
1−p 0H *0 p HH p HH -1 j H 1 1−p
zn-
Fig. 1. Model for a correlation attack
2
Preliminaries
The model used for the attack is the standard model for a correlation attack, illustrated in Figure 1. For a more detailed description of this model we refer to, for example, [17]. The target stream cipher uses two different components, one linear and one nonlinear. The linear part is a LFSR and the nonlinear part can be modeled as a black box. One often illustrates the correlation attack scenario through the class of nonlinear combining generators, although it is applicable to the more general case described above. See for example [18]. It is common to view the problem of cryptanalysis as a decoding problem. The nonlinear function can then, through a linear approximation, be seen as a binary symmetric channel (BSC) with crossover probability p (correlation probability 1 − p) with p = 0.5. The output bits from the LFSR are denoted sn , n = 1, 2, . . ., and the keystream bits are denoted zn , n = 1, 2, . . .. From the BSC it follows immediately that P (sn = zn ) = 1 − p = 0.5. Assuming a known plaintext attack, the problem of ours is the following. Given the observed keystream sequence z = z1 , z2 . . . zn we want to distinguish the keystream from a truly random process. If P0 is the distribution induced by the cipher and P1 is the uniform distribution, we try to determine if the underlying distribution D for the samples that we observe (zn , n = 1, 2, . . .) is more likely to be P0 or P1 . Example: We will here give an example in which such distinguishing attacks can be useful. Assume that the people in a small country are going to vote in a referendum. Alice is going to send her vote as a ciphertext c = m + z, where m is the vote and z is the keystream. Now assume that there are two possible ways to vote, either m = m(1) or m = m(2) , and that both of them include some large amount of data (e.g they are both pictures). The attacker Eve, who can listen to the channel, receives the ciphertext c. He makes the guess that Alice voted m(1) . Eve then adds m(1) to the received ciphertext and depending on whether he made the right or wrong guess he gets z correct guess (1) zˆ = c + m = . m(1) + m(2) + z wrong guess
130
H˚ akan Englund et al.
If the guess was incorrect the result is a random sequence, assuming that m(1) + m(2) is random. Hence if we apply a distinguisher to the vector ˆz, Eve can determine how Alice voted. 2.1
Hypothesis Testing
In this section we give a brief introduction to binary hypothesis testing. The task of a binary hypothesis test is to decide which of two hypotheses H0 or H1 is the explanation for an observed measurement. Statistics provides methods to determine how many output symbols that are needed to make a correct decision and also how to carry out the actual hypothesis test. These two parts will be explained in the following. Assume that we have a sequence of m independent and identically distributed (i.i.d.) random variables X1 , X2 , . . . , Xm over an alphabet X . Its distribution is denoted D(x) = P r(Xi = x), 1 ≤ i ≤ m and the sample values obtained in an experiment are denoted x = x1 , x2 , . . . , xm . We have the two hypotheses H0 : D = P0 and H1 : D = P1 , where P0 and P1 are two different distributions. To distinguish between the two hypotheses, one defines a decision function, φ : X m → {0, 1}. φ(x) = 0 implies that H0 is accepted and φ(x) = 1 implies that H1 is accepted. Two probabilities of error are associated with the decision function, α = P (φ(x) = 1|H0 is true) β = P (φ(x) = 0|H1 is true).
(1)
Let H0 be the hypothesis that the distribution D is induced by the cipher and let H1 be the hypothesis that D is uniform. The overall probability of error, Pe , can be written as a weighted sum over α and β, i.e., Pe = π0 α + π1 β, where π0 and π1 are the a priori probabilities of the two hypotheses. An important asymptotic result is the following, Pe ≈ 2−mC(P0 ,P1 ) ,
(2)
when m is large. The variable C(P0 , P1 ) is the Chernoff information between distributions P0 and P1 . The Chernoff information is obtained through λ 1−λ C(P0 , P1 ) = − min log2 (P0 (x)) (P1 (x)) . (3) 0≤λ≤1
x∈X
It can be difficult to determine the exact value of λ but by picking just any value, e.g. λ = 0.5, it is possible to obtain a lower bound of the Chernoff information and hence, an upper bound of Pe . By using (2), the number of samples needed to distinguish P0 and P1 can be calculated for any error probability. We also need to know how to perform the hypothesis test. The NeymanPearson lemma tells us how to carry out the actual test when we have a sequence of samples.
Correlation Attacks Using a New Class of Weak Feedback Polynomials
131
Lemma 1. (Neyman-Pearson lemma) Let X1 , X2 , . . . , Xm be drawn i.i.d. according to mass function D. Consider the decision problem corresponding to the hypotheses D = P0 vs. D = P1 . For T ≥ 0 define a region P0 (x1 , x2 , . . . , xm ) >T . Am (T ) = P1 (x1 , x2 , . . . , xm ) Let αm = P0m (Acm (T )) and βm = P1m (Am (T )) be the error probabilities corresponding to the decision region Am . Let Bm be any other decision region with associated error probabilities α∗ and β ∗ . If α∗ ≤ α, then β ∗ ≥ β. 0 (x) This tells us that the region Am (T ) that is determined by P P1 (x) > T , is the one that jointly minimizes α and β. In the hypothesis test we want α and β to be equal and hence T = 1. With the assumption that all xn are independent we can rewrite the Newman-Pearson test using 2-logarithms. This gives us the test m P0 (xn ) P0 (x1 , x2 , . . . , xm ) >1 ⇒ log2 >0. (4) P1 (x1 , x2 , . . . , xm ) P1 (xn ) n=1
The ratio in (4) is called a log-likelihood ratio, and the test is thus called a loglikelihood test. This was a very brief overview of some tools that will be useful for us. For a more thorough treatment of hypothesis testing, we refer to any textbook on the subject, e.g. [7].
3
A Basic Distinguishing Attack from a Low Weight Feedback Polynomial
We start our investigation by simplifying the Meier-Staffelbach approach and turn their original ideas into a distinguishing attack. Referring to the assumed model (Figure 1), the observed keystream output is considered as a noisy version of the sequence from the LFSR, z n = sn + e n ,
(5)
where en , n = 1, 2, . . ., are variables representing the noise introduced by the approximation. The noise has a biased distribution P (en = 0) = p = 1/2 + ε, where ε is usually rather small. The recursive computation of sn is linear, and for a LFSR the computation of sn will depend on the Lcharacteristic polynomial of the LFSR. The recurrence can be written as sn = j=1 cj sn−j where L is the LFSR length and cj , j = 1, 2, . . ., are some known constants. By introducing c0 = 1 the recurrence above can be put into the form L j=0
cj sn−j = 0,
n ≥ j.
132
H˚ akan Englund et al.
By adding the corresponding positions in z we can get all sn canceling out. What remains is just a sum of independent noise variables. In more detail, let us introduce L xn = cj zn−j . L
L
j=0
L L Then xn = j=0 cj zn−j = j=0 cj sn−j + j=0 cj en−j = j=0 cj en−j . Since the distribution of en is nonuniform it is possible to distinguish the sample sequence xn , n = 1, 2, . . ., from a truly random sequence. If we assume the binary case (all variables are binary), the sum of the noise will have a bias which according to the piling-up lemma [16] can be expressed as follows, L cj en−j = 0) = 1/2 + 2w−1 εw , P(
(6)
j=0
where w is the weight of (c0 , c1 , . . . , cL ). We also know that the required number of samples is in the order of 1/(2w−1εw )2 . In this paper we choose to use, however, the Chernoff information as a measure of the distance between two distributions, as described in Section 2.1. If we consider the binary case above, the expression for the Chernoff information in equation (3) becomes 1 1 1 1 w−1 w w−1 w +2 + −2 . ε ε C(P0 , P1 ) ≥ − log2 2 2 2 2 In Table 1 in appendix we give the number of samples needed for some different w values. In the table we use ε = 0.1 and ε = 0.01. Note that the ideas behind this simple attack has appeared in many attack scenarios before, even if it might not have been described exactly in this context before. We see that the weight of the characteristic polynomial is directly connected to the success of the attack.
4
A More General Distinguisher with Correlated Vectors
As we have seen in the previous section, low weight polynomials are easily attacked, but when the weight grows so does the required length and complexity (exponentially). At some point we argue that the attack is no longer realistic, or it might require more than an exhaustive key search. In this section we now describe a similar but more general approach that can be applied to another set of characteristic polynomials. Consider a length L LFSR with characteristic polynomial f (x) = f0 + f1 x + f2 x2 + f3 x3 + . . . fL xL ,
Correlation Attacks Using a New Class of Weak Feedback Polynomials
133
where fi ∈ F2 . We try to find a multiple, a(x), of the characteristic polynomial so that this polynomial can be written as a(x) = f (x)h(x) = g1 (x) + xM g2 (x),
(7)
where g1 (x) and g2 (x) are polynomials of some small degree ≤ k. It is possible that f (x) is already on the form (7), then h(x) = 1. In the sequel we assume that such a polynomial a(x) of the above form is given. This will correspond to a shift register for which the taps are concentrated to two regions far away from each other. The linear recurrence relation can then be written as the two sums k
sn+i ai +
i=0
k
sn+M+i aM+i = 0,
(8)
i=0
where sn is the nth output bit from the LFSR and ai , i = 0, 1, . . ., are the coefficients in the characteristic polynomial a(x). We now consider the standard model for a correlation attack where the output of the cipher is considered as a noisy version of the LFSR sequence zn = sn + en . The noise variables en is introduced by the approximation of the nonlinear part of the cipher. Furthermore, the biased noise has distribution P (en = 0) = 1/2 + ε, and the variables are pairwise independent, i.e., P (ei , ej ) = P (ei )P (ej ), ∀i = j. Let us introduce the notation Qn to be the sum Qn =
k i=0
zn+i ai +
k
zn+M+i aM+i =
i=0
k
en+i ai +
i=0
k
en+M+i aM+i .
(9)
i=0
This can also be written as Q0 = e[0, k] · g1 + e[M, M + k] · g2 , Q1 = e[1, k + 1] · g1 + e[M + 1, M + k + 1] · g2 , .. . QN −1 = e[N − 1, N + k − 1] · g1 + e[M + N − 1, M + N + k − 1] · g2 , if we introduce e[i, j] = (ei , . . . , ej ) for i ≤ j and g1 = (g1,0 , g1,1 , . . . , g1,k )T where g1,j , j = 0, 1, . . . , k are the coefficients of the g1 (x) polynomial. A corresponding notation is assumed for g2 . The noise variables (en , n = 1, 2, . . .) are independent but Qi values that are close to each other will not be independent in general. This is because of the fact that several Qi will contain common noise variables. We can take advantage of this fact by moving to a vector representing the noise as follows. Introduce the vectorial noise vector En of length N as En = (QN ·n , . . . , QN (n+1)−1 ).
(10)
134
H˚ akan Englund et al.
Alternatively, En can be expressed as En = (eN ·n , . . . , eN (n+1)+k−1 ) · G1 + (eN ·n+M , . . . , eN (n+1)+M+k−1 ) · G2 , (11) where G1 and G2 are the (N + k) × N matrices ⎞ ⎞ ⎛ g1,0 g2,0 ⎟ ⎟ ⎜ g1,1 g1,0 ⎜ g2,1 g2,0 ⎟ ⎟ ⎜ ⎜ ⎟ ⎟ ⎜ .. ⎜ .. .. .. ⎟ ⎜ . ⎜ . g1,0 ⎟ . . g2,0 ⎟ ⎟ ⎜ and G G1 = ⎜ = 2 ⎜ g1,k g1,k−1 . . . g1,1 ⎟ ⎜ g2,k g2,k−1 . . . g2,1 ⎟ . ⎟ ⎟ ⎜ ⎜ ⎜ ⎜ .. ⎟ .. ⎟ ⎝ ⎝ g1,k g2,k . ⎠ . ⎠ g1,k g2,k ⎛
The remaining pieces to complete the distinguishing attack for the assumed polynomial are obvious. To prepare the attack, we derive the distribution of the En noise vector given in (11). This can be easily done for small polynomial degrees, since we know the distribution of the noice. This calculation results in a distribution which we denote by P0 . Our aim is to distinguish this distribution from the truly random case, so by P1 we denote the uniform distribution. When performing the attack, we collect a sample sequence Q 0 , Q 1 , Q 2 , . . . , QB k
k
by Qn = i=0 zn+i ai + i=0 zn+M+i aM+i . This sample sequence is then transformed into a vectorial sample sequence E0 , E1 , . . . , EB by En = (QN ·n , . . . , QN (n+1)−1 ). The final step is to use optimal (Neyman-Pearson) hypothesis testing to decide whether E0 , E1 , . . . , EB is most likely to come from distribution P0 or P1 . This proposed algorithm is summarized in Figure 2. The performance of the algorithm depends on the polynomials, gi , that are used. Figure 4 shows an example of how the number of vectors required for a successful attack depends on the vector length for a certain combination of
1. Find multiple. 2. For t = 0 . . . B Qt = ki=0 zt+i ai + ki=0 zt+M +i aM +i end for. 3. For n = 0 . . . B En = (QN·n , . . . , QN(n+1)−1 ) end for. P0 (En ) 4. Calculate I = B n=0 log 2 P1 (En ) . If I > 0 then output cipher, else output random. Fig. 2. Summary of proposed algorithm
Correlation Attacks Using a New Class of Weak Feedback Polynomials
135
40
2
Log (n)
35 30 25 20
2
4
6
8
10
12
14
Fig. 3. The number of vectors needed as a function of the vector length N . In this example g1 (x) = 1 + x + x5 + x6 and g2 (x) = 1 + x + x7 + x8
two polynomials. N = 1 corresponds to the basic approach and we see that increasing the vector length will decrease the number of vectors needed. Note that g1 (x) and g2 (x) are just two examples of what the polynomials might look like, they do not represent a multiple of any specific primitive polynomial. Finally, we note that we may generalize our reasoning with two groups to allow finding a multiple of arbitrarily many groups. Expression (7) for the multiple then becomes a(x) = h(x)f (x) = g1 (x) + xM1 g2 (x) + . . . + xMt−1 gt (x),
(12)
where gi (x) is a polynomial of some small degree ≤ k, and M1 < M2 < . . . < Mt−1 . It is clear that when t grows it is easier to find multiples of the characteristic polynomial with the desired properties. But it is also clear that when t grows, the attack becomes weaker. This distinguishing attack that we propose may be mounted on ciphers using a shift register where “good” multiples easily can be found. Ciphers using a polynomial where many taps are close together might be attacked directly without finding any multiple. This attack may be viewed as a new design criteria, one should avoid LFSRs where multiples of the form (7) are easily found.
5
Tweaking the Parameters in the Attack
We have described the new attack in the previous section. We now discuss how various algorithmic parameters effect the results. 5.1
How gi (x) Effects the Results
Since the vectors En are correlated it is intuitive that the forms of gi (x) will effect the strength of the attack. Some combinations of polynomials will turn out to be much better than others. We have tested a number of different polynomials gi (x) in order to find a set of rules describing how the form of the polynomials effects the result. A definite
136
H˚ akan Englund et al.
rule to decide which polynomials are best suited for the attack is hard to find. Some basic properties that characterizes a good polynomial can be found. The parameters that determine how a polynomial will effect the distribution of the noise vectors are the following. – The weight of the polynomial. A small weight means that the we have a small number of noise variables and hence, a large bias. And a large weight means many noise variables and therefore a more uniform noise distribution. – Arrangement of the terms in a polynomial. This is the same as the arrangement of the taps in the LFSR. If there are many taps close together, the corresponding noise variables will occur more frequently in the noise vector. This will significantly effect the distribution of the noise vectors. Since Mi is typically large, all polynomials gi (x) can be considered independent. Their properties will have the same influence on the total result. However, the polynomials are combined to form the distribution. This combination is just a variant of the piling-up lemma so it is obvious that the total distribution is more uniform than the distribution of the individual polynomials. Depending on the form of the individual polynomials, the resulting distribution becomes more or less uniform. Some combinations are “better” than others. An example of this can be found in Figure 4 in the appendix. 5.2
Vector Length
The length of the vectors, denoted N impacts the effectiveness of the algorithm. The idea of using N larger than one is that we can get correlation between the vectors. Every time we increase N by 1 we will also increase the Chernoff information. Recalling (2), we see that increasing the Chernoff information means that we will decrease the number of vectors needed for our hypothesis test. The Chernoff information is however not a linear function of N . Depending on the form of the polynomials the increase can be much higher for some N than for others. The computational complexity is also higher when N gets higher since each vector have 2N different values. The downside is that the complexity of the calculations increase with increasing N . So there is trade off between the number of samples needed and the complexity of the calculations. The largest gain in Chernoff information is usually achieved when going from N = i to N = i + 1 for small i. This phenomenon can be seen in Figure 5 in the appendix. 5.3
Increase the Number of Groups
As we will show in Section 6.2, it is much easier to find a multiple if we allow the multiple to have more groups than just two. The drawback is that the distribution will become more uniform if more groups are used. This is similar to the binary case in which more taps in the multiple will cause a less biased distribution. The difference is that there is no equivalence to equation (6) for how much more uniform the distribution will be when having many groups, since this depends a lot on the polynomials used.
Correlation Attacks Using a New Class of Weak Feedback Polynomials
6
137
Finding Multiples of the Characteristic Polynomial of a Desired Form
We have many times assumed that a multiple of the feedback polynomial has a certain form. Here we briefly look at the problem of finding multiples of a certain form. 6.1
Finding Low Weight Multiples
According to the piling up lemma (6), the distribution becomes more uniform if the polynomial is of a high weight. Therefore, the first step in a correlation attack is to find a multiple that has a low weight. The multiple will produce the same sequence so the linear relation that describes the multiple will also satisfy the original LFSR sequence. There exist easy and efficient ways of finding a multiple of a given weight. The number of bits needed to actually start the attack depends on the degree of the multiple, which in turn depends the weight of the same. If we want to find a multiple of weight w of a polynomial that has degree L it can be shown [12] that the degree M of the multiple will be approximately: L
M ≥ 2 w−1 .
(13)
In Table 2 in the appendix the result of this equation is listed for a LFSR of length 100 and a LFSR of length 1000. 6.2
Finding Multiples with Groups
Say that we want to find a multiple of the form a(x) = h(x)f (x) = g1 (x) + xM g2 (x), where f (x) is the characteristic polynomial of the cipher. The degree of f (x) is L. This can be found by polynomial division. Assume that we have a g2 (x) of degree smaller than k. We then multiply this polynomial with xi . The result is divided by the original LFSR-polynomial. This gives us a quotient q(x) and a remainder r(x). xi g2 (x) = f (x) · q(x) + r(x). We have 2k different g2 (x)-polynomials and i can be chosen in M different ways. The remainder r(x) from the division is a polynomial with 0 ≤ deg(r(x)) < L. If r(x) has degree ≤ k we have found an acceptable g1 (x). The probability of finding a polynomial of maximum degree k is P (deg(g2 (x)) ≤ k) = 2k−L . If we would like it to be probable that we find at least one such polynomial, we need 2k · M ·
2k ≥ 1 ⇒ M ≥ 2L−2k . 2L
(14)
138
H˚ akan Englund et al.
Examining the result in (14) we see that for modest values of k the length of the multiple will become quite large. Therefore we extend our reasoning to the case with arbitrarily many groups, then we have a multiple of the form a(x) = h(x)f (x) = g1 (x) + xM1 g2 (x) + . . . + xMt−1 gt (x). If we use the same reasoning as above we receive a new expression L−tk 2k ≥ 1 ⇒ M ≥ 2 t−1 , (15) L 2 where it is assumed that M1 , M2 , . . . , Mt−1 ≤ M . This gives us an upper bound on all Mi . In (15) we see that by using larger values of t, i.e., more groups, we can lower the length of the multiple. One has to bear in mind though that a larger t, as stated in Section 5.3, will usually effect the Chernoff information in a negative way (from a cryptanalyst point of view). In Table 3 in the appendix we list some values on M needed to find a multiple, for some values of k and t.
2k · M1 · 2k · M2 · . . . · 2k · Mt−1 ·
7
Comparing the Proposed Attack with a Basic Distinguishing Attack
The applicability of our algorithm is twofold. Firstly, if the characteristic polynomial is of the form f (x) = g1 (x) + g2 (x)xM1 + . . . + gt (x)xMt−1 . Applying the basic algorithm to those LFSRs without finding any multiple first will be equivalent to applying our algorithm with N = 1. Since our algorithm has the ability to have vectors with noise variables (N > 1), it will be a significant improvement over the basic algorithm. Using the basic algorithm without first finding a multiple is naive, but if the length L of the LFSR is large the degree of the low weight multiple will also be large, see (13). So if f (x) is of high degree then our algorithm can be more effective. Our algorithm can also be applied to arbitrary characteristic polynomials. Then the approach is to first find a multiple of the polynomial that is of the form f (x) = g1 (x) + g2 (x)xM1 + . . . + gt (x)xMt−1 and then apply the algorithm. By comparing the two equations (13) and (15) we see that it is not much harder to find a polynomial of some weight w than it is to find a polynomial with the same number of groups. Tables showing the corresponding M can be found in the appendix. Although our algorithm takes advantage of the fact that the taps are close together, it is still not enough to compensate for the larger amount of noise variables. In this case the proposed attack will give improvements only for certain specific instances of characteristic polynomials, e.g., those having a surprisingly weak multiple of the form f (x) = g1 (x) + g2 (x)xM1 but no low weight multiples where the weight is surprisingly low.
8
Conclusion and Future Work
Through a new correlation attack, we have identified a new class of weak feedback polynomials, namely, polynomials of the form f (x) = g1 (x) + g2 (x)xM1 + . . . +
Correlation Attacks Using a New Class of Weak Feedback Polynomials
139
gt (x)xMt−1 , where g1 , g2 , . . . , gt are all polynomials of low degree. The correlation attack has been described in the form of a distinguishing attack. This was done mainly for simplicity, since the theoretical performance is easily calculated and we can compare with the basic attack based on low weight polynomials. The next step in this direction would be to examine the possibility of turning these ideas into a key recovery attack. This could be done in a similar manner as the Meier Staffelbach approach. For example, we could try to derive many different relations (multiples) and apply some iterative decoding approach in vector form. The theoretical part of such an approach will probably be much more complicated.
References [1] A. Canteaut, M. Trabbia. Improved fast correlation attacks using parity-check equations of weight 4 and 5. Advances in Cryptology-Eurocrypt’2000, volume 1807 of Lecture Notes in Computer Science, pages 573-588. Springer-Verlag, 2000. 128 [2] V. Chepyzhov, T. Johansson, B. Smeets. A simple algorithm for fast correlation attacks on stream ciphers. Advances in Cryptology-FSE’2000, volume 1978 of Lecture Notes in Computer Science, pages 181-195. Springer-Verlag, 2001. 128 [3] D. Coppersmith, S. Halevi and C. S. Jutla. SCREAM: a software efficient stream cipher. In J. Daemen and V. Rijmen, editors, Advances in Cryptology-FSE’2002, volume 2365 of Lecture Notes in Computer Science, pages 195-210. SpringerVerlag, 2002. 127 [4] D. Coppersmith, S. Halevi and C. S. Jutla. Cryptanalysis of stream ciphers with linear masking. In M. Young, editor. Advances in Cryptology-Crypto’2002, volume 2442 of Lecture Notes in Computer Science, pages 515-532. Springer-Verlag, 2002. 128 [5] N. Courtois and Josef Pieprzyk. Cryptanalysis of block ciphers with overdefined systems of equations. Advances in Cryptology-Asiacrypt’2002, volume 2501 of Lecture Notes in Computer Science, pages 267-287. Springer-Verlag, 2002. 127 [6] N. Courtois and W. Meier. Algebraic attacks on stream ciphers with linear feedback. Advances in Cryptology-Eurocrypt’2003, volume 2656 of Lecture Notes in Computer Science, pages 345-359. Springer-Verlag, 2003. 127 [7] T. Cover and J. A. Thomas. Elements of information theory. Wiley series in telecommunication. Wiley, 1991. 131 [8] P. Ekdahl and T. Johansson. A new version of the stream cipher SNOW. In K. Nyberg and H. Heys, editors. Selected Areas in Cryptography-SAC 2002, volume 2595 of Lecture Notes in Computer Science, pages 47-61. Springer-Verlag, 2003. 127 [9] P. Ekdahl and T. Johansson. Distinguishing attack on SOBER-t16 and SOBERt32. In J. Daemen and V. Rijmen, editors. Advances in Cryptology-FSE’2002, volume 2365 of Lecture Notes in Computer Science, pages 210-224. Springer-Verlag, 2002. 128 [10] N. Ferguson, D. Whiting, B. Schneider, J. Kelsey, S. Lucks and T. Kohno. Helix: Fast Encryption and Authentication in a Single Cryptographic Primitive. In T. Johansson, editor, Advances in Cryptology-FSE’2003, volume 2887 of Lecture Notes in Computer Science, pages 330-346. Springer-Verlag, 2003. 127
140
H˚ akan Englund et al.
[11] J. D. Goli´c. Intrinsic statistical weakness of keystream generators. Advances in Cryptology-Asiacrypt’94, volume 917 of Lecture Notes in Computer Science, pages 91-103. Springer-Verlag, 1995. 128 [12] J. D. Goli´c. Computation of low-weight parity-check polynomials. Electronic Letters, 32(21):1981-1982, October 1996. 137 [13] P. Hawkes and G. Rose. Primitive specification and supporting documentation for SOBER-t32 submission to NESSIE. In Proceedings of First Open NESSIE Workshop, 2000. 127 [14] T. Johansson, F. J¨ onsson. Improved fast correlation attacks on stream ciphers via convolutional codes. Advances in Cryptology-Eurocrypt’99, volume 1592 of Lecture Notes in Computer Science, pages 347-362. Springer-Verlag, 1999. 128 [15] T. Johansson, F. J¨ onsson. Fast correlation attacks based on turbo code techniques. Advances in Cryptology-Crypto’99, volume 1666 of Lecture Notes in Computer Science, pages 181-197. Springer-Verlag, 1999. 128 [16] M. Matsui. Linear cryptanalysis method for DES cipher. In T. Helleseth, editor, Advances in Cryptology-Eurocrypt’93, volume 765 of Lecture Notes in Computer Science, pages 386-397. Springer-Verlag, 1994. 132 [17] W. Meier and O. Staffelbach. Fast correlation attacks on stream ciphers.Advances in Cryptology-Eurocrypt’88,volume 330 of Journal of Cryptology,pages 310-314. Springer-Verlag, 1988. 127, 129 [18] A. Menezes, P. van Oorschot and S. Vanstone. Handbook of Applied cryptography, CRC Press, 1997. 129 [19] M. Mihaljevic, M. Fossorier and H. Imai. A low-complexity and high-performance algorithm for the fast correlation attack. Advances in Cryptology-FSE’2000, volume 1978 of Lecture Notes in Computer Science, pages 196-212. Springer-Verlag, 2001. 128 [20] D. Watanabe, S. Furuya, H. Yoshida, K. Takaragi and B. Preneel. A new keystream generator MUGI. In J. Daemen and V. Rijmen, editors, Fast Software Encryption 2002, volume 2365 of Lecture Notes in Computer Science, pages 179-194. Springer-Verlag, 2002. 127
A
Tables and Figures
The appendix contains some tables and figures that can be used to compare some different parameters discussed in the paper. Table 1 shows the number of samples needed to distinguish a sequence from random in the basic binary case. The result is given for two different ε. Table 2 shows the expected degree of the multiple of f (x) in the basic distinguishing attack, where w is the weight of the polynomial to be found and L is the length of the LFSR. Table 3 shows the corresponding values for our attack, where t denotes the number of groups and k the maximum number of taps allowed in each polynomial. The degree of the multiple is the same as the amount of plaintext needed before the actual attack can start.
Correlation Attacks Using a New Class of Weak Feedback Polynomials
2
3
4
5
2
4
7
2
4
4
5
5
141
6
g1(x)=1+x +x +x +x and g2(x)=1+x +x +x +x 120 110
2
Log (n)
100 90 80 70 60
2
4
6
8
10
12
14
12
14
N 7
8
g (x)=1+x+x +x +x and g (x)=1+x +x +x +x 1
2
120 110
2
Log (n)
100 90 80 70 60
2
4
6
8
10
N
Fig. 4. The sequence length needed as a function of the vector length N (logarithmic) g1(x)=1+x4+x5+x7 and g2(x)=1+x+x3+x4+x5+x7 130 120
Log2(n)
110 100 90 80 70 60
2
4
6
8
10
12
10
12
N g1(x)=1+x+x2+x4+x5 and g2(x)=1+x+x2+x3+x4 130 120
Log2(n)
110 100 90 80 70 60
2
4
6
8 N
Fig. 5. The sequence length needed as a function of the vector length N (logarithmic)
142
H˚ akan Englund et al.
Table 1. Number of samples needed for some different w in the binary case for ε = 0.1 and ε = 0.01. The number of vectors needed is calculated as C(P01,P1 )
M
w 3
4
5
6
7
8
9
10
11
12
13
14
15
16
ε = 0.1 216.4 221 225.7 230.3 235 239.6 244.3 248.9 253.6 258.2 262.8 267.5 272.1 276.8 ε = 0.01 236.3 247.6 258.9 270.2 281.5 292.8 2104 2115 2127 2138 2149 2160 2172 2183
Table 2. The degree M of the multiple as a function of L and w M
L = 100
w 2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
2100
250
233
225
220
217
214
2123
211
210
29
28
28
27
27
1000
L = 1000 2
500
2
333
2
250
200
2
167
2
143
2
125
2
111
2
100
2
2
91
2
83
2
77
2
71
2
Table 3. The degree M of the multiple as a function of k and t for L = 100
M
k
t 2
3
4
5
6
3
294
247
231
224
219
4
292
246
231
223 218
5
290
245
230
223 218
6
288
244
229
222 218
7
286
243
229
222 217
8
284
242
228
221 217
267
Minimum Distance between Bent and 1-Resilient Boolean Functions Soumen Maity1 and Subhamoy Maitra2 1
Department of Mathematics, Indian Institute of Technology Guwahati Guwahati 781 039, Assam, INDIA
[email protected] 2 Applied Statistics Unit, Indian Statistical Institute 203, B T Road, Kolkata 700 108, INDIA
[email protected]
Abstract. In this paper we study the minimum distance between the set of bent functions and the set of 1-resilient Boolean functions and present a lower bound on that. The bound is proved to be tight for functions up to 10 input variables. As a consequence, we present a strategy to modify the bent functions, by toggling some of its outputs, in getting a large class of 1-resilient functions with very good nonlinearity and autocorrelation. In particular, the technique is applied upto 12variable functions and we show that the construction provides a large class of 1-resilient functions reaching currently best known nonlinearity and achieving very low autocorrelation values which were not known earlier. The technique is sound enough to theoretically solve some of the mysteries of 8-variable, 1-resilient functions with maximum possible nonlinearity. However, the situation becomes complicated from 10 variables and above, where we need to go for complicated combinatorial analysis with trial and error using computational facility. Keywords: Autocorrelation, Bent Function, Boolean Function, Nonlinearity, Resiliency
1
Introduction
Construction of resilient Boolean functions with very good parameters in terms of nonlinearity, algebraic degree and other cryptographic parameters has received lot of attention in literature [15, 16, 18, 19, 8, 21, 2, 3]. In [17, 7], it had been shown how bent functions can be modified to construct highly nonlinear balanced Boolean functions. A recent construction method [12] presents modification of some output points of a bent function to construct highly nonlinear 1-resilient function. A natural question that arises in this context is “at least how many bits in the output of a bent function need to be changed to construct an 1-resilient Boolean function”. The answer of this question gives the minimum distance between the set of bent functions and the set of 1-resilient functions. We here try to answer this question and show that the minimum distance for n-variable B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 143–160, 2004. c International Association for Cryptologic Research 2004
144
Soumen Maity and Subhamoy Maitra
functions is
n (r + 1)(2 2 −1 − ri=0 ni ) + ri=1 i ni dBRn (1) ≥ 2 +2 , n−r−1 r+1 n r n n ≤ 2 2 −1 + 1 < where r is the integer such that i=0 i i=0 i is satisfied. We also show that this result is tight for n ≤ 10. The immediate corollary is the construction of 1-resilient Boolean functions with nonlinearity ≥ n 2n−1 − 2 2 −1 − dBRn (1) and maximum absolute value of autocorrelation spectra ≤ 4 dBRn (1). Interestingly, it is possible to get 1-resilient functions with better nonlinearity and autocorrelation than these bounds. In particular, we concentrate on construction of 1-resilient Boolean functions up to 12-variables with best known nonlinearity and autocorrelation. Throughout the paper we consider the number of input variables (n) is even. The bent functions chosen in [12, Section 3] use the concept of perfect nonlinear functions and one example function each for 8, 10 and 12 variables were presented. However, it is not clear how a generalized construction of such bent functions can be achieved in that manner. We here identify a large subclass of Maiorana-McFarland type bent functions which can be modified to get 1-resilient functions with currently best known parameters. Further our construction is superior to [12] in terms of number of points that need to be toggled (we need less in case of 10, 12 variables), the nonlinearity (we get better nonlinearity for 12 variables) and autocorrelation (we get 1-resilient functions with autocorrelation values that were not known earlier for 10, 12 variables). n 2 −1
1.1
Preliminaries
A Boolean function on n variables may be viewed as a mapping from {0, 1}n into {0, 1}. A Boolean function f (x1 , . . . , xn ) is also interpreted as the output column of its truth table f , i.e., a binary string of length 2n , f = [f (0, 0, · · · , 0), f (1, 0, · · · , 0), f (0, 1, · · · , 0), . . . , f (1, 1, · · · , 1)]. The Hamming distance between two binary strings S1 , S2 is denoted by d(S1 , S2 ), i.e., d(S1 , S2 ) = #(S1 = S2 ). Also the Hamming weight or simply the weight of a binary string S is the number of ones in S. This is denoted by wt(S). An n-variable function f is said to be balanced if its output column in the truth table contains equal number of 0’s and 1’s (i.e., wt(f ) = 2n−1 ). Denote addition operator over GF (2) by ⊕. An n-variable Boolean function f (x1 , . . . , xn ) can be considered to be a multivariate polynomial over GF (2). This polynomial can be expressed as a sum of products representation of all distinct kth order products (0 ≤ k ≤ n) of the variables. More precisely, f (x1 , . . . , xn ) can be written as ai xi ⊕ aij xi xj ⊕ . . . ⊕ a12...n x1 x2 . . . xn , a0 ⊕ 1≤i≤n
1≤i<j≤n
where the coefficients a0 , aij , . . . , a12...n ∈ {0, 1}. This representation of f is called the algebraic normal form (ANF) of f . The number of variables in the
Minimum Distance between Bent and 1-Resilient Boolean Functions
145
highest order product term with nonzero coefficient is called the algebraic degree, or simply the degree of f and denoted by deg(f ). Functions of degree at most one are called affine functions. An affine function with constant term equal to zero is called a linear function. The set of all nvariable affine (respectively linear) functions is denoted by A(n) (respectively L(n)). The nonlinearity of an n-variable function f is nl(f ) = min d(f, g), g∈A(n)
i.e., the distance from the set of all n-variable affine functions. Let x = (x1 , . . . , xn ) and ω = (ω1 , . . . , ωn ) both belong to {0, 1}n and x · ω = x1 ω1 ⊕ . . . ⊕ xn ωn . Let f (x) be a Boolean function on n variables. Then the Walsh transform of f (x) is a real valued function over {0, 1}n which is defined as Wf (ω) = (−1)f (x)⊕x·ω . x∈{0,1}n
In terms of Walsh spectra, the nonlinearity of f is given by nl(f ) = 2n−1 −
1 max |Wf (ω)|. 2 ω∈{0,1}n n
For n-even, the maximum nonlinearity of a Boolean function can be 2n−1 −2 2 −1 and the functions possessing this nonlinearity are called bent functions [14]. n Further, for a bent function f on n variables, Wf (ω) = ±2 2 for all ω. In [9], an important characterization of correlation immune and resilient functions has been presented, which we use as the definition here. A function f (x1 , . . . , xn ) is m-resilient (respectively m-th order correlation immune) iff its Walsh transform satisfies Wf (ω) = 0, for 0 ≤ wt(ω) ≤ m (respectively Wf (ω) = 0, for 1 ≤ wt(ω) ≤ m). As the notation used in [15, 16], by an (n, m, d, σ) function we denote an nvariable, m-resilient function with degree d and nonlinearity σ. We will now define restricted Walsh transform which will be frequently used in this text. The restricted Walsh transform of f (x) on a subset S of {0, 1}n is a real valued function over {0, 1}n which is defined as Wf (ω)|S = (−1)f (x)⊕x·ω . x∈S
Now we present the following technical result. Proposition 1. Let S ⊂ {0, 1}n and b(x), f (x) be two n-variable Boolean functions such that f (x) = 1 ⊕ b(x) when x ∈ S and f (x) = b(x) otherwise. Then Wf (ω) = Wb (ω) − 2Wb (ω)|S .
146
Soumen Maity and Subhamoy Maitra
Proof. Take ω ∈ {0, 1}n. Now Wf (ω) = x∈{0,1}n (−1)f (x)⊕ω·x = x∈{0,1}n −S (−1)f (x)⊕ω·x + x∈S (−1)f (x)⊕ω·x = x∈{0,1}n −S (−1)b(x)⊕ω·x − x∈S (−1)b(x)⊕ω·x (since f, b are same for the inputs ∈ /S and complement when the inputs ∈ S) = x∈{0,1}n −S (−1)b(x)⊕ω·x + x∈S (−1)b(x)⊕ω·x − 2 x∈S (−1)b(x)⊕ω·x = x∈{0,1}n (−1)b(x)⊕ω·x − 2 x∈S (−1)b(x)⊕ω·x = Wb (ω) − 2Wb (ω)|S . Propagation Characteristics (PC) and Strict Avalanche Criteria (SAC) [13] are important properties of Boolean functions to be used in S-boxes. Further, Zhang and Zheng [22] identified related cryptographic measures called Global Avalanche Characteristics (GAC). Let α ∈ {0, 1}n and f be an n-variable Boolean function. Define the autocorrelation value of f with respect to the vector α as ∆f (α) = (−1)f (x)⊕f (x⊕α) , x∈{0,1}n
and the absolute indicator ∆f =
max
α∈{0,1}n ,α=0
|∆f (α)|.
A function is said to satisfy PC(k), if ∆f (α) = 0 for 1 ≤ wt(α) ≤ k. Note that, for a bent function f on n variables, ∆f (α) = 0 for all nonzero α, i.e., ∆f = 0. Analysis of autocorrelation properties of correlation immune and resilient Boolean functions has gained substantial interest recently as evident from [20, 23, 11, 4]. In [11, 4], it has been identified that some well known construction of resilient Boolean functions are not good in terms of autocorrelation properties. Since the present construction is modification of bent functions which possess the best possible autocorrelation properties, we get very good autocorrelation properties of the 1-resilient functions. We present a bound on the ∆f value of the 1-resilient functions and further achieve best known autocorrelation values for the cases n = 8, 10, 12.
2
The Distance
Initially we start with a simple technical result. n
Proposition 2. dBRn (1) ≥ 2 2 −1 . n
Proof. For a bent function b on n variables, Wb (ω) = ±2 2 . Hence the minimum n distance from a bent function to balanced functions equals 2 2 −1 . The 1-resilient functions are balanced by definition and hence the result.
Minimum Distance between Bent and 1-Resilient Boolean Functions
147
Now we present a restricted result. Let b(x) be an n-variable bent function n with Wb (ω) = +2 2 for wt(ω) ≤ 1. We denote by Mb (n, 1) the minimum number of bits to be modified in the output column of b(x) to construct an n variable 1resilient function from b(x). n
Theorem 1. Let b(x) be an n-variable bent function with Wb (ω) = 2 2 for 0 ≤ wt(ω) ≤ 1. Then r r n (r + 1)(2 2 −1 − i=0 ni ) + i=1 i ni n −1 Mb (n, 1) ≥ 2 2 + 2 , n−r−1 where r is the integer such that
r i=0
n i
n
≤ 2 2 −1 + 1 <
r+1 n i=0
i
is satisfied.
Proof. Let S ⊂ {0, 1}n and f (x) be an n-variable Boolean function obtained by modifying the b(x) values for x ∈ S and keeping the other bits unchanged. Then from Proposition 1, Wf (ω) = Wb (ω)−2Wb (ω)|S ∀ω, and in particular, Wf (ω) = n 2 2 − 2Wb (ω)|S for 0 ≤ wt(ω) ≤ 1. It is known that, f is 1-resilient iff Wf (ω) = 0 for 0 ≤ wt(ω) ≤ 1, i.e., n iff Wb (ω)|S = 2 2 −1 for 0 ≤ wt(ω) ≤ 1. Thus, our problem is to find a lower n bound on |S| = k with the constraint Wb (ω)|S = 2 2 −1 for 0 ≤ wt(ω) ≤ 1. Given S = {xi1 , xi2 , . . . , xik } ⊂ {0, 1}n, consider the matrices Sk×n = (xi1 , xi2 , . . . , xik )T , b(S)k×1 = (b(xi1 ), b(xi2 ), . . . , b(xik ))T , and (S ⊕ b(S))k×n = (xi1 ⊕ b(xi1 ), xi2 ⊕ b(xi2 ), . . . , xik ⊕ b(xik ))T . By AT we mean transpose of a matrix A. Also by abuse of notation, xij ⊕ b(xij ) means the GF(2) addition (XOR) of the bit b(xij ) with each of the bits of xij . n Now Wb (ω)|S = 2 2 −1 for 0 ≤ wt(ω) ≤ 1 implies that there are exactly n k 2 −2 many 1’s in b(S) and in each column of S ⊕ b(S). Since all the rows 2 −2 n of S are distinct and further b(S) contains k2 + 2 2 −2 many 0’s, S ⊕ b(S) should n contain at least k2 + 2 2 −2 distinct rows. Consider that one such matrix S ⊕ b(S) is formed. The number of 1’s in n n the matrix is exactly n × ( k2 − 2 2 −2 ) as each column contains exactly k2 − 2 2 −2 n many 1’s and there are n columns. We know that there must be at least k2 +2 2 −2 many distinct rows. Thus the total number of 1’s in these distinct rows must be n n ≤ n × ( k2 − 2 2 −2 ). Note that the minimum number of 1’s in k2 + 2 2 −2 many distinct rows is at least r r k n n n −2 2 i + (r + 1) + 2 − i i 2 i=1 i=0 (all the rows upto weight r and some of the rows with weight r + 1). Hence, r r k n n n n k −2 i + (r + 1) + 2 2 − ≤ n × ( − 2 2 −2 ). i i 2 2 i=1 i=0
148
Soumen Maity and Subhamoy Maitra
This gives, k≥2
n
(n + r + 1)2 2 −2 +
i ni − (r + 1) ri=0 ni . n−r−1
r
i=1
Now we discuss how to choose this r. For this we need a easier lower bound on k which does not depend on r itself. n n From Proposition 2, k ≥ 2 2 −1 . We now show that k ≥ 2 2 −1 + 2. This is because, to construct an 1-resilient function form bent function, the number of 1’s in each column must be ≥ 1 (it cannot be 0 since then we will not be n able to get distinct rows). As number of 1’s in each column is k2 − 2 2 −2 , we get n n k 2 −2 ≥ 1, and hence k ≥ 2 2 −1 + 2. 2 −2 n Since, k2 + 2 2 −2 number of distinct rows has to be filled, we need to find r r+1 n the r such that i=0 ni ≤ k2 + 2 2 −2 < i=0 ni . Putting the minimum value r r+1 n n of k, i.e., 2 2 −1 + 2, we get r such that i=0 ni ≤ 2 2 −1 + 1 < i=0 ni . r n As example, for n = 8, take r = 1 and 9 = i=0 i ≤ (the = of ≤ is r+1 n satisfied here) 2 2 −1 + 1 = 9 < i=0 ni is satisfied. For n = 10, take r = 1 and n n 11 = ri=0 ni ≤ (the < of ≤ is satisfied here) 2 2 −1 + 1 = 17 < r+1 i=0 i is satisfied. Theorem 2. Let b(x) be any n-variable bent function. Then n (r + 1)(2 2 −1 − ri=0 ni ) + ri=1 i ni n −1 dBRn (1) ≥ 2 2 + 2 , n−r−1 r+1 r n where r is the integer such that i=0 ni ≤ 2 2 −1 + 1 < i=0 ni is satisfied. n
Proof. Without loss of generality, assume that Wb (ω) = +2 2 for wt(ω) = 0. n Let G1 = {ω|wt(ω) = 1, Wb (ω) = +2 2 } and G2 = {ω|wt(ω) = 1, Wb (ω) = n n −2 2 }. Let S ⊂ {0, 1} and f (x) be an n-variable Boolean function obtained by modifying the b(x) values for x ∈ S and keeping the other bits unchanged. Then from Proposition 1, Wf (ω) = Wb (ω)−2Wb (ω)|S ∀ω, and in particular, Wf (ω) = n n 2 2 −2Wb (ω)|S for wt(ω) = 0, ω ∈ G1 and Wf (ω) = −2 2 −2Wb (ω)|S for ω ∈ G2 . Given f is 1-resilient, we need to find a lower bound on |S| = k with the n n constraints Wb (ω)|S = 2 2 −1 for wt(ω) = 0 and ω ∈ G1 and Wb (ω)|S = −2 2 −1 for ω ∈ G2 . Let |G1 | = λ. Using the same argument as in the proof of Theorem 1, our problem is to find a k × n binary matrix S ⊕ b(S) with minimum number of n rows k such that there are λ columns with exactly k2 − 2 2 −2 many 1’s in each n column and exactly k2 + 2 2 −2 many 1’s in each of the remaining n − λ columns. n Further, there are at least k2 + 2 2 −2 distinct rows. n k×(n−λ) ) be a binary matrix with exactly k2 −2 2 −2 Let M1k×λ (respectively M2 n (respectively k2 + 2 2 −2 ) many 1’s in each column. Let J be the k × (n− λ) matrix with all elements 1. Then the problem of “finding a binary matrix (M1 : M2 ) n with minimum number of rows k such that there are at least k2 + 2 2 −2 distinct
Minimum Distance between Bent and 1-Resilient Boolean Functions
149
rows” is equivalent to “finding a binary matrix (M1 : J − M2 ) with minimum n number of rows k such that there are at least k2 + 2 2 −2 distinct rows”. Note that n each column of (M1 : J − M2 ) contains exactly k2 − 2 2 −2 many 1’s. Thus, the proof follows with the similar argument presented in Theorem 1. 1+1 n 1 n n −1 For 8 ≤ n ≤ 16, it can be checked that i=0 i ≤ 2 2 + 1 < i=0 i is satisfied. In these cases, the lower bound on k is attained for r = 1 itself. Thus we have the following result. n n −n−2 . Corollary 1. For even n, 8 ≤ n ≤ 16, dBRn (1) ≥ 2 2 −1 + 2 2 2n−2 Assume that one can construct a bent function b on n variables such that dBRn (1) bits at the output column of b are changed to get an n-variable 1resilient function f . It is clear that toggling of a single bit can reduce the nonlinearity at most by 1 and increase the maximum absolute value of the autocorrelation spectra (absolute indicator) by at most 4. Thus we have the following result. n
Theorem 3. nl(f ) ≥ 2n−1 − 2 2 −1 − dBRn (1) and ∆f ≤ 4 dBRn (1). Proof. This follows from nl(f ) ≥ nl(b) − dBRn (1) and ∆f ≤ ∆b + 4 dBRn (1), where b is a bent function. However, for the actual constructions of functions on 8, 10 and 12 variables, we will show that we get better nonlinearity and autocorrelation values than these bounds. For n = 4, 6, we refer the readers to Appendix A.
3
The 8-Variable 1-Resilient Functions
In the previous section we have presented a lower bound of the minimum distance between the bent and 1-resilient functions. However, it has not been discussed in Section 2 how exactly a construction is possible. Further to achieve the currently best known parameters (or even better than that, if possible) we may need to consider some other issues. In this section we consider the construction of an (8, 1, 6, 116) function. Construction of this function was an important open question and the function has been first reported in [10] by interlinking combinatorial technique and computer search. Later this function has also been found by meta heuristic search (simulated annealing) in [5]. Further the function found in [5] has ∆f = 24, which is currently the best known value. We here follow the similar kind of technique used in [12]. In the course of discussion it will be clear that how our technique is an improvement over [12]. We present a generalized construction method of (8, 1, 6, 116) functions by modifying Maiorana-McFarland type bent functions and in specific cases, these functions have the ∆f value as low as 24, the best known one [5]. Construction 1. Take a bent function b(x) on 8 variables with the following properties : (1) b(x) = 0 for wt(x) ≤ 1 and b(x) = 1 for wt(x) = 8, (2) Wb (ω) = 16 for wt(ω) ≤ 1 and Wb (ω) = −16 for wt(ω) = 8. Define a set S = {x ∈ {0, 1}8|wt(x) = 0, 1, 8}. Construct a function f (x) as :
150
Soumen Maity and Subhamoy Maitra
f (x) = 1 ⊕ b(x), if x ∈ S = b(x), otherwise. From Corollary 1, we get that dBR8 (1) ≥ 10 and we here choose exactly 10 positions and modify them. It is important to point out that we here start with bent functions with some specific properties. The reason for choosing such bent functions is to get an actual construction of 1-resilient function with very high nonlinearity. Before presenting the theorem regarding the properties of f , let us enumerate the issues we improve here over the work presented in [12]. 1. There is a gap in the proof of [12, Theorem 3]. Note the conditions imposed on the bent function b above. In the statement of [12, Theorem 3], only the conditions of item 1 has been considered and the conditions of the item 2 has not been considered as given in Construction 1. The conditions of item 2 has been implicitly assumed in the proof of [12, Theorem 3]. Fortunately, the bent function considered in [12, Section 3] satisfies the conditions of item 2. However, it should be noted that there exist bent functions which satisfy the conditions of item 1 and not all the conditions of item 2 and in that case the proof of [12, Theorem 3] does not go through. 2. The bent function chosen in [12, Section 3] uses the concept of perfect nonlinear functions and they presented one example function which satisfies the conditions of item 1 (and also conditions of item 2). However, it is not clear how a generalized construction of such bent functions can be achieved in that manner. It should also be noted that the example functions presented in [12] are basically Maiorana-McFarland type, even though they are designed in a different manner by using the concept of perfect nonlinear functions. We here identify a subclass of Maiorana-McFarland type bent functions which satisfy the conditions of both item 1 and 2. This gives a large class of (8, 1, 6, 116) functions. In fact we show that there are more than 246.297 many distinct (upto complementation) (8, 1, 6, 116) functions f with ∆f ≤ 40. 3. The proof of Theorem 4 below is much simpler than the proof of [12, Theorem 3] and it presents a clear picture of the Walsh spectra of the function f with respect to the spectra of the function b. Theorem 4. The function f (x) as described in Construction 1 is an (8, 1, 6, 116) function. Proof. Take ω ∈ {0, 1}8 with wt(ω) = i. Now Wf (ω) = x∈{0,1}8 (−1)b(x)⊕ω·x − 2 x∈S (−1)b(x)⊕ω·x (from Proposition 1) = Wb (ω) − 2 ( 8 − 2wt(ω) + 2(wt(ω) mod 2) ). Now we explain how the last step is deduced. Note that b(x) = 0 when wt(x) = 0 and b(x) = 1, when wt(x) = 8. Thus, b(x)⊕ω·x = 0, when wt(ω) is even, x∈{0,1}8 |wt(x)=0,8 (−1) = 2, when wt(ω) is odd.
Minimum Distance between Bent and 1-Resilient Boolean Functions
151
Table 1. Relationship between Walsh spectra of f, g as described in Construction 1 wt(ω) 0 1 2 3 4567 8 Wf (ω) = Wb (ω)+ -16 -16 -8 -8 0 0 8 8 16
Moreover,
b(x)⊕ω·x x∈{0,1}8 |wt(x)=1 (−1)
= 8 − 2wt(ω), as
(i) b(x) = 0 when wt(x) = 1 and (ii) ω · x = 1 at wt(w) input points when wt(x) = 1. Since x∈S (−1)b(x)⊕ω·x = x∈{0,1}8 |wt(x)=0,8 (−1)b(x)⊕ω·x + x∈{0,1}8 |wt(x)=1 (−1)b(x)⊕ω·x , we get, Wf (ω) = Wb (ω) − 2 ( 8 − 2wt(ω) + 2(wt(ω) mod 2) ). When wt(ω) ≤ 1, Wf (ω) = Wb (ω) − 16 = 16 − 16 = 0. Thus the function is 1-resilient. Further, if wt(ω) = 8, Wf (ω) = Wb (ω) + 16 = −16 + 16 = 0. For any other choice, i.e., for 2 ≤ wt(ω) ≤ 7, we have |8 − 2wt(ω) + 2(wt(ω) mod 2)| ≤ 4 and hence, |Wf (ω)| ≤ |Wb (ω)| + 8 = 16 + 8 = 24. Hence, nl(f ) = 28−1 − 24 2 = 116. Since the function attains the maximum possible nonlinearity, the algebraic degree [1, 3] of the function must be 8 − 2 = 6. Based on Table 1 and the previous discussion, we get related results with respect to (i) nonexistence of some 8-variable bent functions and (ii) some relationship between 8-variable bent functions and balanced Boolean functions with nonlinearity 118 (whose existence is not known till date). These results are placed in Appendix B. 3.1
A Subclass of Maiorana-McFarland Bent Functions
The original Maiorana-McFarland class of bent function is as follows [6]. Conn sider n-variable Boolean functions on (X, Y ), where X, Y ∈ {0, 1} 2 of the form n f (X, Y ) = X · π(Y ) + g(Y ) where π is a permutation on {0, 1} 2 and g is any Boolean function on n2 variables. The function f can be seen as concatenation n of 2 2 distinct (upto complementation) affine function on n2 variables. Once again we write what kind of bent function b(x) on 8 variables we require. 1. b(x) = 0 for wt(x) ≤ 1 and b(x) = 1 for wt(x) = 8, 2. Wb (ω) = 16 for wt(ω) ≤ 1 and Wb (ω) = −16 for wt(ω) = 8. In this case, n = 8, i.e., n2 = 4. We have to decide what permutations π on {0, 1}4 and what kind of functions g on {0, 1}4 we can take such that the conditions on b are satisfied. We present a set of conditions below, which taken all together, provides sufficient condition for construction of such functions. Before going into the conditions, let us fix the notation and ordering of input variables as x = (x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 ), X = (X1 , X2 , X3 , X4 ), and Y = (Y1 , Y2 , Y3 , Y4 ). Further we identify X1 = x1 , X2 = x2 , X3 = x3 , X4 = x4 , Y1 = x5 , Y2 = x6 , Y3 = x7 , Y4 = x8 .
152
Soumen Maity and Subhamoy Maitra
1. First of all, the function b has the value 0 at the points (0, 0, 0, 0, 0, 0, 0, 0), (1, 0, 0, 0, 0, 0, 0, 0), (0, 1, 0, 0, 0, 0, 0, 0), (0, 0, 1, 0, 0, 0, 0, 0), (0, 0, 0, 1, 0, 0, 0, 0) and this condition is satisfied if we choose π(0, 0, 0, 0) = (0, 0, 0, 0) and g(0, 0, 0, 0) = 0. 2. Next we need function b should have value 0 at points (0, 0, 0, 0, 1, 0, 0, 0), (0, 0, 0, 0, 0, 1, 0, 0), (0, 0, 0, 0, 0, 0, 1, 0), (0, 0, 0, 0, 0, 0, 0, 1), and this condition is satisfied if we choose g(Y ) = 0 for wt(Y ) = 1. 3. We need b to be 1 when the input is (1, 1, 1, 1, 1, 1, 1, 1). Thus if π(1, 1, 1, 1) is a vector of odd weight then g(1, 1, 1, 1) need to be 0. otherwise if π(1, 1, 1, 1) is a vector of even weight then g(1, 1, 1, 1) has to be 1. 4. Since we have already decided that π(0, 0, 0, 0) = (0, 0, 0, 0) and g(0, 0, 0, 0) = 0, the Wf (ω) values for ω ∈ {(0, 0, 0, 0, 1, 0, 0, 0), (0, 0, 0, 0, 0, 1, 0, 0), n (0, 0, 0, 0, 0, 0, 1, 0), (0, 0, 0, 0, 0, 0, 0, 1)} becomes +2 2 = 16. 5. Further if π(Y ) ∈ {(1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1)}, then we take g(Y ) = 0. This guarantees that Wf (ω) values for ω ∈ {(1, 0, 0, 0, 0, 0, 0, 0), n (0, 1, 0, 0, 0, 0, 0, 0), (0, 0, 1, 0, 0, 0, 0, 0), (0, 0, 0, 1, 0, 0, 0, 0)} becomes +2 2 = 16. 6. Lastly, if π(Y ) = (1, 1, 1, 1), we have to fix g(Y ) = (wt(Y ) + 1) mod 2. This n guarantees that Wf (1, 1, 1, 1, 1, 1, 1, 1) = −2 2 = −16. Given a bent function from the Maiorana-McFarland class f (X, Y ) = X · π(Y ) + g(Y ), the dual of such function f is Y · π −1 (X) + g(π −1 (X)). It is interesting to check whether the above points can be replaced by more precise arguments using this idea. n
Theorem 5. Let n = 8, x ∈ {0, 1}n and X, Y ∈ {0, 1} 2 . Let b(x) be a Maiorana-McFarland type bent function b(x) = b(X, Y ) = X · π(Y ) + g(Y ) where n π is a permutation on {0, 1} 2 and g is a Boolean function on n2 variables with the following conditions. (1) if (2) if (3) if (4) if
Y = (0, 0, 0, 0), π(Y ) = Y ; wt(π(Y )) ≤ 1, or wt(Y ) ≤ 1, then g(Y ) = 0; Y = (1, 1, 1, 1), g(Y ) = (wt(π(Y )) + 1) mod 2; wt(π(Y )) = 4, g(Y ) = (wt(Y ) + 1) mod 2.
Then (1) b(x) = 0 for wt(x) ≤ 1 and b(x) = 1 for wt(x) = 8, (2) Wb (ω) = 16 for wt(ω) ≤ 1 and Wb (ω) = −16 for wt(ω) = 8. Further there are ≥ 246.297 many distinct b’s (upto complementation) satisfying these conditions and in turn there are ≥ 246.297 many distinct (upto complementation) (8, 1, 6, 116) functions. Proof. The proof of the properties of b is discussed above in detail. The count of n such functions is arrived as follows. Note that there are 2 2 = 16 places for the permutation π. Let there are i many Y ’s, 0 ≤ i ≤ 4 such that wt(π(Y )) = 1 for wt(Y ) = 1. There are 4 elements of weight 1 and 10 elements of weight 2 or 3. Thus the 10 π(Y )’s for wt(Y ) = 1 may be chosen in 4i 4−i ways. Note that π(Y ) can not be (1, 1, 1, 1) for wt(Y ) = 1. Now there are two cases.
Minimum Distance between Bent and 1-Resilient Boolean Functions
153
1. Consider that π(1, 1, 1, 1) = (1, 1, 1, 1). Then the number of options is 4i · 10 6+i . This is because the 4 elements where wt(Y ) = 1 can be 4−i · 4! · 10! · 2 permuted in 4! ways. The 4 elements where wt(Y ) = 2, 3 can be permuted in 10! ways. The function g(Y ) is fixed when Y is (0, 0, 0, 0) (1 place, g(Y ) = 0) or wt(Y ) = 1 (4 places, g(Y ) = 0) or wt(π(Y )) = 1 (4 − i places, g(Y ) = 0) or wt(Y ) = wt(π(Y )) = 4 (1 place, g(Y ) = 1). Thus g(Y ) is fixed in 10 − i places and we can put any choice from {0, 1} for 16 − (10 − i) = 6 + i places. 2. Consider that π(1, 1, 1, 1) = (1, 1, 1, 1). Then the number of options is 4i · 10 5+i . Choose one element of wt(Y ) = 4 as π(1, 1, 1, 1). This 4−i · 10 · 4! · 10! · 2 can be done in 10 ways. The 4 elements where wt(Y ) = 1 can be permuted in 4! ways. The 4 elements where wt(Y ) = 2, 3 can be permuted in 10! ways. The function g(Y ) is fixed when Y is (0, 0, 0, 0) (1 place, g(Y ) = 0) or wt(Y ) = 1 (4 places, g(Y ) = 0) or wt(π(Y )) = 1 (4 − i places, g(Y ) = 0) or wt(Y ) = 4 (1 place, g(Y ) = 1 if wt(π(Y )) = 0, else g(Y ) = 1) or wt(π(Y )) = 4 (1 place, g(Y ) = (wt(Y ) + 1) mod 2). Thus g(Y ) is fixed in 11 − i places and we can put any choice from {0, 1} for 16 − (11 − i) = 5 + i places. 10 So the total number of options is 6 4i=0 4i · 4−i · 4! · 10! · 26+i = 6 · 4! · 10! · 10 26 4i=0 4i · 4−i · 2i ≈ 246.297492 . Remark 1. Following Theorem 3, it is clear that for the function f as discussed in Theorem 4, ∆f ≤ 40. Now we present the following specific case. Consider π(Y ) = Y for all Y ∈ {0, 1}4, g(Y ) = 0 for all Y ∈ {0, 1}4 \ {(1, 1, 1, 1)} and g(Y ) = 1 for Y = (1, 1, 1, 1). Let b(x) = b(X, Y ) = X · π(Y ) + g(Y ) and f (x) is as given in Construction 1. Then f is an (8, 1, 6, 116) function with ∆f = 24. Note that we get an (8, 1, 6, 116) function f with ∆f = 24 in this method which has earlier been found by simulated annealing and linear transformation in [5].
4
The 10-Variable 1-Resilient Functions
We here start with 10-variable bent functions. Theorem 1 and Theorem 2 do not directly provide the idea how the exact construction of an 1-resilient function from a bent function is possible. Let us now describe a method where we will be able to identify a subclass of 10-variable Maiorana-McFarland type bent functions for this purpose. As described in Section 2, we need to modify at least k = 22 points (see Corollary 1). Now following Theorem 1 and Theorem 2, it is clear that we first n need to select k2 + 2 2 −2 = 19 distinct points. Note that we can have 1 point of weight 0 and 10 points of weight 1. Thus we need to find out 8 more points from weight 2. Once these 19 points are selected, further there are 3 more points to be chosen.
154
Soumen Maity and Subhamoy Maitra
⎛x
x9
x8
x7
x6
x5
x4
x3
x2
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1
0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 1
0 0 1
0 1 0
10
⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ S ⊕ b(S) = ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
x1 ⎞ ⎟ ⎟ 0 ⎟ ⎟ 1 ⎟ ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 1 ⎟ ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ ⎟ ⎟ 0 ⎟ ⎠ 1 0
Now we refer to the S⊕ b(S) matrix given here. We present the first 19 points and after the horizontal line we show the next 3 points. Note that the choice of the all zero point and the points of weight 1 are clear from the discussion in Theorem 1. However, it is still to be sorted out how exactly the 8 points of weight 2 are chosen. We here do that by observation and choose the 8 points = 45 weight 2 points. The rest 3 points (one of of weight 2 out of total 10 2 weight 0 and other two of weight 2) are chosen properly to satisfy that weight n of each column should be k2 − 2 2 −2 = 3. Now we need a bent function b on 10 variables with the property that b(x) = 0 when x is any of the first 19 points and b(x) = 1 when x is complement of any of the last 3 points. This means that the last three rows need to be complemented when they will be considered as input points in the function. Thus, we construct two sets S1 , S2 as follows and then denote S = S1 ∪ S2 . S1 = {(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), (0, 0, 0, 0, 0, 0, 0, 0, 0, 1), (0, 0, 0, 0, 0, 0, 0, 0, 1, 0), (0, 0, 0, 0, 0, 0, 0, 1, 0, 0), (0, 0, 0, 0, 0, 0, 1, 0, 0, 0), (0, 0, 0, 0, 0, 1, 0, 0, 0, 0), (0, 0, 0, 0, 1, 0, 0, 0, 0, 0), (0, 0, 0, 1, 0, 0, 0, 0, 0, 0), (0, 0, 1, 0, 0, 0, 0, 0, 0, 0), (0, 1, 0, 0, 0, 0, 0, 0, 0, 0), (1, 0, 0, 0, 0, 0, 0, 0, 0, 0), (0, 0, 0, 0, 0, 0, 0, 1, 1, 0), (0, 0, 0, 0, 0, 1, 1, 0, 0, 0), (0, 0, 0, 0, 0, 1, 0, 0, 0, 1), (0, 0, 0, 1, 1, 0, 0, 0, 0, 0), (0, 0, 1, 1, 0, 0, 0, 0, 0, 0), (0, 1, 1, 0, 0, 0, 0, 0, 0, 0), (1, 1, 0, 0, 0, 0, 0, 0, 0, 0), (1, 0, 0, 0, 1, 0, 0, 0, 0, 0)} and S2 = {(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (1, 1, 1, 1, 1, 1, 1, 1, 0, 0), (1, 1, 1, 1, 1, 1, 0, 0, 1, 1)}.
Minimum Distance between Bent and 1-Resilient Boolean Functions
155
Also consider S1 = {(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), (0, 0, 0, 0, 0, 0, 0, 0, 0, 1), (0, 0, 0, 0, 0, 0, 0, 0, 1, 0), (0, 0, 0, 0, 0, 0, 0, 1, 0, 0), (0, 0, 0, 0, 0, 0, 1, 0, 0, 0), (0, 0, 0, 0, 0, 1, 0, 0, 0, 0), (0, 0, 0, 0, 1, 0, 0, 0, 0, 0), (0, 0, 0, 1, 0, 0, 0, 0, 0, 0), (0, 0, 1, 0, 0, 0, 0, 0, 0, 0), (0, 1, 0, 0, 0, 0, 0, 0, 0, 0), (1, 0, 0, 0, 0, 0, 0, 0, 0, 0), (0, 0, 0, 0, 0, 0, 0, 1, 1, 0), (0, 0, 0, 0, 0, 1, 1, 0, 0, 0), (0, 0, 0, 0, 0, 1, 0, 0, 0, 1)}, S3 = {(0, 0, 0, 0, 0, 0, 0, 1, 0, 1), (0, 0, 0, 0, 0, 0, 0, 1, 1, 1), (0, 0, 0, 0, 0, 0, 1, 0, 0, 1), (0, 0, 0, 0, 0, 0, 1, 0, 1, 0), (0, 0, 0, 0, 0, 0, 1, 1, 1, 0), (0, 0, 0, 0, 0, 1, 0, 0, 1, 1), (0, 0, 0, 0, 1, 1, 1, 0, 0, 1), (0, 0, 0, 0, 0, 1, 1, 1, 0, 0), (0, 0, 0, 0, 0, 1, 1, 1, 1, 1)} and S4 = {(0, 0, 1, 1, 1, 0, 0, 0, 0, 0), (0, 1, 1, 1, 0, 0, 0, 0, 0, 0), (1, 0, 0, 1, 1, 0, 0, 0, 0, 0), (0, 0, 0, 0, 0, 1, 1, 0, 0, 0), (1, 1, 0, 0, 1, 0, 0, 0, 0, 0), (1, 1, 1, 0, 0, 0, 0, 0, 0, 0), (1, 1, 1, 1, 1, 0, 0, 0, 0, 0)}. We will talk about these sets S1 , S3 and S4 little later. We now write the exact construction. Construction 2. We need a 10-variable bent function b(x) with the following properties: 1. b(x) = 0 when x ∈ S1 and b(x) = 1 when x ∈ S2 , 2. Wb (ω) = +32 when ω ∈ S1 ∪ S3 ∪ S4 . The function f (x) is as follows. f (x) = 1 ⊕ b(x), if x ∈ S = b(x), otherwise. From Theorem 1, it is clear that the function f (x) is 1-resilient. Now we need to calculate the nonlinearity of f . In fact, we will prove that nl(f ) = 488, the currently best known nonlinearity for 10-variable 1-resilient functions. By Proposition 1, Wf (ω) = Wb (ω) − 2Wb (ω)|S . Thus, it is important to analyse the values of Wb (ω)|S for all ω ∈ {0, 1}10 . However, this can not be done in a nice way as it has been done in the 8-variable case in Theorem 4. So we use a computer program to calculate Wb (ω)|S for all ω ∈ {0, 1}10 . Note that when |Wb (ω)|S | ≤ 8, then at those points |Wf (ω)| ≤ 48. Thus, we have no restriction on the Walsh spectra of the bent function b at these points to get the nonlinearity 488 for f . However, we need to concentrate on the cases when |Wb (ω)|S | ≥ 12. We have checked that this happens when ω ∈ S1 ∪ S3 ∪ S4 and all these values are either +12 or +16. Thus as given in Construction 2, the Walsh spectra of the function b should be +32 at these points. Hence Construction 2 provides 10-variable 1-resilient functions having nonlinearity 488. Using similar technique as in Theorem 5, it is possible to get the count of such functions. Note that we have not yet discussed the algebraic degree and autocorrelation properties of the functions. We now consider a specific case and check the algebraic degree and autocorrelation property. Take x = (x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 ), X = (X1 , X2 , X3 , X4 , X5 ), and Y = (Y1 , Y2 , Y3 , Y4 , Y5 ). Further we identify X1 = x1 , X2 = x2 , X3 = x3 , X4 = x4 , X5 = x5 , Y1 = x6 , Y2 = x7 , Y3 = x8 , Y4 = x9 , Y5 = x10 . Consider a 10-variable Maiorana-McFarland type bent function b(x) = b(X, Y ) = X · π(Y ) + g(Y ),
156
Soumen Maity and Subhamoy Maitra
where π is a permutation on {0, 1}5 with π(Y ) = Y and g is a Boolean function on 5 variables which is a constant 0 function. It can be checked that this bent function satisfies the conditions required in Construction 2. Then we prepare f as given in Construction 2. We checked that nonlinearity of f is 488, algebraic degree is 8 and ∆f = 48. Now it is important to note the following two points. 1. The construction in [12, Theorem 4] required 26 points to be modified to get 1-resilient function from a bent function. We here need only 22 points to modify. Further, we have checked that the ∆f value of the function constructed in [12] is 64. The function we construct here has ∆f = 48 and this is the best known value which is achieved for the first time here. 2. The (10, 1, 8, 488) function was first constructed in [10] and we have checked that ∆f value is 320 for that function. Thus our construction provides better parameter.
5
The 12-Variable Case
From Corollary 1, we find that dBR12 (1) ≥ 42. However, it seems that it is not possible to construct an 1-resilient function by toggling 42 bits of a bent function. Instead we succeeded to construct a (12, 1, 10, 2000) function f , with ∆f = 120 by toggling 44 points of a bent function. Thus taking k = 44, we have n to first find k2 + 2 2 −2 = 38 distinct points. We select the all zero input point and the twelve input points each of weight one. Now there are 12 2 = 66 input points of weight two. Out of them we choose 38 − 13 = 25 points by trial and error. These points are 2560, 2304, 2176, 2112, 1280, 1152, 1088, 640, 576, 320, 1536, 384, 40, 36, 34, 33, 20, 18, 17, 10, 9, 5, 24, 6, 2080 when written as decimal integers corresponding to 12-bit binary numbers. We need a bent function such that it will have output zero at these 38 input points. Next we take the six input points 4095, 3055, 3575, 3835, 3965, 4030. We need a bent function which provides output one at these six points. Now we present the bent function. Take x = (x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 , x11 , x12 ), X = (X1 , X2 , X3 , X4 , X5 , X6 ), and Y = (Y1 , Y2 , Y3 , Y4 , Y5 , Y6 ). Further we identify X1 = x1 , X2 = x2 , X3 = x3 , X4 = x4 , X5 = x5 , X6 = x6 , Y1 = x7 , Y2 = x8 , Y3 = x9 , Y4 = x10 , Y5 = x11 , Y6 = x12 . Consider a 12-variable MaioranaMcFarland type bent function b(x) = b(X, Y ) = X · π(Y ) + g(Y ) where π is a permutation on {0, 1}6 with π(Y ) = Y , except the cases π(1, 1, 1, 1, 1, 0) = (1, 1, 1, 1, 1, 1) and π(1, 1, 1, 1, 1, 1) = (1, 1, 1, 1, 1, 0). Here g is a Boolean function on 6 variables which is a constant 0 function. The construction presented in [12] requires 54 points to be toggled and they could achieve a nonlinearity 1996. Thus our construction is clearly better. Further we get ∆f = 120 for the (12, 1, 10, 2000) function that we construct here. This is the best known autocorrelation parameter which was not known earlier.
Minimum Distance between Bent and 1-Resilient Boolean Functions
6
157
Conclusion
In this paper we present a lower bound on the minimum distance dBRn (1) between bent and 1-resilient functions on n variables, where n is even. We have also shown that it is possible to get 1-resilient functions by modifying exactly dBRn (1) many bits for n = 4, 6, 8, 10 which shows that the minimum distance is tight in these cases. For the case n = 12, we could not prove the bound is tight as we need to toggle at least 44 points of a bent function to get an 1-resilient function. The tightness of the bound for n ≥ 12 remains an open question and to the best of our understanding, the bound is really not tight. The case for n = 8 could be nicely handled, but it starts to become complicated from n = 10 and requires some computer simulation. A lot of open questions are still to be solved. First of all, a relatively hard question is to find out the minimum distance between bent and m-resilient functions on n variables, which we may denote as dBRn (m). It seems natural that dBRn (n − 2) > dBRn (n − 3) > . . . > dBRn (1), though it needs a proof. Note that (n − 2)-resilient functions on n variables are basically the affine functions, which are known to be at maximum distance from the bent function [14]. The functions we provide here possess currently best known parameters. The n upper bound on nonlinearity of 1-resilient functions is 2n−1 − 2 2 −1 − 4 for n even as described in [16]. The tightness of this bound [16] has been shown upto n = 8. For n ≥ 10, there is no evidence of an 1-resilient function attaining that n bound [16]. Our construction modifies dBRn (1) > 2 2 −1 many bits and it seems unlikely that modifying these many bits will result in a fall of nonlinearity only 4 for n ≥ 10.
Acknowledgement The authors like to thank the anonymous reviewer for important comments that improved the technical quality of the paper. The reviewer has kindly pointed out some technical problems in the submitted version of this draft. The authors also like to acknowledge Mr. Kishan Chand Gupta of Indian Statistical Institute, Kolkata, for detailed discussion on the proof of Theorem 1.
References [1] C. Carlet. On the coset weight divisibility and nonlinearity of resilient and correlation immune functions. In Sequences and Their Applications - SETA 2001, Discrete Mathematics and Theoretical Computer Science, pages 131–144. Springer Verlag, 2001. 151 [2] C. Carlet. A larger Class of Cryptographic Boolean Functions via a Study of the Maiorana-McFarland Constructions. In Advances in Cryptology - CRYPTO 2002, number 2442 in Lecture Notes in Computer Science, pages 549–564. Springer Verlag, 2002. 143
158
Soumen Maity and Subhamoy Maitra
[3] C. Carlet and P. Sarkar. Spectral domain analysis of correlation immune and resilient Boolean functions. Finite Fields and Its Applications, 8(1):120–130, January 2002. 143, 151 [4] P. Charpin and E. Pasalic. On Propagation Characteristics of Resilient Functions. In SAC 2002, number 2595 in Lecture Notes in Computer Science, pages 175–195. Springer-Verlag, 2003. 146 [5] J. Clark, J. Jacob, S. Stepney, S. Maitra and W. Millan. Evolving Boolean Functions Satisfying Multiple Criteria. In INDOCRYPT 2002, Volume 2551 in Lecture Notes in Computer Science, pages 246–259, Springer Verlag, 2002. 149, 153 [6] J. F. Dillon. Elementary Hadamard Difference sets. PhD Thesis, University of Maryland, 1974. 151 [7] H. Dobbertin. Construction of bent functions and balanced Boolean functions with high nonlinearity. In Fast Software Encryption, number 1008 in Lecture Notes in Computer Science, pages 61–74. Springer-Verlag, 1994. 143 [8] M. Fedorova and Y. V. Tarannikov. On the constructing of highly nonlinear resilient Boolean functions by means of special matrices. In Progress in Cryptology - INDOCRYPT 2001, number 2247 in Lecture Notes in Computer Science, pages 254–266. Springer Verlag, 2001. 143 [9] X. Guo-Zhen and J. Massey. A spectral characterization of correlation immune combining functions. IEEE Transactions on Information Theory, 34(3):569–571, May 1988. 145 [10] S. Maitra and E. Pasalic. Further constructions of resilient Boolean functions with very high nonlinearity. IEEE Transactions on Information Theory, 48(7):1825– 1834, July 2002. 149, 156 [11] S. Maitra. Autocorrelation Properties of correlation immune Boolean functions. INDOCRYPT 2001, number 2247 Lecture Notes in Computer Science. Pages 242– 253. Springer Verlag, December 2001. 146 [12] S. Maity and T. Johansson. Construction of Cryptographically Important Boolean Functions. In INDOCRYPT 2002, Volume 2551 in Lecture Notes in Computer Science, pages 234–245, Springer Verlag, 2002. 143, 144, 149, 150, 156 [13] B. Preneel, W. Van Leekwijck, L. Van Linden, R. Govaerts, and J. Vandewalle. Propagation characteristics of Boolean functions. In Advances in Cryptology EUROCRYPT’90, Lecture Notes in Computer Science, pages 161–173. SpringerVerlag, 1991. 146 [14] O. S. Rothaus. On bent functions. Journal of Combinatorial Theory, Series A, 20:300–305, 1976. 145, 157 [15] P. Sarkar and S. Maitra. Construction of nonlinear Boolean functions with important cryptographic properties. In Advances in Cryptology - EUROCRYPT 2000, number 1807 in Lecture Notes in Computer Science, pages 485–506. Springer Verlag, 2000. 143, 145 [16] P. Sarkar and S. Maitra. Nonlinearity bounds and constructions of resilient Boolean functions. In Advances in Cryptology - CRYPTO 2000, number 1880 in Lecture Notes in Computer Science, pages 515–532. Springer Verlag, 2000. 143, 145, 157 [17] J. Seberry, X. M. Zhang, and Y. Zheng. Nonlinearly balanced Boolean functions and their propagation characteristics. In Advances in Cryptology - CRYPTO’93, pages 49–60. Springer-Verlag, 1994. 143 [18] Y. V. Tarannikov. On resilient Boolean functions with maximum possible nonlinearity. In Progress in Cryptology - INDOCRYPT 2000, number 1977 in Lecture Notes in Computer Science, pages 19–30. Springer Verlag, 2000. 143
Minimum Distance between Bent and 1-Resilient Boolean Functions
159
[19] Y. V. Tarannikov. New constructions of resilient Boolean functions with maximal nonlinearity. In Fast Software Encryption - FSE 2001, to be published in Lecture Notes in Computer Science, pages 70–81 (in preproceedings). Springer Verlag, 2001. 143 [20] Y. V. Tarannikov, P. Korolev and A. Botev. Autocorrelation coefficients and correlation immunity of Boolean functions. In ASIACRYPT 2001, Lecture Notes in Computer Science. Springer Verlag, 2001. 146 [21] Y. Zheng and X. M. Zhang. Improved upper bound on the nonlinearity of high order correlation immune functions. In Selected Areas in Cryptography - SAC 2000, number 2012 in Lecture Notes in Computer Science, pages 264–274. Springer Verlag, 2000. 143 [22] X. M. Zhang and Y. Zheng. GAC - the criterion for global avalanche characteristics of cryptographic functions. Journal of Universal Computer Science, 1(5):316–333, 1995. 146 [23] Y. Zheng and X. M. Zhang. On relationships among propagation degree, nonlinearity and correlation immunity. In Advances in Cryptology - ASIACRYPT’00, Lecture Notes in Computer Science. Springer Verlag, 2000. 146
Appendix A We are not interested in the case n = 2, since there is no nonlinear 2-variable 1resilient functions. We now consider the cases for n = 4, 6. Note that r = 0 for these two cases and then we arrive at dBR4 (1) ≥ 4 and dBR6 (1) ≥ 6. We have also checked that this bound is tight since we can construct 4-variable (respectively 6variable) 1-resilient function by changing 4 (respectively 6) output points of 4variable (respectively 6-variable) bent function. For the 4-variable case, we have to take the rows of S ⊕ b(S) as {0001, 0010, 0100, 1000} due to the constraint that the number of 1’s in each column has to be 1 and there are at least 3 distinct rows. Thus, take a bent function with truth table 0000011001010011 and toggle the function at the inputs {(0, 0, 0, 1), (0, 0, 1, 0), (1, 0, 0, 0), (1, 0, 1, 1)}. Then we get a (4, 1, 2, 4) function with the truth table 0110011011000011. For the 6-variable case, take a bent function with truth table 0000000001011010001111000110011001101001001100110101010100001111 and toggle the outputs at the input points {(0, 0, 0, 0, 0, 1), (0, 0, 0, 0, 1, 0), (0, 0, 0, 1, 0, 0), (0, 0, 1, 0, 0, 0), (1, 0, 0, 0, 0, 0), (1, 0, 1, 1, 1, 1)}. Then we get a (6, 1, 4, 24) function with the truth table 0110100011011010001111000110011011101001001100100101010100001111.
Appendix B Note that, in the Walsh spectra of a bent function on 8 variables, there are 120 values of +16 and 136 values of -16 or vice versa. It is known that even if that condition is satisfied for some Walsh spectra, the inverse Walsh transform may not produce a Boolean function. We here discuss that issue.
160
Soumen Maity and Subhamoy Maitra
Lemma 1. Consider a function b(x) on 8 variables with the properties : 1. b(x) = 0 for wt(x) ≤ 1 and b(x) = 1 for wt(x) = 8, 2. Wb (ω) = 16 for wt(ω) ≤ 3 and Wb (ω) = −16 for wt(ω) ≥ 6. This function can not be bent. Proof. If such a function b is bent, then Table 1, we will get an 1-resilient function with nonlinearity 120. This is a contradiction. Corollary 2. Consider a function b(x) on 8 variables with the properties : 1. b(x) = 0 for wt(x) ≤ 3 and b(x) = 1 for wt(x) ≥ 6, 2. Wb (ω) = 16 for wt(ω) ≤ 1 and Wb (ω) = −16 for wt(ω) = 8. Proof. The result follows from Lemma 1 and the duality property of bent functions. Next we present an important result related to the existence of balanced 8-variable function with nonlinearity 118. Theorem 6. Take a bent function h(x) on 8 variables with the following properties : 1. h(x) = 0 for wt(x) ≤ 1 and h(x) = 1 for wt(x) = 8, 2. Wh (ω) = 16 for wt(ω) ≤ 2 and Wh (ω) = −16 for wt(ω) ≥ 6. Define a set T = {x ∈ {0, 1}8|wt(x) = 1}. Construct a function g(x) as : f (x) = 1 ⊕ h(x), if x ∈ T = h(x), otherwise. Then g is a balanced 8-variable function with nonlinearity 118. Proof. The proof is similar to the proof of Theorem 4.
We have tried some heuristic search to find a bent function as mentioned in Theorem 6, but could not get any. Getting such a bent function or proving its nonexistence is an interesting open question.
Results on Rotation Symmetric Bent and Correlation Immune Boolean Functions Pantelimon St˘ anic˘ a1 , Subhamoy Maitra2 , and John A. Clark3 1
Mathematics Department, Auburn University Montgomery Montgomery, AL 36124-4023, USA
[email protected] 2 Applied Statistics Unit, Indian Statistical Institute 203, B T Road, Kolkata 700 108, INDIA
[email protected] 3 Department of Computer Science, University of York York YO10 3EE, England
[email protected]
Abstract. Recent research shows that the class of Rotation Symmetric Boolean Functions (RSBFs), i.e., the class of Boolean functions that are invariant under circular translation of indices, is potentially rich in functions of cryptographic significance. Here we present new results regarding the Rotation Symmetric (rots) correlation immune (CI) and bent functions. We present important data structures for efficient search strategy of rots bent and CI functions. Further, we prove the nonexistence of homogeneous rots bent functions of degree ≥ 3 on a single cycle. Keywords: Rotation Symmetric Boolean Function, Bent Functions, Balancedness, Nonlinearity, Autocorrelation, Correlation Immunity, Resiliency
1
Introduction
A variety of criteria for choosing Boolean functions with cryptographic applications (for secret key cryptosystems) have been identified. These are balancedness, nonlinearity, autocorrelation, correlation immunity, algebraic degree etc. The trade-offs among these criteria have received a lot of attention in Boolean function literature for a long time (see [7] and the references in this paper). The more criteria that have to be taken into account, the more difficult the problem is to obtain a Boolean function satisfying these properties. It has been found recently that the class of RSBFs is extremely rich in terms of cryptographically significant Boolean functions. These functions have been analyzed in [4], where the authors studied the nonlinearity of these Boolean functions up to 9 variables and found encouraging results. This study has been
This author is associated with the Institute of Mathematics “Simion Stoilow” of the Romanian Academy, Bucharest - Romania.
B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 161–177, 2004. c International Association for Cryptologic Research 2004
162
Pantelimon St˘ anic˘ a et al.
extended in [15, 16] and important properties (further to [4]) of these functions up to 8 variables have been demonstrated. Also, the enumeration of RSBFs of specific degree has been discussed in [15, 16]. On the other hand, in [11], Pieprzyk and Qu studied these functions as components in the rounds of a hashing algorithm and research in this direction was later continued in [3]. 2n The space of RSBFs is of size approximately 2 n for n-variable, which is of n size n-th root of the total space 22 . Thus any kind of search becomes comparatively easier and it has been shown in [15] that it is easy to get a 7-variable, 2-resilient RSBF with nonlinearity 56, which has earlier been considered as a function that is not easy to search for [10]. Moreover, these functions also possess the best known autocorrelation spectra. Thus it is important to present tools that can be used to efficiently search the space of RSBFs. We present important data structures, the matrices n A and n B, that make this search and the study of bent functions more efficient. Using these data structures, for the first time we could find 8-variable, 1resilient, algebraic degree 6, nonlinearity 116, PC(1) functions with maximum absolute value in the autocorrelation spectra 32. Functions with such parameters have not been reported earlier. Moreover, interesting results are obtained for 9-variable correlation immune functions. The space for these functions in the Rotation Symmetric class is too large to execute exhaustive search. Hence we exploited simulated annealing technique to find these functions. The results found by simulated annealing are as follows. We could find 9-variable, 2-resilient, algebraic degree 6 and nonlinearity 240 functions and unbalanced 9-variable, 3rd order correlation immune, algebraic degree 5 and nonlinearity 240 functions. These functions have been posed as important open questions in [13, 14]. Note that the details of simulated annealing is not included in this paper and that has been published in [2]. In this paper, we also try to analyze the RSBFs class using combinatorial techniques in Section 3. We present enumerative results (based on constructive techniques) on balanced and correlation immune RSBFs. Further, we show that it is possible to transform a class of RSBFs to correlation immune functions depending on full rank of binary circulant matrices over Z2 . In [15], it was observed that there is no homogeneous rots bent functions of degree ≥ 3 up to 10 variables. We here theoretically show the nonexistence of homogeneous rots bent functions of degree ≥ 3 on a single cycle for any (even) number of input variables ≥ 6.
2
Preliminaries
A Boolean function on n variables may be viewed as a mapping from Vn = {0, 1}n into {0, 1}. A Boolean function f (x1 , . . . , xn ) is also interpreted as the output column of its truth table f , i.e., a binary string of length 2n , f = [f (0, 0, · · · , 0), f (1, 0, · · · , 0), f (0, 1, · · · , 0), . . . , f (1, 1, · · · , 1)]. The Hamming distance between S1 , S2 is denoted by d(S1 , S2 ) = #(S1 = S2 ). Also the Hamming weight or simply the weight of a binary string S is the number
Results on Rotation Symmetric Bent
163
of ones in S. This is denoted by wt(S). An n-variable function f is said to be balanced if its output column in the truth table contains equal number of 0’s and 1’s (i.e., wt(f ) = 2n−1 ). Addition operator over GF (2) is denoted by ⊕. An n-variable Boolean function f (x1 , . . . , xn ) can be considered to be a multivariate polynomial over GF (2). This polynomial can be expressed as a sum of products representation of all distinct k-th order products (0 ≤ k ≤ n) of the variables. More precisely, f (x1 , . . . , xn ) can be written as a0 ⊕ ai xi ⊕ aij xi xj ⊕ . . . ⊕ a12...n x1 x2 . . . xn , 1≤i≤n
1≤i<j≤n
where the coefficients a0 , aij , . . . , a12...n ∈ {0, 1}. This representation of f is called the algebraic normal form (ANF) of f . The number of variables in the highest order product term with nonzero coefficient is called the algebraic degree, or simply the degree of f and denoted by deg(f ). Take 0 ≤ b ≤ n. An n-variable function is called nondegenerate on b variables if its ANF contains exactly b distinct input variables. A Boolean function is said to be homogeneous if its ANF contains terms of the same degree only. Functions of degree at most one are called affine functions. An affine function with constant term equal to zero is called a linear function. The set of all nvariable affine (respectively linear) functions is denoted by A(n) (respectively L(n)). The nonlinearity of an n-variable function f is nl(f ) = ming∈A(n) (d(f, g)), i.e., the distance from the set of all n-variable affine functions. Let x = (x1 , . . . , xn ) and ω = (ω1 , . . . , ωn ) both belonging to {0, 1}n and x · ω = x1 ω1 ⊕ . . . ⊕ xn ωn . Let f (x) be a Boolean function on n variables. Then the Walsh transform of f (x) is a real valued function over {0, 1}n which is defined as (−1)f (x)⊕x·ω . Wf (ω) = x∈{0,1}n
In terms of Walsh spectra, the nonlinearity of f is given by nl(f ) = 2n−1 −
1 max |Wf (ω)|. 2 ω∈{0,1}n
In [5], an important characterization of correlation immune functions has been presented, which we use as the definition here. A function f (x1 , . . . , xn ) is m-th order correlation immune (respectively m-resilient) iff its Walsh transform satisfies Wf (ω) = 0, for 1 ≤ wt(ω) ≤ m (respectively 0 ≤ wt(ω) ≤ m). As the notation used in [13, 14], by an (n, m, d, σ) function we denote an nvariable, m-resilient function with degree d and nonlinearity σ. Further by an
164
Pantelimon St˘ anic˘ a et al.
[n, m, d, σ] function we denote an unbalanced n-variable, mth order correlation immune function with degree d and nonlinearity σ. Propagation Characteristics (PC) and Strict Avalanche Criteria (SAC) [12] are important properties of Boolean functions to be used in S-boxes. Further, Zhang and Zheng [18] identified related cryptographic measures called Global Avalanche Characteristics (GAC). Let α ∈ {0, 1}n and f be an n-variable Boolean function. Let us denote the autocorrelation value of the Boolean function f with respect to the vector α as (−1)f (x)⊕f (x⊕α) , ∆f (α) = x∈{0,1}n
and the absolute indicator ∆f =
max
α∈{0,1}n ,α=0
|∆f (α)|.
A function is said to satisfy PC(k), if ∆f (α) = 0 f or 1 ≤ wt(α) ≤ k. 2.1
Rotation Symmetric Boolean Functions
Let xi ∈ {0, 1} for 1 ≤ i ≤ n. For 1 ≤ k ≤ n, we define ρkn (xi ) = xi+k , if i + k ≤ n, and = xi+k−n , if i + k > n. Let (x1 , x2 , . . . , xn−1 , xn ) ∈ Vn . We can extend the definition of ρkn on tuples and monomials as ρkn (x1 , x2 , . . . , xn ) = (ρkn (x1 ), ρkn (x2 ), . . . , ρkn (xn )) and ρkn (xi1 xi2 · · ·) = ρkn (xi1 )ρkn (xi2 ) · · · . Definition 1. A Boolean function f is called Rotation Symmetric if for each input (x1 , . . . , xn ) ∈ {0, 1}n, f (ρkn (x1 , . . . , xn )) = f (x1 , . . . , xn ) for 1 ≤ k ≤ n. Following [15], let us denote Gn (x1 , . . . , xn ) = {ρkn (x1 , . . . , xn ), for 1 ≤ k ≤ n}. Note that Gn (x1 , . . . , xn ) generates a partition in the set Vn . Let gn be the number of such partitions. Using Burnside’s lemma, it can be shown (see also [15]) that the number of n-variable RSBFs is n 1 φ(k) 2 k , 2gn , where gn = n k|n
φ being Euler’s phi−function. Further the following result has been proved regarding n-variable RSBFs of some specific degree. The number of (i) degree w homogeneous functions is 2gn,w − 1,
Results on Rotation Symmetric Bent
165
w−1 (ii) the number of degree w functions is (2gn,w − 1)2 i=0 gn,i and w (iii) the number of functions with degree at most w is 2 i=0 gn,i , where gn,w is defined as follows (see also [15]). Consider Gn (x1 , . . . , xn ), where wt(x1 , . . . , xn ) is exactly w, and define gn,w as the number of partitions over the n bit binary strings of weight w (total n number w ), determined by Gn . Further, denote by hn,w the number of distinct sets Gn (x1 , . . . , xn ), where wt(x1 , . . . , xn ) = w and |Gn (x1 , . . . , xn )| = n, that is, the number of long cycles of weight w. It is easy to see that hn,w < gn,w . Write k|m, if k (1 < k ≤ m) is a proper divisor of m. The following results were obtained in [15]. 1n , if gcd(n, w) = 1. Also, gn,0 = gn,n = 1. (i) gn,w = n ⎛w ⎞ n 1 ⎝ n · h nk , wk ⎠ + h nk , wk , if w < n. − (ii) gn,w = w n k k|gcd(n,w)
k|gcd(n,w)
Filiol and Fontaine [4] discussed the set of idempotent Boolean functions in an experimental setting. Let B = (b1 , . . . , bn ) be a basis of F2n (which is identified with F2n ). An idempotent f is a Boolean function on F2n that satisfies f 2 = f . Define the Mattson-Solomon (MS) polynomial by M Sf (Z) =
n 2 −2
Aj Z
2n −j−1
, where Aj =
j=0
n 2 −1
f (αi )αij ,
i=0
where α is a primitive element of F2n . Using the representation f= f (g)(g) g∈F2∗n
(in the multiplicative algebra F2 [F2n , ×]), one gets that f is an idempotent iff f (g) = f (g 2 ), ∀ g; the coefficients of the MS polynomial belong to F2 ; Aj = Ak for all k in the 2-cyclotomic class of j ({j, 2j, . . . , 2n−1 j}); the ANF of f ´a(using n−1 a normal basis (γ, γ 2 , . . . , γ 2 ) remains invariant under circular shift. This gives that the corpus of idempotents is the same as the class of Rotation Symmetric Boolean functions. For n = 5, 7, they found idempotents of highest nonlinearity (12, respectively 56) of degrees 2, 3 (for n = 5), and degrees 2, 3, 4, 5, 6 (for n = 7). For n = 6, 8 they found all idempotents of highest nonlinearity (28, respectively 120), of degrees 2, 3, respectively, 2, 3, 4. They were not able to find all idempotent functions for n = 8, though. Finally, for n = 9, they found 1142395 functions (up to equivalence) with nonlinearity 240, some of which are balanced, of degrees 2, 3, 4, 5, 6, 7.
3
Study on RSBFs
Motivated by [4, 15], in this section we will investigate the richness of the RSBFs class in terms of cryptographic properties and present some important data
166
Pantelimon St˘ anic˘ a et al.
structures. The data structures will help in running the search algorithms very fast. In this direction we start with a few technical results. In the preliminaries, we have defined Gn (x1 , . . . , xn ) = {ρkn (x1 , . . . , xn ), for 1 ≤ k ≤ n}. As example, for n = 4 we get the following partition of {0, 1}n: G4 (0, 0, 0, 0) = {(0, 0, 0, 0)}; G4 (0, 0, 0, 1) = {(0, 0, 0, 1), (0, 0, 1, 0), (0, 1, 0, 0), (1, 0, 0, 0)}; G4 (0, 0, 1, 1) = {(0, 0, 1, 1), (0, 1, 1, 0), (1, 0, 0, 1), (1, 1, 0, 0)}; G4 (0, 1, 0, 1) = {(0, 1, 0, 1), (1, 0, 1, 0)}; G4 (0, 1, 1, 1) = {(0, 1, 1, 1), (1, 0, 1, 1), (1, 1, 0, 1), (1, 1, 1, 0)}; G4 (1, 1, 1, 1) = {(1, 1, 1, 1)}. Note that there are gn such partitions, and the lexicographically first element of each part is considered as the representative element. We denote these representative elements by Λn,i where i varies from 0 to gn − 1 and representative elements are again arranged lexicographically. That is, in the above example, Λ4,0 = (0, 0, 0, 0), Λ4,1 = (1, 0, 0, 0), Λ4,2 = (1, 1, 0, 0), Λ4,3 = (1, 0, 1, 0), Λ4,4 = (1, 1, 1, 0), Λ4,5 = (1, 1, 1, 1). By RSTT (rotation symmetric truth table) we mean the gn -bit long binary string [f (Λn,0 ), f (Λn,1 ), . . . , f (Λn,gn −1 )], which gives the complete information of the function f when it is rots. Lemma 1. Let u, v ∈ {0, 1}n and u = v with u ∈ Gn (v). Let f be an n-variable RSBF. Then Wf (u) = Wf (v), which implies that the Walsh spectra of f can be at most gn valued. Proof. First we show that for a ∈ {0, 1}, (−1)a⊕x·u = x∈Gn (Λn,i )
(−1)a⊕x·v .
x∈Gn (Λn,i )
Since u ∈ Gn (v), u = ρkn (v) for some k. Now x∈Gn(Λn,i ) (−1)a⊕x·u k k a⊕y·v = (−1)a⊕ρn (x)·ρn (u) = (take y = ρkn (x)) = y∈Gn (Λn,i ) (−1) x∈Gn (Λn,i ) a⊕x·v . x∈Gn (Λn,i ) (−1) gn −1 f (x)⊕x·u Wf (u) = x∈{0,1}n (−1)f (x)⊕x·u = i=0 x∈Gn (Λn,i ) (−1) gn −1 f (x)⊕x·v = ( using the above result ) i=0 = Wf (v). x∈Gn (Λn,i ) (−1) Note that, Lemma 1 helps to run any heuristic in a much smaller space. Now we define an important matrix called n A with respect to the set of n-variable RSBFs as: (−1)x·Λn,j . n Ai,j = x∈Gn (Λn,i )
See the following example corresponding to 6-variable case.
Results on Rotation Symmetric Bent
167
i 0 1 2 3 4 5 6 Λ6,i 000000 000001 000011 000101 000111 001001 001011 i 7 8 9 10 11 12 13 Λ6,i 001101 001111 010101 010111 011011 011111 111111 ⎡
⎤ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ⎢ 6 4 2 2 0 2 0 0 −2 0 −2 −2 −4 −6 ⎥ ⎢ ⎥ ⎢ 6 2 2 −2 2 −2 −2 −2 2 −6 −2 −2 2 6 ⎥ ⎢ ⎥ ⎢ 6 2 −2 2 −2 −2 −2 −2 −2 6 2 −2 2 6 ⎥ ⎢ ⎥ ⎢ 6 0 2 −2 0 −6 0 0 −2 0 2 6 0 −6 ⎥ ⎢ ⎥ ⎢ 3 1 −1 −1 −3 3 1 1 −1 −3 −1 3 1 3 ⎥ ⎢ ⎥ ⎢ 6 0 −2 −2 0 2 −4 4 2 0 2 −2 0 −6 ⎥ ⎢ ⎥ A = 6 ⎢ 6 0 −2 −2 0 2 4 −4 2 0 2 −2 0 −6 ⎥ ⎢ ⎥ ⎢ 6 −2 2 −2 −2 −2 2 2 2 6 −2 −2 −2 6 ⎥ ⎢ ⎥ ⎢ 2 0 −2 2 0 −2 0 0 2 0 −2 2 0 −2 ⎥ ⎢ ⎥ ⎢ 6 −2 −2 2 2 −2 2 2 −2 −6 2 −2 −2 6 ⎥ ⎢ ⎥ ⎢ 3 −1 −1 −1 3 3 −1 −1 −1 3 −1 3 −1 3 ⎥ ⎢ ⎥ ⎣ 6 −4 2 2 0 2 0 0 −2 0 −2 −2 4 −6 ⎦ 1 −1 1 1 −1 1 −1 −1 1 −1 1 1 −1 1 This matrix is of size gn × gn . Now for an n-variable RSBF f , we have Wf (ω) = gn −1 f (x)⊕x·ω f (x)⊕x·ω = i=0 x∈{0,1}n (−1) x∈Gn (Λn,i ) (−1) gn −1 f (Λn,i ) x·Λn,j = i=0 (−1) , if ω ∈ Gn (Λn,j ). Thus, Wf (Λn,j ) = x∈Gn (Λn,i ) (−1) gn −1 f (Λn,i ) n Ai,j . To summarize, we have the following result. i=0 (−1) gn −1 Proposition 1. Wf (Λn,j ) = i=0 (−1)f (Λn,i ) n Ai,j . In terms of Proposition 1, we can list the following. Lemma 2. Let f be an n-variable RSBF. gn −1 1. nl(f ) = 2n−1 − 12 maxΛn,j ,0≤j
3. f is m-th order CI (respectively m-resilient) iff g n −1
(−1)f (Λn,i ) n Ai,j = 0, f or 1 (respectively 0) ≤ wt(Λn,j ) ≤ m.
i=0
4. f is bent iff g n −1
n
(−1)f (Λn,i ) n Ai,j = ±2 2 f or 0 ≤ j ≤ gn − 1.
i=0
168
Pantelimon St˘ anic˘ a et al.
Theorem 1. The number of balanced RSBFs is exactly 2πn , where πn is the number of partitions of the space Vn as Vn = An ∪ Bn , where An and Bn have the same cardinal, and both include complete cycles of any length. Further, if (2p −2)/p n = p is an odd prime, then the number of balanced RSBFs is 2 · (2p−1 −1)/p ; if n = pa (a > 1) and p is an odd prime, then the number of balanced RSi i−1 a 2p − 2p x xi · , where x = , and BFs is 2 · πn , with πn ≥ x/2 i xi /2 pi ⎛ ⎞ i=1 a a a a−j x = p−a ⎝2p + φ(pj ) · 2p ⎠ − xi − 2. j=1
i=1
Proof. Using item 2 of Lemma 2, to determine balanced RSBFs, it suffices to find gn −1 the RSBFs satisfying i=0 (−1)f (Λn,i ) n Ai,0 = 0. According to the definition gn −1 x·Λn,0 = #Gn (Λn,i ). Since the values of (−1)f (Λn,i ) are either n Ai,0 = i=0 (−1) ±1, and f is constant on Gn (v) for any v, we get the first claim. p If n = p is prime, the number of long cycles is hp = 2 p−2 and the number of short cycles is 2 (the trivial ones) (see Subsection 2.1). Therefore, to partition Vn = An ∪ Bn (with An , Bn having the same number of elements), we need to place a short cycle in each of An , Bn , and the rest of p · hp elements must be placed half in An and half in Bn (keeping together cycles). That can be done in (2p −2)/p ways. The second claim is proved. (2p−1 −1)/p If n = pa (a > 1), the number of short cycles of length pi (for any i = i i−1 1, . . . , a − 1) is xi = (2p − 2p )/pi (see Subsection 2.1). For each i, we can put half of the cycles in An , and half in Bn . The same can be done with the long cycles. Since the number of long cycles is x, the result is proved. For example, consider the case for 4-variable balanced RSBFs. We have V4 = G4 (Λ4,0 ) ∪ G4 (Λ4,1 ) ∪ G4 (Λ4,2 ) ∪ G4 (Λ4,3 ) ∪ G4 (Λ4,4 ) ∪ G4 (Λ4,5 ). Now consider W4 = G4 (Λ4,0 ) ∪ G4 (Λ4,3 ) ∪ G4 (Λ4,5 ). Hence V4 = W4 ∪ G4 (Λ4,1 ) ∪ G4 (Λ4,2 ) ∪ G4 (Λ4,4 ). Therefore, a balanced RSBF must be 1 at the output corresponding to any two of W4 , G4 (Λ4,1 ), G4 (Λ4,2 ), G4 (Λ4,4 ). Hence π4 = 3 and there are 6 balanced RSBFs on 4-variables. The reason we do not exhaust all possibilities in the second part of the previous theorem is because we can get a different partition of Vn , satisfying the requirements, by placing more short cycles in An (or Bn ) as long as one ends up with the same number of elements in An , Bn . Note that we have defined ρkn (xi1 xi2 · · ·) = ρkn (xi1 )ρkn (xi2 ) · · · in Subsection 2.1. By abuse of notation let us denote Gn (xi1 xi2 . . . xil ) = {ρkn (xi1 xi2 . . . xil ), for 1 ≤ k ≤ n}.
Results on Rotation Symmetric Bent
169
We select the representative element of Gn (xi1 xi2 . . . xil ) as the lexicographically first element. As example, the representative element of {x1 x2 x3 , x2 x3 x4 , x3 x4 x1 , x4 x1 x2 } is x1 x2 x3 . Note that it is also clear that the term x1 will always exist in the lexicographically first element (the representative element). We now define the short algebraic normal form (SANF) of an RSBF. An RSBF f (x1 , . . . , xn ) can be written as a1j x1 xj + . . . + a12...n x1 x2 . . . xn , a0 + a1 x1 + where the coefficients a0 , a1 , a1j , . . . , a12...n ∈ {0, 1}, and the existence of a representative term x1 xi2 . . . xil implies the existence of all the terms from the set Gn (x1 xi2 . . . xil ) in the ANF. This representation of f is called the short algebraic normal form (SANF) of f . Note that the number of terms in each summation ( ) corresponding to same degree terms depends on the number of short and long cycles. As example consider the ANF of a 4-variable RSBF x1 + x2 + x3 + x4 + x1 x2 x3 + x2 x3 x4 + x3 x4 x1 + x4 x1 x2 . Its SANF is x1 + x1 x2 x3 . We can easily identify a monomial xi1 xi2 · · · xik as a binary string of length n where the positions i1 , i2 , · · · , ik contain ‘1’ and the rest of the positions contain ‘0’. By abuse of notation we associate the n-bit patterns with monomials. It is clear that all the monomials in Gn (Λn,i ) will either be present in the ANF or all of them will be absent if the Boolean function is rotation symmetric. Let us define another matrix n B as e|Λn,i . n Bi,j = e∈Gn (Λn,j )
That is, we take an RSBF (say h) with all the monomials coming from a single Rotation Symmetric group (say represented by Λn,j ). Then we check what is the value of h at the representative input points Λn,i and put that in the location n Bi,j which contains either 0 or 1. Given n Bi,j and the SANF of an RSBF, one can directly get the RSTT of the RSBF. The example for 6 B is as follows. ⎡ ⎤ 10000000000000 ⎢1 1 0 0 0 0 0 0 0 0 0 0 0 0⎥ ⎢ ⎥ ⎢1 0 1 0 0 0 0 0 0 0 0 0 0 0⎥ ⎢ ⎥ ⎢1 0 0 1 0 0 0 0 0 0 0 0 0 0⎥ ⎢ ⎥ ⎢1 1 0 1 1 0 0 0 0 0 0 0 0 0⎥ ⎢ ⎥ ⎢1 0 0 0 0 1 0 0 0 0 0 0 0 0⎥ ⎢ ⎥ ⎢1 1 1 1 0 1 1 0 0 0 0 0 0 0⎥ ⎢ ⎥ 6B = ⎢ ⎥ ⎢1 1 1 1 0 1 0 1 0 0 0 0 0 0⎥ ⎢1 0 1 0 0 1 1 1 1 0 0 0 0 0⎥ ⎢ ⎥ ⎢1 1 0 1 0 0 0 0 0 1 0 0 0 0⎥ ⎢ ⎥ ⎢1 0 0 1 1 1 1 1 0 1 1 0 0 0⎥ ⎢ ⎥ ⎢1 0 0 0 0 0 0 0 0 0 0 1 0 0⎥ ⎢ ⎥ ⎣1 1 0 0 1 0 1 1 0 1 0 1 1 0⎦ 10000100000101
170
Pantelimon St˘ anic˘ a et al.
Note that the matrices n A, n B help to perform the search much faster than the naive Boolean function implementation. 3.1
Correlation Immune (CI) and Resilient RSBFs
We start our discussion with construction of 1st order CI RSBFs. Note that the second column of the matrix n A is instrumental in the analysis of first order From item 3 of Lemma 2 we get that f is 1st order CI if gn −1CI functions. f (Λn,i ) (−1) A n i,j = 0 for wt(Λn,j ) = 1, i.e., when j = 1, i.e., Λn,j = Λn,1 . i=0 (n−2wt(Λn,i )) for cycles of length nk , where k (1 ≤ k ≤ n) Note that n Ai,1 = k is a divisor of n. See the second column of 6 A as example. Thus we have the following result. Theorem 2. An n-variable Rotation Symmetric Boolean function f is 1st order gn −1 (n−2wt(Λn,i )) (−1)f (Λn,i ) = 0, where |Gn (Λn,i )| = kni . CI iff i=0 ki Based on this we present the following enumerative result when n is prime. gn,w gn,w 2 n−1 2 many 1st order Corollary 1. There are at least 2 w=1 k=0 k CI RSBFs on n variables, where n is an odd prime. In this case, gn,w = n
n w n
.
n
Proof. For n prime we know that gn = 2 n−2 + 2. There are 2 n−2 full cycles and two trivial short cycles (all zero and all one). Thus it is clear that n Ai,1 = n − 2wt(Λn,i ) for 1 ≤ i ≤ gn − 2 and n A0,1 = 1, n Agn −1,1 = −1. Note that, n Ai1 ,1 = −n Ai2 ,1 , when wt(Λn,i1 ) = n − wt(Λn,i2 ). Now consider an assignment of 0 or 1 value at output corresponding to the gn,w classes where wt(Λn,i1 ) = w. We have to put the same number of 0’s and 1’s corresponding to the gn,n−w classes where wt(Λn,i2 ) = n − w. The two trivial cycles should also have the same value at the output, either both zero or both 1. This satisfies the condition gn −1 f (Λn,i ) that n Ai,1 = 0, i.e., Wf (Λn,1 ) = 0, i.e., f is 1st order CI. i=0 (−1) n−1 gn,w gn,w gn,w 2 Hence the number of possible options is 2 × w=1 · k ). ( k=0 k Note that similar strategy can be exploited for higher order correlation immune or resilient RSBFs. However, in those cases, the analysis will be more involved. 3.2
A Large Subclass of RSBFs that Are Transformable to 1st Order CI Functions
We first investigate the independence of the vectors of a full cycle, i.e., the vectors in Gn (Λn,i ) when |Gn (Λn,i )| = n. Lemma 3. Consider the elements of Gn (Λn,i ) for some i, where |Gn (Λn,i )| = n. Let Λn,i = (a1 , a2 , . . . , an ) of weight w and the positions of 1’s in Λn,i be s1 = 1, s2 , . . . , sw . The vectors in Gn (Λn,i ) are linearly dependent (over Z2 ) iff there is an n-th root of unity µ such that 1 + µs2 · · · + µsw = 0, over Z2 .
Results on Rotation Symmetric Bent
171
Proof. The set {(a1 , a2 , . . . , an ), (an , a1 , . . . , an−1 ), . . .} is linear dependent over Z2 if and only if the matrix ⎤ ⎡ a 1 a 2 a 3 . . . an ⎢ an a1 a2 . . . an−1 ⎥ ⎥ ⎢ ⎥ ⎢ circ(a1 , a2 , . . . , an ) = ⎢ an−1 an a1 . . . an−2 ⎥ ⎢ .. .. ⎥ ⎣ . . ⎦ a 2 a 3 a 4 . . . a1 has zero determinant over Z2 . We observe that the matrix is circular and it is known that the determinant of a circular matrix is given by det(circ(a1 , a2 , . . . , an )) = a1 + a2 µ + a3 µ2 + · · · + an µn−1 , µ
where the product runs over all the n number of n-th roots of unity. Since ai ’s are 1 in the positions described by sj ’s and 0 elsewhere, we get that det(circ(a1 , a2 , . . . , an )) = (1 + µs2 + · · · + µsw ) , µ
which is zero if and only if one of the factors is zero, that is, iff there exists an n-th root of unity such that 1 + µs2 · · · + µsw = 0 (over Z2 ). Corollary 2. Take n arbitrary. If wt(Λn,i ) is even, then the full cycle generated by Λn,i is dependent. Proof. We have det(circ(Λn,i )) = µ (1 + µs2 + · · · + µsw ) = 0, since 1 + µs2 + · · · + µsw = 0 (in Z2 ), for µ = 1 (which is an n-th root of unity, for any n). Now we present some examples. Take the cycle generated by (1, 1, 0, 0) in V4 . The circular determinant is det(circ(1, 1, 0, 0)) = µ (1 + µ) = 0, since µ = −1 is one of the 4-roots of unity. Another example is the generated over V6 cycle by (1, 1, 1, 0, 1, 0). We have det(circ(1, 1, 1, 0, 1, 0)) = µ 1 + µ + µ2 + µ4 = 0, since µ = 1 (a 6-root of unity) satisfies 1 + µ + µ2 + µ4 = 0 over Z2 . On the other hand, the full cycle generated in V6 by (1, 1, 0, 0, 1, 0) is linearly independent. Corollary 3. Let n be a positive integer, and p be the least odd prime occurring in the factorization of n. Take Λn,i (a generator of a full cycle), of odd weight w and sw ≤ p − 2. Then the full cycle generated by Λn,i is independent. Proof. As before, under the above conditions, if we have dependence, then there is an n-th root of unity µ, such that P (µ) = 0, where P (x) = xsw + · · · + xs2 + 1. Since w is odd, µ = ±1. There exists k | n such that µ is a primitive k-th root of unity. Therefore, the cyclotomic polynomial Φk (x) divides P (x) over Z2 (see [6], Ch. 2 & 3). If k < p, then it must be that k is a power of 2, say 2l (since k is a divisor of n, and p is the least odd prime dividing n). But that is impossible, l l since then Φk (x) will divide x2 − 1, so (over Z2 ) 1 = µ2 = µ. Therefore, k ≥ p.
172
Pantelimon St˘ anic˘ a et al.
αr 1 Assume k = 2l pα 1 · · · pr . If r ≥ 1, then pi ≥ p, so φ(k) ≥ φ(pi ) = pi − 1 ≥ p − 1. But the degree of P (x) is at most p− 2 and that of Φk (x) is greater than or equal to p − 1. That is a contradiction. If r = 0, then k = 2l > p, and the previous case’s argument applies.
Corollary 3 is the best we can get in that direction, as we see taking the cycles in V14 generated by (1, 1, 1, 1, 1, 0, 0, 0, . . .) and (1, 1, 1, 1, 1, 1, 1, 0, . . .). Now, the prime 7 is the least odd prime dividing n = 14. The weight w and sw of the first generator is 5 and the cycle is independent; the weight w and sw of the second generator is 7 and the cycle is dependent. With the background of Lemma 3, Corollary 2 and Corollary 3 we present the following result. Theorem 3. Let f be an n-variable RSBF with Wf (Λn,j ) = 0 for some j such that Gn (Λn,j ) contains n independent vectors. Then the function f can be transformed to a 1st order correlation immune function g which may or may not be RSBF. Further if f is balanced, i.e., Wf (0) = 0, then g is 1-resilient. Proof. Given the set of n independent vectors, at which the values of the Walsh spectra are 0, it is possible to apply linear transformation on the function f to get a function g which is 1st order correlation immune (using the methods of [8]). Note that, after the linear transformation, the Rotation Symmetric property of g is not guaranteed. Theorem 3 presents a simple method to get 1st order CI or 1-resilient functions easily from RSBFs satisfying some conditions. Moreover, the combinatorially interesting point is that the conditions are related to full rank of binary circulant matrices over Z2 and n-th roots of unity as described in Lemma 3. 3.3
Search for Important Functions
n Recall the notation gn,w in Subsection 2.1. It is clear that for an RSBF, the w many monomials of degree w are partitioned into gn,w many groups and the monomials of each group are either present or absent together. Now the search technique works as follows. 1. Choose a candidate RSBF (say f ) represented by its SANF. 2. Use n B to get the RSTT of f from the SANF. 3. Use n A and the RSTT of f to analyze the Walsh Spectra of f . Let us now consider the (8, 1, 6, 116) functions. These functions are of lot of interest as evident from [7, 1, 9]. Note that so far there was no evidence of (8, 1, 6, 116) functions with PC(1) property. We here show that there are such functions in the RSBFs class. We consider f (0) = 0, and there can not be any term 5 of degree 7, 8 in the ANF. Thus we need to take any combination from i=1 g8,igroups and at least one group from g8,6 groups. This search space is 5
of size 2
i=1
g8,i
(2g8,6 − 1). Note that g8,1 = 1, g8,2 = g8,6 = 4, g8,3 = g8,5 =
Results on Rotation Symmetric Bent
173
7, g8,4 = 10. Thus we need to search a space of size 229 (24 − 1) ≈ 233 and the search needed little more than a day on a Pentium 1.6 GHz computer with 256 MB RAM using Linux 7.2 operating system. We searched the complete space and found 10272 such functions. The ∆f (autocorrelation values) of the functions are 32 (2176 many), 40 (1024 many), 48 (128 many), 64 (6688 many) and 128 (256 many). Next we searched the set of these 10272 functions for the propagation property. There are 2672 such functions. The ∆f (autocorrelation values) of the functions are 32 (384 many), 40 (256 many), 64 (1936 many) and 128 (96 many). Thus we have the following theorem. Theorem 4. There are 10272 many (8, 1, 6, 116) RSBFs f with f (0) = 0. Among them we have 2672 many (8, 1, 6, 116) RSBFs which are also PC(1) and out of them 384 many functions have ∆f value as low as 32. The following one is the truth table (in Hex) of an (8, 1, 6, 116), PC(1) RSBF with ∆f = 32. 0055 6267 7d59 2d7a 3be6 32c3 4da2 3bcc 0f8b fd3c 5a49 b05a 31f6 c94c 5e9a e4a0 Next we concentrate on 9-variable functions. As we discuss, it will be clear that even if the search space is reduced, it is not possible to go for an exhaustive search. Thus we attempted heuristic search using simulated annealing. Note that the details of simulated annealing is not included in this paper and that has been published in [2]. Let us consider the (9, 2, 6, 240) functions with f (0) = 0. There can not be 5 any term of degree 7, 8, 9. Thus we need to take any combination from i=1 g9,i groups and at least one group from g9,6 groups. Now g9,1 = 1, g9,2 = 4, g9,3 = 5
g9,6 = 10, g9,4 = g9,5 = 14. Thus the search space is of size 2 i=1 g9,i (2g9,6 − 1) = 243 (210 − 1) ≈ 253 . With the current computational facility this search would be extremely time consuming. Hence we attempted heuristic search in this case and succeeded to get such functions. Note that this function was posed as an important open question in [13, 14]. The best possible functions that have been achieved earlier [13] are (9, 2, 6, 232) and (9, 2, 5, 240), i.e., the first one has smaller nonlinearity (than the upper bound 240) when the algebraic degree was maximum and the second one has smaller algebraic degree (maximum upper bound 6) when the nonlinearity was maximum. Next we consider the (9, 3, 5, 240) functions with f (0) = 0. There can not be any term of degree 6, 7, 8, 9. Thus we need to take any combination from 4 g groups and at least one group from g9,5 groups. Thus the search space 9,i i=1 4
is of size 2 i=1 g9,i (2g9,5 − 1) = 229 (214 − 1) ≈ 243 . Though this search space is not extremely large, with our current implementation it is expected to take almost 3 years to complete the search on a single Pentium 1.6 GHz computer with 256 MB RAM using Linux 7.2 operating system. Hence we attempted heuristic search, but could not succeed. Instead we could achieve unbalanced [9, 3, 5, 240] functions, which were also not known earlier.
174
4
Pantelimon St˘ anic˘ a et al.
Rotation Symmetric Bent Functions
Let us now discuss a sieving strategy for rots bent functions. Given the matrix n A, a rots bent function needs to satisfy item 5 of Lemma 2. Thus the idea is to get the RSTT of the function can be seen as a column of gn elements. nwhich −1 (−1)f (Λn,i ) n Ai,j and check whether this is Now one needs to calculate gi=0 n equal to ±2 2 for 0 ≤ j ≤ gn − 1. The first time it fails for some j, we terminate checking that function and go for the next. This gives a very good performance for search strategies. At the time of the search we can consider that b(0) = 0 and the function is free from linear terms. Moreover, for a bent function, the maximum possible algebraic degree is n2 . Here the matrix n B comes into play. We need to consider only those columns of n B where 2 ≤ wt(Λn,j ) ≤ n2 . Then we choose all the linear combinations of those columns n and then search for the bent functions. Thus the 2 we ignore the all zero algorithm needs to check 2 i=2 gn,i − 1 combinations gas n −1 combination. Note that in this case once we get any i=0 (−1)f (Λn,i ) n Ai,j not n equal to ±2 2 for 0 ≤ j ≤ gn − 1, then we need not check the function further for bentness and check the next function. Thus the process of sieving is much faster. Filiol and Fontaine [4] counted all the bent functions b on 8-variables where b(0) = 0 and b is free from linear terms. There are 3776 such functions and in total 3776 × 4 = 15104 many. With the matrices 6 A,6 B, and using our sieving method we need just one minute on a Pentium 1.6 GHz computer with 256 MB RAM using Linux 7.2 operating system. The number of functions to be checked 4
is 2 i=2 gn,i − 1 = 221 − 1 for n = 8. Note that, g10 = 108 and gn,2 = 5, gn,3 = 12, gn,4 = 22, gn,5 = 26. Thus the search required is 265 − 1 and with the current computational facility, it is not possible to exhaust this set easily. That is the reason some kind of heuristic search is required in this case and we found enough number of bent functions in each attempt using simulated annealing. We can also increase the speed of the algorithm by noting that there can not be any single cycle rots bent function of degree ≥ 3. In [15] it has been observed that up to 10-variables, there is no rotation symmetric homogeneous bent function with degree > 2 and it has been conjectured that it is true for any even n. Our result on single cycle rots bent functions provides a partial answer to that. We have already denoted Vn = {0, 1}n. For a Boolean function f : V2n → V1 , let ki (i = 1, . . . , 4) be the number of input bits 1 (i.e., x with f (x) = 1) in each of the quarters of f . If S is a bit string, by (S)u or Su we shall mean the string obtained by concatenation of u copies of S. The concatenation of two strings ¯ is the complement of h, and for u, v will be denoted by uv or u|v. Further, h ˆ fixed integer d, h is equal to h (bit string in Vs ) with the last 2s−d bits of its truth table complemented. Let A = 0, 0, 1, 1; B = 0, 1, 0, 1; C = 0, 1, 1, 0; D = 0, 0, 0, 0; U = 1, 0, 0, 0; V = 0, 0, 0, 1; X = 0, 1, 0, 0; Y = 0, 0, 1, 0. The following result was a central proposition in [17].
Results on Rotation Symmetric Bent
175
Proposition 2. Let f : V2n → V1 be a bent Boolean function (not necessarily homogeneous) and the corresponding ki (i = 1, 2, 3, 4). Then (i) three of ki ’s are equal and one is different, and (ii) min(k1 , k2 , k3 , k4 ) ≥ 22n−3 − 2n−1 . The following lemma (Lemma 11 of [3]) turns out to be quite useful. It gives the truth table of every monomial of arbitrary degree. Lemma 4 ([3]). The truth table of any monomial xi1 · · · xis of degree s is ¯ 2n−is −2 is −is−1 −1 i −1 , D2n−i1 −2 · · · D2n−is −2 D 2 21 if 1 ≤ i1 < · · · < is ≤ n − 2, D2n−i1 −2 · · · D2n−is−1 −2 M2n−is−1 −2 2is−1 −is−2 −1 2i1 −1 ,
(1)
where M = A or B if is = n − 1, respectively is−1 < n − 1 and is = n, D2n−i1 −2 · · · D2n−is−2 −2 V2n−is−2 −2 2is−2 −is−3 −1 2i1 −1 , if is−1 = n − 1 and is = n.
Theorem 5. There are no homogeneous RSBFs with a single full cycle of degree d ≥ 3 on Vn (n ≥ 6 even) that are bent. Proof. Any full one-cycle RSBF is affinely equivalent to an RSBF f generated by x1 x2 . . . xd . We show now that the first quarter in the truth table of f has weight strictly less than 22n−3 − 2n−1 , thus contradicting Proposition 2. Therefore, f it is not bent. An immediate application of Lemma 4 gives that, for i ≤n − d − 2, the truth ¯ 2n−i−d−2 ) i−1 , table of xi xi+1 . . . xi+d = D2n−i−2 · · · (D2n−i−d−2 D 2 xn−d · · · xn−2 xn−1 = (D2d−2 · · · (DA))2n−d−1 , and xn−d+1 · · · xn−1 xn = (D2d−3 · · · (DV ))2n−d , therefore the first quarter of the truth table of f is given by the first quarter of n−d−2
¯ 2n−i−d−2 ) i−1 + (D2d−1 −1 A)2n−d−1 D2n−i−2 · · · (D2n−i−d−2 D 2
i=1
+(D2d−2 −1 V )2n−d =
n−d−2
¯ 2n−i−d−2 i−1 + (D2d−2 −1 V D2d−2 −1 Y ) n−d−1(2) D2n−i−1 −2n−i−d−2 D 2 2
i=1
To see that it is so, observe that the only terms missing are x1 xn−d+2 · · · xn−1 xn + · · ·. But all these contain x1 · · · xn−1 xn . Therefore, in all the missing terms, i1 = 1, is−1 = n − 1, is = n, so the last case of Lemma 4 implies that they all have 0 in the first quarter of their truth table, so all these terms do not contribute anything to the weight of the first quarter of f . For easy writing, denote the first quarter in the truth table of f (on Vn ) . Let n = d + 2 and consider hdd . Since the first quarter of the truth table by hn−2 d of f (on Vn ), that is hdd , is obtained by taking the last two variables xd+1 = xd+2 = 0, and since the degree is d, it follows easily that hdd is nonzero only for x1 = x2 = · · · = xd = 1, that is, hdd = D2d−2 −1 V . Inductively on s, by using
176
Pantelimon St˘ anic˘ a et al.
ˆ s−1 (write the the displayed relation (2), we obtain the recurrence hsd = hs−1 d hd displayed relation (2) for s − 1 and s, and look at how the first quarter of that expression for s changes from the expression for s − 1; this is why we needed the definition for ˆ h, to explain that change). As example, let d = 3, and f be the RSBF generated by x1 x2 x3 . Write f q(f ) for the first quarter of f . If n = 5, ) = then the RSTT of f q(f ) = 00000001 = DV ; if n = 6, then f q(f ) = DV (DV DV DY ; if n = 7, then f q(f ) = DV DY (DV DY ) = DV DY DV DY . When d is fixed we shall write hsd as hs . Using the recurrence and Maple (a trademark of Waterloo Maple) we obtained easily that the sequence of weights of hn for the first few values of n, say d ≤ n ≤ d + 10 is n d d + 1 d + 2 d + 3 d + 4 d + 5 d + 6 d + 7 d + 8 d + 9 d + 10 wt(hnd ) 1 2 6 14 32 72 156 336 712 1496 3120
(3)
ˆ s−1 , we get Fixing d, and using the recurrence hs = hs−1 h s−4
ˆ ¯ s−4 ¯h hs = hs−1 hs−2 hs−3 h
ˆ s = hs−1 hs−2 ¯hs−3 hs−4 h ˆ s−4 . and h
ˆ s , we ˆs the weight of h Therefore, denoting by ws the weight of hs , and by w s s−1 s−2 s s−2 s s−1 + 2w − w + 2 , and w = w +w ˆs−1 . arrive at the identities w ˆ = 2w We deduce (s ≥ 6) wt(hs ) = 2 wt(hs−2 ) + wt(hs−3 ) + 2s−3 . (4) s+2
Next we want to prove that wt(hsd ) < wt(hs3 ) < 2s−1 − 2 2 , s ≥ 5, d > 3. From these inequalities we derive the theorem. The first inequality on weights follows easily from the recursive definition of hsd . The second inequality will be proved by induction. If s = 5, then wt(h5d ) = 6 < 24 − 23 = 8; if s = 6, then wt(h6d ) = 14 < 25 − 24 = 16; if s = 7, then wt(h7d ) = 32 < 26 − 24 = 48. They are certainly true. Assume the inequality for all values from 5 to true n−2 n−4 ) = 2 wt(h ) + wt(hn−5 ) + 2n−5 ≤ n − 1. Now, for dimension n, wt(h d
2 2n−5 − 2 2
n−3
−2
n 2
n−2 2
+ 2n−6 − 2
, since 2
n−2 2 +1
n−3 2
+2
+ 2n−5 = 2n−3 − 2
n−3 2 +1
>2
n 2
.
n−2 2 +1
− 2
n−3 2 +1
<
References [1] J. Clark, J. Jacob, S. Stepney, S. Maitra and W. Millan. Evolving Boolean Functions Satisfying Multiple Criteria. In INDOCRYPT 2002, Volume 2551 in Lecture Notes in Computer Science, pages 246–259, Springer Verlag, 2002. 172 [2] J. Clark, J. Jacob, S. Maitra and P. Stanica. Almost Boolean Functions: The Design of Boolean Functions by Spectral Inversion. In CEC 2003, the 2003 Congress on Evolutionary Computation, Volume 3 in the proceedings, page 2173–2180, IEEE Press, December 8–12, 2003, Canberra, Australia. 162, 173 [3] T. W. Cusick and P. St˘ anic˘ a. Fast Evaluation, Weights and Nonlinearity of Rotation-Symmetric Functions. Discrete Mathematics, pages 289-301, vol 258, no 1-3, 2002. 162, 175
Results on Rotation Symmetric Bent
177
[4] E. Filiol and C. Fontaine. Highly nonlinear balanced Boolean functions with a good correlation-immunity. In Advances in Cryptology - EUROCRYPT’98. Springer-Verlag, 1998. 161, 162, 165, 174 [5] X. Guo-Zhen and J. Massey. A spectral characterization of correlation immune combining functions. IEEE Transactions on Information Theory, 34(3):569–571, May 1988. 163 [6] R. Lidl and H. Niederreiter. Introduction to finite fields and their applications. Cambridge University Press, 1994. 171 [7] S. Maitra and E. Pasalic. Further constructions of resilient Boolean functions with very high nonlinearity. IEEE Transactions on Information Theory, 48(7):1825– 1834, July 2002. 161, 172 [8] S. Maitra and P. Sarkar. Cryptographically significant Boolean functions with five-valued walsh spectra. Theoretical Computer Science, Volume 276, Number 1–2, pages 133-146, 2002. 172 [9] S. Maity and T. Johansson. Construction of Cryptographically Important Boolean Functions. In INDOCRYPT 2002, Volume 2551 in Lecture Notes in Computer Science, pages 234–245, Springer Verlag, 2002. 172 [10] E. Pasalic, S. Maitra, T. Johansson and P. Sarkar. New constructions of resilient and correlation immune Boolean functions achieving upper bound on nonlinearity. In Workshop on Coding and Cryptography - WCC 2001, Paris, January 8–12, 2001. Electronic Notes in Discrete Mathematics, Volume 6, Elsevier Science, 2001. 162 [11] J. Pieprzyk and C. X. Qu. Fast Hashing and Rotation-Symmetric Functions. Journal of Universal Computer Science, pages 20-31, vol 5, no 1 (1999). 162 [12] B. Preneel, W. Van Leekwijck, L. Van Linden, R. Govaerts, and J. Vandewalle. Propagation characteristics of Boolean functions. In Advances in Cryptology EUROCRYPT’90, Lecture Notes in Computer Science, pages 161–173. SpringerVerlag, 1991. 164 [13] P. Sarkar and S. Maitra. Construction of nonlinear Boolean functions with important cryptographic properties. In Advances in Cryptology - EUROCRYPT 2000, number 1807 in Lecture Notes in Computer Science, pages 485–506. Springer Verlag, May 2000. 162, 163, 173 [14] P. Sarkar and S. Maitra. Nonlinearity bounds and constuction of resilient Boolean functions. In Mihir Bellare, editor, Advances in Cryptology - Crypto 2000, pages 515–532, Berlin, 2000. Springer-Verlag. Lecture Notes in Computer Science Volume 1880. 162, 163, 173 [15] P. St˘ anic˘ a and S. Maitra. Rotation Symmetric Boolean Functions – Count and Cryptographic Properties. In R. C. Bose Centenary Symposium on Discrete Mathematics and Applications, December 2002. Electronic Notes in Discrete Mathematics, Elsevier, Volume 15. 162, 164, 165, 174 [16] P. St˘ anic˘ a and S. Maitra. A constructive count of Rotation Symmetric functions. Information Processing Letters, 88:299–304, 2003. 162 [17] T. Xia, J. Seberry, J. Pieprzyk, C. Charnes. Homogeneous bent functions of degree n in 2n variables do not exist for n > 3. Discrete Mathematics, (to appear). 174 [18] X-M. Zhang and Y. Zheng. GAC – the criterion for global avalanche characteristics of cryptographic functions. Journal of Universal Computer Science, 1(5):316–333, 1995. 164
A Weakness of the Linear Part of Stream Cipher MUGI Jovan Dj. Goli´c System on Chip, Telecom Italia Lab, Telecom Italia Via Reiss Romoli 274, I-10148 Turin, Italy
[email protected]
Abstract. The linearly updated component of the stream cipher MUGI, called the buffer, is analyzed theoretically by using the generating function method. In particular, it is proven that the intrinsic response of the buffer, without the feedback from the nonlinearly updated component, consists of binary linear recurring sequences with small linear complexity 32 and with extremely small period 48. It is then shown how this weakness can in principle be used to facilitate the linear cryptanalysis of MUGI with two main objectives: to reconstruct the secret key and to find linear statistical distinguishers. Keywords: Stream ciphers, combiners with memory, linear finite-state machines, linear cryptanalysis
1
Introduction
MUGI is a specific keystream generator for stream cipher applications proposed in [8]. Due to its design rationale, it is suitable for software implementations. Efficient hardware implementations in terms of speed are also possible, but are more complex with respect to the gate count than the usual designs based on linear feedback shift registers (LFSRs), nonlinear combining functions, and irregular clocking. MUGI has been evaluated within the CRYPTREC project of the Information-technology Promotion Agency for possible electronic government applications in Japan. It is interesting to note that in mathematical terms, the structure of MUGI as well as of PANAMA, which is a stream cipher of a similar type previously proposed in [1], is essentially one of a combiner with memory, which is a wellknown type of keystream generators (see [5]). Specific features are the following: – the nonlinear combining function has a large internal memory size and is based on a round function of the block cipher AES [2] – the driving linear finite-state machine (LFSM) providing input to the combining function is not an LFSR with a primitive connection polynomial – the LFSM receives feedback from a part of the internal memory of the combining function
B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 178–192, 2004. c International Association for Cryptologic Research 2004
A Weakness of the Linear Part of Stream Cipher MUGI
179
– the output at a given time is a binary word taken from the internal memory of the combining function and then bitwise added to the plaintext word to produce the ciphertext word. A security analysis of MUGI is presented in [9] and [10]. The main claims are essentially that MUGI is not vulnerable to common attacks on block ciphers and also to some attacks on stream ciphers. In particular, the linear cryptanalysis method for block ciphers [7] is adapted to deal with MUGI and to show that particular linear approximations cannot be effective for MUGI. However, some general methods for analyzing stream ciphers based on combiners with memory, most notably the so-called linear cryptanalysis of stream ciphers [3], [4], [5], and [6], are not addressed. Since MUGI can essentially be regarded as a combiner with memory, such methods are in principle also applicable to MUGI. Also, the underlying LFSM of MUGI, the so-called buffer, is not analyzed in [9] and [10]. Recall that linear cryptanalysis of product block ciphers composed of an iterated round function is based on the fact that the input and output to the block cipher are known, in the known plaintext scenario, and that all the intermediate outputs are unknown. Linear cryptanalysis of stream ciphers is essentially different from the linear cryptanalysis of block ciphers because of the underlying iterative structure in which the initial state is unknown and the output sequence produced from a sequence of internal states is known, in the known plaintext scenario. It essentially consists in finding linear relations among the unknown internal variables, possibly conditioned on the known output sequence, that hold with probabilities different from one half. It has two main objectives: – to reconstruct the initial state of the keystream generator as well as the secret key – to derive a linear statistical distinguisher which can distinguish the output sequence from a purely random sequence, defined as a sequence of mutually independent uniformly distributed random variables. The main objective of this paper is to show that the linear part of MUGI, that is, the buffer is analyzable and that it is surprisingly weak. The second objective is to investigate how this weakness can in principle be used to facilitate the cryptanalysis of MUGI, especially the linear cryptanalysis. The paper is organized as follows. Section 2 contains a brief description of MUGI. Analysis of the LFSM of MUGI is presented in Section 3, with some elements shown in the Appendix, a related transformation that eliminates the LFSM from the underlying system of nonlinear recurrences is given in Section 4, and the framework for the linear cryptanalysis of MUGI is outlined in Section 5. Section 6 contains a summary of the established weaknesses of MUGI and problems for future investigation.
2
Description of MUGI
A concise description of MUGI is specified here in as much detail as needed for the analysis. More details can be found in [8].
180
Jovan Dj. Goli´c
The keystream generator is essentially a combiner with memory, with specific properties described in Section 1. It is a finite-state machine (FSM) whose internal state has two components: – a linearly updated component, called buffer, b = b0 b1 · · · b15 , where each bi is a 64-bit word; the size of this component is 1024 bits – a nonlinearly updated component, called state, a = a0 a1 a2 , where each ai is a 64-bit word; the size of this component is 192 bits. The next-state or update function is invertible and has two components, φ = (ρ, λ) , where ρ updates a and λ updates b, that is, (a(t+1) , b(t+1) ) = (ρ(a(t) , b(t) ), λ(a(t) , b(t) )). The ρ component is a nonlinear function defined in terms of an invertible (64 × 64)−bit function F by a kind of Feistel structure. More precisely: (t+1)
= a1
(t+1)
= a2 ⊕ F (a1 ⊕ b4 ) ⊕ C1
(t+1)
= a0 ⊕ F (a1 ⊕ <17 b10 ) ⊕ C2 .
a0 a1 a2
(t) (t)
(t)
(t)
(t)
(t)
(t)
(t)
(1)
(t)
The ρ function is invertible for any given b4 and <17 b10 . Here, for a 64-bit word x,
i x denote the rotations of x by i bits to the left and right, respectively. The function F is derived from the round function of AES and is a composition of (G, G) and a permutation of 8 bytes, where a (32 × 32)−bit function G is a composition of a parallel combination of 4 (8 × 8)−bit S-boxes and a linear (32 × 32)−bit function, MixColumn, of AES (see [2]). The bytepermutation is (4,5,2,3,0,1,6,7) and C1 and C2 are 64-bit constants. The λ component is a linear function defined by the following equations: (t+1)
bi
(t+1) b0 (t+1) b4 (t+1) b10
(t)
= bi−1 , = = =
(t) b15 (t) b3 (t) b9
⊕ ⊕ ⊕
i = 0, 4, 10 (t) a0 (t) b7 <32 (t) b13 . (t)
(2)
The λ function is invertible for any given a0 . (t) The 64-bit output of the keystream generator at time t is defined as a2 . The initial internal state of the keystream generator is produced from a 128bit secret key K and a 128-bit initialization vector IV in three stages, by using the keystream generator itself. In the first two stages only ρ is used and in the third stage both ρ and λ are used. At the beginning of the third stage, the initial value of a, a(0) , depends on both K and IV , but the initial value of the buffer b, b(0) , depends on K only. The third stage consists of iterating φ (i.e., both (t) ρ and λ) 15 times, without producing the output (a2 )15 t=0 , and the keystream generation starts from the 16-th iteration on.
A Weakness of the Linear Part of Stream Cipher MUGI
Fig. 1.
3
181
The buffer as LFSM
Analysis of Buffer
In this section, the buffer is analyzed as a non-autonomous LFSM with one (t) input sequence, namely, a0 = (a0 )∞ t=0 . The input sequence and all the internal sequences in the buffer are 64-bit sequences. Our objective is to derive expressions for the internal sequences in the buffer in terms of the input sequence a0 and (0) (0) (0) the initial state of the buffer, b(0) = b0 b1 · · · b15 . In view of the λ update function, the 16 internal sequences in the buffer can be divided in three groups, in each group the sequences being phase shifts of each other (see Fig. 1, where Rj denotes the rotation by j bits to the left, which is a linear transformation of a 64-bit word). 3.1
Linear Recurrences
From the λ update function, we directly obtain the following linear recurrences, all holding for t ≥ 1: (t)
(t−4)
b4 = b4 (t)
(t−4)
⊕ b0
(t−4)
b10 = R32 b10 (t)
(t−1)
b 0 = a0
(t−6)
⊕ b4
(t−6)
⊕ b10
.
(3)
In vectorial notation where vectors are represented as one-column matrices, R32 is represented as a matrix. The initial state of the buffer can now be represented (t) (t) (t) as (b0 )0t=−3 (b4 )0t=−5 (b10 )0t=−5 . Then, by eliminating b0 , we obtain (t)
(t−4)
(t) b10
(t−6) b4
b4 = b4 =
(t−10)
⊕ b10 ⊕
(t−5)
⊕ a0
(t−4) R32 b10 ,
,
t ≥ 1.
t≥5 (4)
This is a system of two 64-bit linear recurrences (that is, 128 binary linear recurrences) in terms of 64-bit sequences b4 and b10 , where a0 is regarded as a given 64-bit sequence.
182
3.2
Jovan Dj. Goli´c
Generating Functions
The system can be solved by using the generating function technique dealing with the z-transforms of 64-bit sequences. In vectorial notation, the z-transforms or generating functions of b4 , b10 , and a0 are defined as formal power series B4 =
∞
(t)
b4 z t ,
B10 =
t=0
∞
(t)
b10 z t ,
A0 =
t=0
∞
(t)
a0 z t ,
(5)
t=0
respectively. It is shown in the Appendix how to convert the system (4) of linear recurrences from the time domain into the generating function domain. Namely, letting I denote the 64 × 64 identity matrix, we thus obtain the representation (1 ⊕ z 4 )B4 ⊕ z 10 B10 = z 5 A0 ⊕ ∆1
(6)
z 6 B4 ⊕ (I ⊕ z 4 R32 )B10 = ∆2
(7)
where (0)
(0)
∆1 = b4 ⊕ z 4 b0 ⊕
3
(t−4)
(b4
(t−4)
⊕ b0
)z t ⊕
t=1
∆2 =
5 t=1
(t−6) t
b4
9
(t−10) t
b10
z
(8)
t=5
(0)
z ⊕ b10 ⊕ R32
3
(t−4) t
b10
z
(9)
t=1
are 64-dimensional vectors (64 × 1 matrices) whose elements are polynomials in z defined by the initial state of the buffer and whose degrees are at most 9 and 5, respectively. Essentially, we obtain a system of 128 linear equations with coefficients being polynomials in z and with unknowns being 128 generating functions of 64 binary sequences in b4 and 64 binary sequences in b10 . 3.3
Solution
The system has a unique solution which can be found in the following way. First, by elimination we obtain F (z)B4 = z 5 (I ⊕ z 4 R32 )A0 ⊕ (I ⊕ z 4 R32 )∆1 ⊕ z 10 ∆2
(10)
F (z)B10 = z 11 A0 ⊕ z 6 ∆1 ⊕ (1 ⊕ z 4 )∆2
(11)
F (z) = I ⊕ z 4 (I ⊕ R32 ) ⊕ z 8 R32 ⊕ z 16 I
(12)
where
denotes a 64 × 64 matrix whose coefficients are polynomials in z of degree at most 16.
A Weakness of the Linear Part of Stream Cipher MUGI
183
When regarded over a field of rational functions in z, F (z) is invertible as is seen from the following equation: F (z)F (z) = (I ⊕ z 4 (I ⊕ R32 ) ⊕ z 8 R32 ⊕ z 16 I)2 2 2 = I ⊕ z 8 (I ⊕ R32 ) ⊕ z 16 R32 ⊕ z 32 I = I ⊕ z 8 (I ⊕ I) ⊕ z 16 I ⊕ z 32 I = (1 ⊕ z ⊕ z 2 )16 I
(13)
2 because of R32 = I. Thus we get that F (z)/f (z) is the inverse of F (z), where
f (z) = 1 ⊕ z 16 ⊕ z 32 = (1 ⊕ z ⊕ z 2 )16 =
1 ⊕ z 48 . 1 ⊕ z 16
(14)
Accordingly, we obtain the solution for the generating functions B4 and B10 in the form of 1 5 1 z G(z)A0 ⊕ ∆ f (z) f (z) 1 1 1 = z 5 (1 ⊕ z 16 )G(z)A0 ⊕ (1 ⊕ z 16 )∆1 48 1⊕z 1 ⊕ z 48
B4 =
1 11 1 z F (z)A0 ⊕ ∆ f (z) f (z) 2 1 1 = z 11 (1 ⊕ z 16 )F (z)A0 ⊕ (1 ⊕ z 16 )∆2 1 ⊕ z 48 1 ⊕ z 48
(15)
B10 =
(16)
where G(z) = F (z)(I ⊕ z 4 R32 ) = (1 ⊕ z ⊕ z 2 ⊕ z 3 ⊕ z 4 )4 I ⊕ z 20 R32
(17)
denotes a 64 × 64 matrix whose coefficients are polynomials in z of degree at most 20, and ∆1 = G(z)∆1 ⊕ z 10 F (z)∆2
(18)
∆2 = F (z)(z 6 ∆1 ⊕ (1 ⊕ z 4 )∆2 )
(19)
are 64-dimensional vectors (64×1 matrices) whose elements are polynomials in z defined by the initial state of the buffer and whose degrees are at most 31. 3.4
Discussion
Both b4 and b10 have two components, one being a linear transform of the input sequence a0 to the buffer and the other being a linear transform of the initial conditions contained in ∆1 and ∆2 . For both b4 and b10 , the other, intrinsic component consists of 64 binary linear recurring subsequences each produced by an LFSR with the feedback polynomial f (z), or alternatively, by a cycling LFSR with the feedback polynomial 1 ⊕ z 48 . The following properties should then be considered as serious weaknesses of the buffer design.
184
Jovan Dj. Goli´c
– The exponent (period) of f (z) is only 48; the period of each of the intrinsic binary subsequences is thus equal to 48 or divides 48. – The degree of f (z) is only 32, and with an appropriate design it could have been as large as 1024, which is the bit-size of the internal state of the buffer. – Due to their extremely small period, the statistical properties of the intrinsic binary subsequences are very bad. – In common designs of keystream generators, the linear component, with the feedback from the nonlinear component disconnected, ensures a large period of the corresponding internal state sequence which itself very likely provides a lower bound on the period of the keystream sequence as well as good statistical properties. Consequently, the design of MUGI does not satisfy this criterion. – The polynomial f (z) defines a linear sequential transform of any buffer sequence, in particular b4 or b10 , that is equal to a linear sequential transform of the input sequence, which is produced by the nonlinear component. Its low degree, 32, small number of nonzero coefficients, 3, and small period, 48, facilitate the initial state reconstruction and finding statistical distinguishers for the keystream sequence (see Sections 4 and 5). – The intrinsic binary subsequences do not depend on the initialization vector, but only on the secret key. Namely, for each 1 ≤ j ≤ 64, the j-th binary (0) (0) 15 subsequence of both b4 and b10 solely depends on (bi,j )15 i=0 (bi,(j+32)mod 64 )i=0 , that is, on the j-th and the (j + 32)mod 64 -th binary subsequences of b(0) , which are defined by 32 secret key bits only. This is because the mixing between different binary subsequences in the buffer, provided by the linear transform R32 , is not good. Divide-and-conquer secret key reconstruction attacks may be facilitated by this property.
4
Elimination of Buffer
The obtained expressions (15) and (16) for the generating functions of the 64bit buffer sequences b4 and b10 , respectively, can be transformed into the time domain and then appropriately substituted in the recurrences (1) for the update function ρ. In this way, we can derive the recurrences involving only the state sequences a1 and a2 , where the output sequence a2 is assumed to be known, (t) in the known plaintext scenario, except for the first 16 outputs (a2 )15 t=0 , which are discarded. Namely, from (1) we first eliminate a0 and use the fact that F is invertible to get for t ≥ 0 a1 ⊕ b4 = F −1 (a1 (t)
(t)
(t+1)
a1 ⊕ <17 b10 = F −1 (a1 (t)
(t)
(t−1)
(t)
⊕ a2 ⊕ C1 )
(20)
(t+1)
(21)
⊕ a2
⊕ C2 )
where F −1 is the inverse of F and formally a1 = a0 . Then, by converting (15) and (16) into the time domain we get the following linear recurrences holding (−1)
(0)
A Weakness of the Linear Part of Stream Cipher MUGI
185
for t ≥ 48: (t)
(t−48)
b4 ⊕ b4
(t−5)
= a0
(t−9)
⊕ a0
(t−33)
⊕ a0
(t−11)
⊕ a0
⊕ a0 (t)
(t−48)
b10 ⊕ b10
= a0 ⊕
(t−13)
⊕ a0
(t−17)
⊕ a0
(t−37)
⊕ <32 a0
(t−15)
⊕ a0
<32 (t−19) a0
(t−25)
(t−25)
⊕ a0
(t−41)
⊕ <32 a0
(t−31)
⊕ a0
(t−31)
⊕ <32 a0
⊕ <32 a0
(t−29)
⊕ a0
(t−43)
(22) (t−15)
⊕ <32 a0
(t−35)
.
(23)
Finally, by combining (20) with (22) and (21) with (23), we get the following recurrences involving a1 and a2 only, which hold for t ≥ 48: (t)
(t−48)
a1 ⊕ a1 (t−6)
a1
⊕ F −1 (a1
(t−10)
⊕ a1
(t+1)
(t−14)
⊕ a1
(t)
(t−48)
⊕ F −1 (a1
<17 (t−12) a1
(t−1)
(t)
(t−18)
⊕ a1
⊕ a1 ⊕ a1
⊕ a2 ⊕ C1 ) ⊕ F −1 (a1
<32 (t−26) a1
⊕
(t+1)
⊕ a2
(t−16)
⊕ <17 a1
(t−26)
⊕ a1
⊕ a2
(t−30)
⊕ a1
⊕ a1
(t−48)
⊕ C1 ) =
(t−34)
⊕ a1
(t−38)
(t−42)
⊕ <32 a1
(24)
⊕ C2 ) ⊕ F −1 (a1 (t−32)
⊕ <17 a1
<49 (t−20) a1
(t−47)
⊕ a2
(t−44)
⊕ <49 a1
⊕ <17 a1
(t−32)
⊕ <49 a1
(t−49)
(t−47)
(t−36)
⊕ <49 a1
⊕ C2 ) =
(t−16)
.
(25)
The recurrences are nonlinear because of nonlinear F −1 . As the first 16 outputs are not known, it is interesting to consider (24) and (25) only for t ≥ 64. One can also obtain different recurrences by using the polynomial f (z) of degree 32 instead of the polynomial 1 ⊕ z 48 . They will hold for t ≥ 32, but each will involve F −1 three times which makes them less useful. In principle, there are at least two general ways of using the recurrences (24) and (25). – One way is to try to eliminate a1 , possibly with certain approximation probabilities, thus yielding a recurrence in a2 holding with a certain probability which will represent a statistical distinguisher between the keystream sequence and a purely random sequence (see Section 5). – The other way is to assume that a2 is known, in the known plaintext scenario, and try to solve the corresponding nonlinear equations for a1 , possibly by an algebraic approach or a probabilistic approach using approximations to the nonlinear equations.
186
5
Jovan Dj. Goli´c
Linear Cryptanalysis of MUGI
A general way of conducting the linear cryptanalysis of stream ciphers is to linearize the (vectorial) next-state function and the output function, with certain approximation probabilities, and to analyze the LFSM resulting from these linear approximations (see [3], [4], [5], and [6]). In particular, this method is also applicable if only one bit of the internal state is known at a time. This is essentially different from the linear cryptanalysis of iterated block ciphers where one concatenates mutually correlated linear functions of the input and output to the round function. The obtained LFSM is in fact an LFSM approximation to the keystream generator, which itself is a nonlinear autonomous FSM. Our objective here is to present a framework for conducting the linear cryptanalysis of MUGI by using the results from Sections 3 and 4. Linearizing the next-state function of MUGI reduces to linearizing the nonlinear function F or its inverse F −1 . More precisely, we linearize equations (20) and (21), where, for convenience, t − 1 is substituted for t, and the sequences b4 and b10 are determined by (15) and (16), respectively. In turn, linearizing F −1 reduces to linearizing the S-boxes of AES. The effectiveness of the linear cryptanalysis depends on the underlying linear approximations and the corresponding approximation probabilities. A linear approximation to an S-box is a pair of input, α, and output, β, linear functions with a nonzero correlation coefficient c(α, β) = Pr(α = β)−Pr(α = β). It is well known that the maximal correlation coefficient magnitude is 1/8. This value is relatively small, but allows a lot of freedom when choosing the linear approximations. In particular, for any given α, there are 5 values of β with c(α, β) = 1/8, 16 values of β with c(α, β) = 7/64, and 36 values of β with c(α, β) = 6/64. A linear approximation to F −1 consists of linear approximations to the 64 component Boolean functions and each of the 64 correlation coefficients is determined by the corresponding active S-box. More generally, one can also consider linear approximations to any 64 linearly independent linear combinations of the 64 component Boolean functions of F −1 . In this case more than just one S-box can be active for each linear combination considered, so that the resulting correlation coefficient is the product of the involved correlation coefficients of active S-boxes. In the more general scenario, let L2 define a set of 64 linear approximations to 64 linearly independent linear combinations of the component Boolean functions of F −1 defined by L1 F −1 . More precisely, an invertible matrix L1 is first applied to both (20) and (21) and then a matrix L2 is substituted for L1 F −1 on the right-hand sides of the equations. We thus get (t−1)
L 1 a1
(t−1)
L 1 a1
(t−1)
⊕ L1 b4
(t−1)
⊕ L1 <17 b10
(t)
(t−1)
= L 2 a1 ⊕ L 2 a2 (t−2)
= L 2 a1
(t)
⊕ L2 C1 ⊕ e1
(t)
(t)
⊕ L2 a2 ⊕ L2 C2 ⊕ e2
(26) (27)
where e1 and e2 are the 64-bit approximation-error sequences whose binary component subsequences are expressed as nonbalanced Boolean functions of the cor-
A Weakness of the Linear Part of Stream Cipher MUGI
187
responding inputs to F −1 . The greater the imbalance, the better the underlying linear approximations. Equations (26) and (27) in fact define an LFSM with input sequences e1 and e2 . The LFSM can be solved for a1 and a2 by using the generating function (0) method. Let A1 , A2 , E1 , E2 , and A0 = zA1 ⊕a0 denote the generating functions of a1 , a2 , e1 , e2 , and a0 , respectively. Then after some algebraic manipulations, by using the expressions (15) and (16) for B4 and B10 , in terms of 1 ⊕ z 48 , respectively, we get F1 (z)A1 = z(1 ⊕ z 48 )L2 A2 ⊕ (1 ⊕ z 48 )E1 ⊕ ∆1 ⊕ (1 ⊕ z 48 )C1
(28)
F2 (z)A1 = (1 ⊕ z 48 )L2 A2 ⊕ (1 ⊕ z 48 )E2 ⊕ ∆2 ⊕ (1 ⊕ z 48 )C2
(29)
where C1 = C1 /(1 ⊕ z) and C2 = C2 /(1 ⊕ z) denote the generating functions of the constant sequences, of period equal to 1, corresponding to the constants C1 and C2 , respectively, and F1 (z) = (1 ⊕ z 48 )(zL1 ⊕ L2 ) ⊕ z 7 (1 ⊕ z 16 )L1 G(z)
(30)
F2 (z) = z (1 ⊕ z 48 )(L1 ⊕ zL2 ) ⊕ z 12 (1 ⊕ z 16 )L1 R17 F (z)
(31)
∆1 = (1 ⊕ z 16 )(zL1 ∆1 ⊕ z 6 L1 G(z)a0 ) ⊕ (1 ⊕ z 48 )L2 a1
(32)
(0)
(0)
(0) ∆2 = z(1 ⊕ z 16 )L1 R17 ∆2 ⊕ z z 11 (1 ⊕ z 16 )L1 R17 F (z) ⊕ (1 ⊕ z 48 )L2 a0 (0)
⊕ (1 ⊕ z 48 )L2 a2 .
(33)
Note that ∆1 and ∆2 are 64-dimensional vectors whose elements are polynomials in z defined by the initial state of the whole keystream generator (b(0) (0) (0) (0) and a0 a1 a2 ) and whose degrees are at most 49. Here E1 and E2 are generating functions of unknown, but nonbalanced sequences, so that (28) and (29) in fact constitute a system of binary linear recurrences each holding with a probability different from one half. To eliminate the unknown A1 from the system, assume for simplicity that F1 (z) is invertible, where F1 (z)−1 = F1 (z)∗ / det F1 (z), F1 (z)∗ being the adjunct matrix of F1 (z). (As L1 is invertible, it is likely that F1 (z) or F2 (z) are invertible.) We then obtain a correlation equation F2 (z)F1 (z)∗ ∆1 ⊕ det F1 (z)∆2 = (F2 (z)F1 (z)∗ E1 ⊕ det F1 (z)E2 ) 1 ⊕ z 48 ⊕ (zF2 (z)F1 (z)∗ ⊕ det F1 (z)I) L2 A2 ⊕ (F2 (z)F1 (z)∗ C1 ⊕ det F1 (z)C2 ) . (34)
188
Jovan Dj. Goli´c
In the time domain, the left-hand side of (34) is a 64-bit sequence, x , in the generating function domain denoted as X, which depends on the initial conditions, and the last two terms on the right-hand side of (34) are linear sequential transforms of the (known) sequence a2 and of the constant sequences corresponding to C1 and C2 , respectively. The first term on the right-hand side of (34), E = F2 (z)F1 (z)∗ E1 ⊕ det F1 (z)E2 , is the noise term depending on the performed linear approximations. Consequently, (34) means that the linear recurring sequence x depending on the initial conditions which is ultimately periodic with period of only 48 (or smaller) is termwise correlated to a linear sequential transform of a2 and of the constant sequences corresponding to C1 and C2 . Equivalently, this is the case for each of the 64 constituent binary subsequences. 5.1
Initial State Reconstruction
If t = 0 is taken as the initial time, then the first 16 elements of the output (t) (0) sequence, (a2 )15 , rept=0 , are unknown, and the initial state of the buffer, b resented through ∆1 and ∆2 , depends on the secret key only. Alternatively, if t = 16 is taken as the initial time, then the output sequence is known, but the initial state of the buffer, b(16) , depends on the initialization vector as well. Note (0) (0) (0) that the initial state of MUGI, b(0) and a0 a1 a2 , can be obtained from any (16) (16) (16) internal state of MUGI (e.g., b(16) and a0 a1 a2 ) by reversing the next-state (0) (0) function. Furthermore, the 128-bit secret key can be obtained from b0 b1 by reversing the update function ρ. The effectiveness of the correlation equation (34) is determined by how much the probabilities for the 64 underlying binary noise subsequences of the resulting noise sequence e deviate from one half. Let ci = 1 − 2pi denote the (positive or negative) correlation coefficient of the i-th binary noise subsequence of e, where pi is the probability that the noise bit is equal to 1. The correlation coefficients of the involved approximation-error sequences e1 and e2 depend on the linearization of S-boxes and their magnitudes are equal to 2−3 or are close to this value if only one S-box is active per each linear approximation involved. If we assume that these subsequences are mutually independent sequences of mutually independent and identically distributed binary random variables, then the correlation coefficient magnitude is given as |ci | = 2−3mi where mi is the total number of binary terms from e1 and e2 present in this subsequence, that is, the total number of nonzero binary coefficients of the polynomials in the i-th row of F2 (z)F1 (z)∗ and in det F1 (z). Here we used a well-known fact that the correlation coefficient of a binary sum of mutually independent binary random variables is equal to the product of their individual correlation coefficients (see also the piling-up lemma from [7]). Accordingly, the periodic part of the 64-bit linear recurring sequence x, that is, any corresponding segment composed of 48 consecutive 64-bit words can in principle be statistically reconstructed by a sort of repetition attack based on the fact that any bit of the periodic part of x is repeated with period 48. If c denotes the underlying correlation coefficient for a considered bit of x, then the
A Weakness of the Linear Part of Stream Cipher MUGI
189
bit can be reconstructed by the mojority count from the given output segment of length O(48c−2 ) with complexity O(c−2 ). It is easy to see that the periodic (0) (0) part of x depends (linearly) only on b(0) and a0 and not on a1 . Namely, this is due to (0) (1 ⊕ z 16 )F2 (z)F1 (z)∗ zL1 ∆1 ⊕ z 6 L1 G(z)a0 X = 1 ⊕ z 48
⊕
(0) det F1 (z)(1 ⊕ z 16 ) zL1 R17 ∆2 ⊕ z 12 L1 R17 F (z)a0 1 ⊕ z 48
(0) (0) (0) ⊕ F2 (z)F1 (z)∗ L2 a1 ⊕ det F1 (z) zL2 a0 ⊕ L2 a2
(35)
where ∆1 and ∆2 depend only on b(0) . Accordingly, the initial time of the 48word segment of x to be reconstructed should on one hand be sufficiently large (0) (0) so as to render the terms depending on a1 and a2 in (35) vanish and, on the other, should be at least 16 so that the involved terms of the output sequence a2 in (34) are all known. The recovered 48 64-bit words of the periodic part of x define a system of 48 · 64 binary linear equations in the unknown 17 · 64 initial state bits, b(0) (0) and a0 , which can then be obtained by solving the system. The 128-bit secret (0) (0) key can be obtained from b0 b1 by reversing the update function ρ. Alternatively, if the objective is to recover the initial state of MUGI, regardless of the initialization algorithm and the secret key, but taking into account the (t) fact that the first 16 elements of the output sequence, (a2 )15 t=0 , are unknown, (16) then t = 16 should be taken as the initial time, b(16) and a0 should be recon(16) structed by the repetition attack described above, and the 64 bits of a1 can be (16) reconstructed by the exhaustive search in view of the fact that a2 is known. (0) (0) (0) Finally, the initial state, b(0) and a0 a1 a2 , is then obtained from the internal (16) (16) (16) (16) and a0 a1 a2 by reversing the next-state function. state b The feasibility of the attack can be estimated by computing the noise correlation coefficients, as described above, for different vectorial linear transformations L1 and L2 . Assuming for simplicity that the underlying correlation coefficient |c| = 2−3m is the same for all the bits of x to be reconstructed, the attack would in theory be effective if it is faster than the exhaustive search over the initial states. Note that the exhaustive search requires 18 64-bit output values to be produced for each guessed initial state or, more precisely, for each guessed internal state at time t = 16. Roughly speaking, the attack is effective if 48 · 26m < 218·64 , that is, if m ≤ 191, where the underlying complexity units are assumed to be the same. Of course, it has to be noted that the required output sequence length would be of the same order of magnitude as the complexity.
190
5.2
Jovan Dj. Goli´c
Linear Statistical Weakness
The equation (34) can also be put into the form L(z)A2 =
1 ⊕ z 48 (F2 (z)F1 (z)∗ C1 ⊕ det F1 (z)C2 ) 1⊕z ⊕ (F2 (z)F1 (z)∗ ∆1 ⊕ det F1 (z)∆2 ) ⊕ (1 ⊕ z 48 )E
(36)
where the matrix L(z) = (1 ⊕ z 48 )(zF2 (z)F1 (z)∗ ⊕ det F1 (z)I) defines a linear sequential transform of the output sequence a2 . This equation specifies a linear statistical distinguisher between the output sequence and a purely random sequence. Namely, all the terms except the noise term on the right-hand side of (36) are polynomials in z and as such vanish in the time domain after a sufficiently large t depending on the degrees of the involved polynomials. So, (36) means that a linear sequential transform of the output sequence is termwise correlated to the all-zero 64-bit sequence where the approximation/correlation noise is defined by (1 ⊕ z 48 )E. Equivalently, the 64 constituent binary subsequences, obtained as linear sequential transforms of the output sequence, are bitwise correlated to the all-zero binary sequence, where the corresponding correlation coefficients can be approximated as squares of the correlation coefficients of the corresponding binary noise subsequences of e. If c is such a correlation coefficient, then the output sequence length required for detecting the weakness in the corresponding binary subsequence is O(c−4 ). The output sequence length required to detect the weakness by using all the 64 subsequences is then O(c−4 /64). The feasibility of the attack can be estimated by computing the noise correlation coefficients for different vectorial linear transformations L1 and L2 . Assuming for simplicity that the underlying correlation coefficient |c| = 2−3m is the same for all the bits of x to be reconstructed, the attack would in theory be effective if the total required output sequence length, proportional to the complexity, is smaller than the expected period for the size of the internal state, that is, if 212m /64 < 218·64 , that is, if m ≤ 96.
6
Conclusions
Our main finding is that the linearly updated component of MUGI, the socalled buffer, is not designed properly. It is proven that if the feedback from the nonlinearly updated component is disconnected, then the binary subsequences of the buffer, comprising its intrinsic response, are linear recurring sequences with the linear complexity of only 32 and with the period of only 48. Accordingly, the buffer neither provides a large lower bound on the period nor ensures good statistics of the output sequences of MUGI, unlike the usual designs of keystream generators. Furthermore, as each such subsequence depends on only 32 bits of the initial state of the buffer, the mixing between the 1024 bits of the initial state
A Weakness of the Linear Part of Stream Cipher MUGI
191
of the buffer is not good. In addition, it is pointed out that the 128-bit secret key can directly be recovered from the reconstructed internal state of MUGI at any time, which is not desirable. As a consequence of this small period, it is shown that the buffer sequence can be eliminated from the update equations for the nonlinearly updated component of MUGI, the so-called state, thus yielding nonlinear recurrences involving only the output sequence and a part of the state sequence. It is then pointed out how the weakness of the buffer can be used to facilitate the linear cryptanalysis of MUGI. This is achieved by developing a framework for the linear cryptanalysis with two main objectives: to reconstruct the initial state or the secret key and to find linear statistical distinguishers for MUGI. This framework can be used for future experiments to investigate if the proposed attacks are feasible. More precisely, the feasibility depends on the magnitude of the corresponding correlation coefficients which can be estimated by examining the corresponding matrices with polynomial elements that result from the chosen linearizations of the S-boxes.
Appendix: Generating Function Representation Firstly, by virtue of (5), the system of linear recurrences (4) results in ∞
(t)
b4 z t = z 4
t=0
∞
(t−4) t−4
b4
z
⊕ z 10
t=5 ∞
(t)
b10 z t = z 6
t=0
∞
(t−10) t−10
b10
z
⊕ z5
t=5 ∞
(t−6) t−6
b4
z
∞
(t−5) t−5
a0
z
(37)
t=5
⊕ z 4 R32
t=1
∞
(t−4) t−4
b10
z
.
(38)
t=1
Then, we get (1 ⊕ z 4 )B4 ⊕ z 10 B10 = z 5 A0 ⊕
4
(t)
(0)
b4 z t ⊕ z 4 b4 ⊕
t=0
(0)
= z 5 A0 ⊕ (1 ⊕ z 4 )b4 ⊕
4
(t−4)
(b4
(0)
(0)
(t−4)
⊕ b0
z
)z t ⊕
9
(t−10) t
b10
z
t=5
3 9 (t−4) (t−4) t (t−10) t (b4 ⊕ b0 )z ⊕ b10 z t=1
(I ⊕ z 4 R32 )B10 ⊕ z 6 B4 =
(t−10) t
b10
t=5
t=1
= z 5 A0 ⊕ b4 ⊕ z 4 b0 ⊕
9
5 t=1
(39)
t=5
(t−6) t
b4
(0)
z ⊕ b10 ⊕ R32
3
(t−4) t
b10
z .
(40)
t=1
By using the simplified notation (8) and (9), we then directly obtain (6) and (7).
192
Jovan Dj. Goli´c
Acknowledgement This work is based on a result of evaluation requested by the Japanese CRYPTREC project: http://www.ipa.go.jp/security/enc/CRYPTREC/index-e.html.
References [1] J. Daemen and C. Claap, “Fast hashing and stream encryption with PANAMA,” Fast Software Encryption - FSE ’98, Lecture Notes in Computer Science, vol. 1372, pp. 60-74, 1998. 178 [2] J. Daemen and V. Rijmen, The Design of Rijndael: AES - The Advanced Encryption Standard . Berlin: Springer-Verlag, 2002. 178, 180 [3] J. Dj. Goli´c, “Correlation via linear sequential circuit approximation of combiners with memory,” Advances in Cryptology - EUROCRYPT ’92, Lecture Notes in Computer Science, vol. 658, pp. 124-137, 1993. 179, 186 [4] J. Dj. Goli´c, “Linear cryptanalysis of stream ciphers,” Fast Software Encryption - FSE ’94, Lecture Notes in Computer Science, vol. 1008, pp. 154-169, 1995. 179, 186 [5] J. Dj. Goli´c, “Correlation properties of a general combiner with memory,” Journal of Cryptology, vol. 9, pp. 111-126, 1996. 178, 179, 186 [6] J. Dj. Goli´c, “Linear models for keystream generators,” IEEE Transactions on Computers, vol. 45, pp. 41-49, 1996. 179, 186 [7] M. Matsui, “Linear cryptanalysis method for DES cipher,” Advances in Cryptology - EUROCRYPT ’93, Lecture Notes in Computer Science, vol. 765, pp. 159-169, 1994. 179, 188 [8] D. Watanabe, S. Furuya, H. Yoshida, and K. Takaragi, MUGI Pseudorandom number generator, Specification, Ver. 1.2, 2001, available at http://www.sdl.hitachi.co.jp/crypto/mugi/index-e.html. 178, 179 [9] D. Watanabe, S. Furuya, H. Yoshida, and K. Takaragi, MUGI Pseudorandom number generator, Self-evaluation report, Ver. 1.1, 2001, available at http://www.sdl.hitachi.co.jp/crypto/mugi/index-e.html. 179 [10] D. Watanabe, S. Furuya, H. Yoshida, K. Takaragi, and B. Preneel, “A new keystream generator MUGI,” Fast Software Encryption 2002, Lecture Notes in Computer Science, vol. 2365, pp. 179-194, 2002. 179
Vulnerability of Nonlinear Filter Generators Based on Linear Finite State Machines Jin Hong1 , Dong Hoon Lee1 , Seongtaek Chee1 , and Palash Sarkar2 1
National Security Research Institute 161 Gajeong-dong, Yuseong-gu, Daejeon, 305-350, Korea {jinhong,dlee,chee}@etri.re.kr 2 Indian Statistical Institute 203, B.T. Road, Kolkata 700108, India [email protected]
Abstract. We present a realization of an LFSM that utilizes an LFSR. This is based on a well-known fact from linear algebra. This structure is used to show that a previous attempt at using a CA in place of an LFSR in constructing a stream cipher did not necessarily increase its security. We also give a general method for checking whether or not a nonlinear filter generator based on an LFSM allows reduction to one that is based on an LFSR and which is vulnerable to Anderson information leakage. Keywords: Stream cipher, nonlinear filter model, LFSR, CA, Anderson information leakage
1
Introduction
Linear feedback shift registers (LFSR) are one of the most useful building blocks for constructing stream ciphers. There are classical models of memoryless synchronous stream ciphers that utilize LFSR’s : the nonlinear filter model (NF) and the nonlinear combiner model (NC). For the NF model, building on previous works[9, 3], Anderson[1] showed that much information about the state of the LFSR may be obtained from the output key stream, if the distribution of possible states of LFSR’s in relation to output stream blocks is not uniform. And for a random NF model, this is usually quite irregular. In the paper [10], presented at CRYPTO 2002, a model that combines the NF and NC models was introduced. This model utilized a cellular automaton (CA) instead of an LFSR to eliminate the above mentioned information leakage of NF models. In the paper, it is claimed that this non-uniform distribution stems from the fact that a particular state bit of an LFSR affects the output key stream several times. This would be unavoidable in an NF generator based on an LFSR. It is also claimed that this property can be avoided through the use of a CA, thus removing Anderson information leakage. In this paper, we show this claim to be incorrect. B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 193–209, 2004. c International Association for Cryptologic Research 2004
194
Jin Hong et al.
CA is a special case of linear finite state machines (LFSM) and can be viewed as a one-dimensional array of cells. The cells change state at each clock tick, and the new state of a cell is completely determined by its present state and those of its left and right neighbors. CA’s have been applied to various fields such as biological system, fault-tolerant computation, VLSI design, and cryptography. (See [12] for a survey on the general theory of CA.) In the cryptographical field, CA’s have been used in designing hash functions and stream ciphers[8, 13]. It was believed that from the security perspective, a CA would give properties better than those of an LFSR. However, we shall show that the use of a CA in place of an LFSR does not necessarily increase security. Recalling a well-known fact from linear algebra, we give a way to realize an LFSM, utilizing an LFSR. In short, the realization is done by attaching a linear map to an LFSR. We understand that, due to its simplicity, this could have been known to experts of this field. But we could not find any references, and it seems that this fact was not looked at from the security perspective. The realization could be of interest in its own. For example, it gives a natural way of running a CA in the reverse direction, something which was thought to be a complex procedure. But as will be shown through the examination of arguments in [10], this realization also has grave consequences in the use of LFSM’s as cryptographic building blocks. The paper is organized as follows. Section 2 shall present the simple mathematical fact that is the starting point of this paper. This is used in Section 3 in realizing an LFSM using an LFSR and a linear map. In Section 4, we review the notion of Anderson information leakage and examine the system given in [10]. Using the realization of a CA which utilizes an LFSR, we shall argue that the system did not achieve its design goal. The section that follows presents an explicit example confirming these arguments. Next, in Section 6 we give a general method for checking whether or not a given NF generator based on an LFSM admits a reduction to a NF generator based on an LFSR that is vulnerable to Anderson information leakage. The last section closes the paper with some concluding remarks. Some readers might want to read Appendix E, which contains remarks on what further developments the basic idea of this paper might bring.
2
Basic Facts and Definitions
In this section we shall recall some elementary facts from linear algebra and introduce two classes of linear finite state machines, CA and LFSR. 2.1
Linear Algebra
Let us denote by I the n × n identity matrix. The characteristic polynomial of a matrix M with entries in the binary field F2 is defined to be the polynomial char(M ) = det(xI − M ) ∈ F2 [x].
(1)
Vulnerability of Nonlinear Filter Generators
195
We define the companion matrix of a monic polynomial p(x) = a0 + a1 x + · · · + an−1 xn−1 + xn ∈ F2 [x] to be the matrix
⎛
0 ⎜0 ⎜ ⎜0 ⎜ ⎜ .. ⎜. ⎜ ⎜0 ⎜ ⎝0 a0
1 0 0 ··· 0 1 0 ··· 0 0 1
0 0 0
0 0 0 .. .
(2)
⎞
⎟ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ 0 ⎟ ⎟ 1 ⎠
(3)
0 0 0 ··· ··· 0 a1 a2 · · · · · · an−2 an−1
We shall accept the following statement as a fact. Fact 1 Let p(x) be the characteristic polynomial of a square matrix M . Denote by L, the companion matrix of the monic polynomial p(x). If p(x) is irreducible, then there exists an invertible (basis transition) matrix T satisfying T M T −1 = L. This is the only fact from linear algebra we shall need in this paper. Readers familiar with linear algebra can look up Appendix A to see justification for this fact. Remark 1. The matrix T appearing in this fact is not unique. If the size of the square matrix M is n, there can be up to 2n − 1 of them. 2.2
Linear Finite State Machine
An n-bit linear finite state machine (LFSM), denoted by M, is a pair (Fn2 , M ), where M is an n × n matrix. The internal state of M is described by an n-bit vector v = (v0 , . . . , vn−1 ) ∈ Fn2 . The evolution of M over discrete time t ≥ 0 is described by a sequence of n-bit vectors v(0) , v(1) , . . . , satisfying v(t+1) = M v(t) .
(4)
Here, v(0) is the initial state. For t ≥ 0, we shall write (t)
(t)
(t)
v(t) = (v0 , v1 , . . . , vn−1 ). It is well known that if the characteristic polynomial of M is primitive over F2 , then each of the sequences (t) vi = (vi )t≥0 (5) has period 2n − 1 [7]. This is the maximum possible period obtainable for the state sequence of an LFSM. The most popular subclasses of the LFSM’s are CA’s and LFSR’s.
196
2.3
Jin Hong et al.
Cellular Automaton
A cellular automaton (CA) is an LFSM with a defining matrix M which is tridiagonal. If the upper and lower subdiagonal entries of M are all equal to 1, then it is called a 90/150 CA. Visually, the general matrix defining a 90/150 CA will be of the form ⎞ ⎛ 0 c0 1 0 0 · · · · · · ⎜ 1 c1 1 0 · · · · · · 0 ⎟ ⎟ ⎜ ⎜ 0 1 c2 1 0 ⎟ ⎟ ⎜ ⎜ .. .. ⎟ , (6) ⎟ ⎜. . ⎟ ⎜ ⎟ ⎜ 1 0 ⎟ ⎜ ⎝0 ··· 1 cn−2 1 ⎠ 0 ··· 0 1 cn−1 where each ci is either 0 or 1. We shall only consider 90/150 CA’s in this paper. The sequences obtained from such a CA satisfies the following relation. For each 0 ≤ i ≤ n − 1 and t ≥ 0, (t+1)
vi (t)
(t)
(t)
(t)
= vi−1 ⊕ ci vi ⊕ vi+1 ,
(t)
where we take v−1 = vn = 0. 2.4
Linear Feedback Shift Register
A linear feedback shift register (LFSR) corresponding to a monic polynomial p(x) given by (2) is an LFSM with the defining matrix set to the companion matrix of p(x) given by (3). So if we write the internal state of the LFSR at time (t) (t) (t) t ≥ 0 as v(t) = (v0 , v1 , . . . , vn−1 ), we have (t+1)
vi
(t)
= vi+1
for each 0 ≤ i ≤ n − 2, and (t+1)
(t)
(t)
(t)
vn−1 = a0 v0 ⊕ a1 v1 ⊕ · · · ⊕ an−1 vn−1 . Hence, register contents will be shifted to the left by one cell during the evolution process.
3
Reducing an LFSM to an LFSR
In this section, we shall see how the contents of the previous section relate to each other. We give a way to realize an LFSM, utilizing an LFSR and a linear map. It seems that the method we are going to give is known to the experts of this field. But we could not find any references, so it is explained here for completeness.
Vulnerability of Nonlinear Filter Generators
197
Let us be given an LFSM M defined by a matrix M . Denote the characteristic polynomial of M by p(x) and the companion matrix of p(x) by L. Notice that the matrix L defines an LFSR. We say that this LFSR is associated with the LFSM M. Suppose that p(x) is irreducible. Then we know from Fact 1 that there exists some invertible matrix T such that T M T −1 = L.
(7)
Now, recalling that the evolution of LFSM internal state is given by (4), if the initial state of the LFSM M was v, the state v(t) of the LFSM at time t ≥ 0 will be given by v(t) = M t v. Here, M t denotes M multiplied t times and not the transpose of M . Similarly, if the initial state of the LFSR defined by the matrix L was w, the state w(t) of the LFSR at time t ≥ 0 will be given by w(t) = Lt w. Now, if we had w = T v, using (7), we can easily check the following sequence of equalities. v(t) = M t v = (T −1 LT )t v = T −1 Lt T v = T −1 Lt w = T −1 w(t) .
(8)
This shows that an LFSM is intimately related to the LFSR defined by its characteristic polynomial. Proposition 1. The current internal state v(t) of an LFSM which starts at the initial state v(0) may be calculated using the internal state w(t) of the associated LFSR through the simple linear equation v(t) = T −1 w(t)
(9)
by initializing the LFSR with w(0) = T v(0) . So, even though an LFSM seems much more complicated than an LFSR, the two are only apart by a simple linear transformation. The only hypothesis on the LFSM we have used in this section is that its characteristic polynomial be irreducible. In most cryptographic applications of an LFSM, the characteristic polynomial will be taken to be primitive, in order to achieve maximal period, so this is not a very restricting assumption. Hence any cryptographic system that bases its safety on the complexity of an LFSM, compared to an LFSR, may not be as safe as it seems at first sight. Since CA’s are just a special type of LFSM’s the same can be said of systems using CA’s.
4
Security of Nonlinear Filter Models Utilizing a CA
In this section, we shall present a system which has tried to use a CA in place of an LFSR in order to remove some unwanted property of a stream cipher. We shall apply the theory of Section 3 to show that the attempt did not succeed in achieving its goal.
198
4.1
Jin Hong et al.
The NF-CA Model
In the paper [10], a memoryless synchronous stream cipher called the filtercombiner (FC) model was introduced. We shall not present the whole FC model in this paper, but use only a small part of the model in explaining one of the main arguments of that work. The referenced paper contains more than what is presented here. Let M = (Fn2 , M ) be a CA. We assume that the characteristic polynomial p(x) of the CA is primitive. It is known that each of the n sequences given by (5) are all exactly the same periodic sequence with only the starting points different. Hence they are relative shifts of each other. We apply a nonlinear filter f with good properties, for example, high resiliency and nonlinearity, on the cells of the CA to obtain a stream cipher. The system is to satisfy the following loosely stated constraints. We refer the reader to the original paper [10] for exact statements. 1. The number r of cells used as inputs to f is small relative to the size n of the CA. (r ≤ log2 n) 2. The starting points of the sequences obtained from the cells used as inputs to f is (almost) evenly distributed within the common periodic sequence. 3. The number of bits encrypted using the system does not come close to 2n /r. We shall call this reduced model by the name NF-CA, a nonlinear filter model utilizing a CA. The paper claims that under these constraints the NF-CA is resistant to Anderson information leakage [1]. Anderson information leakage is an observation on the nonlinear filter model (a stream cipher that applies a nonlinear filter on an LFSR) that allows one to gather information on the initial state of the LFSR from the key stream. More explanation is given in the next subsection. The author of [10] believed that Anderson information leakage was fundamentally due to using the same bit more than once as input to the nonlinear filter in obtaining the key stream. This reasoning was also somewhat vaguely stated in the paper [1]. Hence the main objective behind the above constrains was to remove the possibility of any part of the periodic sequence being used more than once. 4.2
Anderson Information Leakage
Consider the filter model of stream ciphers. This is a stream cipher that uses cell states of an LFSR as inputs to a nonlinear filter in obtaining a key stream. We shall write this model as NF-LFSR for short. Suppose we use k consecutive cells of the LFSR as inputs to the nonlinear filter f . Let us take the convention, as given in Section 2.4, that the contents of the register are being shifted to the left at each step. If we fix the contents of k cells used as inputs to f , we can calculate one bit of output from the NF-LFSR. Similarly, if we fix contents of the k cells and also (k − 1) more cells that lie immediately to their right, we can calculate k output bits from the NF-LFSR.
Vulnerability of Nonlinear Filter Generators
199
Now, suppose we classify all possible (2k − 1)-bit states according to the kbit output key stream it will give. In the ideal case, each class will contain exactly 2k−1 elements. That this distribution usually is not uniform was investigated by Anderson [1] to show that much information about the state of the LFSR may be obtained from the output key stream. He gives an explicit example using a 2-resilient nonlinear filter that uses 5 variables. The above mentioned table is constructed to show that it is indeed irregular. To show that actually useful information may be found, he lists all possible 9-bit initial states that can give the 5-bit output stream 11010. 001010101 001110001
001110010 100110001
100110010 101001011
101110001 101110010
110110001 110110010
If we look closely at these values, we see that there is only a single 0 among all the 5th bits, and a single 1 among both 6th and 7th. In other words, if the key stream we obtain is 11010, then at the starting point of this key stream, the state of the 5th cell of the LFSR would have been 1 with probability 0.9. Likewise, state of the 6th and 7th bit would have been 0 with probability 0.9. Irregularity in the distribution of initial states classified according to output stream blocks contains potential for the NF-LFSR giving out information on the initial LFSR state. We remark that some further developments of Anderson’s idea appear in [6, 2, 4]. 4.3
Information Leakage of the NF-CA
Let the initial state of the NF-CA, or equivalently, that of the CA M = (Fn2 , M ) be denoted by v(0) . We shall add dummy variables to the nonlinear filter f and view it as defined on the whole CA. Then the t-th output key stream bit ct of the NF-CA will be given by (10) ct = f (v(t) ). We may follow through the arguments of Section 3 in constructing an associated LFSR and finding a linear transformation T satisfying (7). And by applying (9) to the above equation, we may write ct = f ◦ T −1 (w(t) ),
(11)
where we have taken the initial state of the associated LFSR to be w(0) = T v(0) . Notice that since T −1 is a simple linear map, we may view the map g = f ◦ T −1 as just another normal nonlinear filter. That is, we have ct = g(w(t) ).
(12)
We see that the right hand side is now the output of a normal NF-LFSR. Proposition 2. The NF-CA which uses nonlinear filter f on a CA initialized to v(0) may be realized as an NF-LFSR. This is done by applying the nonlinear filter g = f ◦ T −1 to the associated LFSR and initializing it to w(0) = T v(0) .
200
Jin Hong et al.
Now, we do not yet have any criterion for measuring an NF-LFSR’s resistance to Anderson information leakage. And, as stated in Anderson’s paper [1], random NF-LFSR’s tend to leak a lot of information. Hence there is a non-dismissible chance of (12) and hence (10) representing a stream cipher which is not resistant to Anderson information leakage. Remark 2. Anderson information leakage does not seem to be applicable to the nonlinear combiner model. Hence Anderson information leakage is probably not applicable to the FC model of [10]. But, once again, this is due to the use of combiner part of the FC model rather than from the three constraints of Section 4.1.
5
Explicit Example of Leaking NF-CA
We have constructed a small but concrete example to verify that it is possible for an NF-CA to satisfy all three of the constraints introduced in Section 4.1 and still show Anderson information leakage. 5.1
CA and Its Relation to an LFSR
Consider the 90/150 CA represented by a matrix M of the form (6) with the diagonal entries given by (c0 , c1 , . . . , c22 ) = (1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0). For clarity, we have written down the explicit matrix M in Appendix B. Characteristic polynomial p(x) of M is 1 + x2 + x4 + x7 + x9 + x10 + x11 + x13 + x14 + x15 + x17 + x19 + x20 + x22 + x23 . This is a primitive polynomial so that each of the 23 cells of the CA gives a sequence of period (223 − 1). Let L be the companion matrix of p(x). To define T , we let T1 = (1, 0, 0, . . . , 0) be its top row and recursively fix the i-th row Ti by setting (13) Ti = M Ti−1 for 1 < i ≤ 23. The actual matrix T may be found in Appendix C. Checking T M T −1 = L
(14)
is easy. We remark that the such a T is not unique. Any invertible matrix obtained through the process (13) starting with an arbitrary nonzero vector will satisfy (14). For the above T , its inverse, T −1 , is given in Appendix D.
Vulnerability of Nonlinear Filter Generators
5.2
201
Shifts between CA Cells and the Nonlinear Filter (t)
Let mi be the sequences generated by the i-th cell of the CA for 1 ≤ i ≤ 23. (t) (t) Then the relative shifts between m1 and mi are 0 1988170 8388605 5964510 4125305 3763873 6190462 6778815 · · · .
These have been calculated using a program implementing [11]. From this, one can check that the relative shifts between the four sequences obtained from the 2nd, 3rd, 5th, and 7th cells are quite close to 221 or 222 . 221.03 2nd
220.92
5th
221.98
222.00
3rd
220.98
7th 221.07
(t)
(t)
For example, the shift between m3 and m2 is 1988170 − 8388605 = −6400435 ≡ 1988172 (mod (223 − 1)) 220.92301··· . Hence, if we apply the nonlinear filter given by f = m2 ⊕ m3 ⊕ (m5 · m7 )
(15)
on the CA, Rule 2 of Section 4.1 is satisfied. It is easily checked that f is a 1resilient function, so we are not using a very bad filter. That Rule 1 is also satisfied is easily checked by calculating log2 23 = 4.52356 · · · ≥ 4. To satisfy Rule 3, we just need to use less than 221 bits from the NF-CA we shall create. This should not be a problem since we shall be using less than 20 key stream bits. 5.3
Equivalent NF-LFSR
We may recall from equations (10) and (11) that f (v(t) ) = (f ◦ T −1 )(w(t) ).
202
Jin Hong et al. Table 1. The number of possible input states for each 7-bit output Output # Output # Output # Output # Output # Output # Output # Output #
00 38 08 65 10 79 18 49 20 73 28 79 30 65 38 63
01 51 09 29 11 63 19 73 21 93 29 83 31 49 39 71
02 61 0A 67 12 45 1A 43 22 51 2A 77 32 99 3A 69
03 73 0B 63 13 37 1B 59 23 71 2B 49 33 75 3B 85
04 51 0C 69 14 87 1C 73 24 61 2C 75 34 57 3C 39
05 75 0D 69 15 67 1D 85 25 69 2D 43 35 45 3D 59
06 73 0E 87 16 77 1E 59 26 39 2E 57 36 67 3E 53
07 89 0F 63 17 57 1F 71 27 55 2F 49 37 55 3F 73
Output # Output # Output # Output # Output # Output # Output # Output #
40 51 48 77 50 87 58 65 60 61 68 67 70 57 78 47
41 83 49 53 51 59 59 101 61 61 69 59 71 53 79 43
42 65 4A 87 52 85 5A 59 62 47 6A 57 72 59 7A 53
43 89 4B 71 53 57 5B 63 63 55 6B 41 73 55 7B 81
44 43 4C 53 54 75 5C 61 64 69 6C 91 74 69 7C 51
45 55 4D 33 55 59 5D 69 65 89 6D 79 75 53 7D 75
46 65 4E 71 56 41 5E 39 66 47 6E 73 76 103 7E 73
47 61 4F 67 57 49 5F 55 67 83 6F 45 77 63 7F 89
And in terms of the state (l1 , l2 , . . . , l23 ) of the LFSR, the nonlinear filter f translates into the nonlinear filter g = (f ◦ T −1 ) = (l1 ⊕ l2 ⊕ l3 ) ⊕ (l2 ⊕ l4 ⊕ l5 ) · (l1 ⊕ l2 ⊕ l3 ⊕ l5 ⊕ l7 ). We have used the 2nd, 3rd, 5th, and 7th rows of the explicitly calculated T −1 given in Appendix D. Notice that the 7th bit from the LFSR is the rightmost bit used to find states of the 4 CA cells we have chosen to use for the NF-CA. 5.4
Information Leakage
Since the 7th bit of LFSR remains in effect until we obtain 7 bits of the output stream, we trace the 7-bit output stream by running the obtained NF-LFSR on all possible 13-bit states of the leftmost part of the LFSR. Input count for each output is given in Table 1. The 7-bit output stream is represented in hexadecimal notation and # denotes the corresponding possible input state count. Leftmost bit of the 7 bits in hexadecimal notation(exclude the leading 0 from the 8 bits) is the first output bit. In the ideal case, all the counts should be equal to (or, at least near) 26 = 64. But as we see in Table 1, this is not the case. It is quite irregular. Number as big as 103 appears and number as small as 29 also appears. So this shows the potential of this structure leaking information. For example, let us look at the following list of all 13-bit input states that give the 7-bit output stream 0x09 = 0001001. As given by the table, there are 29 such states.
Vulnerability of Nonlinear Filter Generators 1101000000100 0011000000100 1011000000100 0111000000100 0110101110100 0110100011100 0110100101010 1101000000110
0011000000110 1011000000110 0111000000110 0110101110110 0110100011110 0000011111110 0000011100001 0000000110001
0110101110001 0110101110101 0110100011101 0110101110011 1101000001011 0011000001011 1011000001011 0111000001011
203
0110010001011 1110010001011 0110100101011 0110101110111 0110100011111
We find that, with probability 28/29, only one of the bits 4, 5, and 6 is equal to 1. In particular, sum of the three bits is equal to 1 with probability 28/29. Anderson information leakage theory is applicable to this structure. Therefore, applying a nonlinear filter having good cryptographic properties to a CA and using cells of large relative shifts, as suggested in [10], does not necessarily prevent Anderson information leakage.
6
Checking the Vulnerability of a Given NF-LFSM
In this section, we shall give a general method for checking whether or not a given NF-LFSM allows reduction to a vulnerable NF-LFSR. Since CA and LFSR are subclasses of LFSM, our method applies even to NF-LFSR. That is, we can check whether the nonlinear filter to an NF-LFSR may be rewritten in a form that shows information leakage. Let us be given an NF-LFSM defined by a matrix M of size n and a nonlinear filter f . Denote by L the companion matrix of M and write Z(L) = {Z ∈ GL(n) | ZL = LZ} for the centralizer of L. For a given companion matrix L, it is easy to write down Z(L) more explicitly. The following lemma may be proved through a straightforward application of ZL = LZ. Lemma 1. For the companion matrix L given by (3), the centralizer Z(L) consists of elements Z = (zi,j )n−1 i,j=0 ∈ GL(n) satisfying zi+1,0 = a0 zi,n−1 , zi+1,1 = a1 zi,n−1 ⊕ zi,0 , zi+1,2 = a2 zi,n−1 ⊕ zi,1 , .. . zi+1,n−1 = an−1 zi,n−1 ⊕ zi,n−2 for all 0 ≤ i < n − 1.
204
Jin Hong et al.
The important implication of this lemma is that every entry of a Z ∈ Z(L) may be written as a linear sum of the terms belonging to its first row in a uniform way. For example, ⎧⎛ ⎫ ⎞ ⎛ ⎞ x y z ⎬ 0 1 0 ⎨ ⎠ ∈ GL(3) . x ⊕ bz y ⊕ cz Z ⎝0 0 1⎠ = ⎝ az ⎩ ⎭ ay ⊕ acz az ⊕ by ⊕ bcz x ⊕ bz ⊕ cy ⊕ cz a b c Now, fix any matrix T¯ satisfying T¯M T¯ −1 = L. It is an easy exercise in linear algebra to show that the set of all T satisfying (7) is given by Z(L)T¯ := {Z T¯ | Z ∈ Z(L)}. And since Z ∈ Z(L) if and only if Z −1 ∈ Z(L), we have the following proposition. Proposition 3. The set of all T −1 satisfying (7) is given by T¯ −1 Z(L) := {T¯ −1 Z | Z ∈ Z(L)}. We are now ready to give a general method for checking whether or not a given NF-LFSM allows reduction to a vulnerable NF-LFSR. If a nonlinear filter applied to an LFSR uses a small number of variables and if those variables correspond to LFSR cells that are close to each other, then such an NF-LFSR is vulnerable to Anderson information leakage. Otherwise, the NF-LFSR is highly immune to information leakage. Hence, for a given NF-LFSM, it suffices to check the possibility of finding a T , for which the filter g = f ◦ T −1 given by Proposition 2 uses variables from a small clustered set. Decide on a (small) number s < n. If it is possible to choose T so that all variables used by g falls within some s consecutive LFSR cells, we shall conclude that there is a high probability that the NF-LFSM yields to Anderson information leakage. Otherwise we shall presume that the NF-LFSM does not leak information. Procedure for Checking Vulnerability. 1. From the matrix M of size n, defining the LFSM, calculate its characteristic polynomial and the associated LFSR L. 2. Fix any matrix T¯ satisfying T¯M T¯ −1 = L. Using the idea of (13) is one way to do this. 3. Write Z(L) in the form given by Lemma 1, so that all entries of lower rows are expressed as linear combinations of the first row terms. We shall denote the first row terms by x0 , . . . , xn−1 . 4. Recalling Proposition 3, multiply T¯ −1 to the general element of Z(L) obtained in the previous step to express the general T −1 . All entries of the general T −1 will again be linear combinations of xj . . Let r be the number of rows used by f . Note that from the general expression of T −1 , which is an n × n array of linear sums over xj , only the r rows that that correspond to variables used by the nonlinear filter f will have any significance.
Vulnerability of Nonlinear Filter Generators
205
5. Remove all rows not corresponding to variables used by f . . Now, suppose that for some explicit nontrivial values of the variables xj , all the remaining entries evaluate to zero, except for those contained in the first s columns. Then g = f ◦ T −1 would used only s variables for the T −1 evaluated at the explicit values. 6. Temporarily remove the first s columns from the remaining array of T −1 entries. 7. Check whether setting all remaining entries to zero yields a nontrivial solution. 8. If a nontrivial solution is found, conclude that the NF-LFSM allows reduction to a vulnerable NF-LFSR. 9. Otherwise, bring back the array of linear sums obtained after Step 5. 10. Unless we’ve tried all possible consecutive s columns, (temporarily) remove the next set of s consecutive columns and go back to Step 7. 11. If no nontrivial solution is found, conclude that the NF-LFSM resists Anderson information leakage. Notice that at Step 7, we have a set of r ×(n−s) equations in n variables. For most interesting values of r and s, the number of equations would be larger than the number of variables. But our (small number of) testings show that nontrivial solutions do occur from time to time. We close this section by adding that the complexity of this process can easily be seen to be of polynomial order in n.
7
Conclusion
We have seen that an LFSM (or a CA) is intimately connected to an LFSR by the simple relation (9). This structure allows one to realize an LFSM using an LFSR and a linear transformation. Since an LFSR is much simpler than an LFSM, this will have implications on the security of any system that (iteratively) uses an LFSM as one of its building blocks, and has used it assuming that it is more complex than an LFSR. An example of such an attempt has been examined in this paper. Using a CA in place of an LFSR in an attempt to remove Anderson information leakage from a nonlinear filter model has failed. We have also given a general method for checking whether or not a given NF-LFSM allows reduction to an NF-LFSR which is vulnerable to information leakage.
References [1] Ross Anderson, Searching for the optimum correlation attack. Proceedings of FSE, LNCS 1008, pp. 137–143, Springer-Verlag, 1995. 193, 198, 199, 200 [2] Jovan Dj. Goli´c, On the security of nonlinear filter generators. Fast Software Encryption, LNCS 1039, pp. 173–188, Springer-Verlag, 1996. 199
206
Jin Hong et al.
[3] Jovan Dj. Golic, Correlation via linear sequential circuit approximation of combiners with memory. Advances in Cryptology - EUROCRYPT’92, LNCS 658, pp. 113123, Springer-Verlag. 193 [4] Jovan Dj. Golic, Andrew Clark, and Ed Dawson, Generalized inversion attack on nonlinear filter generators. IEEE Transations on Computers, vol. 49 (10), pp. 1100–1109, October, 2000. 199 [5] Thomas W. Hungerford, Algebra. GTM 73, Springer-Verlag, 1980. 206 [6] Sangjin Lee, Seongtaek Chee, Sangjoon Park, and Sungmo Park, Conditional correlation attack on nonlinear filter generators. Advances in Cryptology - ASIACRYPT’96, LNCS 1163, pp. 360–367, Springer-Verlag, 1996. 199 [7] R. Lidl and H. Niederreiter. Introduction to finite fields and their applications. Cambridge University Press, 1994. 195 [8] Miodrag Mihaljevi´c, Yuliang Zheng, and Hideki Imai, A Family of Fast Dedicated One-Way Hash Functions Based on Linear Cellular Automata over GF(q), IEICE Trans. Fundamentals, vol.E82-A(a), pp. 1–8, January 1999. 194 [9] W. Meier and O. Staffelbach, Fast correlation attacks on certain stream ciphers. Journal of Cryptology, vol.1, pp. 159–176, 1989. 193 [10] Palash Sarkar, The filter-combiner model for memoryless synchronous stream ciphers. Advances in Cryptology - CRYPTO 2002, LNCS 2442, pp. 533–548, Springer-Verlag, 2002. 193, 194, 198, 200, 203 [11] Palash Sarkar, Computing shifts in 90/150 cellular automata sequences. Finite Fields and their Applications, vol. 9 (2), pp. 175–186, April 2003. 201 [12] Palash Sarkar, Brief History of Cellular Automata, ACM Computing Serveys, vol. 32 (1), pp. 80–107, March 2000. 194 [13] Palash Sarkar, Hiji-bij-bij: A New Stream Cipher with a Self-Synchronizing Mode of Operation, ICAR e-print 2003-014, 2003. http://eprint.iacr.org. 194
A
Explanation for Fact 1
Let us denote by E, a vector space over a field K. We have gathered some wellknown facts from linear algebra in the following theorem. Basic definitions and proofs may be found in standard textbooks (for example, [5]). Theorem 1. Let φ : E → E be a linear transformation. Then the following statements hold. 1. There exists monic polynomials of positive degree p1 , . . . , pt ∈ K[x] and φcyclic subspaces E1 , . . . , Et of E such that E = E1 ⊕· · ·⊕Et and p1 |p2 | · · · |pt . 2. The sequence p1 , . . . , pt is uniquely determined by E and φ. (This is called the invariant factors of φ.) 3. If E is a φ-cyclic space and φ has minimal polynomial p(x) of degree r, then dimK E = r and there exists an ordered basis of E relative to which the matrix of φ is the companion matrix of p(x). 4. The characteristic polynomial of φ is the product of its invariant factors. From these statements, we may easily obtain the following corollary, stated in terms of matrices.
Vulnerability of Nonlinear Filter Generators
207
Corollary 1. Let A be an n × n matrix with entries in the field K. If the characteristic polynomial of A is irreducible, then the matrix A is similar to the companion matrix of the characteristic polynomial. To make it easier for those without a mathematical background and to make everything explicit, we shall explain this corollary, giving out some basic definitions. The companion matrix of a monic polynomial p(x) = a0 + a1 x + · · · + an−1 xn−1 + xn ∈ K[x] is usually defined to be the matrix ⎞ ⎛ 0 0 0 · · · 0 −a0 ⎜1 0 0 · · · 0 −a1 ⎟ ⎟ ⎜ ⎜0 1 0 · · · 0 −a2 ⎟ ⎟ ⎜ ⎜ .. .. .. ⎟ . ⎜. . . ⎟ ⎟ ⎜ ⎜. . .. ⎟ ⎝ .. .. 0 . ⎠ 0 0 · · · 0 1 −an−1 We can see that the form given by (3) is the transpose of this one, if we take into consideration the fact that we were dealing with the binary field there. Now, let p(x) be the characteristic polynomial of a square matrix A and let B be the companion matrix of p(x). Corollary 1 states that if p(x) is irreducible, then there exists some invertible matrix C satisfying CA C −1 = B. Notice that we may take the transpose of both sides to obtain (C T )−1 AT C T = B T . It is clear that the characteristic polynomial of A is equal to that of AT . Hence, Fact 1 follows.
B
Matrix M Defining the CA 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
208
C
Jin Hong et al.
Matrix T 1 1 0 0 1 1 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1
D
0 0 1 1 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0
0 0 0 1 1 0 1 1 1 0 0 0 0 1 0 0 0 0 1 1 1 1 0
0 0 0 0 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 1 0 0 1
0 0 0 0 0 1 0 0 1 1 0 0 0 1 1 1 1 1 0 1 1 0 1
0 0 0 0 0 0 1 0 1 1 1 0 1 0 0 1 1 0 0 1 1 1 0
0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 1 1 0
0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1 0 1 0 0 1
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 1 0 0 0 1 0 1
0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Matrix T −1 1 1 0 1 0 1 1 1 1 0 1 1 0 1 0 1 1 0 1 0 1 1 1
E
0 1 0 1 0 0 1 0 0 1 1 1 1 1 1 0 1 0 1 0 0 1 1
0 1 0 1 1 1 1 0 0 1 1 1 1 0 0 0 1 0 1 1 1 1 0
0 0 1 1 0 0 1 1 1 0 0 1 0 0 0 0 0 1 1 0 0 1 1
0 0 0 1 1 1 0 0 1 0 1 1 1 0 1 0 1 1 1 0 1 1 0
0 0 0 0 1 1 1 1 1 1 0 0 1 0 1 1 0 0 1 1 1 1 0
0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 1 0 0 1 0 0
0 0 0 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0 1 0 1 0 1
0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 0 1 1 0 1 1 0 1
Remarks on Further Developments
Here, we have gathered some remarks on what other implications the basic idea of this paper might have. Remark 3. The arguments of this paper need not be constrained to the binary field. An LFSM which uses cells representing elements of any finite field can be realized using an LFSR over the same finite field. Remark 4. One can deduce from Fact 1 that any two square matrices with a common irreducible characteristic polynomial are related by an invertible matrix. So even though we have focused on the relationship between an LFSM and an LFSR, the arguments of this paper can be applied in realizing an LFSM using a different (and maybe simpler) LFSM.
Vulnerability of Nonlinear Filter Generators
209
Remark 5. Those familiar with linear algebra will know that it is easy to extend the idea of this paper to the case when the characteristic polynomial of an LFSM is not irreducible. In such a case, the resulting realization will use several LFSR’s whose sizes add up to the size of the original LFSM. Remark 6. One can view the idea of this paper from a different direction and use it in realizing an LFSR with an LFSM. In this case, we can exploit our freedom over the choice of transition matrixes. So, for example, applying it to an NF-LFSR, one can turn any nonlinear filter of high resiliency into a filter having correlation of degree one. Remark 7. In a way, Anderson information leakage is fundamentally due to the fact that inputs to different variables of the filter is supplied by sequences that are just small shifts of each other. This paper shows that as long as LFSM’s are used, this is unavoidable. So it might be a good idea to look at nonlinear feedback shift registers now.
VMPC One-Way Function and Stream Cipher Bartosz Zoltak [email protected] http://www.vmpcfunction.com
Abstract. A simple one-way function along with its proposed application in symmetric cryptography is described. The function is computable with three elementary operations on permutations per byte. Inverting the function, using the most efficient method known to the author, is estimated to require an average computational effort of about 2260 operations. The proposed stream cipher based on the function was designed to be efficient in software implementations and, in particular, to eliminate the known weaknesses of the alleged RC4 keystream generator while retaining most of its speed and simplicity. Keywords: one-way function; stream cipher; cryptanalysis; RC4; lower bound
1
Introduction
A simple transformation of permutations appearing to be hard to invert is described together with its proposed practical application in a software-efficient symmetric encryption algorithm. The transformation, here termed “VMPC” function as an abbreviation of “Variably Modified Permutation Composition”, is a combination of elementary operations on permutations and integers. The simplest variant of the function can be coded with three basic “MOV” instructions from the Intel 80x86 processor instruction set per byte. When applied on 256-element permutations (a comfortable size in practical cryptographic applications), the function requires an estimated average of 2260 computational operations to be inverted using the most efficient method known to the author. The very low computational cost required to obtain practical one-way property makes the function a plausible candidate for cryptographic applications. The simplicity of the function could also raise a question whether it might be possible to prove a lower bound on the complexity of inverting it. This currently is an open problem and a possible subject of future research. A proposition of an encryption algorithm constructed as a stream cipher based on the VMPC function is described in sections 8-14. The cipher was designed to be both efficient in software implementations and to resist the known attacks against this kind of algorithms (like the alleged RC4 keystream generator) – in particular against attacks distinguishing the keystream from a truly random source and attacks against the cipher’s Key Scheduling Algorithm (KSA). B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 210–225, 2004. c International Association for Cryptologic Research 2004
VMPC One-Way Function and Stream Cipher
2
211
Definition of the VMPC Function
Notation: n, P, Q : P, Q : n-element permutations. For simplicity of further references P and Q are assumed to be one-to-one mappings A → A; A = {0, 1, . . . , n − 1} k : Level of the function; k < n + : addition modulo n Definition 1. A k-level VMPC function, referred to as VMPCk , is such transformation of P into Q, where Q[x] = P [Pk [Pk−1 [. . . [P1 [P [x]]] . . .]]], x ∈ {0, 1, . . . , n − 1}, Pi is an n-element permutation such that Pi [x] = fi (P [x]), where fi is any function such that Pi [x] = P [x] = Pj [x] for i ∈ {1, 2, ..., k}, j ∈ {1, 2, ..., k}, i = j. For simplicity of further references fi is assumed to be fi (x) = x + i Example: Q = VMPC1 (P ) : Q[x] = P [P [P [x]] + 1]
3
Difficulty of Inverting the VMPC Function
n-element permutation P has to be recovered from the n-element permutation Q, where Q = VMPCk (P ). By definition each element of Q is formed by k + 2, usually different, elements of P . One element of Q can be formed by many possible configurations of elements of P (e.g. for Q = VMPC1 (P ) : Q[X] = Y can be formed by P [X] = a, P [a] = b, P [b + 1] = Y for any reasonable combination of values of a and b). All possible configurations are equally likely to be correct. If any of them is chosen, it needs to be verified with all of those elements of Q which use any of the elements of P included in the picked configuration. Each element of P is usually used to form k + 2 different elements of Q thus (k + 2)× (k + 1) new elements of Q usually need to be inverted (all k + 2 elements of P used to form each of these elements of Q need to be revealed) to verify the elements of P from the picked configuration. Because the cycle structure of P is corrupted by the addition operation(s) it is usually impossible to find two different elements of Q, which share at least k + 1 elements of P . Instead only such element of Q can usually be found, name it Q[x], which shares only one of the k + 2 elements of P with another element of Q. This forces k elements of P used to form Q[x] to be guessed to invert Q[x]. However at each new guessed element of P there usually occur k + 1 new elements of Q which use this element of P and which need to be inverted to verify the guess. The algorithm falls into a loop, where at every step usually k new elements of P need to be guessed to enable continuation of verification of the previously
212
Bartosz Zoltak Table 1. Assembler implementation of 1-level VMPC function Instruction MOV AL, [P m] + EAX MOV AL, [P m] + EAX MOV AL, [P m] + EAX+1
Description Store (EAX=AL)-th element of P m in AL Store (EAX=AL)-th element of P m in AL Store ((EAX=AL)+1)-th element of P m in AL
guessed elements. In consequence the k +2 elements of P picked at the beginning of the process indirectly depend on all n elements of Q. The described scenario is the case usually. In some circumstances the verification process can be simplified by benefiting from coincidences (where for example it is possible to find two elements of Q, which share more than one element of P (e.g. for Q = VMPC1 (P ) : Q[2] = 3 : P [2] = 4, P [4] = 8, P [9] = 3 and Q[1] = 8 : P [1] = 9, P [9] = 3, P [4] = 8)). A proposed algorithm for inverting the VMPC function (Section 6) was optimized to benefit from the possible coincidences. The average number of elements of P which need to be guessed for n = 256 was reduced to about 34 for 1-level VMPC function, to about 57 for 2-level VMPC, to about 77 for 3-level VMPC and to about 92 for 4-level VMPC function. Searching through half of the possible states of these elements of P requires on average about 2260 steps for 1-level VMPC function, about 2420 for 2-level VMPC, about 2550 for 3-level VMPC and about 2660 steps for 4-level VMPC function.
4
A 3-instruction Implementation of the VMPC Function
Implementation of 1-level VMPC function, where Q[x] = P [P [P [x]] + 1], for 256-element permutations P and Q in assembly language is described. Assume that: – P m is a 257-byte array indexed by numbers from 0 to 256. The P permutation is stored in the P m array at indexes from 0 to 255 (P m[0...255] = P ) and P m[256] = P m[0] – the EAX 32-bit register specifies which element of the Q permutation to compute (“AL” always denotes 8 least significant bits of EAX, here EAX=AL) The 3 MOV instructions in Table 1 store the EAX-th element of permutation Q, where Q=VMPC1 (P ), in the AL (and EAX) register.
5
Example Values of the VMPC Function
Values of 1,2,3 and 4-level VMPC function of an example 10-element permutation P are given in Table 2:
VMPC One-Way Function and Stream Cipher
213
Table 2. Example values of the VMPC function index P Q1 =VMPC1 (P ) Q2 =VMPC2 (P ) Q3 =VMPC3 (P ) Q4 =VMPC4 (P )
6
0 2 9 0 3 8
1 0 3 9 4 5
2 4 8 2 9 3
3 3 6 5 5 1
4 6 5 8 0 6
5 9 4 7 2 7
6 7 1 3 7 0
7 8 7 1 6 2
8 5 2 6 1 9
9 1 0 4 8 4
Proposed Algorithm for Inverting the VMPC Function
The algorithm outputs an n-element permutation P satisfying Q = VMPCk (P ). Notation: P : n-element table the searched permutation will be stored in X, Y : temporary variables Argument, V alue, Base, P arameter of an element of P : For an element P [x] = y : x is termed the Argument; x can be either the Base or the P arameter. y is termed the V alue; y is the P arameter or the Base respectively. Example: For an element P [3] = 5: If Argument 3 is the P arameter, V alue 5 is the Base. 1.1) Reveal one element of P by assuming P [X] = Y , where X and Y are any values from {0, 1, ..., n − 1} 1.2) Choose at random whether X is the Base and Y the P arameter or Y the Base and X the P arameter of the element P [X] = Y . Denote P [X] = Y as the Current element of P . 2) Reveal all possible elements of P by running the Deducing Process (see Sect. 6.1) 3) If n elements of P are revealed with no contradiction: Terminate the algorithm and output P 4) If fewer than n elements of P are revealed with no contradiction: 4.1) Reveal a new element of P by running the Selecting Process (see Sect. 6.2). Denote the revealed element as the Current element of P . 4.2) Save the P arameter of the Current element of P 4.3) Go to step 2 5) If a contradiction occurred in step 2: 5.1) Remove all elements of P revealed in step 2 when the Current element of P had been revealed 5.2) Increment modulo n the P arameter of the Current element of P 5.3) If the P arameter of the Current element of P returned to the value saved in step 4.2:
214
Bartosz Zoltak
5.3.1) Remove the Current element of P 5.3.2) Denote the element, which had been the Current element of P directly before the element removed in step 5.3.1 became the Current one, as the Current element of P 5.3.3) Go to step 5.1 6) Go to step 2 6.1
The Deducing Process
The Deducing Process reveals all possible elements of P given Q and given the already revealed elements of P . Notation as in Section 6, with: C, A : temporary variables Statement y : A sequence of k + 2 elements of P used to compute Q[y] W ord x of Statement y : The x-th element of the sequence of k + 2 elements of P used to compute Q[y] Example: For Q = VMPC2 (P ) : Q[x] = P [P [P [P [x]] + 1] + 2]: Assume that P [2] = 3, P [3] = 5, P [6] = 2, P [4] = 7, which forms Q[2] = 7. The elements P [2] = 3, P [3] = 5, P [6] = 2, P [4] = 7 form Statement 2. The element P [2] = 3 is W ord 1 of Statement 2; P [3] = 5 is W ord 2 of Statement 2, etc. 1.1) Set C to 0 1.2) Set A to 0 2) If the element P [A] is revealed: 2.1) If the element P [A] and k other revealed elements of P fit a general pattern of k + 1 W ords of any Statement: Deduce the remaining W ord of that Statement (see Example 6.1.1) 2.2) If the deduced W ord is not a revealed element of P : 2.2.1) Reveal the deduced W ord as an element of P 2.2.2) Set C to 1 2.3) If the deduced W ord contradicts any already revealed element of P (see Example 6.1.2): Output a contradiction and terminate the Deducing Process 3.1) Increment A 3.2) If A is lower than n: Go to step 2 3.3) If C is equal 1: Go to step 1.1 Example 6.1.1. For Q = VMPC2 (P ) : Q[x] = P [P [P [P [x]] + 1] + 2]: Assume that Q[0] = 9 and that the following elements of P are revealed: P [0] = 1, P [1] = 3, P [8] = 9. W ord 3 of Statement 0 can be deduced as P [4] = 6 (P [3 + 1] = 8 − 2)
VMPC One-Way Function and Stream Cipher
215
Example 6.1.2. For Q = VMPC2 (P ) : Q[x] = P [P [P [P [x]] + 1] + 2]: Assume that Q[7] = 2 and that the following elements of P are revealed: P [1] = 8, P [9] = 3, P [5] = 2 and P [6] = 1. W ord 1 of Statement 7, deduced as P [7] = 1, contradicts the already revealed element P [6] = 1 6.2
The Selecting Process (for k not higher than 4)
The Selecting Process selects such new element of P to be revealed which is expected to maximize the number of elements of P possible to deduce in further steps of the inverting algorithm. The Selecting Process outputs a selected Base and a randomly chosen P arameter of a new element of P . Notation as in Section 6.1, with: B, R : temporary variables T a, T v : temporary tables W eight : table of numbers: W eight[0; 1; 2; 3; 4] = (0; 2; 5; 9; 14). Example: W eight[3] = 9 1.1) Set T a and T v to 0 1.2) Set B to 0 1.3) Set R to 1 2) Count the number of revealed elements of P which fit a general pattern of W ords of a Statement in which an unrevealed element of P with Argument B would be W ord R. Increment T a[B] by W eight of this number (see Example 6.2.1) 3) Count the number of revealed elements of P which fit a general pattern of W ords of a Statement in which an unrevealed element of P with V alue B would be W ord R. Increment T v[B] by W eight of this number 4.1) Increment R 4.2) If R is lower than k + 3: Go to step 2 4.3) Increment B 4.4) If B is lower than n: Go to step 1.3 5.1) Pick any index of T a or T v for which the number stored in tables T a and T v is maximal 5.2) If the index picked in step 5.1 is an index of T a: 5.2.1) Store this index in variable X 5.2.2) Generate a random number Y ∈ {0, 1, . . . , n − 1}, such that an element of P with V alue Y is not revealed 5.2.3) Output P [X] = Y , where X is the Base and Y is the P arameter 5.3) If the index picked in step 5.1 is an index of T v: 5.3.1) Store this index in variable Y 5.3.2) Generate a random number X ∈ {0, 1, . . . , n − 1},
216
Bartosz Zoltak
such that an element of P with Argument X is not revealed 5.3.3) Output P [X] = Y , where Y is the Base and X is the P arameter Example 6.2.1. For Q = VMPC2 (P ) : Q[x] = P [P [P [P [x]] + 1] + 2]: Assume that B = 8, R = 2, Q[6] = 9 and that P [6] = 8 and P [2] = 9 are revealed. There are two revealed elements of P which fit a general pattern of W ords of a Statement in which P [8] would be W ord 2: P [6] = 8 and P [2] = 9: W ord 1 W ord 2 W ord 3 W ord 4 P [6] = 8 P [8] = y P [y + 1] = 0 P [2] = 9 (T a[8] = T a[8] + W eight[2] = T a[8] + 5)
7
Example Complexities of Inverting the VMPC Function
Complexity of inverting the VMPC function (Table 3) was approximated as an average number of times the Deducing Process (step 2) needs to be run by the inverting algorithm described in Section 6 until permutation P satisfies Q = VMPCk (P ). Average numbers of elements of P which need to be assumed are given in Table 3 in brackets. Complexities of inverting the VMPC function of the following levels were approximated: Q= Q= Q= Q=
VMPC1 (P ) : Q[x] = P [P [P [x]] + 1] VMPC2 (P ) : Q[x] = P [P [P [P [x]] + 1] + 2] VMPC3 (P ) : Q[x] = P [P [P [P [P [x]] + 1] + 2] + 3] VMPC4 (P ) : Q[x] = P [P [P [P [P [P [x]] + 1] + 2] + 3] + 4]
Example: For 1-level VMPC function applied on 256-element permutations: on average about 34 elements of P need to be assumed. Searching through half of the possible states of these elements requires about 2260 calls to the Deducing Process.
8
Design Objectives for the VMPC Stream Cipher and its KSA
The Cipher should require no initial outputs to be discarded directly after running the KSA. Probability that the Cipher’s output will enter a short cycle should be negligibly low. Output generated by the Cipher should be free from statistical biases.
VMPC One-Way Function and Stream Cipher
217
Effort required to recover the internal state from the Cipher’s output should be higher than a brute-force search of all possible 512-bit keys. The KSA should resist related-key attacks and attacks against the scheme of using the Initialization Vector (IV). The KSA should provide random-looking diffusion of changes of one byte of the key of size up to 512 bits onto the generated permutation and onto output generated by the Cipher.
9
Description of the VMPC Stream Cipher
The algorithm generates a stream of 8-bit values. Variables: P : 256-byte table storing a permutation initialized by the VMPC KSA s : 8-bit variable initialized by the VMPC KSA n : 8-bit variable L : desired length of the keystream in bytes
10
Description of the VMPC Key Scheduling Algorithm
The VMPC Key Scheduling Algorithm (KSA) transforms a cryptographic key and (optionally) an Initialization Vector into a 256-element permutation P and initializes variable s. Variables as in Section 9, with: c : fixed length of the cryptographic key in bytes, 16 ≤ c ≤ 64 K : c-element table storing the cryptographic key z : fixed length of the Initialization Vector in bytes, 16 ≤ z ≤ 64 V : z-element table storing the Initialization Vector m : 16-bit variable
Table 3. Example complexities of inverting the VMPC function Function n 6 8 10 16 32 64 128 256
VMPC1
VMPC2
VMPC3
VMPC4
24,1 (2,3) 25,5 (2,7) 27,1 (3,0) 211,5 (3,8) 224 (6,0) 253 (10,2) 2117 (18) 2260 (34)
25,5 (3,1) 27,5 (3,4) 29,7 (4,0) 216,6 (5,4) 237 (9,1) 284 (16,2) 2190 (30) 2420 (57)
26,1 (3,3) 28,8 (4,0) 211,5 (4,7) 220,4 (6,6) 247 (11,5) 2108 (21,0) 2245 (40) 2550 (77)
26,9 (3,8) 29,8 (4,4) 213,0 (5,2) 223,3 (7,5) 254 (13,4) 2127 (24,9) 2292 (47) 2660 (92)
218
Bartosz Zoltak Table 4. VMPC Stream Cipher 1. n = 0 2. Repeat steps 3-6 L times: 3. s = P [(s + P [n]) modulo 256 ] 4. Output P [(P [P [s]] + 1) modulo 256 ] 5. T emp = P [n] P [n] = P [s] P [s] = T emp 6. n = (n + 1) modulo 256
Table 5. VMPC Key Scheduling Algorithm 1. s = 0 2. for n from 0 to 255: P [n] = n 3. for m from 0 to 767: execute steps 4-6: 4. n = m modulo 256 5. s = P [(s + P [n] + K[m modulo c]) modulo 256 ] 6. T emp = P [n] P [n] = P [s] P [s] = T emp 7. If Initialization Vector is used: execute step 8: 8. for m from 0 to 767: execute steps 9-11: 9. n = m modulo 256 10. s = P [(s + P [n] + V [m modulo z]) modulo 256 ] 11. T emp = P [n] P [n] = P [s] P [s] = T emp
11
VMPC Stream Cipher Test Vectors
Table 6 gives 16 test output-bytes generated by the VMPC Stream Cipher for a given 16-byte key (K) and a given 16-byte Initialization Vector (V ):
12
Performance of the VMPC Stream Cipher
Performance of a moderately optimized 32-bit assembler implementation of the VMPC Stream Cipher and its KSA, measured on an Intel Pentium 4, 2.66 GHz processor, is given in tables 7 and 8.
VMPC One-Way Function and Stream Cipher
219
Table 6. Test-output of the VMPC Stream Cipher K; c = 16 [hex] 96, 61, 41, 0A, B7, 97, D8, A9, EB, 76, 7C, 21, 17, 2D, F6, C7 V ; z = 16 [hex] 4B, 5C, 2F, 00, 3E, 67, F3, 95, 57, A8, D2, 6F, 3D, A2, B1, 55 Output-byte position [dec] 0 1 2 3 252 253 254 255 Output-byte value [hex] A8 24 79 F5 B8 FC 66 A4 Output-byte position [dec] 1020 1021 1022 1023 102396 102397 102398 102399 Output-byte value [hex] E0 56 40 A5 81 CA 49 9A
Table 7. Perfomance of the VMPC Stream Cipher MBytes/s 210
13 13.1
MBits/s 1680
cycles/byte 12.7
Analysis of the VMPC Stream Cipher Theoretical Probability of Equal Consecutive Outputs
Probability of two consecutive outputs being equal appears to be an important parameter for a cipher based, as VMPC is, on an internal permutation (P ). A sole permutation is obviously distinguishable from a truly random stream as its values never repeat. The construction of a cipher based on an internal permutation should corrupt the regular structure of the permutation in such way as to force the outputs to repeat with a random-looking probability. This section explains theoretically why the probability of consecutive outputs generated by the VMPC Stream Cipher being equal is the same as we would expect from a random oracle, i.e. that P rob(out[x] = out[x+1]) = 2−N , where N is the word-size of the Cipher in bits; the standard value of N is 8. To compute the probability, two scenarios need to be considered: (1) - there is no swap in step 5 (Table 4) and (2) - there is a swap in step 5. In (1): P rob(no-swap)=P rob(s[x] = n[x]) = 2−N As a result of (1) permutation P will have the same arrangement of elements in steps x and x+1. This implies a distinction into two sub-scenarios - (1a) where s[x] = s[x + 1] and (1b) where s[x] = s[x + 1], which directly affects whether out[x] = out[x + 1] or out[x] = out[x + 1].
Table 8. Perfomance of the VMPC KSA for 128-, 256- and 512-bit keys keys/s 310 000
milliseconds/key 0.0032
220
Bartosz Zoltak
In (1a): P rob(s[x] = s[x + 1]) = 2−N and (1aR): P rob(out[x] = out[x + 1]) = 1 In (1b): P rob(s[x] = s[x + 1]) = 1 − 2−N and (1bR): P rob(out[x] = out[x + 1]) = 0 In (2): P rob(swap)=P rob(s[x] = n[x]) = 1 − 2−N and (2R): P rob(out[x] = out[x + 1]) = 2−N , regardless of the relation between s[x] and s[x+1] because P (and VMPC1 (P )) in steps x and x + 1 are different permutations. This probability was also confirmed experimentally. By combining the probabilities in scenarios (1)(1a)(1aR), (1)(1b)(1bR) and (2)(2R) we get: P rob(out[x] = out[x + 1]) = = 2−N × 2−N × 1 + 2−N × (1 − 2−N ) × 0 + (1 − 2−N ) × 2−N = 2−N which ends the proof. 13.2
Recovering the Cipher’s Internal State
A method analogous to the Forward Tracking Algorithm proposed by Mister and Tavares in [3] was applied to break the VMPC Stream Cipher. Following this approach an average of over 2900 computational operations is estimated to be required to recover the Cipher’s internal state from its output. 13.3
Digraph and Trigraph Probabilities
Frequencies of occurrence of each of the possible 216 pairs of consecutive output values (out[x], out[x + 1]) were measured in a stream of 240.1 output bytes. None of the measured frequencies showed a statistically significant deviation from its expected value of 1 / 65536. Frequencies of occurrence of each of the possible 224 triplets of two consecutive output values and the n variable (out[x], out[x + 1], n) were measured in a stream of 241.85 output bytes. None of the measured frequencies showed a statistically significant deviation from its expected value of 1 / 16777216. Frequencies of occurrence of each of the possible 224 trigraphs of consecutive output values (out[x], out[x + 1], out[x + 2]) were measured in a stream of 241.6 output bytes. None of the measured frequencies showed a statistically significant deviation from its expected value of 1 / 16777216. 13.4
Single-Output Probabilities
Frequencies of occurrence of each of the possible 28 output values (out[x]) were measured in a stream of 241.85 output bytes. None of the measured frequencies showed a statistically significant deviation from its expected value of 1 / 256.
VMPC One-Way Function and Stream Cipher
221
Table 9. VMPC Stream Cipher cycle lengths M 4 5 6 7 8
Cycle lengths 200, 88, 40, 36, 12, 8 1 860, 640, 295, 110, 45, 25, 20, 5 15 510, 5 580, 2 508, 936, 516, 510, 252, 90, 12, 6 215 089, 23 821, 3 990, 2 485, 1 015, 392, 70, 56, 28, 14 2 401 728, 79 504, 53 512, 42 120, 2 136, 1 032, 288, 96, 24, 16 (2 different cycles of length 16 possible), 8 9 20 355 471, 2 908 098, 2 728 890, 1 359 855, 949 725, 609 174, 299 592, 125 091, 27 306, 13 068, 6 219, 5 067, 2 853, 2 538, 180, 90, 18 (3 different cycles of length 18 possible), 9 10 113 748 840, 99 425 590, 75 813 290, 37 178 940, 20 169 740, 9 955 030, 3 239 140, 2 349 150, 572 500, 363 830, 45 520, 8 730, 7 520, 700, 390, 370, 40 (17 different cycles of length 40 possible), 20, 10 (2 different cycles of length 10 possible)
Frequencies of occurrence of each of the possible 216 configurations of an output value and the n variable (out[x], n) were measured in a stream of 239.4 output bytes. None of the measured frequencies showed a statistically significant deviation from its expected value of 1 / 65536. 13.5
First-Outputs Probabilities
Frequencies of occurrence of each of the possible 28 values on each of the first 256 byte-positions of the keystream generated directly after running the KSA were measured in a sample of 240.3 bytes of the Cipher’s output for 232.3 different keys. None of the measured frequencies showed a statistically significant deviation from its expected value of 1 / 256. [In [6] Mantin and Shamir show that the second output of RC4 takes on value 0 with probability 1 / 128 rather than 1 / 256.] 13.6
Short Cycles
Probability of Entering a Cycle Not Longer than X. Following Knuth’s [1], probability of entering a cycle not longer than X for an n-element random permutation is X/n. To compare cycle lengths in the output of the VMPC Stream Cipher to cycle lengths in a random permutation, the Cipher was scaled down to use M -element permutations for M ∈ {4, 5, . . . , 10}. The total number of M ! × M 2 possible internal states of the Cipher is determined by all possible configurations of permutation P and variables s and n. The observed cycle lengths, listed in the Appendix, do not show an appreciable difference from a model of cycles in a random (M ! × M 2 )-element permutation. Probability of entering a cycle not longer than X by the VMPC Stream Cipher
222
Bartosz Zoltak
is conjectured from this to be approximately X/(256! × 2562 ) ∼ = X/21700 . An example estimate is that probability that the Cipher’s output will enter a cycle not longer than 2850 is about 1 / 2850 . Finney States. In [10] Finney defined a theoretical class of internal states of RC4 which produce a short cycle of length 65280 by swapping P [n] = 1 in each step (the KSA of RC4 prevents the cipher’s internal state from entering this class). The class is diagnosed by n + 1 = s and P [n + 1] = 1. Such phenomenon is possible because step s = s + P [n] of the statetransformation function of RC4 retains the linear structure of P [n] in variable s (P [n], after the increment of n, is always equal 1). The VMPC Stream Cipher uses an additional table-lookup (s = P [s + P [n]]), which, assuming that P was properly initialized, corrupts a possible linear structure of P [n] (or s) and prevents situations analogous to Finney states from occurring. 13.7
Binary Derivatives of Bit Output Sequences
This family of tests was inspired by Golic’s [8], where the author describes a statistical bias in the second binary derivative of the least significant bit output sequence of RC4. The author finds that the bias allows the attacker to distinguish RC4 output from a truly random source using about 64N /225 outputs, where N is the cipher’s word-size in bits (e.g. for N = 8 the required length is about 240 )1 . Output generated by the VMPC Stream Cipher showed no bias in this family of tests. The following objectives were taken in testing VMPC here: N = 7 word-size was chosen to make the tested algorithm as close to the real 8-bit one as possible while significantly decreasing the output-sequence length required to reveal the bias for RC4 (for 7-bit RC4 - about 234.2 outputs according to the original estimates in [8]). First, second and third binary derivatives of all 7 bits output sequences were tested (21 frequencies of (outk [x] + outk [x + A]=1) were measured for k ∈ {0, 1, . . . , 6}, A ∈ {1, 2, 3}, where outk [x] denotes k-th bit of x-th output word). In a sequence of 244.8 (about 1013.5 ) VMPC outputs tested according to this approach none of the measured frequencies showed a statistically significant deviation from its expected value of 0.5. 13.8
Equal Neighboring Outputs Probabilities
Frequencies of occurrence of situations where there occurs a given number (0, 1, 2, 3, 4, 5 and over 5) of direct (generated consecutively) and indirect (separated by one output byte) equal neighboring outputs in the consecutive 256-byte 1
Authors of [5] consider this estimate somewhat optimistic and suggest that the required keystream length for N = 8 is about 244.7 .
VMPC One-Way Function and Stream Cipher
223
sub-streams of the Cipher’s output and the average total number of direct and indirect equal neighboring outputs – showed no statistically significant deviation from their expected values in a sample of 243.1 bytes of the Cipher’s output. 13.9
Statistical Tests on the Cipher’s Output
Keystreams generated by the VMPC Stream Cipher were tested by two popular batteries of statistical tests – the DIEHARD battery [11] and the NIST statistical tests suite [12]. No bias was found by any of the 15 tests included in the DIEHARD battery or by any of the 16 tests from the NIST suite.
14
Analysis of the VMPC Key Scheduling Algorithm
The VMPC Key Scheduling Algorithm was tested for diffusion of changes of the cryptographic key onto the generated permutation and onto the Cipher’s output. A change of one byte of the cryptographic key of size 128, 256 and 512 bits appears to cause a random-looking change in the generated permutation and in the Cipher’s output. The KSA was designed to provide the diffusion without the use of the Initialization Vector and the tests were run without the IV. The Initialization Vector would obviously mix the generated permutation further, which would improve the diffusion effect. 14.1
Given Numbers of Equal Permutation Elements Probabilities
Frequencies of occurrence of situations where in two permutations, generated from keys differing in one byte, there occurs a given number (0, 1, 2, 3, 4, 5) of equal elements in the corresponding positions and the average number of equal elements in the corresponding positions – showed no statistically significant deviation from their expected values in samples of 233.2 pairs of 128-, 256- and 512-bit keys. 14.2
Given Numbers of Equal Cipher’s Outputs Probabilities
Frequencies of occurrence of situations where in two 256-byte streams generated by the VMPC Stream Cipher directly after running the KSA for keys differing in one byte, there occurs a given number (0, 1, 2, 3, 4, 5) of equal values in the corresponding byte-positions and the average number of equal values in the corresponding byte-positions – showed no statistically significant deviation from their expected values in samples of 233.2 pairs of 128-, 256- and 512-bit keys.
224
Bartosz Zoltak
14.3
Equal Corresponding Permutation Elements Probabilities
Frequencies of occurrence of situations where the elements in the corresponding positions of permutations generated from keys differing in one byte are equal (for each of the 256 positions) – showed no statistically significant deviation from their expected value in samples of 233.2 pairs of 128-, 256- and 512-bit keys.
15
Conclusions
A simple one-way function together with a description of the most efficient method of inverting it found have been presented. An open problem is whether the simplicity of the function helps make a hypothetical attempt to prove a lower bound on the complexity of inverting it worth undertaking. A proposed stream cipher which employs the function was given together with some analysis of the cipher’s cryptographic strength, statistical properties of the cipher’s output and statistical properties of the cipher’s Key Scheduling Algorithm. The analyses performed so far did not reveal any weakness in the design and indicated that the cipher has a number of security advantages over the alleged RC4 keystream generator while retaining most of its speed and simplicity.
References [1] Donald E. Knuth: The Art of Computer Programming, vol. 1. Fundamental Algorithms Third Edition. Addison Wesley Longman, 1997. 221 [2] Donald E. Knuth: The Art of Computer Programming, vol. 2. Seminumerical Algorithms Third Edition. Addison Wesley Longman, 1998. [3] Serge Mister, Stafford E. Tavares: Cryptanalysis of RC4-like Ciphers. Proceedings of SAC 1998, LNCS, vol. 1556, Springer-Verlag, 1999. 220 [4] Lars R. Knudsen, Willi Meier, Bart Preneel, Vincent Rijmen, Sven Verdoolaege: Analysis Methods for (Alleged) RC4. Proceedings of ASIACRYPT 1998, LNCS, vol. 1514, Springer-Verlag, 1998. [5] Scott R. Fluhrer, David A. McGrew: Statistical Analysis of the Alleged RC4 Keystream Generator. Proceedings of FSE 2000, LNCS, vol. 1978, SpringerVerlag, 2001. 222 [6] Itsik Mantin, Adi Shamir: A Practical Attack on Broadcast RC4. Proceedings of FSE 2001, LNCS, vol. 2355, Springer-Verlag, 2002. 221 [7] Scott Fluhrer, Itsik Mantin, Adi Shamir: Weaknesses in the Key Scheduling Algorithm of RC4. Proceedings of SAC 2001, LNCS, vol. 2259, Springer-Verlag 2001. [8] Jovan Dj. Golic: Linear Statistical Weakness of Alleged RC4 Keystream Generator. Proceedings of EUROCRYPT 1997, LNCS, vol. 1233, Springer-Verlag 1997. 222 [9] Alexander L. Grosul, Dan S. Wallach: A Related-Key Cryptanalysis of RC4. Technical Report TR-00-358, Department of Computer Science, Rice University, 2000. [10] Hal Finney: An RC4 Cycle That Can’t Happen. Post in sci.crypt, 1994. 222 [11] George Marsaglia: DIEHARD battery of statistical tests with documentation. http://stat.fsu.edu/∼geo/diehard.html. 223 [12] NIST statistical tests suite with documentation. http://csrc.nist.gov/rng. 223
VMPC One-Way Function and Stream Cipher
225
Appendix: Cycle Lengths Observed in the Output of the VMPC Stream Cipher The observed cycle lengths in the output of the scaled down variants of the Cipher for M ∈ {4, 5, . . . , 10} are listed in Table 9. M denotes the number of elements in the P permutation. All addition operations performed by the Cipher here are additions modulo M .
A New Stream Cipher HC-256 Hongjun Wu Institute for Infocomm Research 21 Heng Mui Keng Terrace, Singapore 119613 [email protected]
Abstract. Stream cipher HC-256 is proposed in this paper. It generates keystream from a 256-bit secret key and a 256-bit initialization vector. HC-256 consists of two secret tables, each one with 1024 32-bit elements. The two tables are used as S-Box alternatively. At each step one element of a table is updated and one 32-bit output is generated. The encryption speed of the C implementation of HC-256 is about 1.9 bit per clock cycle (4.2 clock cycle per byte) on the Intel Pentium 4 processor.
1
Introduction
Stream ciphers are used for shared-key encryption. The modern software efficient stream ciphers can run 4-to-5 times faster than block ciphers. However, very few efficient and secure stream ciphers have been published. Even the most widely used stream cipher RC4 [25] has several weaknesses [14, 16, 22, 9, 10, 17, 21]. In the recent NESSIE project all the six stream cipher submissions cannot meet the stringent security requirements [23]. In this paper we aim to design a very simple, secure, software-efficient and freely-available stream cipher. HC-256 is the stream cipher we proposed in this paper. It consists of two secret tables, each one with 1024 32-bit elements. At each step we update one element of a table with non-linear feedback function. Every 2048 steps all the elements of the two tables are updated. At each step, HC-256 generates one 32-bit output using the 32-bit-to-32-bit mapping similar to that being used in Blowfish [28]. Then the linear masking is applied before the output is generated. In the design of HC-256, we take into consideration the superscalar feature of modern (and future) microprocessors. Without compromising the security, we try to reduce the dependency between operations. The dependency between the steps is reduced so that three consecutive steps can be computed in parallel. At each step, three parallel additions are used in the feedback function and three additions are used to combine the four table lookup outputs instead of the addition-xor-addition being used in Blowfish (similar idea has been suggested by Schneier and Whiting to use three xors to combine those four terms [29]). With the high degree of parallelism, HC-256 runs very efficiently on the modern processor. We implemented HC-256 in C and tested its performance on the Pentium 4 processor. The encryption speed of HC-256 reaches 1.93 bit/cycle. This paper is organized as follows. We introduce HC-256 in Section 2. The security of HC-256 is analyzed in Section 3. Section 4 discusses the implementation and performance of HC-256. Section 5 concludes this paper. B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 226–244, 2004. c International Association for Cryptologic Research 2004
A New Stream Cipher HC-256
2
227
Stream Cipher HC-256
In this section, we describe the stream cipher HC-256. From a 256-bit key and a 256-bit initialization vector, it generates keystream with length up to 2128 bits. 2.1
Operations, Variables and Functions
The following operations are used in HC-256: + : x + y means x + y mod 232 , where 0 ≤ x < 232 and 0 ≤ y < 232 : x y means x − y mod 1024 ⊕ : bit-wise exclusive OR || : concatenation >> : right shift operator. x >> n means x being right shifted n bits. << : left shift operator. x << n means x being left shifted n bits. >>> : right rotation operator. x >>> n means ((x >> n)⊕(x << (32−n)) where 0 ≤ n < 32, 0 ≤ x < 232 . Two tables P and Q are used in HC-256. The key and the initialization vector of HC-256 are denoted as K and IV . We denote the keystream being generated as s. P Q K IV s
: a table with 1024 32-bit elements. Each element is denoted as P [i] with 0 ≤ i ≤ 1023. : a table with 1024 32-bit elements. Each element is denoted as Q[i] with 0 ≤ i ≤ 1023. : the 256-bit key of HC-256. : the 256-bit initialization vector of HC-256. : the keystream being generated from HC-256. The 32-bit output of the ith step is denoted as si . Then s = s0 ||s1 ||s2 || · · ·
There are six functions being used in HC-256. f1 (x) and f2 (x) are the same as {256} {256} the σ0 (x) and σ1 (x) being used in the message schedule of SHA-256 [24]. For g1 (x) and h1 (x), the table Q is used as S-box. For g2 (x) and h2 (x), the table P is used as S-box. f1 (x) = (x >>> 7) ⊕ (x >>> 18) ⊕ (x >> 3) f2 (x) = (x >>> 17) ⊕ (x >>> 19) ⊕ (x >> 10) g1 (x, y) = ((x >>> 10) ⊕ (y >>> 23)) + Q[(x ⊕ y) mod 1024] g2 (x, y) = ((x >>> 10) ⊕ (y >>> 23)) + P [(x ⊕ y) mod 1024] h1 (x) = Q[x0 ] + Q[256 + x1 ] + Q[512 + x2 ] + Q[768 + x3 ] h2 (x) = P [x0 ] + P [256 + x1 ] + P [512 + x2 ] + P [768 + x3 ] where x = x3 ||x2 ||x1 ||x0 , x is a 32-bit word, x0 , x1 , x2 and x3 are four bytes. x3 and x0 denote the most significant byte and the least significant byte of x, respectively.
228
2.2
Hongjun Wu
Initialization Process (Key and IV Setup)
The initialization process of HC-256 consists of expanding the key and initialization vector into P and Q (similar to the message setup in SHA-256) and running the cipher 4096 steps without generating output. 1. Let K = K0 ||K1 || · · · ||K7 and IV = IV0 ||IV1 || · · · ||IV7 , where each Ki and IVi denotes a 32-bit number. The key and IV are expanded into an array Wi (0 ≤ i ≤ 2559) as: ⎧ 0≤i≤7 ⎨ Ki 8 ≤ i ≤ 15 Wi = IVi−8 ⎩ f2 (Wi−2 ) + Wi−7 + f1 (Wi−15 ) + Wi−16 + i 16 ≤ i ≤ 2559 2. Update the tables P and Q with the array W . P [i] = Wi+512 Q[i] = Wi+1536
for 0 ≤ i ≤ 1023 for 0 ≤ i ≤ 1023
3. Run the cipher (the keystream generation algorithm in Subsection 2.3) 4096 steps without generating output. The initialization process completes and the cipher is ready to generate keystream. 2.3
The Keystream Generation Algorithm
At each step, one element of a table is updated and one 32-bit output is generated. An S-box is used to generate only 1024 outputs, then it is updated in the next 1024 steps. The keystream generation process of HC-256 is given below (“” denotes “−” modulo 1024, si denotes the output of the i-th step). i = 0; repeat until enough keystream bits are generated. { j = i mod 1024; if (i mod 2048) < 1024 { P [j] = P [j] + P [j 10] + g1 ( P [j 3], P [j 1023] ); si = h1 ( P [j 12] ) ⊕ P [j]; } else { Q[j] = Q[j] + Q[j 10] + g2 ( Q[j 3], Q[j 1023] ); si = h2 ( Q[j 12] ) ⊕ Q[j]; } end-if i = i + 1; } end-repeat
A New Stream Cipher HC-256
2.4
229
Encryption and Decryption
The keystream is XORed with the message for encryption. The decryption is to XOR the keystream with the ciphertext.
3
Security Analysis of HC-256
We start with a brief review of the attakcs on stream ciphers. Many stream ciphers are based on the linear feedback shift registers (LFSRs) and a number of correlation attacks, such as [30, 31, 19, 11, 20, 4, 15], were developed to analyze them. Later Goli´c [12] devised the linear cryptanalysis of stream ciphers. That technique could be applied to a wide range of stream ciphers. Recently Coppersmith, Halevi and Jutla [6] developed the distinguishing attacks (the linear attack and low diffusion attack) on stream ciphers with linear masking. The correlation attacks cannot be applied to HC-256 because HC-256 uses non-linear feedback functions to update the two tables P and Q. The output function of HC-256 uses the 32-bit-to-32-bit mapping similar to that being used in Blowfish. The analysis on Blowfish shows that it is extremely difficult to apply linear cryptanalysis [18] to the large secret S-box. The large secret S-box of HC-256 is updated during the keystream generation process and it is almost impossible to develop linear relations linking the input and output bits of the Sbox. Vaudenay has found some differential weakness of the randomly generated large S-box [32]. But it is very difficult to launch differential cryptanalysis [2] against HC-256 since it is a synchronous stream cipher for which the keystream generation is independent of the message. In this section, we will analyze the security of the secret key, the randomness of the keystream, and the security of the initialization process. 3.1
Period
The 65547-bit state of HC-256 ensures that the period of the keystream is extremely large. But the exact period of HC-256 is difficult to predict. The average period of the keystream is estimated to be about 265546 (if we assume that the invertible next-state function of HC-256 is random). The large number of states also completely eliminates the threat of time-memory tradeoff attack on stream ciphers [1, 13]. 3.2
The Security of the Key
We begin with the study of a modified version of HC-256 (with no linear masking). Our analysis shows that even for this weak version of HC-256, it is impossible to recover the secret key faster than exhaustive key search. The reason is that the keystream is generated from a highly non-linear function (h1 (x) or h2 (x)), so the keystream leaks very small amount of information at each step. Recovering P and Q requires the partial information leaked from a lot of steps. Because the tables are updated in a highly non-linear way, it is difficult to retrieve the informtion of P and Q from those leaked information.
230
Hongjun Wu
HC-256 With No Linear Masking. For HC-256 with no linear masking, the output at the ith step is generated as si = h1 ( P [i 12] ) or si = h2 ( Q[i 12] ). If two outputs generated from the same S-box are equal, then very likely those two inputs to the S-box are equal. According to the analysis on the randomness of the outputs of h1 (x) and h2 (x) given in Subsection 3.3, s2048×α+i = s2048×α+j (0 ≤ i < j < 1024) with probability about 2−31 . If s2048×α+i = s2048×α+j , then at the (2048 × α + j)-th step, P [i 12] = P [j 12] with probability about 0.5 (31-bit information of the table P is leaked). We note that for every 1024 steps in the range (2048 × α, 2048 × α + 1024), the same used in h1 (x). is−31 S-box × 2 The probability that there are two equal outputs is 1024 ≈ 2−12 . In 2 −12 ×31 average each output leaks 2 1024 ≈ 2−17 bit information of the table P . To ≈ 232 outputs. Recovering P from recover P , we need to analyze at least 1024×32 2−17 32 those 2 outputs involves very complicated non-linear equations and solving them is computationally infeasible. Recovering Q is as difficult as recovering P . We note that the table Q is used as S-box to update P , and vice versa. P and Q interact in such a complicated way and recovering them from the keystream cannot be faster than exhaustive key search. HC-256. The analysis above shows that the secret key of HC-256 with no linear masking is secure. With the linear masking, the information leakage is greatly reduced and it would be even more difficult to recover the secret key from the keystream. We thus conclude that the key of HC-256 cannot be recovered faster than exhaustive key search. 3.3
Randomness of the Keystream
In this subsection, we investigate the randomness of the keystream of HC-256. Because the large, secret and frequently updated S-boxes are used in the cipher, the efficient attack is to analyze the randomness of the overall 32-bit words. Under this guideline, we developed some attacks against HC-256 with no linear masking. Then we show that the linear masking eliminates those threats. 3.3.1
Keystream of HC-256 with No Linear Masking
The attacks on HC-256 with no linear masking is to investigate the security weaknesses in the output and feedback functions. We developed two attacks against HC-256 with no linear masking. Weakness of h1 (x) and h2 (x). For HC-256 with no linear masking, the output is generated as si = h1 (P [i 12]) or si = h2 (Q[i 12]). Because there is no difference between the analysis of h1 (x) and h2 (x), we use h(x) to refer h1 (x) and h2 (x) here. Assume that h(x) is a 32-bit-to-32-bit S-box H(x) with randomly generated secret elements and the inputs to H are randomly generated. Because the elements of the H(x) are randomly generated, the output of H(x)
A New Stream Cipher HC-256
231
is not uniformly distributed. If a lot of outputs are generated from H(x), some values in the range [0, 232 ) never appear and some appear with probability larger than 2−32 . Then it is straightforward to distinguish the outputs from random signal. However each H(x) in HC-256 is used to generate only 1024 outputs, then it gets updated. The direct computation of the distribution of the outputs of H(x) from those 1024 outputs cannot be successful. Instead, we consider the collision between the outputs of H(x). The following theorem gives the collision rate of the outputs of H(x). Theorem 1. Let H be an m-bit-to-n-bit S-box and all those n-bit elements are randomly generated, where m ≥ n and n is a large integer. Let x1 and x2 be two m-bit random inputs to H. Then H(x1 ) = H(x2 ) with probability about 2−n + 2−m . Proof. If x1 = x2 , then H(x1 ) = H(x2 ). If x1 = x2 , then H(x1 ) = H(x2 ) with probability 2−n . x1 = x2 with probability 2−m and x1 = x2 with probability 1 − 2−m . The probability that H(x1 ) = H(x2 ) is 2−m + (1 − 2−m ) × 2−n ≈ 2−n + 2−m . Attack 1. According to Theorem 1, for the 32-bit-to-32-bit S-box H, the collision rate of the outputs is 2−32 + 2−32 = 2−31 . With 235 pairs of (H(x1 ), H(x2 )), we can distinguish the output from random signal with success rate 0.761. (The success rate can be improved to 0.996 with 236 pairs.) Note that only 1024 outputs are generated from the same S-box H, so 226 outputs are needed to distinguish the keystream of HC-256 with no linear masking. Experiment. To compute the collision rate of the outputs of HC-256 (with no linear masking), we generated 239 outputs (248 pairs). The collision rate is 2−31 − 2−40.09 . The experiment confirms that the collision rate of the outputs of h(x) is very close to 2−31 , and approximating h(x) with randomly generated S-box has negligible effect on the attack. Remark 1. The distinguishing attack above can be slightly improved if we consider the differential attack on Blowfish. Vaudenay [32] has pointed out that the collision in a randomly generated S-box in Blowfish can be applied to distinguish the outputs of Blowfish with reduced round number (8 rounds). The basic idea of Vaudenay’s differential attack is that if Q[i] = Q[j] for 0 ≤ i, j < 256, i = j, then for a0 ⊕ a0 = i ⊕ j, h1 (a3 ||a2 ||a1 ||a0 ) = h1 (a3 ||a2 ||a1 ||a0 ) with probability 2−7 , where each ai denotes an 8-bit number. We can detect the collision in the S-box with success rate 0.5 since that S-box Q is used as inputs to h2 (x) to produce 1024 outputs. If Q[i] = Q[j] for 256α ≤ i, j < 256α + 256, 0 ≤ α < 4, i = j, and x1 and x2 are two random inputs (note that we cannot introduce or identify inputs with particular difference to h(x)), then the probability that h1 (x1 ) = h1 (x2 ) becomes 2−31 + 2−32 . However the chance that there (256)×4 is one useful collision in the S-box is only 2232 = 2−15 . The average collision rate becomes 2−15 ×(2−31 +2−32 )+(1−2−15)×2−31 = 2−31 +2−47 . The increase in collision rate is so small that the collision in the S-box has negligible effect on this attack.
232
Hongjun Wu
Weakness of the Feedback Function. The table P is updated with the nonlinear feedback function P [i mod 1024] = P [i mod 1024] + P [i 10] + g1 ( P [i 3], P [i 1023] ). The following attack is to distinguish the keystream by exploiting this relation. Attack 2. Assume that the h(x) is a one-to-one mapping function. Consider two groups of outputs (si , si−3 , si−10 , si−2047 , si−2048 ) and (sj , sj−3 , sj−10 , sj−2047 , sj−2048 ). If i = j and 1024 × α + 10 ≤ i, j < 1024 × α + 1023, they are equal with probability about 2−128 . The collision rate is 2−160 if the outputs are truely random. 2−128 is much larger than 2−160 , so the keystream can be distinguished from random signal with about 2128 pairs of such five-tuple groups of outputs. Note that the S-box is updated every 1024 steps, 2119 outputs are needed in the attack. The two attacks given above show that the HC-256 with no linear masking does not generate secure keystream. 3.3.2
Keystream of HC-256
With the linear masking being applied, it is no longer possible to exploit those two weaknesses separately and the attacks given above cannot be applied directly. We need to remove the linear masking first. We recall that at the ith step, if (i mod 2048) < 1024, the table P is updated as P [i mod 1024] = P [i mod 1024] + P [i 10] + g1 ( P [i 3], P [i 1023] ) We know that si = h1 (P [i12])⊕P [i mod 1024]. For 10 ≤ (i mod 2048) < 1023, this feedback function can be written alternatively as si ⊕ h1 (zi ) = (si−2048 ⊕ h1 (zi−2048 )) + (si−10 ⊕ h1 (zi−10 )) + g1 ( si−3 ⊕ h1 (zi−3 ), si−2047 ⊕ h1 (zi−2047 ) )
(1)
where h1 (x) and h1 (x) indicate two different functions since they are related to different S-boxes; zj denotes the P [j 12] at the j-th step. The linear masking is removed successfully in (1). However, it is very difficult to apply (1) directly to distinguish the keystream. To simplify the analysis, we attack a weak version of (1). We replace all the ‘+’ in the feedback function with ‘⊕’ and write (1) as si ⊕ si−2048 ⊕ si−10 ⊕ (si−3 >>> 10) ⊕ (si−2047 >>> 23) = h1 (zi ) ⊕ h1 (zi−2048 )) ⊕ h1 (zi−10 )) ⊕ (h1 (zi−3 ) >>> 10) ⊕ ⊕ (h1 (zi−2047 ) >>> 23) ⊕ Q[ri ],
(2)
where ri = (si−3 ⊕ h1 (zi−3 ) ⊕ si−2047 ⊕ h1 (zi−2047 )) mod 1024. Because of the random nature of h1 (x) and Q, the right hand side of (2) is not uniformly distributed. But each S-box is used in only 1024 steps, these 1024 outputs are not sufficient to compute the distribution of si ⊕ si−2048 ⊕ si−10 ⊕ (si−3 >>>
A New Stream Cipher HC-256
233
10) ⊕ (si−2047 >>> 23). Instead we need to study the collision rate. The effective way is to eliminate the term h1 (zi ) before analyzing the collision rate. Replace the i with i + 10. For 10 ≤ i mod 2048 < 1013, (2) can be written as si+10 ⊕ si−2038 ⊕ si ⊕ (si+7 >>> 10) ⊕ (si−2037 >>> 23) = h1 (zi+10 ) ⊕ h1 (zi−2038 )) ⊕ h1 (zi ) ⊕ (h1 (zi+7 ) >>> 10) ⊕ ⊕ (h1 (zi−2037 ) >>> 23) ⊕ Q[ri+10 ]
(3)
For the left-hand sides of (2) and (3) to be equal, i.e., for the following equation si ⊕ si−2048 ⊕ si−10 ⊕ (si−3 >>> 10) ⊕ (si−2047 >>> 23) = si+10 ⊕ si−2038 ⊕ si ⊕ (si+7 >>> 10) ⊕ (si−2037 >>> 23)
(4)
to hold, we require that (after eliminating the term h1 (zi )) h1 (zi−10 ) ⊕ h1 (zi−2048 ) ⊕ (h1 (zi−3 ) >>> 10) ⊕ (h1 (zi−2047 ) >>> 23) ⊕ Q[ri ] = h1 (zi+10 ) ⊕ h1 (zi−2038 ) ⊕ (h1 (zi+7 ) >>> 10) ⊕ (h1 (zi−2037 ) >>> 23) ⊕ Q[ri+10 ]
(5)
For 22 ≤ i mod 2048 < 1013, we note that zi−10 = zi ⊕ zi−2048 ⊕ (zi−3 >>> 10)⊕(zi−2047 >>> 23), and zi+10 = zi ⊕zi−2038 ⊕(zi+7 >>> 10)⊕(zi−2037 >>> 23). Approximate (5) as H(x1 ) = H(x2 )
(6)
where H denotes a random secret 106-bit-to-32-bit S-box, x1 and x2 are two 106bit random inputs, x1 = zi−3 ||zi−2047 ||zi−2048 ||ri and x2 = zi+7 ||zi−2037 ||zi−2038 ||ri+10 . (The effect of zi is included in H.) According to Theorem 1, (6) holds with probability 2−32 + 2−106 . So (4) holds with probability 2−32 + 2−106 . We approximate the binomial distribution with the normal distribution. The mean µ = N p and the standard deviation σ = N p(1 − p), where N is the total random signal, p = 2−32 , number of equations (4), and p = 2−32 + 2−106 . For the mean µ = N p and the standard deviation σ = N p (1 − p ). If |u − u | > 2(σ + σ ), i.e. N > 2184 , the output of the cipher can be distinguished from random signal with success rate 0.9772. After verifying the validity of 2184 equations (4), we can successfully distinguish the keystream from random signal. We note that the S-box is updated every 1024 steps, so only about 210 equations (4) can be obtained from 1024 steps in the range 1024 × α ≤ i < 1024 × α + 1024. To distinguish the keystream from random signal, 2184 outputs are needed in the attack. The attack above can be improved by exploiting the relation ri = (si−3 ⊕ h1 (zi−3 )⊕si−2047 ⊕h1 (zi−2047 )) mod 1024. If (si−3 ⊕si−2047 ) mod 1024 = (si+7 ⊕ si−2037 ) mod 1024, then (6) holds with probability 2−32 + 2−96 and 2164 equations (4) are needed in the attack. Note that only about one equation (4) can now be obtained from 1024 steps in the range 1024 × α ≤ i < 1024 × α + 1024.
234
Hongjun Wu
To distinguish the keystream from random signal, 2174 outputs are needed in the attack. We note that the attack above can only be applied to HC-256 with all the ‘+’ in the feedback function being replaced with ‘⊕’. To distinguish the keystream of HC-256, more than 2174 outputs are needed. So we conclude that it is impossible to distinguish a 2128 -bit keystream of HC-256 from random signal. 3.4
Security of the Initialization Process (Key/IV Setup)
The initialization process of the HC-256 consists of two stages, as given in Subsection 2.2. We expand the key and IV into P and Q. At this stage, every bit of the key/IV affects all the bits of the two tables and any difference in the related keys/IVs results in uncontrollable differences in P and Q. Then we run the cipher 4096 steps without generating output so that the P and Q become more random. After the initialization process, we expect that any difference in the keys/IVs would not result in biased keystream.
4
Implementation and Performance of HC-256
The direct C implementation of the encryption algorithm given in Subsection 2.3 runs at about 0.6 bit/cycle on the Pentium 4 processor. The program size is very small but the speed is only about 1.5 times that of AES [7]. At each step in the direct implementation, we need to compute (i mod 2048), i 3, i 10 and i 1023. And at each step there is a branch decision based on the value of (i mod 2048). These operations affect greatly the encryption speed. The optimization process is to reduce the amount of these operations. 4.1
The Optimized Implementation of HC-256
This subsection describes the optimized C implementation of HC-256 given in Appendix B (“hc256.h”). In the optimized code, loop unrolling is used and only one branch decision is made for every 16 steps. The experiment shows that the branch decision in the optimized code affects the encryption speed by less than one percent. There are several fast implementations of the feedback functions of P and Q. We use the implementation given in Appendix B because it achieves the best consistency on different platforms. The details of that implementation are given below. The feedback function of P is given as P [i mod 1024] = P [i mod 1024] + P [i 10] + g1 ( P [i 3], P [i 1023] ) A register X containing 16 elements is introduced for P . If (i mod 2048) < 1024 and i mod 16 = 0, then at the begining of the ith step, X[j] = P [(i − 16 + j) mod 1024] for j = 0, 1, · · · 15, i.e. the X contains the values of P [i 16], P [i
A New Stream Cipher HC-256
235
15], · · · , P [i 1]. In the 16 steps starting from the ith step, the P and X are updated as P [i] = P [i] + X[6] + g1 ( X[13], P [i + 1] ); X[0] = P [i]; P [i + 1] = P [i + 1] + X[7] + g1 ( X[14], P [i + 2] ); X[1] = P [i + 1]; P [i + 2] = P [i + 2] + X[8] + g1 ( X[15], P [i + 3] ); X[2] = P [i + 2]; P [i + 3] = P [i + 3] + X[9] + g1 ( X[0], P [i + 4] ); X[3] = P [i + 3]; ··· P [i + 14] = P [i + 14] + X[4] + g1 ( X[11], P [i + 15] ); X[14] = P [i + 14]; P [i + 15] = P [i + 15] + X[5] + g1 ( X[12], P [(i + 1) mod 1024] ); X[15] = P [i + 15]; Note that at the ith step, two elements of P [i 10] and P [i 3] can be obtained directly from X. Also for the output function si = h1 (P [i12])⊕P [i mod 1024], the P [i 12] can be obtained from X. In this implementation, there is no need to compute i 3, i 10 and i 12. A register Y with 16 elements is used in the implementation of the feedback function of Q in the same way as that given above. To reduce the memory requirement and the program size, the initialization process implemented in Appendix B is not as straightforward as that given in Subsection 2.2. To reduce the memory requirement, we do not implement the array W in the program. Instead we implement the key and IV expansion on P and Q directly. To reduce the program size, we implement the feedback functions of those 4096 steps without involving X and Y . 4.2
Performance of HC-256
Encryption Speed. We use the C codes given in Appendix B and C to measure the encryption speed. The processor used in the test is Pentium 4 (2.4 GHz, 8 KB Level 1 data cache, 512 KB Level 2 cache, no hyper-threading). The speed is measured by repeatedly encrypting the same 512-bit buffer for 226 times (The buffer is defined as ‘static unsigned long DATA[16]’ in Appendix C). The encryption speed is given in Table 1. The C implementation of HC-256 is faster than the C implementations of almost all the other stream ciphers. (However different designers may have made different efforts to optimize their codes. And the encryption speed may be measured in different ways. So the speed comparison is not absolutely accurate.) SEAL [26] is a software-efficient cipher and its C implementation runs at the
236
Hongjun Wu Table 1. The speed of the C implementation of HC-256 on Pentium 4
Operating System Windows XP (SP1)
Red Hat Linux 9 (Linux 2.4.20-8)
Compiler Intel C++ Compiler 7.1 Microsoft Visual C++ 6.0 Professional (SP5) Intel C++ Compiler 7.1 gcc 3.2.2
Optimization option -O3 -Release
Speed (bit/cycle) 1.93 1.81
-O3 -O3
1.92 1.83
speed of about 1.6 bit/cycle on Pentium III processor. Scream [5] runs at about the same speed as SEAL. The C implementation of SNOW2.0 [8] runs at about 1.67 bit/cycle on Pentium 4 processor. TURING [27] runs at about 1.3 bit/cycle on the Pentium III mobile processor. The C implementation of MUGI [33] runs at about 0.45 bit/cycle on the Pentium III processor. The encryption speed of Rabbit [3] is about 2.16 bit/cycle on Pentium III processor, but it is programmed in assembly language inline in C. Remark 2. In HC-256, there is dependency between the feedback and output functions since the P [i mod 1024] (or Q[i mod 1024]) being updated at the ith step is immediately used as linear masking. This dependency reduces the speed of HC-256 by about 3%. We do not remove this dependency from the design of HC-256 for security reason. Our analysis shows that each term being used as linear masking should not have been used in an S-box in the previous steps, otherwise the linear masking could be removed much easier. In our optimized implementation, we do not deal with this dependency because its effect on the encryption speed is very limited on the Pentium 4 processor. Initialization Process. The key setup of HC-256 requires about 74,000 clock cycles (measured by repeating the setup process 216 times on the Pentium 4 processor with Intel C++ compiler 7.1). This amount of computation is more than that required by most of the other stream ciphers (for example, the initialization process of Scream takes 27,500 clock cycles). The reason is that two large S-boxes are used in HC-256. To eliminate the threat of related key/IV attack, the tables should be updated with the key and IV thoroughly and this process requires a lot of computations. So it is undesirable to use HC-256 in the applications where key (or IV) is updated frequently.
5
Conclusion
In this paper, we proposed a software-efficient stream cipher HC-256. Our analysis shows that HC-256 is very secure. However, the extensive security analysis of any new cipher requires a lot of efforts from many researchers. We thus invite and encourage the readers to analyze the security of HC-256.
A New Stream Cipher HC-256
237
Finally we explicitly state that HC-256 is available royalty-free and HC-256 is not covered by any patent in the world.
References [1] S. Babbage, “A Space/Time Tradeoff in Exhaustive Search Attacks on Stream Ciphers”, European Convention on Security and Detection, IEE Conference publication, No. 408, May 1995. 229 [2] E. Biham, A. Shamir, “Differential Cryptanalysis of DES-like Cryptosystems”, in Advances in Cryptology – Crypto’90, LNCS 537, pp. 2-21, Springer-Verlag, 1991. 229 [3] M. Boesgaard, M. Vesterager, T. Pedersen, J. Christiansen, and O. Scavenius, “Rabbit: A New High-Performance Stream Cipher”, in Fast Software Encryption (FSE’03), LNCS 2887, pp. 307-329, Springer-Verlag, 2003. 236 [4] V. V. Chepyzhov, T. Johansson, and B. Smeets. “A Simple Algorithm for Fast Correlation Attacks on Stream Ciphers”, in Fast Software Encryption (FSE’00), LNCS 1978, pp. 181-195, Springer-Verlag, 2000. 229 [5] D. Coppersmith, S. Halevi, and C. Jutla, “Scream: A Software-Efficient Stream Cipher”, in Fast Software Encryption (FSE’02), LNCS 2365, pp. 195-209, SpringerVerlag, 2002. 236 [6] D. Coppersmith, S. Halevi, and C. Jutla, “Cryptanalysis of Stream Ciphers with Linear Masking”, in Advances in Cryptology – Crypto 2002, LNCS 2442, pp. 515532, Springer-Verlag, 2002. 229 [7] J. Daeman and V. Rijmen, “AES Proposal: Rijndael”, available on-line from NIST at http://csrc.nist.gov/encryption/aes/rijndael/. 234 [8] P. Ekdahl and T. Johansson, “A new version of the stream cipher SNOW”, in Selected Areas in Cryptology (SAC 2002), LNCS 2595, pp. 47–61, Springer-Verlag, 2002. 236 [9] S. Fluhrer and D. McGrew, “Statistical Analysis of the Alleged RC4 Keystream Generator”, in Fast Software Encryption (FSE’00), LNCS 1978, pp. 19-30, 2001. 226 [10] S. Fluhrer, I. Mantin, and A. Shamir. “Weaknesses in the Key Scheduling Algorithm of RC4”, in Selected Areas in Cryptography (SAC 2001), LNCS 2259, pp. 1-24, Springer-Verlag, 2001. 226 [11] J. D. Goli´c, “Towards Fast Correlation Attacks on Irregularly Clocked Shift Registers”, in Advances in Cryptography – Eurocrypt’95, pages 248-262, SpringerVerlag, 1995. 229 [12] J. D. Goli´c, “Linear Models for Keystream Generator”. IEEE Trans. on Computers, 45(1):41-49, Jan 1996. 229 [13] J. D. Goli´c, “Cryptanalysis of Alleged A5 Stream Cipher”, in Advances in Cryptology – Eurocrypt’97, LNCS 1233, pp. 239 - 255, Springer-Verlag, 1997. 229 [14] J. D. Goli´c, “Linear Statistical Weakness of Alleged RC4 Keystream Generator”, in Advancesin Cryptology – Eurocrypt’97, pp. 226 - 238, Springer-Verlag, 1997. 226 [15] T. Johansson and F. J¨ onsson. “Fast Correlation Attacks through Reconstruction of Linear Polynomials”, in Advances in Cryptology – CRYPTO 2000, LNCS 1880, pp. 300-315, Springer-Verlag, 2000. 229 [16] L. Knudsen, W. Meier, B. Preneel, V. Rijmen and S. Verdoolaege, “Analysis Methods for (Alleged) RC4”, in Advances in Cryptology – Asiacrypt’98, LNCS 1514, pp. 327-341, Springer-Verlag, 1998. 226
238
Hongjun Wu
[17] I. Mantin and A. Shamir, “A Practical Attack on Broadcast RC4”, in Fast Software Encryption (FSE’01), LNCS 2355, pp. 152-164, Springer-Verlag, 2002. 226 [18] M. Matsui, “Linear Cryptanalysis Method for DES Cipher”, in Advances in Cryptology – Eurocrypt’93, LNCS 765, pp. 386-397, Springer-Verlag, 1994. 229 [19] W. Meier and O. Staffelbach, “Fast Correlation Attacks on Certain Stream Ciphers”. Journal of Cryptography, 1(3):159-176, 1989. 229 [20] M. Mihaljevi´ c, M. P. C. Fossorier, and H. Imai, “A Low-Complexity and HighPerformance Algorithm for Fast Correlation Attack”, in Fast Software Encryption (FSE’00), pp. 196-212, Springer-Verlag, 2000. 229 [21] I. Mironov, “(Not So) Random Shuffles of RC4”, in Advances in Cryptology – Crypto 2002, LNCS 2442, pp. 304-319, Springer-Verlag, 2002. 226 [22] S. Mister, and S. E. Tavares, “Cryptanalysis of RC4-like Ciphers”, in Selected Areas in Cryptography (SAC’98), LNCS 1556, pp. 131-143, Springer-Verlag, 1999. 226 [23] NESSIE, “NESSIE Project Announces Final Selection of Crypto Algorithms”, available at https://www.cosic.esat.kuleuven.ac.be/nessie/deliverables/press relea se feb27.pdf. 226 [24] National Institute of Standards and Technology, “Secure Hash Standard (SHS)”, available at http://csrc.nist.gov/cryptval/shs.html. 227 [25] R. L. Rivest, “The RC4 Encryption Algorithm”. RSA Data Security, Inc., March 12, 1992. 226 [26] P. Rogaway and D. Coppersmith, “A Software Optimized Encryption Algorithm”. Journal of Cryptography, 11(4), pp. 273-287, 1998. 235 [27] G. G. Rose and P. Hawkes, “Turing: a Fast Stream Cipher”. Fast Software Encryption (FSE’03), LNCS 2887, pp. 290-306, Springer-Verlag, 2003. 236 [28] B. Schneier, “Description of a New Variable-Length Key, 64-bit Block Cipher (Blowfish)”, in Fast Software Encryption (FSE’93), LNCS 809, pp. 191-204, Springer-Verlag, 1994. 226 [29] B. Schneier and D. Whiting, “Fast Software Encryption: Designing Encryption Algorithms for Optimal Software Speed on the Intel Pentium Processor”, in Fast Software Encryption (FSE’97), LNCS 1267, pp. 242-259, Springer-Verlag, 1997. 226 [30] T. Seigenthaler. “Correlation-Immunity of Nonlinear Combining Functions for Cryptographic Applications”. IEEE Transactions on Information Theory, IT30:776-780,1984. 229 [31] T. Seigenthaler. “Decrypting a Class of Stream Ciphers Using Ciphertext Only”. IEEE Transactions on Computers, C-34(1):81-85, Jan. 1985. 229 [32] S. Vaudenay, “On the Weak Keys of Blowfish”, in Fast Software Encryption (FSE’96), LNCS 1039, pp. 27-32, Springer-Verlag, 1996. 229, 231 [33] D. Watanabe, S. Furuya, H. Yoshida, K. Takaragi, and B. Preneel, “A New Keystream Generator MUGI”, in Fast Software Encryption (FSE’02), LNCS 2365, pp. 179-194, Springer-Verlag, 2002. 236
A New Stream Cipher HC-256
A
239
Test Vectors of HC-256
Let K = K0 ||K1 || · · · ||K7 and IV = IV0 ||IV1 || · · · ||IV7 . The first 512 bits of keystream are given for different values of key and IV. 1. The key and IV are set as 0. 8589075b 3465f053 ec2792cd 7b4c50e8
0df3f6d8 f2891f80 bf4dcfeb eaf3a9c8
2fc0c542 8b24744e 7769bf8d f506016c
5179b6a6 18480b72 fa14aee4 81697e32
2. The key is set as 0, the IV is set as 0 except that IV0 = 1. bfa2e2af ee42c05f edfec706 5e04135a
e9ce174f 01312b71 633d9241 9448c434
8b05c2fe c61f50dd a6dac448 2de7e9f3
b18bb1d1 502a080b af8561ff 37520bdf
3. The IV is set as 0, the key is set as 0 except that K0 = 0x55. fe4a401c 3c5aa688 59f03a6c 6d10b7d8
B
ed5fe24f 23e2abc0 6e39eb44 add0f7cd
d19a8f95 2f90b3ae 8f7579fb 723423da
6fc036ae a8d30e42 70137a5e f575dde6
The Optimized C Implementation of HC-256 (“hc256.h”)
#include <stdlib.h> typedef unsigned long uint32; typedef unsigned char uint8; uint32 P[1024],Q[1024]; uint32 X[16],Y[16]; uint32 counter2048; // counter2048 = i mod 2048; #ifndef _MSC_VER #define rotr(x,n) #else #define rotr(x,n) #endif
(((x)>>(n))|((x)<<(32-(n)))) _lrotr(x,n)
#define h1(x,y) { uint8 a,b,c,d; a = (uint8) (x);
\ \ \
240
Hongjun Wu
b = c = d = (y)
(uint8) ((x) >> 8); \ (uint8) ((x) >> 16); \ (uint8) ((x) >> 24); \ = Q[a]+Q[256+b]+Q[512+c]+Q[768+d]; \
} #define h2(x,y) { \ uint8 a,b,c,d; \ a = (uint8) (x); \ b = (uint8) ((x) >> 8); \ c = (uint8) ((x) >> 16); \ d = (uint8) ((x) >> 24); \ (y) = P[a]+P[256+b]+P[512+c]+P[768+d]; \ } #define step_A(u,v,a,b,c,d,m){ uint32 tem0,tem1,tem2,tem3; tem0 = rotr((v),23); tem1 = rotr((c),10); tem2 = ((v) ^ (c)) & 0x3ff; (u) += (b)+(tem0^tem1)+Q[tem2]; (a) = (u); h1((d),tem3); (m) ^= tem3 ^ (u) ; }
\ \ \ \ \ \ \ \ \
#define step_B(u,v,a,b,c,d,m){ uint32 tem0,tem1,tem2,tem3; tem0 = rotr((v),23); tem1 = rotr((c),10); tem2 = ((v) ^ (c)) & 0x3ff; (u) += (b)+(tem0^tem1)+P[tem2]; (a) = (u); h2((d),tem3); (m) ^= tem3 ^ (u) ; }
\ \ \ \ \ \ \ \ \
void encrypt(uint32 data[]) //each time it encrypts 512-bit data { uint32 cc,dd; cc = counter2048 & 0x3ff; dd = (cc+16)&0x3ff; if (counter2048 < 1024) {
A New Stream Cipher HC-256
241
counter2048 = (counter2048 + 16) & 0x7ff; step_A(P[cc+0], P[cc+1], X[0], X[6], X[13],X[4], data[0]); step_A(P[cc+1], P[cc+2], X[1], X[7], X[14],X[5], data[1]); step_A(P[cc+2], P[cc+3], X[2], X[8], X[15],X[6], data[2]); step_A(P[cc+3], P[cc+4], X[3], X[9], X[0], X[7], data[3]); step_A(P[cc+4], P[cc+5], X[4], X[10],X[1], X[8], data[4]); step_A(P[cc+5], P[cc+6], X[5], X[11],X[2], X[9], data[5]); step_A(P[cc+6], P[cc+7], X[6], X[12],X[3], X[10],data[6]); step_A(P[cc+7], P[cc+8], X[7], X[13],X[4], X[11],data[7]); step_A(P[cc+8], P[cc+9], X[8], X[14],X[5], X[12],data[8]); step_A(P[cc+9], P[cc+10],X[9], X[15],X[6], X[13],data[9]); step_A(P[cc+10],P[cc+11],X[10],X[0], X[7], X[14],data[10]); step_A(P[cc+11],P[cc+12],X[11],X[1], X[8], X[15],data[11]); step_A(P[cc+12],P[cc+13],X[12],X[2], X[9], X[0], data[12]); step_A(P[cc+13],P[cc+14],X[13],X[3], X[10],X[1], data[13]); step_A(P[cc+14],P[cc+15],X[14],X[4], X[11],X[2], data[14]); step_A(P[cc+15],P[dd+0], X[15],X[5], X[12],X[3], data[15]); } else { counter2048 = (counter2048 + 16) & 0x7ff; step_B(Q[cc+0], Q[cc+1], Y[0], Y[6], Y[13],Y[4], data[0]); step_B(Q[cc+1], Q[cc+2], Y[1], Y[7], Y[14],Y[5], data[1]); step_B(Q[cc+2], Q[cc+3], Y[2], Y[8], Y[15],Y[6], data[2]); step_B(Q[cc+3], Q[cc+4], Y[3], Y[9], Y[0], Y[7], data[3]); step_B(Q[cc+4], Q[cc+5], Y[4], Y[10],Y[1], Y[8], data[4]); step_B(Q[cc+5], Q[cc+6], Y[5], Y[11],Y[2], Y[9], data[5]); step_B(Q[cc+6], Q[cc+7], Y[6], Y[12],Y[3], Y[10],data[6]); step_B(Q[cc+7], Q[cc+8], Y[7], Y[13],Y[4], Y[11],data[7]); step_B(Q[cc+8], Q[cc+9], Y[8], Y[14],Y[5], Y[12],data[8]); step_B(Q[cc+9], Q[cc+10],Y[9], Y[15],Y[6], Y[13],data[9]); step_B(Q[cc+10],Q[cc+11],Y[10],Y[0], Y[7], Y[14],data[10]); step_B(Q[cc+11],Q[cc+12],Y[11],Y[1], Y[8], Y[15],data[11]); step_B(Q[cc+12],Q[cc+13],Y[12],Y[2], Y[9], Y[0], data[12]); step_B(Q[cc+13],Q[cc+14],Y[13],Y[3], Y[10],Y[1], data[13]); step_B(Q[cc+14],Q[cc+15],Y[14],Y[4], Y[11],Y[2], data[14]); step_B(Q[cc+15],Q[dd+0], Y[15],Y[5], Y[12],Y[3], data[15]); } } //The following defines the initialization functions #define f1(x) (rotr((x),7) ^ rotr((x),18) ^ ((x) >> 3)) #define f2(x) (rotr((x),17) ^ rotr((x),19) ^ ((x) >> 10)) #define f(a,b,c,d) (f2((a)) + (b) + f1((c)) + (d))
242
Hongjun Wu
#define feedback_1(u,v,b,c) { \ uint32 tem0,tem1,tem2; \ tem0 = rotr((v),23); tem1 = rotr((c),10); \ tem2 = ((v) ^ (c)) & 0x3ff; \ (u) += (b)+(tem0^tem1)+Q[tem2]; \ } #define feedback_2(u,v,b,c) { \ uint32 tem0,tem1,tem2; \ tem0 = rotr((v),23); tem1 = rotr((c),10); \ tem2 = ((v) ^ (c)) & 0x3ff; \ (u) += (b)+(tem0^tem1)+P[tem2]; \ } void initialization(uint32 key[], uint32 iv[]) { uint32 i,j; //expand the key and iv into P and Q for (i = 0; i < 8; i++) P[i] = key[i]; for (i = 8; i < 16; i++) P[i] = iv[i-8]; for (i = P[i] for (i = P[i] for (i = P[i]
16; i < 528; i++) = f(P[i-2],P[i-7],P[i-15],P[i-16])+i; 0; i < 16; i++) = P[i+512]; 16; i < 1024; i++) = f(P[i-2],P[i-7],P[i-15],P[i-16])+512+i;
for (i = Q[i] for (i = Q[i] for (i = Q[i] for (i = Q[i]
0; i < 16; i++) = P[1024-16+i]; 16; i < 32; i++) = f(Q[i-2],Q[i-7],Q[i-15],Q[i-16])+1520+i; 0; i < 16; i++) = Q[i+16]; 16; i < 1024;i++) = f(Q[i-2],Q[i-7],Q[i-15],Q[i-16])+1536+i;
//run the cipher 4096 steps without generating output for (i = 0; i < 2; i++) { for (j = 0; j < 10; j++) feedback_1(P[j],P[j+1],P[(j-10)&0x3ff],P[(j-3)&0x3ff]); for (j = 10; j < 1023; j++) feedback_1(P[j],P[j+1],P[j-10],P[j-3]);
A New Stream Cipher HC-256
243
feedback_1(P[1023],P[0],P[1013],P[1020]); for (j = 0; j < 10; j++) feedback_2(Q[j],Q[j+1],Q[(j-10)&0x3ff],Q[(j-3)&0x3ff]); for (j = 10; j < 1023; j++) feedback_2(Q[j],Q[j+1],Q[j-10],Q[j-3]); feedback_2(Q[1023],Q[0],Q[1013],Q[1020]); } //initialize counter2048, and tables X and Y counter2048 = 0; for (i = 0; i < 16; i++) X[i] = P[1008+i]; for (i = 0; i < 16; i++) Y[i] = Q[1008+i]; }
C
Test HC-256 (“test.c”)
//This program prints the first 512-bit keystream //then measure the average encryption speed #include "hc256.h" #include <stdio.h> #include int main() { uint32 key[8],iv[8]; static uint32 DATA[16]; // the DATA is encrypted clock_t start, finish; double duration, speed; uint32 i; //initializes the key and IV for (i = 0; i < 8; i++) key[i]=0; for (i = 0; i < 8; i++) iv[i]=0; //key and iv setup initialization(key,iv); //generate and print the first 512-bit keystream for (i = 0; i < 16; i++) DATA[i]=0; encrypt(DATA); for (i = 0; i < 16; i++) printf(" %8x ", DATA[i]); //measure the encryption speed by encrypting
244
Hongjun Wu
//DATA repeatedly for 0x4000000 times start = clock(); for (i = 0; i < 0x4000000; i++) encrypt(DATA); finish = clock(); duration = ((double)(finish - start))/ CLOCKS_PER_SEC; speed = ((double)i)*32*16/duration; printf("\n The encryption takes %4.4f seconds.\n\ The encryption speed is %13.2f bit/second \n",\ duration,speed); return (0); }
A New Weakness in the RC4 Keystream Generator and an Approach to Improve the Security of the Cipher Souradyuti Paul and Bart Preneel Katholieke Universiteit Leuven, Dept. ESAT/COSIC Kasteelpark Arenberg 10, B–3001 Leuven-Heverlee, Belgium {souradyuti.paul,bart.preneel}@esat.kuleuven.ac.be
Abstract. The paper presents a new statistical bias in the distribution of the first two output bytes of the RC4 keystream generator. The number of outputs required to reliably distinguish RC4 outputs from random strings using this bias is only 225 bytes. Most importantly, the bias does not disappear even if the initial 256 bytes are dropped. This paper also proposes a new pseudorandom bit generator, named RC4A, which is based on RC4’s exchange shuffle model. It is shown that the new cipher offers increased resistance against most attacks that apply to RC4. RC4A uses fewer operations per output byte and offers the prospect of implementations that can exploit its inherent parallelism to improve its performance further.
1
Introduction
RC4 is the most widely used software based stream cipher. The cipher has been integrated into TLS/SSL and WEP implementations. The cipher was designed by Ron Rivest in 1987 and kept as a trade secret until it was leaked out in 1994. RC4 is extremely fast and its design is simple. The RC4 stream cipher is based on a secret internal state of N = 256 bytes and two index-pointers of size n = 8 bits. In this paper we present a bias in the distribution of the first two output bytes. We observe that the first two output words are equal with probability that is significantly less than expected. Based on this bias we construct a distinguisher with non-negligible advantage that distinguishes RC4 outputs from random strings with only 224 pairs of output bytes when N = 256. More significantly, the bias remains detectable even after discarding the initial N output bytes. This fact helps us to create another practical distinguisher with only 232 pairs of output bytes that works 256 rounds away from the beginning when N = 256. A second contribution of the paper is a modified RC4 keystream generator, within the scope of the existing model of an exchange shuffle, in order to achieve
This work was partially supported by the Concerted Research Action GOAMEFISTO-666 of the Flemish government.
B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 245–259, 2004. c International Association for Cryptologic Research 2004
246
Souradyuti Paul and Bart Preneel KSA (K, S)
PRGA(S)
for i = 0 to N − 1 S[i] = i j=0
i=0 j=0 Output Generation loop i = (i + 1) mod N j = (j + S[i]) mod N Swap(S[i], S[j]) Output=S[(S[i] + S[j]) mod N ]
for i = 0 to N − 1 j = (j +S[i]+K[i mod l]) mod N Swap(S[i], S[j])
Fig. 1. The Key Scheduling Algorithm (KSA) and the Pseudo-Random Generation Algorithm (PRGA)
better security. The new cipher is given the name RC4A. We compare its security to the original RC4. Most of the known attacks on RC4 are less effective on RC4A. The new cipher needs fewer instructions per byte and it is possible to exploit the inherent parallelism inherent to improve its performance. 1.1
Description of RC4
RC4 runs in two phases (description in Fig. 1). The first part is the key scheduling algorithm KSA which takes an array S or S-box to derive a permutation of {0, 1, 2, . . . , N − 1} using a variable size key K. The second part is the output generation part PRGA which produces pseudo-random bytes using the permutation derived from KSA. Each iteration or loop or ‘round’ produces one output value. Plaintext bytes are bit-wise XORed with the output bytes to produce ciphertext. In most of the applications RC4 is used with word length n = 8 bits and N = 256. The symbol l denotes the byte-length of the secret key. 1.2
Previous Attacks on RC4
RC4 came under intensive scrutiny after it was made public in 1994. Finney [1] showed a class of internal states that RC4 never enters. The class contains all the states for which j = i + 1 and S[j] = 1. A fraction of N −2 of all possible states fall under Finney’s forbidden states. It is simple to show that these states are connected by a cycle of length N (N − 1). We know that RC4 states are also connected in a cycle (because the next state function of RC4 is a bijective mapping on a finite set) and the initial state, where i = 0 and j = 0, is not one of Finney’s forbidden states. Jenkins [6] detected a probabilistic correlation between the secret information (S, j) and the public information (i, output). Goli´c [4] showed a positive correlation between the second binary derivative of the least significant bit output sequence and 1. Using this correlation RC4 outputs can be distinguished from a perfectly random stream of bits by observing only 244.7 output bytes. Fluhrer and McGrew [3] observed stronger correlations between consecutive bytes. Their
A New Weakness in the RC4 Keystream Generator
247
distinguisher works using 230.6 output bytes. Properties of the state transition graph of RC4 were analyzed by Mister and Tavares [11]. Grosul and Wallach demonstrated a related key attack that works better on very long keys [5]. Andrew Roos also discovered classes of weak keys [15]. Knudsen et al. have attacked versions of RC4 with n < 8 by their backtracking algorithm in which the adversary guesses the internal state and checks if an anomaly occurs in later stage [7]. In the case of contradiction the algorithm backtracks through the internal states and re-guesses. So far this remains the only efficient algorithm which attempts to discover the secret internal state of this cipher. The most serious weakness in RC4 was observed by Mantin and Shamir [9] who noted that the probability of a zero output byte at the second round is twice as large as expected. In broadcast applications a practical ciphertext only attack can exploit this weakness. Fluhrer et al. [2] have recently shown that if some portion of the secret key is known then RC4 can be broken completely. This is of practical importance because in the Wired Equivalence Privacy Protocol (WEP in short) a fixed secret key is concatenated with IV modifiers to encrypt different messages. In [16] it is shown that the attack is feasible. Pudovkina [14] has attempted to detect a bias, only analytically, in the distribution of the first, second output values of RC4 and digraphs under certain uniformity assumptions. Mironov modeled RC4 as a Markov chain and recommended to dump the initial 12 · N bytes of the output stream (at least 3 · N ) in order to obtain uniform distribution of the initial permutation of elements [10]. More recently Paul and Preneel [12] have formally proved that only a known elements of the S-box along with two index-pointers cannot predict more than a output bytes in the next N rounds. They have also designed an efficient algorithm to deduce certain special RC4-states known as Non-fortuitous Predictive States. 1.3
Organization
The remainder of this paper is organized as follows. Section 2 introduces the new statistical bias in RC4. Section 3 reflects on the design principles of RC4. Our new variant RC4A is introduced in Sect. 4 and its security is analyzed in Sect. 5. Section 6 presents our concluding remarks.
2
The New Weakness
Our major observation is that the distribution of the first two output bytes of RC4 is not uniform. We noted that the probability that the first two output bytes are equal is N1 (1 − N1 ). This fact is, in a sense, counter-intuitive from the results obtained by Fluhrer and McGrew [3] who showed that the first two outputs take the value (0, 0) with probability significantly larger than 1/N 2 . Pudovkina [14] analytically obtained, under certain assumptions, that the first two output bytes
248
Souradyuti Paul and Bart Preneel 0
1
2
2
X
3
4 Z
ij
(a) X
2
i
j
X
Z
Z
-
S1 [X + 2]
-
S2 [Z + 2]
(b)
i
2 j (c)
Fig. 2. (a) Just before the beginning of the output generation. (b) First output S1 [X + 2] is produced. (c) Second output S2 [Z + 2] is produced. S1 [X + 2] = S2 [Z + 2]
are equal with probability larger than 1/N . However, experiments revealed that this result is incorrect. Our main objective is to find a reasonable explanation for this particular bias. Throughout the paper Sr [l] and Or denote the lth element of the S-box after the swapping at round r and the output byte generated at that round respectively. Similarly, ir and jr should be understood. All arithmetic operations are computed modulo N . 2.1
Motivational Observation
Theorem 1. If S0 [1] = 2 then the first two output bytes of RC4 are always different. Proof. Fig. 2 shows the execution of the first two rounds of the output generation. We note that, O1 = S1 [X + 2] and O2 = S2 [Z + 2]. Clearly X + 2 and Z + 2 point to two different cells of the array. Therefore, the first two outputs are the same if (X = 0 and Z = 2) or if (X = 2 and Z = 0). But this is impossible because X = Z = 2 as they are permutation elements. 2.2
Quantifying the Bias in the First Two Outputs
Theorem 1 immediately implies a bias in the first two outputs of the cipher that are captured in the following corollaries. Corollary 1. If the first two output bytes are equal then S0 [1] = 2. Proof. This is an obvious and important deduction from Theorem 1. This fact can be used to speed up exhaustive search by a factor of NN−1 .
A New Weakness in the RC4 Keystream Generator
249
Corollary 2. The probability that the first two output bytes are equal is (1 − 1/N )/N (assuming that S0 [1] = 2 occurs with probability 1/N and that for the rest of the permutations for which S0 [1] = 2 the first two output bytes are equal with probability 1/N ).1 Proof. If S0 [1] = 2 occurs with probability 1/N then the first two output bytes are different for a fraction of 1/N of the total keys (see Theorem 1). The output bytes are equal with probability 1/N for each of the other keys, i.e., a fraction of (1 − 1/N ) of the total keys. Combining these two, P [O1 = O2 ] = P [O1 = O2 |S0 [1] = 2] · P [S0 [1] = 2] + P [O1 = O2 |S0 [1] = 2] · [S0 [1] = 2] 1 1 1 =0· + · (1 − ) N N N 1 1 = · (1 − ) . N N 2.3
Distinguisher Based on the Weakness
A distinguisher is an efficient algorithm which distinguishes a stream of bits from a perfectly random stream of bits, that is, a stream of bits that has been chosen according to the uniform distribution. There are two ways an adversary can attempt to distinguish between strings, one generated by a pseudorandom generator and the other by a perfectly random source. In the first case the adversary selects only one key randomly and produces keystream, seeded by the chosen key, long enough to detect a bias. In this scenario the adversary is “weak” as she has a keystream produced by a single key and therefore the distinguisher is called a weak distinguisher. In the other case the adversary may use any number of randomly chosen keys and the respective keystreams generated by those keys. In this case the adversary is “strong” because she may collect outputs to her advantage from many keystreams to detect a bias. Therefore, the distinguisher so constructed is termed a strong distinguisher. A bias present in the output at time t in a single stream may hardly be detected by a weak distinguisher but a strong distinguisher can easily discover the anomaly with fewer bytes. This fact was wonderfully used by Mantin and Shamir [9] to identify a strong bias toward zero in the second output byte of RC4. We also construct a strong distinguisher for RC4 based on the non-uniformity of the first two output bytes. We use the following result (see [9] for a proof). Theorem 2. If an event e occurs in a distribution X with probability p and in Y with probability p(1 + q) then, for small p and q, O( pq12 ) samples are required to distinguish X from Y with non-negligible probability of success. In our case, X, Y and e are the distribution of the first two output bytes collected from a perfectly random source, the distribution of these variables from RC4 and 1
Note that this condition is more relaxed than assuming that the initial permutation is distributed uniformly.
250
Souradyuti Paul and Bart Preneel
the occurrence of equal successive bytes respectively. Therefore, the number of required samples equals O(N 3 ) (by Corollary 2, p = 1/N and q = −1/N ). Experimental observations agree well with the theoretical results. For N = 256, with 224 pairs of the first two output bytes, generated from as many randomly chosen keys, our distinguisher separates RC4 outputs from random strings with an advantage of 40% when the threshold is set at 65408. Empirical results show that the expected number of pairs, for which the bytes are equal, trails the expected number from a random source by 1.21 standard deviations of the binomial distribution with parameters (224 , 1/256). 2.4
Bias after Dropping the First N Output Bytes
A similar but a smaller bias is also expected in the output bytes Ot+1 and Ot+2 , where t = 0 mod N and t > 0, if we assume that P [St [1] = 2 ∩ jt = 0] = 1/N 2 and the expected probability that Ot+1 = Ot+2 for rest of the internal states is 1/N . Almost in a similar manner, we can compute P [Ot+1 = Ot+2 ], where t = 0 mod N and t > 0. P [Ot+1 = Ot+2 ] = P [Ot+1 = Ot+2 |St [1] = 2 ∩ jt = 0] · P [St [1] = 2 ∩ jt = 0] + P [Ot+1 = Ot+2 |St [1] = 2 ∪ jt = 0] · P [St [1] = 2 ∪ jt = 0] 1 1 1 · (1 − 2 ) =0· 2 + N N N 1 1 · (1 − 2 ) . = N N Therefore, the required number of samples needed to establish the distinguisher is O(N 5 ) according to Theorem 2 (note that in this case p = 1/N and q = −1/N 2). However, our experiments for N = 256 show that the bias can be detected much sooner: 232 pairs of the output bytes (each pair is chosen from rounds t = 257 and t = 258) are sufficient for a distinguisher, where the theory predicts that this should be 240 . In our experiments, the expected number of pairs, for which the bytes are equal, trails the expected number from a random source by 1.13 standard deviations of the binomial distribution with parameters (232 , 1/256). A large number of experiments showed that the frequency of simultaneous occurrence of j = 0 and S[1] = 2 at the 256th round is much higher than expected. This phenomenon accounts for the optimistic behavior of our distinguisher. However, it is still unknown how to quantify the bias in P [j256 = 0 ∩ S256 [1] = 2] theoretically. It is worth noting at this point that Mironov, based on an idealized model of RC4, has suggested to drop the initial 3 · N bytes (more conservatively the initial 12 · N bytes) in order to obtain a uniform distribution of the initial permutation thereby ruling out any possibility of a strong attack [10].
A New Weakness in the RC4 Keystream Generator
2.5
251
Possibility of a Weak Distinguisher
The basic fact examined in the Theorem 1 can be used to characterize a general internal state which would produce unequal consecutive bytes. One can see that for an RC4 internal state, if it = jt and St [it + 1] = 2 then Ot+1 = Ot+2 . If we assume uniformity of RC4 internal states when the value of i is fixed, then the above observation allows for a weak distinguisher for RC4 (i.e., distinguishing RC4 using a single stream generated by a randomly chosen key). However, extensive experiments show that, in case of a single stream, the bias in the output bytes due to these special states is much weaker. With a sample size of 232 pairs of bytes, the expected number of pairs for which the output bytes are equal trails the expected number from a random source by 0.21 standard deviations of the binomial distribution with parameters (232 , 1/256) (compared to 1.13 standard deviations for the strong distinguisher with the same sample size). The experimentally obtained standard deviation of the distribution of the number of pairs with equal members in a sample of 232 pairs equals 0.9894 · σ where σ is the standard deviation of the binomial distribution with parameters (232 , 1/256). The closeness of these two distributions shows that such a weak distinguisher is less effective than the strong distinguisher with respect to the required number of outputs. Further experimental work is required to determine the effectiveness of distinguishers which require outputs for fewer keys (say 220 rather than 232 ) but longer output streams than just a pair of consecutive bytes for each key. It is still unclear how this correlation between the internal and external states can be used to mount a full attack on RC4. However, the observation reveals a weakness in the working principle of the cipher, even if N output bytes are dropped.
3
Analyzing RC4 Design Principles
The pseudorandom bit generation in RC4 is divided into two stages (see Sect. 1.1). The key scheduling algorithm KSA intends to turn an identity permutation S into a pseudorandom permutation of elements and the pseudorandom byte generation algorithm PRGA issues one output byte from a pseudorandom location of S in every round. At every round the secret internal state S is changed by the swapping of elements, one in a known location and another pointed to by a ‘random’ index. The whole idea is inspired by the principle of an exchange shuffle to obtain a ‘random’ distribution of a deck of cards [8]. Therefore, the security of RC4 in this model of exchange shuffle depends mainly on the following three factors. – Uniform distribution of the initial permutation of elements (that is, S). – Uniform distribution of the value of the index-pointer of the element to be swapped with the element contained in a known index (that is, the indexpointer j). – Uniform distribution of the value of the index-pointer from which the output is issued in a round (i.e., S[i] + S[j]).
252
Souradyuti Paul and Bart Preneel
Note that the above three conditions are necessary conditions of the security of the cipher but by no means they can be sufficient. This fact is wonderfully demonstrated by Goli´c in [4] using the Linear Sequential Circuit Approximation (LSCA for short) methods capturing the basic flaw in the model that the arrangement of the S-box elements in two successive rounds can be approximated to be identical because of ‘negligible’ changes of S in two successive rounds. In this paper we take the key scheduling algorithm of RC4 for granted and assume that the distribution of the initial permutation of elements is uniform; we focus solely on the pseudorandom output generation process. As a consequence, the adversary concentrates on deriving the secret internal state S (not the secret key K) from the known outputs exploiting correlations between the internal state and the output bytes. Most of the known attacks on RC4 to derive part of the secret internal state are based on fixing a few elements of the S-box along with the two index pointers i and j that give information about the outputs at certain rounds with probability 1 or very close to it. This at once results in distinguishing attacks on the cipher and helps to derive the secret internal state with probability that is significantly larger than expected. Paul and Preneel have proved that under reasonable assumptions the maximum probability with which a part of the internal state (i.e., certain S-box elements and the value of j) can be predicted by observing known outputs is 1/N which is too high [12]. Note that these correlations between the internal state and the external state immediately violate the ‘randomness’ criteria of an ideal cipher. The only algorithm attempting to derive the entire internal state of RC4 from the sequence of outputs is by Knudsen et al. which is based on a “guess on demand” strategy [7]. The expected complexity of the algorithm is much smaller than a trivial exhaustive search because it implicitly uses the weakness of RC4 that a part of internal state can be guessed with non-trivial probability by observing certain outputs. In the following discussion we modify RC4 in an attempt to achieve tighter security than the original cipher within the scope of the existing model of exchange shuffle without degrading its speed.
4 4.1
RC4A: An Attempt to Improve RC4 RC4A Design Principles
As most of the existing known plaintext attacks on RC4 harness the stronger correlations between the internal and external states (in generic term b-predictive astate attack [9, 12]), in principle, making the output generation dependent on more random variables weakens the correlation between them, i.e., the probability to guess the internal state by observing output sequence can be reduced. The larger the number of the variables the weaker will be the correlation between them. On the other hand, intuitively, the large number of variables increases the time complexity as it involves more arithmetic operations.
A New Weakness in the RC4 Keystream Generator
253
1. Set i = 0, j1 = j2 = 0 2. i++ 3. j1 = j1 + S1 [i] 4. Swap(S1 [i], S1 [j1 ]) 5. I2 = S1 [i] + S1 [j1 ] 6. Output=S2 [I2 ] 7. j2 = j2 + S2 [i] 8. Swap(S2 [i], S2 [j2 ]) 9. I1 = S2 [i] + S2 [j2 ] 10. Output=S1 [I1 ] 11. Repeat from step 2. Fig. 3. Pseudo-random Generation Algorithm of RC4A
4.2
RC4A Description
We take one randomly chosen key k1 . Another key k2 is also generated from a pseudorandom bit generator (e.g. RC4) using k1 as the seed. Applying the Key Scheduling Algorithm, as described in Fig. 1, we construct two S-boxes S1 and S2 using the keys k1 and k2 respectively. As mentioned before we assume that S1 and S2 are two random permutations of {0, 1, 2, . . . , N − 1}. In this new scheme the Key Scheduling Algorithm is assumed to produce a uniform distribution of permutation of {0, 1, 2, . . . , N − 1}. Therefore, our effort focuses on the security of the Pseudo-Random Generation Algorithm. In Fig. 3, we show the pseudo-code of the pseudorandom byte generation algorithm of RC4A. All the arithmetic operations are computed modulo N . The transition of the internal states of the two S-boxes are based on an exchange shuffle as before. Here we introduce two variables j1 and j2 corresponding to S1 and S2 instead of one. The only modification is that the index-pointer S1 [i] + S1 [j] evaluated on S1 produces output from S2 and vice-versa (see steps 5, 6 and 9, 10 of Fig. 3). The next round starts after each output generation. RC4A Uses Fewer Instructions per Output Byte than RC4. To produce two successive output bytes the i pointer is incremented once in case of RC4A where it is incremented twice to produce as many output words in RC4. Parallelism in RC4A. The performance of RC4A can be further improved by extracting the parallelism latent in the algorithm. The parallel steps of the algorithm can be easily found by drawing a dependency graph of the steps shown in Fig. 3. In the following list the parallel steps of RC4A are shown within brackets.
254
1. 2. 3. 4.
Souradyuti Paul and Bart Preneel
(3, 7). (4, 5, 9). (6, 10). (8,2).
The existence of many parallel steps in RC4A is certainly an important aspect of this new cipher and it offers the possibility of a faster stream cipher if RC4A is implemented efficiently.
5
Security Analysis of RC4A
The RC4A pseudorandom bit generator has passed all the statistical tests listed in [13]. RC4A achieves two major gains over RC4. By making every byte depend on at least two random values (e.g. O1 depends on S1 [1], S1 [j1 ] and S2 [S1 [1] + S1 [j1 ]]) of S1 and S2 the secret internal state of RC4A becomes N !2 × N 3 . So, for N = 256, the number of secret internal states for RC4A is approximately 23392 when the number is only 21700 for RC4. Let the events EA and EB denote the occurrences of an internal state (i.e., a known elements in the S-boxes, i, j1 and j2 ) and the corresponding b outputs when the i value of an internal state is known. We assume uniformity of the internal state and the corresponding external state for any fixed i value of the internal state. Assuming that a is much smaller than N and disregarding the small bias induced in EB due to EA , we apply Bayes’ Rule to get P [EA |EB ] =
P [EA ] N −(a+2) P [EB |EA ] ≈ · 1 = N b−a−2 . P [EB ] N −b
(1)
Paul and Preneel [12] have proved that a ≥ b for RC4 for small values of a. We omit the proof as it is quite rigorous and beyond the scope of this paper. Exactly the same technique can be applied here to prove that a known elements of the Sboxes along with i, j1 and j2 cannot predict more than a elements for small values of a. Therefore, the maximum probability with which any internal state of RC4A can be predicted from a known output sequence equals 1/N 2 compared to 1/N for RC4. In the following sections we describe how RC4A resists the two major attacks on it: one attempts to derive the entire internal state deterministically and another to derive a part of the internal state probabilistically. 5.1
Precluding the Backtracking Algorithm by Knudsen et al.
As mentioned earlier that the “guess on demand” backtracking algorithm by Knudsen et al. is so far the best algorithm to deduce the internal state of RC4 from the known plaintext [7]. Now we briefly discuss the functionality of the variant of the algorithm to be applied for RC4A. The algorithm simulates RC4A by observing only the output bytes in recursive function calls. The values of S[i] and S[j] in one S-box are guessed from the permutation elements to agree with the output and its possible location in the
A New Weakness in the RC4 Keystream Generator
255
other S-box. If they match then the algorithm calls the round function for the next round. If an anomaly occurs then it backtracks through the previous rounds and re-guesses. The number of outputs m, needed to uniquely determine the entire internal state, is bounded below by the inequality, 2nm > (2n !)2 . Therefore, m ≥ 2N (note, N = 2n ). Theorem 3 (RC4 vs RC4A). If the expected computational complexity to derive the secret internal state of RC4A from known 2N initial output bytes with the algorithm by Knudsen et al. is Crc4a and if the corresponding complexity for RC4 using N known initial output bytes is Crc4 then Crc4a is much higher 2 than Crc4 and Crc4a can be approximated to Crc4 under certain assumptions.2 Proof. According to the algorithm by Knudsen et al., the internal state of RC4 is derived using only the first N output bytes, that is, simulating RC4 for the first N rounds. The variant of this algorithm which works on RC4A uses the initial 2N bytes, thereby runs for the first 2N rounds. Let the algorithms A1 and A2 derive the secret internal states for RC4 and RC4A respectively. At every round the S-boxes are assigned either 0, 1, 2, or 3 elements and move to the next round. Let, at the tth round, A2 go to the next round after assigning k elements an expected number of mk,t times. So the number of value assignments in the tth round is
3
k=0
k · mk,t . Note, each of the
3
k=0
mk,t iterations gives rise to an S-
box arrangement in the next round. It is possible that we reach some S-box arrangements from which no further transition to the next rounds is possible because of contradictions. In such case, we assume assignment of zero elements in the S-box till the N th round is reached. Let the number of S-box arrangements at the tth round from which these
3
k=0
Consequently, 3
mk,t arrangements are generated is Lt .
mk,t = Lt+1 .
(2)
k=0
Now we set, 3
k · mk,t = k˜t ·
k=0
3
mk,t = k˜t · Lt+1 .
(3)
k=0
In Eqn. (3), k˜t is the expected number elements which are assigned to the S-boxes in each iteration in that particular round. If each of Lt is assumed to produce an ˜ t+1 number of S-box arrangements in the next round then Eqn. (3) expected L becomes, 3
˜ t+1 · Lt ) . k · mk,t = k˜t · (L
k=0 2
The complexity is measured in terms of the number of value assignments.
(4)
256
Souradyuti Paul and Bart Preneel
Denoting the total number of value assignments in the tth round by C(t), it is easy to note from Eqn. (4), ˜ t+1 · Lt ) . C(t) = k˜t · (L
(5)
Proceeding this way it can be shown that, C(t + s) = k˜t+s · Lt ·
t+s
˜ i+1 . L
(6)
i=t
If t = 1 then Lt = 1. Setting t + s = n in Eqn. (6), we get, C(n) = k˜n ·
n
˜ i+1 . L
(7)
i=1
From Eqn. (2) and Eqn. (3), C(n) can be evaluated ∀n ∈ {1, 2, . . . 2N } when mk,t is known ∀(k, t) ∈ {0, 1, 2, 3}{1, 2, . . . 2N }. It is important to note that on a random output sequence k˜2f −1 ≈ k˜2f ˜ 2f ≈ L ˜ 2f +1 ∀f ∈ {1, 2, . . . N }. The reason behind the approximation is and L that, with the algorithm by Knudsen et al., the difference between the expected number of assignments in the S-boxes in the (2f − 1)th and the 2f th rounds is very small. Therefore, the overall complexity Crc4a becomes, Crc4a =
2N
C(n)
n=1
=
2N
(k˜n ·
n=1
n
˜ i+1 ) L
i=1
˜2 + = k˜1 · L
N
(k˜2i ·
i=1
˜2 + = k˜1 · L
i
˜2 ) + L 2j+1
j=1
N
(k˜2i−1 ·
i=1
N i−1 ˜ ˜2 ) ˜ L (k2i−1 · L2i · 2j i=2
i
˜ 22j ) + L
j=1
j=1
N
˜ 2i · (k˜2i−1 · L
i=2
i−1
˜ 22j ) . L
j=1
˜ q by x q+1 and g q we get, Replacing k˜q and L 2 2
Crc4a =
N i=1
(xi ·
i
gj2 ) + x1 · g1 +
j=1
N
(xi · gi ·
i=2
i−1
gj2 ) .
(8)
j=1
Applying a similar technique as above it is easy to see that, Crc4 =
N i=1
(xi ·
i j=1
gj ) .
(9)
A New Weakness in the RC4 Keystream Generator
257
Again we note that the difference between the expected number of elements that are already assigned in S1 for RC4A at round (2t − 1) and the expected number of elements in S for RC4 at round t is negligible. Therefore, the corresponding ˜ t+1 for RC4 can be approximated to k˜2t−1 and L ˜ 2t for RC4A. k˜t and L As the gi ’s are real numbers greater than 1 and the xi ’s are non-negative real numbers, from Eqn. (8) and Eqn. (9) it is easy to see that Crc4a Crc4 . We observe from the algorithm that xi ∈ {y : 0 ≤ y ≤ 3, y ∈ R}. It is clear from the algorithm that xi decreases as i increases. xi is less Intuitively, Nthan one 2 g and C ≈ in the last rounds. Therefore, assuming Crc4a ≈ N rc4 i=1 i i=1 gi , we 2 get Crc4a ≈ Crc4 . By Theorem 3, the expected complexity to deduce the secret internal state of RC4A (N = 256) with the algorithm by Knudsen et al. is 21558 when the corresponding complexity is 2779 for RC4. 5.2
Resisting the Fortuitous States Attack
Fluhrer and McGrew discovered certain RC4 states in which only m known consecutive S-box elements participate in producing the next m successive outputs. Those states are defined to be Fortuitous States (see [3, 12] for a detailed analysis). Fortuitous States increase the probability to guess a part of internal state in a known plaintext attack (see Eqn. (1)). The larger the probability of the occurrence of a fortuitous state, the smaller will be the number of required rounds to obtain one of them. RC4A also weakens the fortuitous state attack by a large degree. A moment’s reflection shows that RC4A does not have any fortuitous state of length 1. Now we will compare the probability of the occurrence of a fortuitous state of length 2a in RC4A to that of length a in RC4. It is easy to note that a fortuitous state of length 2a of RC4A implies and is implied by two fortuitous states of length a of RC4 appearing simultaneously in S1 and S2 . If C denotes the number of fortuitous states of length a of RC4 then the expected number of fortuitous states of length 2a in RC4A is C 2 /N . Let Pa denote the probability of the occurrence of a fortuitous state of length a in RC4 and P2a denote the probability of the occurrence of a fortuitous state of length 2a in RC4A. Then, for small values C2 of a, Pa = N C a+2 and P2a = N 2a+4 which immediately implies P2a < Pa . 5.3
Resisting other Attacks
One can see that the strong positive bias of the second output byte of RC4 toward zero [9], and the bias described in the first part of this paper are also diminished in this new cipher as more random variables are required to be fixed for the biased state to occur. 5.4
Open Problems and Directions for Future Work
Although RC4A has an improved security over the original cipher against most of the known plaintext attacks, it is still as vulnerable as RC4 against the attack
258
Souradyuti Paul and Bart Preneel
by Goli´c which uses the positive correlation between the second binary derivative of the least significant bit output sequence and 1. The weakness originates from the slow change of the S-box in successive rounds that seems to be inherent in any model based on exchange shuffle. Therefore, this still remains an open problem whether it is possible to remove this weakness from the output words of the stream cipher based on an exchange shuffle while retaining all of its speed and security. Our work leaves room for more research. It is worthwhile to note that one output byte generation in this existing model of exchange shuffle involves two random pointers; j and S[i] + S[j]. In RC4 both the pointers fetch values from a single S-box. We obtained better results by making S[i] + S[j] fetch value from a different S-box. What if we obtain S[j] from another S-box and generate output using three S-boxes?
6
Conclusions
In this paper we have described a new statistical weakness in the first two output bytes of the RC4 keystream generator. The weakness does not disappear even after dropping the initial N bytes. Based on this observation, we recommend to drop at least the initial 2N bytes of RC4 in all future applications of it. In the second part of the paper we attempted to improve the security of RC4 by introducing more random variables in the output generation process thereby reducing the correlation between the internal and the external states. As a final comment we would like to mention that the security of RC4A could be further improved. For example, one could introduce key-dependent values of i and j at the beginning of the first round, and one could address the weaknesses of the Key Scheduling Algorithm. In this paper, we have assumed that the original Key Scheduling Algorithm produces a uniform distribution of the initial permutation of elements, which is certainly not correct.
Acknowledgements We are grateful to Adi Shamir for helpful comments. The authors also thank Kenneth G. Paterson for pointing out the possibility of a weak distinguisher using our observation. We would like to thank Joseph Lano for providing us with experimental results. Thanks are due to Alex Biryukov, Christophe De Canni`ere, Jorge Nakahara Jr. and Dai Watanabe for many useful discussions. The authors also acknowledge the constructive comments of the anonymous reviewers.
References [1] H. Finney, “An RC4 cycle that can’t happen,” Post in sci.crypt, September 1994. 246
A New Weakness in the RC4 Keystream Generator
259
[2] S. Fluhrer, I. Mantin, A. Shamir, “Weaknesses in the Key Scheduling Algorithm of RC4,” SAC 2001 (S. Vaudenay, A. Youssef, eds.), vol. 2259 of LNCS, pp. 1-24, Springer-Verlag, 2001. 247 [3] S. Fluhrer, D. McGrew, “Statistical Analysis of the Alleged RC4 Keystream Generator,” Fast Software Encryption 2000 (B. Schneier, ed.), vol. 1978 of LNCS, pp. 19-30, Springer-Verlag, 2000. 246, 247, 257 [4] J. Goli´c, “Linear Statistical Weakness of Alleged RC4 Keystream Generator,” Eurocrypt ’97 (W. Fumy, ed.), vol. 1233 of LNCS, pp. 226-238, Springer-Verlag, 1997. 246, 252 [5] A. Grosul, D. Wallach, “A related key cryptanalysis of RC4,” Department of Computer Science, Rice University, Technical Report TR-00-358, June 2000. 247 [6] R. Jenkins, “Isaac and RC4,” Published on the Internet at http://burtleburtle.net/bob/rand/isaac.html. 246 [7] L. Knudsen, W. Meier, B. Preneel, V. Rijmen, S. Verdoolaege, “Analysis Methods for (Alleged) RC4,” Asiacrypt ’98 (K. Ohta, D. Pei, eds.), vol. 1514 of LNCS, pp. 327-341, Springer-Verlag, 1998. 247, 252, 254 [8] D. E. Knuth, “The Art of Computer Programming,” vol. 2, Seminumerical Algorithms, Addison-Wesley Publishing Company, 1981. 251 [9] I. Mantin, A. Shamir, “A Practical Attack on Broadcast RC4,” Fast Software Encryption 2001 (M. Matsui, ed.), vol. 2355 of LNCS, pp. 152-164, SpringerVerlag, 2001. 247, 249, 252, 257 [10] I. Mironov, “Not (So) Random Shuffle of RC4,” Crypto 2002 (M. Yung, ed.), vol. 2442 of LNCS, pp. 304-319, Springer-Verlag, 2002. 247, 250 [11] S. Mister, S. Tavares, “Cryptanalysis of RC4-like Ciphers,” SAC ’98 (S. Tavares, H. Meijer, eds.), vol. 1556 of LNCS, pp. 131-143, Springer-Verlag, 1999. 247 [12] S. Paul, B. Preneel, “Analysis of Non-fortuitous Predictive States of the RC4 Keystream Generator,” Indocrypt 2003 (T. Johansson, S. Maitra, eds.), vol. 2904 of LNCS, pp. 52-67, Springer-Verlag, 2003. 247, 252, 254, 257 [13] B. Preneel et al., “NESSIE Security Report,” Version 2.0, IST-1999-12324, February 19, 2003, http://www.cryptonessie.org. 254 [14] M. Pudovkina, “Statistical Weaknesses in the Alleged RC4 keystream generator,” Cryptology ePrint Archive 2002–171, IACR, 2002. 247 [15] A. Roos, “Class of weak keys in the RC4 stream cipher,” Post in sci.crypt, September 1995. 247 [16] A. Stubblefield, J. Ioannidis, A. Rubin, “Using the Fluhrer, Mantin and Shamir attack to break WEP,” Proceedings of the 2002 Network and Distributed Systems Security Symposium, pp. 17–22, 2002. 247
Improving Immunity of Feistel Ciphers against Differential Cryptanalysis by Using Multiple MDS Matrices Taizo Shirai and Kyoji Shibutani Ubiquitous Technology Laboratories, Sony Corporation 7-35 Kitashinagawa 6-chome, Shinagawa-ku, Tokyo, 141-0001 Japan {taizo.shirai,kyoji.shibutani}@jp.sony.com
Abstract. A practical measure to estimate the immunity of block ciphers against differential and linear attacks consists of finding the minimum number of active S-Boxes, or a lower bound for this minimum number. The evaluation result of lower bounds of differentially active S-boxes of AES, Camellia (without F L/F L−1 ) and Feistel ciphers with an MDS based matrix of branch number 9, showed that the percentage of active S-boxes in Feistel ciphers is lower than in AES. The cause is a difference cancellation property which can occur at the XOR operation in the Feistel structure. In this paper we propose a new design strategy to avoid such difference cancellation by employing multiple MDS based matrices in the diffusion layer of the F-function. The effectiveness of the proposed method is confirmed by an experimental result showing that the percentage of active S-boxes of the newly designed Feistel cipher becomes the same as for the AES. Keywords: MDS, Feistel cipher, active S-boxes, multiple MDS design
1
Introduction
Throughout recent cryptographic primitive selection projects, such as AES, NESSIE and CRYPTREC projects, many types of symmetric key block ciphers have been selected for widely practical uses [16, 17, 18]. A highly regarded design strategy in a lot of well-known symmetric-key block ciphers consists in employing small non-linear functions (S-box), and designing a linear diffusion layer to achieve a high value of the minimum number of active S-boxes [2, 4, 17, 18]. If the diffusion layer guarantees a sufficient minimum number of differentially active S-boxes, and the S-boxes have low maximum differential probability, the resistance against differential attacks will be strong enough. Let a be the lower bound on the minimum number of active S-boxes, and DPmax be the maximum differential probability (MDP) of S-boxes. It is guaranteed that there is no differential path whose differential characteristic probability (DCP) is higher than
The first author was a guest researcher at Katholieke Universiteit Leuven, Dept. ESAT/SCD-COSIC from 2003 to 2004.
B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 260–278, 2004. c International Association for Cryptologic Research 2004
Improving Immunity of Feistel Ciphers against Differential Cryptanalysis
261
(DPmax )a . For instance, in the case of a 128-bit block cipher using 8-bit bijective S-boxes with DPmax = 2−6 , the necessary condition to rule out any path with DCP > 2−128 is that a should be at least 22. In order to determine the appropriate number of rounds of a fast and secure cipher, it is thus essential to have an accurate estimation of the lower bound a [1, 4]. Regarding this problem, finding an optimal linear diffusion is one of the research topics included in the future research agenda of cryptology proposed by STORK project in EU [19]. Comparing the minimum number of active S-boxes of two well-known ciphers, AES and Camellia without F L/F L−1 (denoted by Camellia*), it is shown that the ratio of the minimum number of active S-boxes to the total number of S-boxes for Camellia* is lower than for AES. Even if the diffusion matrix of Camellia* is replaced by a 8 × 8 MDS based matrix with branch number 9 (which is called a MDS-Feistel cipher), the ratio won’t increase significantly and there is an apparent gap between these Feistel ciphers (Camellia* and MDS-Feistel) and a SPN cipher AES. We found that the low percentage of active S-boxes in a MDS-Feistel structure is due to a difference cancellation which always occurs in the differential path that realizes the minimum number of active S-boxes. In such a case, the output difference of the F-function in the i-th round will be canceled completely by the output difference of i + 2j-th round (j > 0). It is obvious that one of the conditions for difference cancellations is employing an unique diffusion matrix for all F-functions. In this work, we propose a new design strategy to avoid the difference cancellation in Feistel ciphers with SP-type F-function. We call this new strategy multiple MDS Feistel structure design. The basic principle of this design is as follows. Let 2r, m be the round number of the Feistel structure and the number of S-boxes in the F-function, respectively. We then employ q(< r) m × m MDS matrices. Furthermore they are chosen that any m columns in these q MDS matrices also satisfy the MDS property. Then, at first these MDS matrices are allocated in the odd-round F-functions, then they are allocated in the even-round F-functions again keeping the involution property. This construction removes chances of difference cancellation within consecutive 2q + 1 rounds. We will also show an evaluation result that confirms the effectiveness of the new design, which shows that our new design strategy makes the Feistel cipher achieve a high ratio for the minimum number of active S-boxes. The new design has a ratio that is at the same level as AES. Our results open a way to design faster Feistel ciphers keeping its advantage that the same implementation can be used for encryption and decryption except the order of subkeys. This paper is organized as follows: In Sect. 2, we describe some definitions used in this paper. In Sect. 3, we compare the minimum number of active S-boxes of various ciphers. In Sect. 4, we explain how difference cancellation occurs. In Sect. 5, we propose our new design strategy, multiple MDS Feistel structure design. In Sect. 6, we investigate the effect of the multiple MDS Feistel structure design. Finally in Sect. 7, we discuss the new design and future research.
262
2
Taizo Shirai and Kyoji Shibutani
Preliminaries
In this section, we state some definitions and notions that are used in the rest of this paper. Definition 1. active S-box An S-box which has non-zero input difference is called active S-box. Definition 2. χ function For any difference ∆X ∈ GF (2n ), a function χ : GF (2n ) → {0, 1} is defined as follows: 0 if ∆X = 0 χ(∆X) = 1 if ∆X = 0 For any differential vector ∆X = (∆X[1], ∆X[2], . . . , ∆X[m]) ∈ GF (2n )m , the truncated difference δX ∈ {0, 1}m is defined as δX = χ(∆X) = (χ(∆X[1]), χ(∆X[2]), . . . , χ(∆X[m])) Definition 3. (truncated) Hamming weight of vector in GF (2n )m Let v = (v1 , v2 , . . . , vm ) ∈ GF (2n )m . the Hamming weight of a vector v is defined as follows: wh (v) = {vi |vi = 0, 1 ≤ i ≤ m}. Theorem 1. [7] A [k + m, k, d] linear code with generator matrix G = [Ik×k Mk×m ], is MDS iff every square submatrix (formed from any i rows and any i columns, for any i = 1, 2, ..., min{k, m}) of Mk×m is nonsingular. From the above theorem, we call a matrix M is a MDS matrix if every square submatrix is nonsingular. Definition 4. Branch Number Let v = (v1 , v2 , . . . , vm ) ∈ GF (2n )m . The branch number B of a linear mapping θ : GF (2n )m → GF (2n )m is defined as: B(θ) = min{wh (v) + wh (θ(v))}. v=0
If M is a m × m MDS matrix and θ : x → M x, then B(θ) = m + 1. Definition 5. Feistel structure using SP-type F-function A SP-type F-function is defined as the following: Let n be a bit width of bijective S-boxes, and m be a number of S-boxes employed in a F-function. In the i-th round F-function, (1) mn bit round key ki ∈ GF (2n )m and data xi ∈ GF (2n )m are XORed: wi = xi ⊕ki . (2)wi is split into m pieces of n-bit data, then each n-bit data is input to a corresponding S-box. (3) The output values of S-boxes regarded as zi ∈ GF (2n )m are transformed by an m × m matrix M over GF (2n ): yi = M zi . A Feistel structure using SP-type F-function is shown in Fig. 1.
Improving Immunity of Feistel Ciphers against Differential Cryptanalysis
xi mn
Round i
ki
xi-1
263
ki
mn
mn
mn
Sm
F
xi
n
: :
a diffusion matrix
S2
M
: :
yi
S1
xi+1
xi
wi
zi
Fig. 1. The general model of a SP-type F-function
3
Comparison of the Minimum Number of Active S-Boxes
At first in this paper, we compare the lower bound of the minimum number of active S-boxes of 3 typical ciphers: AES, Camellia without F L/F L−1 (we call it Camellia*) and a Feistel cipher using a 8 × 8 MDS matrix with branch number 9 (we call it MDS-Feistel cipher). Note that we assumed that the MDS-Feistel uses eight 8-bit bijective S-boxes in the F-function like Camellia*, therefore the block sizes of these block ciphers are all 128-bit. The lower bound estimation for these ciphers have been obtained as follows. – AES: The wide trail strategy guarantees B 2 = 25 active S-boxes in 4 consecutive rounds. The lower bound is obtained by using Matsui’s truncated path search technique which is slightly modified to analyze AES [2, 8, 9, 12]. Let a(r) be the minimum number of active S-boxes for r rounds, then the conjectured a(r) from the estimation is a(0) = 0, a(1) = 1, a(2) = 5, a(3) = 9, and then a(r) = a(r − 4) + 25 for (r ≥ 4). – Camellia*: The lower bound is obtained from Shirai et al.’s result. They used an improved estimation method based on Matsui’s technique which was also used by the designers’ evaluation of Camellia [2, 8, 9, 14]. The improved method discards algebraic contradiction in difference paths [12, 13]. – MDS-Feistel: The lower bound is obtained by also using Matsui’s truncated path search technique which is slightly modified to analyze the MDS-Feistel’s round function [2, 8, 9, 12]. Shimizu has also shown a similar but limited result for the lower bound by using a method not based on Matsui’s approach, and he has conjectured an equation a(r) = r/4(B +1)+(r mod 4)−1 [11]. We confirmed that our result matches the Shimizu’s conjectured equation. Table 1 shows the lower bound on the number of active S-boxes for r-round ciphers, and the ratio of active S-boxes to all S-boxes in the r-round cipher. Fig. 2 shows a graph of the ratios of active S-boxes to all S-boxes.
264
Taizo Shirai and Kyoji Shibutani
Table 1. the lowerbound of the minimum number of active S-boxes Round AES (ratio) Camellia* (ratio) MDS(B = 9) (ratio) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ∞
1 5 9 25 26 30 34 50 51 55 59 75 76 80 84 100 -
6.3% 15.6% 18.8% 39.1% 32.5% 31.3% 30.1% 39.1% 35.4% 34.4% 33.5% 39.1% 36.5% 36.5% 35.0% 39.1% 39.1%
0 1 2 7 9 12 14 16 20 22 24 -
0.0% 6.3% 8.3% 21.9% 22.5% 25.0% 25.0% 25.0% 27.8% 23.8% 27.5% -
0 1 2 9 10 11 12 19 20 21 22 29 30 31 32 39 -
0.0% 6.3% 8.3% 28.1% 25.0% 22.9% 21.4% 29.7% 27.7% 26.3% 25.0% 30.2% 28.8% 27.7% 26.7% 30.5% 34.4%
45 AES Camellia* MDS-Feistel
Ratio of active S-boxes to all employed S-boxes(%)
40
35
30
25
20
15
10
5
0 0
2
4
6
8 Round Number
10
12
14
16
Fig. 2. The percentage of active S-boxes for AES, Camellia* and MDS-Feistel
Improving Immunity of Feistel Ciphers against Differential Cryptanalysis
265
The fact that the minimum numbers of active S-boxes are smaller for Feistel ciphers (Camellia* and MDS-Feistel) than for AES is not unexpected, because there are only half as many S-boxes in each round (8 in Feistel ciphers, 16 in AES). However, Fig. 2 shows a non-trivial fact that also the percentage of active S-boxes is lower in these Feistel ciphers than in AES. Also, we note that even with a MDS matrix of branch number 9, which is the best possible branch number for an 8 × 8 matrix, the construction doesn’t gain significantly compared to Camellia*, which uses a non-MDS matrix of branch number 5 . The percentage of active S-boxes indicates how many S-boxes are effectively used in all existing S-boxes for the first consecutive rounds, and can be considered as a reference of efficiency of the diffusion property for ciphers which have the same block length and the same S-box bit length. As described in [1], if we choose 8-bit S-boxes with maximum differential probability (MDP) 2−6 , 22 active S-boxes is a necessary condition to rule out the existence of differential characteristics with a probability higher than 2−128 . In the case of AES, 22 active S-boxes are already achieved by only 4 rounds. However, Feistel ciphers require more than 11 rounds to guarantee 22 active S-boxes. Because the minimum number of active S-boxes is often taken into consideration when determining the round number of a block cipher, if more active S-boxes can be guaranteed in SP-type Feistel ciphers we may be able to design fewer rounds (it means fast) ciphers. In the following sections, we analyze a mechanism that explains why the ratio for MDS-Feistel ciphers is low, and we propose a new design strategy that achieves more active S-boxes and thus enables us to construct Feistel structures with fewer rounds.
4
Difference Cancellation
By our analysis of the MDS-Feistel cipher, we found that every path which contains the minimum number of active S-boxes includes a particular phenomenon where differences generated at certain round are canceled after some rounds at an XOR operation. We will call this phenomenon difference cancellation. The left half of Fig. 3 shows an example of the 3-round difference cancellation. Differences are represented in the truncated way in which 8-bit difference data is represented as 0 or 1, depending on whether each difference is 0 or not [6]. The 3round difference cancellation starts from the difference δxi−1 = (00000000) ,and ends with the difference δxi+3 = (00000000) again. This means that a certain difference is generated and then canceled between these two 0-differences. In this case, the full hamming weight difference δyi = (11111111) is canceled by δyi+2 = (11111111) at once. Consequently there’s no active S-boxes in i + 3round. Similarly, 5-round difference cancellation, which is shown in the right half of Fig. 3, have the form δxi = δxi+4 = (00000001), δxi+2 = (00000000), and output differences of both active S-boxes in the i-th and i + 4-th round are equal.
266
Taizo Shirai and Kyoji Shibutani δxi=(00000001)
δxi=(00000001)
δxi-1=(00000000)
δyi=(11111111) F
δxi=(00000001) δxi+1=(11111111)
δyi+1=(00000001) F δxi+2=(00000001)
δyi+2=(11111111) F δyi+3=(00000000) F
δxi=(00000001)
δyi=(11111111) F δyi+1=(00000001) F
δxi+2=(00000000)
δxi+3=(00000000)
δxi+3=(11111111)
δyi+4=(11111111) F δyi+5=(00000000) F
δxi+6=(00000001) 3-round Difference Cancellation
δxi+1=(11111111)
δyi+2=(00000000) F δyi+3=(00000001) F
δxi+3=(00000000) δxi+4=(00000001)
δxi+4=(00000001)
δxi-1=(00000000)
δxi+5=(00000000)
δxi+5=(00000000)
5-round Difference Cancellation
Fig. 3. Difference Cancellation
In both cases, a truncated difference (11111111) generated by one active Sbox is canceled by a truncated differences (11111111) which is also generated by one active S-box. These difference cancellations are derived from 2 active S-boxes, and we call this type of difference cancellation 2-derived difference cancellation. An interesting fact is that at least one of these 3-round or 5-round 2-derived difference cancellations can be found in every differential path of more than 6round that realizes the minimum number of active S-boxes in the MDS-Feistel cipher. Details are shown in Appendix A. 4.1
Observation on Difference Cancellation M
Let X, Y ∈ {0, 1}8 and X → Y denotes that a truncated difference X can produce a truncated difference Y by a matrix M . Let MMDS be a 8 × 8 MDS matrix, then the following property of the MMDS contributes to occur the 2derived difference cancellation. M
M
M DS M DS (00000001) −→ (11111111) and (11111111) −→ (00000001)
These transitions are appeared in the above 3-round and 5-round difference cancellation several times. More precisely, let CM (X, Y ) = {0, 1} be a function which shows the capability of connection between truncated difference X and Y defined as, M CM (X, Y ) = 1 if X → Y 0 else
Improving Immunity of Feistel Ciphers against Differential Cryptanalysis
267
We can observe that a 2-derived difference cancellation can occur if there exists at least one set of truncated differences X, Y where wh (X) = 1 which satisfy CM (X, Y ) · CM (Y, X) = 0. From MDS property, any m × m MDS matrix MMDS holds the condition CMM DS (X, Y ) = 1 for all wh (X) + wh (Y ) ≥ m + 1 and wh (X) = wh (Y ) = 0. Otherwise CMM DS (X, Y ) = 0. It is obvious that at least one set X, Y where wh (X) = 1 satisfy CMM DS (X, Y ) ∗ CMM DS (Y, X) = 0, thus 2-derived difference cancellation can occur in a MDS matrix construction. This observation explains why Camellia*’s lower bounds are not too low even though it employs a non-MDS matrix MCa of branch number 5. For any choice of X, Y where wh (X) = 1, CMCa (X, Y ) · CMCa (Y, X) = 0 always. Thus MCa never produces the 2-derived difference cancellation, and it keeps a moderate number of active S-boxes. However, even though 2-derived difference cancellation is avoided by choosing Camellia type matrix, if certain X, Y where wh (X) = 2 satisfying CM (X, Y ) · CM (Y, X) = 0 exists, then a 4-derived difference cancellation would be a building block for a small number of active S-boxes, and a significant gain of the number of active S-boxes may not be expected. In the next section another approach to avoid m-derived difference cancellation will be introduced by using multiple MDS matrices in a Feistel structure.
5 5.1
Multiple MDS Feistel Structure Design Basic Strategy
Suppose that some intermediate differential data ∆xi−1 = 0, and that the output of F-function in every 2 rounds is added to the data, (ex. ∆yi , ∆yi+2 , ..∆yi+2j ). Consider a situation where the differential data ∆xi+2j+1 become 0 after XORing the output of the F-function in the i + 2j-th round, caused by a difference cancellation as shown in Fig. 4. In the difference cancellation, the following condition exists: j
∆yi+2k = 0 .
(1)
k=0
Therefore, M
j
∆zi+2k = 0 .
(2)
k=0
When a nonsingular matrix M is employed, we obtain that j
∆zi+2k = 0 .
(3)
k=0
The above equation shows that a difference cancellation occurs by only 2 active S-boxes in ∆zi+2k , (0 ≤ k ≤ j) in the minimum case, which is exactly the case of
268
Taizo Shirai and Kyoji Shibutani
∆xi
∆zi S
∆xi+2
∆xi-1=0
∆yi
M
∆zi+2 ∆yi+2 S
:
M
:
∆xi+2j
:
∆zi+2j ∆yi+2j S
M
∆xi+2j+1=0 Fig. 4. Difference Cancellation
2-derived difference cancellations shown in the previous section. Now we consider a setting with multiple matrices in which a different matrix is used in each Ffunction. Let Mi be a diffusion matrix employed in the i-th round. Obviously, the transformation from (1) to (2) is not correct when the matrices Mi are different from each other. In such setting, we can rewrite (1) as Mi ∆zi + Mi+2 ∆zi+2 + . . . + Mi+2j ∆zi+2j = 0 .
(4)
The above condition can be written as the product of a large m × m(j + 1) matrix and a vector with m(j + 1) elements: ⎤ ⎡ ∆zi ⎢ ∆zi+2 ⎥ ⎥ ⎢ (5) [Mi Mi+2 · · · Mi+2j ] ⎢ ⎥ = 0. .. ⎦ ⎣ . ∆zi+2j If these matrices are chosen to satisfy that there is no combination of l column vectors that are dependent of each other (2 ≤ l ≤ m) in the matrix, k-derived difference cancellation (k ≤ l) would never happen in the consecutive 2j rounds. From this observation, we introduce a strategy to choose matrices Mi , .., Mi+2j for which any choice of m column vectors are independent of each other in the large matrix [Mi , .., Mi+2j ]. 5.2
Construction Steps
We propose a new design strategy that employs multiple MDS matrices in the Feistel network, in order to avoid an occurrence of m-derived difference cancellation in any consecutive 2q rounds where q is the number of employed matrices.
Improving Immunity of Feistel Ciphers against Differential Cryptanalysis
269
The construction steps are as follows. Without loss of generality, we assume the round number is 2r. 1. Choose q(≤ r) MDS matrices: M0 , M1 , . . . , Mq−1 . 2. Check that any m of qm column vectors in all Mi matrices hold the MDS property. 3. Assign matrix M(i mod q) to the 2i + 1-th round (0 ≤ i < r). 4. Assign matrix M(i mod q) to the 2r − 2i-th round (0 ≤ i < r) (reverse order). In this construction, since any m columns of the large matrix [Mi Mi+2 · · · Mi+2q−2 ] have been chosen to generate MDS which has m independent column vectors, there is no chance to generate m-derived difference cancellation in any consecutive 2q − 1 rounds. Fig. 5 shows the construction of the example setting r = 6, q = 3, 6 respectively. When m, n are small, we can randomly generate MDS matrices and check MDS conditions of column vectors in Step 2. However, if m, n are large, it might be difficult to search such a set of matrices. In such a case, we can use the algorithm to generate Reed-Solomon code’s generation matrix. The algorithm can generate a large MDS matrix immediately, because its complexity is O(N 3 ) where N is the dimension of the matrix [7]. Once the qm × qm MDS matrix ML is made, any combination of m rows of ML can be regarded as a matrix [M0 , M1 , .., Mq−1 ] with proposed additional MDS property from Theorem 1. We can find a 128 × 128 MDS matrix on GF (28 ) from [256, 128, 129] extended RS codes, thus 16 MDS matrices of dimension 8 satisfying the condition of Step 2. can be found in it. We show an example of a set of such matrices in Appendix B.
1st
1st
7th
2nd
M5 M1
11th
M2
M5
M2
6th
M0
r=6, q=6
M1
5sh
12th
M3
10th
M1
M1
M2
M1
4th
11th
6th
9th
M1
M4
M4
M2
3rd
10th
5sh
8th
M2
M2 9th
4th
M0
2nd
8th
3rd
7th
M0
M3
M0
12th
M0
M0
r=6, q=3
Fig. 5. Examples of the New Design (r = 3, q = 3, 6)
270
6
Taizo Shirai and Kyoji Shibutani
Evaluation of the Proposed Construction
We estimated a lower bound for the number of active S-boxes of the new construction with m = 8, r = 6 (12-round cipher) for the number of matrices q = 1..6. We adopted a weight based approach in the evaluation algorithm, because the known truncated path approach which is employed in the other cipher’s evaluation requires too huge memory space and time consumption. 6.1
Algorithm
The following algorithm outputs a lower bound for the number of active S-boxes of our proposed construction based on the weight based approach . Let the round number be R. 1. Set L = ∞. 2. For each possible combination of the weight 0, 1, .., 8 in δx0 , δx1 , .., δxR+1 (There are 9R+2 candidates): (a) For i = 2 to R + 1 do the following, i. For j = 2 to j ≤ i, j ← j + 2 do the following, A. Check whether the given weight combination of wh (δxi−j ), wh (δxi ) and the list of given weight of active S-boxes in the wh (δxi−j+1 ), wh (δxi−j+3 ),.., wh (δxi−1 ) are possible or not in the weight context of the given MDS property. B. If the check passed then continue the loop, else exit the loop. (b) If all checks passed, count the total number of active S-boxes A in the path. If A < L, set L = A. 3. Output L as the lower bound of the minimum number of active S-boxes for the round R. The description to check the possibility of a weight distribution in Step A is described in Appendix C. We note that this algorithm can be speeded up for R-round evaluation by using the result of R − 1-round evaluation recursively. This technique can be seen in Matsui’s path search method [8]. 6.2
Result
Table 2 shows the result of the lower bound of the minimum number of active S-boxes for four types of 12-round multiple-MDS Feistel ciphers, the cases of m = 8, r = 6, q = 1, 2, 3, 6. The graph of the ratio of active S-boxes to total number of S-boxes is shown in Fig. 6. It can be confirmed that the result of the case q = 1 is the same as the result of the MDS-Feistel cipher shown in Table 1. Moreover, the results shows that the lower bounds for the cases q = 3 and q = 6 are always the same. The numbers are significantly increased in the case of q = 2 compared to the case q = 1. However there is no gain in 8 and 9 rounds. In the case of q ≥ 3,
Improving Immunity of Feistel Ciphers against Differential Cryptanalysis
271
Table 2. Result of Evaluation q = 1 (rate) q = 2 (rate) q = 3 (rate) q = 6 (rate)
Round 1 2 3 4 5 6 7 8 9 10 11 12 avrg.(4-12)
0 1 2 9 10 11 12 19 20 21 22 29
0.0% 6.3% 8.3% 28.1% 25.0% 22.9% 21.4% 29.7% 27.7% 26.3% 25.0% 30.2% 26.3%
0 1 2 9 10 18 18 19 20 27 28 36
0.0% 6.3% 8.3% 28.1% 25.0% 37.5% 32.1% 29.7% 27.8% 33.8% 31.8% 37.5% 31.5%
0 1 2 9 10 18 18 20 27 28 32 36
0.0% 6.3% 8.3% 28.1% 25.0% 37.5% 32.1% 31.3% 37.5% 35.0% 36.4% 37.5% 33.4%
0 1 2 9 10 18 18 20 27 28 32 36
0.0% 6.3% 8.3% 28.1% 25.0% 37.5% 32.1% 31.3% 37.5% 35.0% 36.4% 37.5% 33.4%
45 AES Camellia* q=1 q=2 q = 3,4,5,6
Ratio of active S-boxes to all employed S-boxes(%)
40
35
30
25
20
15
10
5
0 0
2
4
6
8 Round Number
10
12
14
16
Fig. 6. The percentage of active S-boxes for AES, Camellia* and multiple-MDS Feistel
272
Taizo Shirai and Kyoji Shibutani
the lower bound is even higher than for the case q = 2 when we have more than 8 rounds, and more than 22 active S-boxes are guaranteed in 9 rounds. The ratio of the new design successfully came to the level of AES after more than 6 rounds. These results show that the design with triple MDS matrices has enough advantages over single MDS matrix design. Also, our experiment indicates that not so many MDS matrices seem to be required to get a benefit from the proposed design.
7 7.1
Discussion Implementation Aspects
This new construction requires an additional implementation cost because it employs multiple matrices. Multiple diffusion matrices require additional gate size in hardware and lookup tables in memory in software implementation. However the speed impact in hardware is expected to be negligible because only switching circuits for matrices will be added. Detailed observations on hardware implementation of many types of SP-type Feistel networks can be found in [15]. If all lookup tables can be stored in the fastest cache memory, not much time cost would be expected. If b matrices of dimension 8 on GF (28 ) are employed in the 128-bit block setting, the cipher requires 16b KB lookup tables at maximum in which the size of each entry is 64-bit. Since some recent 64-bit CPUs have 64KB first cache memory for data, 48 KB lookup table required by 3 matrices would be acceptable. In such a setting, it is estimated that only 8R table lookups and 9R XOR operations are required to finish R round calculation without a key scheduling procedure. 7.2
Future Research
Though we only discussed the immunity against differential attacks throughout this paper, we can directly extend the result to the linear attack if we construct a PS-type F-function whose order of S-box and diffusion layer is exchanged from SP-type F-function[5]. This is due to the dual property of differential and linear masks [5, 10]. However, it is not clear so far that the immunity has been gained for the linear attack if a cipher is designed to have immunity against the differential attack based on our strategy. This theoretical explanation should be included in the topic of future research. Our evaluation method adopted a simple weight based approach to estimate lower bounds of the proposed designs. Since the approach achieved an algorithm with feasible time and memory space at the expense of information of truncated differential form, a more detailed algorithm may produce tighter lower bounds. It is considered an important research topic to develop a new algorithm that counts lower bounds of the minimum number of active S-boxes more strictly for the new design.
Improving Immunity of Feistel Ciphers against Differential Cryptanalysis
273
Acknowledgement The authors would like to thank Bart Preneel, Alex Biryukov, Christophe De Canni`ere and Joseph Lano of COSIC, K.U.Leuven for their helpful discussions and suggestions. Furthermore, we thank the anonymous reviewers for helpful feedback.
References [1] K. Aoki, T. Ichikawa, M. Kanda, M. Matsui, S. Moriai, J. Nakajima and T. Tokita, “Specification of Camellia - a 128-bit Block Cipher,” Primitive submitted to NESSIE by NTT and Mitsubishi, 2000, See also http://info.isl.ntt.co.jp/camellia. 261, 265 [2] K. Aoki, T. Ichikawa, M. Kanda, M. Matsui, S. Moriai, J. Nakajima and T. Tokita, “Camellia: A 128-Bit Block Cipher Suitable for Multiple Platforms - Design and Analysis,” Selected Area in Cryptography, SAC 2000, LNCS 2012, pp.39-56, 2000. 260, 263 [3] E. Biham and A. Shamir,” Differential Cryptanalysis of DES-like Cryptosystems,” CRYPTO ’90, LNCS 537, pp.2-21, 1991. [4] J. Daemen and V. Rijmen, ”The Design of Rijndael: AES - The Advanced Encryption Standard,” Springer-Verlag, 2002. 260, 261 [5] M. Kanda, “Practical Security Evaluation against Differential and Linear Cryptanalysis for Feistel Ciphers with SPN Round Function,” Selected Areas in Cryptography, SAC 2000, LNCS 2012, pp. 324-338, 2000. 272 [6] L. R.Knudsen, “Truncated and Higher Order Differentials,” Fast Software Encryption - Second International Workshop, LNCS 1008, pp.196-211, 1995. 265 [7] R. Lidl and H. Niederreiter, “Finite Fields,” Encyclopedia of mathematics and its applications 20, Cambridge Univ. Press, 1997. 262, 269, 276 [8] M.Matsui, “Differential Path Search of the Block Cipher E2,” Technical Report ISEC99-19, IEICE, 1999.(written in Japanese) 263, 270 [9] M. Matsui and T. Tokita,“Cryptanalysis of Reduced Version of the Block Cipher E2,” Fast Software Encryption, FSE’99, LNCS 1636, 1999. 263 [10] M. Matsui, “Linear Cryptanalysis of the Data Encryption Standard,” EUROCRYPT ’93, LNCS 765, pp.386-397, 1994. 272 [11] H. Shimizu, “On the security of Feistel cipher with SP-type F function,” 7A-3, SCIS 2001, 2001. (written in Japanese) 263 [12] T. Shirai, S. Kanamaru and G. Abe, “Improved Upper Bounds of Differential and Linear Characteristic Probability for Camellia”, Fast Software Encryption, FSE2002, LNCS 2365, pp.128-142, 2002. 263 [13] T. Shirai, “Differential, Linear, Boomerang and Rectangle Cryptanalysis of Reduced-Round Camellia,” preproceedings of Third NESSIE Workshop, 2002. 263 [14] S. Moriai, M. Sugita, K. Aoki and M. Kanda, “Security of E2 against Truncated Differential Cryptanalysis,” Selected Areas in Cryptography, SAC’99, LNCS 1758, pp.106-117, 2000. 263 [15] L. Xiao and H. M. Heys,”Hardware Performance Characterization of Block Cipher Structures,” Topics in Cryptology - CT-RSA 2003, The Cryptographers’ Track at the RSA Conference 2003, Proceedings pp.176-192, 2003. 272
274
Taizo Shirai and Kyoji Shibutani
[16] National Institute of Standards and Technology, Advanced Encryption Standard, FIPS 197, 2001. 260 [17] NESSIE project - New European Schemes for Signatures, Integrity, and Encryption, http://www.cryptonessie.org. 260 [18] CRYPTREC project, http://www.ipa.go.jp/security/enc/CRYPTREC/. 260 [19] STORK project - Strategic Roadmap for Crypto, Public Document, D6: Open problems in Cryptology version 2.1, chapter 4, 2003, available at http://www.stork.eu.org/. 261
Appendix A All the minimum differentially active S-boxes paths for more than 5 rounds of MDS-Feistel cipher using an 8 × 8 MDS matrix can be represented by only the following eight types of differential paths as building blocks in Fig. 7. Pattern (A),(B) and (C) are prefix patterns which appear only at the beginning of differential paths, and pattern (X), (Y) and (Z) are suffix patterns which appear only at the end of differential paths. Pattern (P) and (Q) are middle iteration patterns which appear at the middle of differential paths and are sometimes iterated more than once depending on the total number of rounds. (P) and (Q) are respectively shown as 3-round and 5-round differential cancellations in Sect. 4.
(A) 1r , active = 0 00000000
00000001
(P) 4r, active = 10 00000001 00000000
F
F
11111111 00000001
(X) 0r, active = 0 00000001 00000000
00000000
00000001
00000000
00000001
F
00000001
11111111 F
(B) 2r, active = 1 00000001 11111111
00000000
F
00000001
00000000
00000001
(Y) 1r, active = 1 00000001 00000000
00000000
F
F
00000001 00000001
F
00000001
00000000
(Z) 3r, active = 9 00000001
F
(C) 4r, active = 9 00000000 11111111
11111111 00000000 11111111
F
00000001
11111111
00000001
00000001
00000000
11111111
00000001
00000000
00000001 F
11111111 F
11111111 00000000
F
F
00000001
11111111
F
F
00000000
F
F
00000000
00000000
00000001 F
F
11111111
11111111
(Q) 6r, active = 18 00000001 00000000
11111111
00000001 F
00000000
Prefix Patterns (A)-(C)
00000001
00000000
Middle Iteration Patterns (P),(Q)
Suffix Patterns (X)-(Z)
Fig. 7. Building Blocks based on δA = (00000001)
Improving Immunity of Feistel Ciphers against Differential Cryptanalysis
275
Table 3. All path patterns of the minimum number of active S-boxes for each round
R.
1
2
3
4
M. A.
0
1
2
9
R
5
6
7
8
M. A.
10
11
12
19
Pat.
APX BZ CY
APY BPX
BPY
APZ AQY BQX CPX
R.
9
10
11
12
M. A.
20
21
22
29
Pat.
APPX BPZ BQY CPY
APPY BPPX
BPPY
APPZ APQY, AQPY BPQX, BQPX CPPX
R.
13
14
15
16
M. A.
30
31
32
39
Pat.
APPPX BPPZ BPQY, BQPY CPPY
APPPY BPPPX
BPPPY
APPPZ APPQY, APQPY, AQPPY BPPQX, BPQPX, BQPPY CPPPX
R.
17
18
19
20
M. A.
40
31
32
39
Pat.
R.
APPPPX APPPPY BPPPPY APPPPZ BPPPZ BPPPPX APPPQY, APPQPY, APQPPY, AQPPPY BPPQY, BPQPY, BQPPY BPPPQX, BPPQPX, BPQPPX, BQPPPX CPPPY CPPPPX 4n+1
4n+2
4n + 3
4n + 4
M. A.
10n
10n+1
10n+2
10n + 9
Pat.
AP n X BP n−1 Z B(P n−2 Q)Y (n > 1) CP n−1 Y
AP n Y BP n X
BP n Y
AP n Z A(P n−1 Q)Y B(P n−1 Q)X CP n X
(R. ≥ 5)
R. : Round Number M. A. : the minimum numbers of active S-boxes Pat. : path expressions constructed from basic patterns. P k : iteration of pattern P for k times (P k Q): all possible patterns generated from pattern P for k times and Q for once
Each pattern in Fig. 7 shows a representative path using truncated difference (00000001), and each pattern also contains 7 other different paths by replacing (00000001) with one of (00000010), (00000100), .., (10000000).
276
Taizo Shirai and Kyoji Shibutani
Table 3 shows the search results for 5-round to 20-round differential paths of MDS-Feistel. The Patterns field shows the path patterns expressed by their building blocks. It means that any resulting path can be expressed by one of the path patterns in the corresponding field. We can see that there is a 4-round regularity. The last three rows shows the regularity in a generalized form. From this experimental result, any differential path with a minimum number of active S-boxes for more than 6-rounds contains at least one (P) or (Q). This is the reason why difference cancellation should be avoided in order to gain more active S-boxes, as described in Sect. 4.
Appendix B In this appendix, we show a set of example 8 × 8 MDS matrices in which any combination of any 8 columns form a MDS matrix, which was obtained from the right part of a [256, 128, 129] Reed-Solomon code’s generation matrix in standard form [7]. In the following example we employed a primitive polynomial p(x) = x8 + 4 x + x3 + x2 + 1. Let α be a root of p(x), we set a parity check matrix H as: ⎡ ⎤ 1 1 ··· 1 1 1 ⎢ α254 α253 · · · α 1 0 ⎥ ⎢ ⎥ ⎢ .. .. .. ⎥ . .. .. .. H=⎢ . . . .⎥ . . ⎢ ⎥ ⎣ (α254 )126 (α253 )126 · · · α126 1 0 ⎦ (α254 )127 (α253 )127 · · · α127 1 0 Then we calculated a generation matrix G = [I128×128 M128×128 ] The following 16 matrices are obtained from the first 8 rows of M128×128 simply by splitting every 8 columns as [M0 M1 ...M15 ]. Each element is expressed in a hexadecimal value corresponding to a binary representation of elements in GF (28 ). ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ M0 = ⎜ ⎜ ⎜ ⎜ ⎝
⎛ ⎜ ⎜ ⎜ ⎜ ⎜ M3 = ⎜ ⎜ ⎜ ⎜ ⎝
9d 29 67 8e d9 2a 42 55
b4 34 6a d7 e5 f7 e6 63
d3 39 d2 e6 4d 67 a0 fa
5d 60 e3 1b dd 72 4 51
84 5c 4b 8b c6 b1 f1 c
ae 81 db 9e 5 7 4 d9
ec 25 9d 3a f0 f2 7d 28
⎛ ⎞ b8 f 1 65 b9 13 ⎟ ⎜ 3a f 6 2d ⎜ ⎟ 4 ⎟ ⎜ 4a 97 a3 ⎜ ⎟ 91 ⎟ ⎜ 82 5f a2 ⎟ M1 = ⎜ ad ⎟ ⎜ 59 89 10 ⎜ ⎟ 27 ⎟ ⎜ 1d 69 eb ⎝ ⎠ 8c 31 1b 22 d6 e0 7b b5
ef 6a b9 c1 2d 4e 29 5a
d2 1e f4 bf 4 c8 71 f2
c3 cc 2b 30 bc b8 51 81
7b 5e a0 69 fb b0 37 cd
⎛ ⎞ f4 d0 46 a4 ⎟ ⎜ f d b3 ⎜ ⎟ 76 ⎟ ⎜ ca 9b ⎜ ⎟ 2d ⎟ ⎜ 79 ec ⎟ M2 = ⎜ 5c ⎟ ⎜ 56 f 8 ⎜ ⎟ 2d ⎟ ⎜ 77 e1 ⎝ ⎠ 63 c2 5 81 2b f d
a6 84 d3 6a a0 26 33 5b
a7 18 9c a8 79 19 9c aa
e1 7a 66 c1 3a 77 d1 3a
b7 cc b1 55 4b bd 3 a4
16 31 12 e2 13 3a 1e 47
d2 e7 af 14 27 f6 5 c5
ce 9a b0 d7 eb 62 dd 7f
a2 e1 7 53 fa b4 d b8
8 65 6 23 91 e5 d3 7b
3d f6 6f 62 69 2a 18 70
c2 5f bb 21 3e b2 db 2f
c4 b5 1f ee a2 aa 2e 44
c0 5e 8 58 c b7 8b 8a
⎛ ⎞ a6 1c 2 ⎟ ⎜ 1f ⎜ ⎟ 3e ⎟ ⎜ cb ⎜ ⎟ f9 ⎟ ⎜ 40 ⎟ M4 = ⎜ 14 ⎟ ⎜ c3 ⎜ ⎟ d2 ⎟ ⎜ b4 ⎝ ⎠ c6 65 c4 d5
4d 36 cd cd 75 b8 7c 4b
f4 92 1e 40 f8 a3 6b ca
b5 21 a9 24 74 b1 7 8d
f2 7f 4 3f ce ed f 19
⎛ ⎞ 71 24 75 50 ⎟ ⎜ a7 6b ⎜ ⎟ 4 ⎟ 16 21 ⎜ ⎜ ⎟ 6b ⎟ ⎜ 86 6b ⎟ M5 = ⎜ 88 ⎟ 25 50 ⎜ ⎜ ⎟ 13 ⎟ ⎜ 75 79 ⎝ ⎠ d5 2f f 0 2b 6d 76
60 38 bb 8c 1a d4 e7 a2
ec eb 24 5a e2 ed ae cb
cf d8 c4 d6 fa c0 ae 13
de 14 5c f4 b0 81 25 38
60 93 c4 f1 85 34 1f 77
4f b8 6a 4c ed 4a 49 91
2e da fb f9 a 70 4f a3
b3 ee ec da 12 c b9 1
64 13 c3 53 ca 17 2f d0
c1 82 d6 e ee b8 5e 4c
f 64 bc fa ef 1e 2b 5e
⎛ 50 cd e9 ⎞ 15 ⎟ ⎜ c 9c ⎜ ⎟ fd ⎟ ⎜ ad d3 ⎜ ⎟ c5 ⎟ ⎜ 88 56 ⎟ M8 = ⎜ aa ⎟ ⎜ 2 96 ⎜ ⎟ fe ⎟ ⎜ 6c ee ⎝ 2f f f b1 ⎠ ca 1 fb
21 8a 5f c9 26 8d 7 fd
ff bd cd 75 df 46 81 93
88 d4 8c 93 b 2d 1f c4
97 9e a4 2f 36 f0 d8 af
8b 38 27 79 b0 6d 11 c9
⎛ 86 ⎜ b6 ⎜ ⎜ 63 ⎜ ⎜ b M6 = ⎜ ⎜ 2f ⎜ ⎜ 4f ⎝ 19 d2
33 aa 56 38 5c c4 b8 64
36 7c e7 15 67 e3 9a df
42 92 84 ff a0 f5 7b c2
⎛ 7f d7 8a ⎞ e5 ⎟ ⎜ f a 1f ⎜ ⎟ bd ⎟ ⎜ 74 e1 ⎜ ⎟ b0 ⎟ ⎜ ed ea ⎟ M7 = ⎜ d5 ⎟ ⎜ 1a 16 ⎜ ⎟ 5b ⎟ ⎜ 58 19 ⎝ 69 b2 6a ⎠ 4c cd 4e
16 c4 4c d7 b9 d1 56 3f 37 16 2 a8 dc 7c 53 f4
4a 7e 6a 12 8a f5 d4 21
8b e5 c7 b7 8d d1 14 c5
ca 2e 4f ad 29 49 47 55
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
c ⎞ a1 ⎟ ⎟ 22 ⎟ ⎟ 11 ⎟ ⎟ da ⎟ ⎟ 2e ⎟ b7 ⎠ 5c
Improving Immunity of Feistel Ciphers against Differential Cryptanalysis ⎛ 9c ⎜ 59 ⎜ ⎜ be ⎜ ⎜ 26 M9 = ⎜ ⎜ 97 ⎜ ⎜ 97 ⎝ 4 25 ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ M12 = ⎜ ⎜ ⎜ ⎜ ⎝
9d 60 81 ac 92 74 7d e3
ae 99 de 6b a8 56 22 2c 16 1b be d2 ea e3 d7 39
3e d4 1b 39 c0 ba 13 d5 3a 81 64 4b 18 79 1b 1b
9e 93 3b 16 1 43 40 12 75 75 7 19 9d 44 fc 59
58 bc cc 6a 13 d8 f5 4d a0 a2 18 ed 8a c fb a
ca fd 16 a6 f4 d7 4e b4 f2 6a 87 8e 7 13 b2 e5
c8 97 a7 75 86 fe 91 9c 4a bb 16 97 ac 2e bb c0
77 7f e1 d3 1d 14 ab 40
⎛ 51 f 4 ⎜ ee 8e ⎟ ⎜ ⎟ ⎜ bf c6 ⎟ ⎜ ⎟ ⎜ 1e 32 ⎟ ⎟ M10 = ⎜ ⎜ 4b 2b ⎟ ⎜ ⎟ ⎜ bc 9d ⎟ ⎝ ad 6f ⎠ 17 1b ⎞
a9 98 7c 82 3c c2 2d 28
⎛ ⎞ c2 54 c8 7d 28 ⎟ ⎜ ca 9 40 ⎜ ⎟ f6 ⎟ ⎜ 6d 90 68 ⎜ ⎟ 58 ⎟ ⎜ 91 e c9 ⎟ M13 = ⎜ cb ⎟ ⎜ f e b1 1a ⎜ ⎟ 77 ⎟ ⎜ 8c f 4 f 2 ⎝ ⎠ df 1b ca 16 9f c9 f e 8c ⎛ 49 ⎜ 14 ⎜ ⎜ 32 ⎜ ⎜ cd M15 = ⎜ ⎜ cc ⎜ ⎜ 6a ⎝ 1 99
7f 39 85 62 67 41 58 a6
18 2b d e9 39 46 e8 c2
82 bd be f 80 1f 2c 8d
c8 df e7 dd 2f 8a 59 fe
f7 93 7f e9 9 50 e1 bb
d9 45 62 de f 5a b0 18
23 c2 73 7b 7 cf cb 87
25 89 fc 7b 6a 9d 9b f2
3f 23 73 91 8f 1f b0 ab
a9 53 d1 37 9f 66 af f7
34 5d be c7 3e fa b4 fa
55 40 60 53 3e 1e fe 33
8c 93 8c 2c 4c 4d fa 80
7c 78 8a 5f 67 68 c7 9
⎛ d7 bf 93 f6 ⎞ fe ⎟ ⎜ 96 3d 44 ⎜ ⎟ 9d ⎟ ⎜ e5 ba b0 ⎜ ⎟ e6 ⎟ ⎜ 3 f2 b ⎟ M11 = ⎜ 55 ⎟ ⎜ 1d 4b 36 ⎜ ⎟ 17 ⎟ ⎜ 56 85 31 ⎝ f d 45 cd ⎠ 59 95 7e f e ce 99 6d c9 4d 8f 34 99 c1
⎛ a1 ⎜ 32 ⎟ ⎜ ⎟ ⎟ ⎜ 78 ⎟ ⎜ ⎟ ⎜ a9 ⎟ M14 = ⎜ ⎟ ⎜ 81 ⎟ ⎜ ⎟ ⎜ 79 ⎠ ⎝ e3 a8 ⎞
6d ba 80 69 24 fc 55 3b
5 fc 3f 18 57 c2 29 a9
96 f9 4c 99 f1 aa ac ba 75 4a 7a 42 ce b 47 6b
9 2d 66 e2 4 89 a5 1d 5 e4 cf e2 7c 28 de bb
ae c aa b3 6c c5 3c 8d 9c 50 fb cb 64 31 2 4c
277 2b d6 d8 9d bf a6 9b db 74 af ae c 42 5f dc 87
49 e6 22 4b 5e 3f b6 bd
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
⎞ d9 c6 ⎟ ⎟ 5e ⎟ ⎟ f ⎟ ⎟ ea ⎟ ⎟ 11 ⎟ ⎠ bf 25
4c ⎞ 10 ⎟ ⎟ 61 ⎟ ⎟ 6a ⎟ ⎟ 97 ⎟ ⎟ f1 ⎟ 34 ⎠ ee
Appendix C In our evaluation algorithm, a check procedure to judge whether a given weight distribution is possible or not was employed. Let Mi (1 ≤ i ≤ 2r) be the i-th round diffusion matrix, and let Nj (1 ≤ j ≤ q) be q matrices designed by our proposed design strategy. In any combination of ∆xi and ∆xi+2j ∈ GF (2n )m , there is the following relation, ∆xi + ∆xi+2j =
j
Mi+2k−1 ∆zi+2k−1
(6)
k=1
In the case of q = r, all Mk in the above equation are guaranteed to be different because they are chosen from Nj without overlap. Then we consider weight conditions on both sides of (6). Let W1 be wh (∆xi + ∆xi+2j ). We obtain the following inequality. |wh (∆xi ) − wh (∆xi+2j )| ≤ W1 ≤ min(m, wh (∆xi ) + wh (∆xi+2j ))
(7)
j On the other hand, let W2 be wh ( k=1 Mi+2k−1 ∆zi+2k−1 ). If there is at least one nonzero hamming weight in wh (∆zj ), max(m + 1 −
j
wh (∆zi+2k−1 ), 0) ≤ W2 ≤ m
(8)
k=1
If all hamming weight of wh (∆zj ) are 0, obviously W2 = 0. In the evaluation algorithm, we compare the above weight conditions W1 and W2 implied by a given weight distribution. In the checking procedure, it is judged false if these two weight range conditions have no overlap, because it means there is no path with such a weight distribution.
278
Taizo Shirai and Kyoji Shibutani
In q < r and j > q settings, some of the matrices Nj appear more than once in equation (6). In that case, we can rewrite (6) in partially bundle form, ∆xi + ∆xi+2j =
q
Nk ∆zk ,
(9)
k=1
where
∆zk =
∆zl .
(10)
{l|Nk =Ml }
In this case, we also have to consider the weight condition for XOR j of multiple element to treat ∆zk = {l|Nk =Ml } ∆zl . Let j ≥ 1, W3 be wh ( i=1 ∆ai ), and let wmax be max(wh (∆a1 ), wh (∆a2 ), .., wh (∆aj )), we obtain the range of W3 as, max(0, 2wmax −
j i=1
wh (∆ai )) ≤ W3 ≤ min(m,
j
wh (∆ai ))
(11)
i=1
Then this weight condition W3 can be used to determine the weight condition of W2 for the equation (9).
ICEBERG : An Involutional Cipher Efficient for Block Encryption in Reconfigurable Hardware Francois-Xavier Standaert, Gilles Piret, Gael Rouvroy, Jean-Jacques Quisquater, and Jean-Didier Legat UCL Crypto Group Laboratoire de Microelectronique Universite Catholique de Louvain Place du Levant, 3, B-1348, Louvain-La-Neuve, Belgium {standaert,piret,rouvroy,quisquater,legat}@dice.ucl.ac.be
Abstract. We present a fast involutional block cipher optimized for reconfigurable hardware implementations. ICEBERG uses 64-bit text blocks and 128-bit keys. All components are involutional and allow very efficient combinations of encryption/decryption. Hardware implementations of ICEBERG allow to change the key at every clock cycle without any performance loss and its round keys are derived “on-the-fly” in encryption and decryption modes (no storage of round keys is needed). The resulting design offers better hardware efficiency than other recent 128-key-bit block ciphers. Resistance against side-channel cryptanalysis was also considered as a design criteria for ICEBERG. Keywords: block cipher design, efficient implementations, reconfigurable hardware, side-channel resistance.
1
Introduction
In October 2000, NIST (National Institute of Standards and Technology) selected Rijndael as the new Advanced Encryption Standard. The selection process included performance evaluation on both software and hardware platforms. However, as implementation versatility was a criteria for the selection of the AES, it appeared that Rijndael is not optimal for reconfigurable hardware implementations. Its highly expensive substitution boxes are a typical bottleneck but the combination of encryption and decryption in hardware is probably as critical. In general, observing the AES candidates [1, 2], one may assess that the criteria selected for their evaluation led to highly conservative designs although the context of certain cryptanalysis may be considered as very unlikely (e.g. more than 2100 chosen plaintexts). More recent designs of the NESSIE1 project 1
This work has been funded by the Wallon region (Belgium) through the research project TACTILS http://www.dice.ucl.ac.be/crypto/TACTILS/T -home.html NESSIE: New European Schemes for Signatures, Integrity, and Encryption. See http://www.cryptonessie.org.
B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 279–299, 2004. c International Association for Cryptologic Research 2004
280
Francois-Xavier Standaert et al.
(e.g. Khazad [3], Misty [4]) provide an improved efficiency. They also allowed interesting comparisons between Feistel networks (e.g. Misty) and substitutionpermutation networks, with respect to hardware efficiency. Although Khazad is not a Feistel network, its structure is designed so that by choosing all components to be involutions, the inverse operation of the cipher differs from the forward operation in the key scheduling only. ICEBERG is also based on an involutional structure but allows to derive the keys more efficiently than Khazad. Moreover, the combination of encryption and decryption is improved due to a simplified diffusion layer. Reconfigurable hardware devices usually enable high performance encryption/ decryption solutions for real-time applications of multi-Gbps data streams. Video-processing is the typical context where high throughput has to be provided at low hardware cost. Although present encryption algorithms may provide very high encryption rates, it is often at the cost of expensive designs. The main feature of ICEBERG is that is has been defined in order to allow very efficient reconfigurable hardware implementations. An additional criteria was the simplicity of the design. ICEBERG is scalable for different architectures (loop, unrolled, pipeline) and FPGA2 technologies. As a consequence, ASIC3 implementations are also efficient. All its components easily fit into 4-input lookup tables and its key scheduling allows to derive the round keys “on-the-fly” in encryption and decryption modes. This involves no storage requirements for the round keys. The resulting design offers better hardware efficiency than other recent 128-key-bit block ciphers. As a consequence, very low-cost hardware crypto-processors and high throughput data encryption are potential applications of ICEBERG. Finally, resistance against side-channel cryptanalysis was also considered as a design criteria for ICEBERG. Small substitution tables are used in order to allow efficient boolean masking. Moreover, the key agility offers the opportunity to consider new encryption modes where the key is changed frequently in order to make the averaging of side-channel traces unpractical. This paper is structured as follows. Section 2 presents the design goals of ICEBERG and section 3 gives its specifications. The security analysis of ICEBERG is in section 4 and its performance analysis in section 5. Finally, conclusions are in section 6. Some tables and proofs are given in appendixes.
2
Design Goals
Present reconfigurable components like FPGAs are usually made of reconfigurable logic blocks combined with fast access memories (RAM blocks) and high speed arithmetic circuits [5, 6]. Basic logic blocks of FPGAs include a 4-input function generator (called lookup table, LUT), carry logic and a storage element. As reconfigurable components are divided into logic elements and storage elements, an efficient implementation will be the result of a better compromise 2 3
FPGA : Field Programmable Gate Array. ASIC : Application Specific Integrated Circuit.
ICEBERG : An Involutional Cipher Efficient for Block Encryption
281
between combinatorial logic used, sequential logic used and resulting performances. These observations lead to different definitions of implementation efficiency [7, 8, 9, 10, 11, 12]: 1. In terms of performances, let the efficiency of a block cipher be the ratio T hroughput (M bits/s)/Area (LU T s, RAM blocks). 2. In terms of resources, the efficiency is easily tested by computing the ratio N br of LU T s/N br of registers: it should be close to one. The general design goal of ICEBERG is to provide an efficient algorithm for reconfigurable hardware implementations meeting the usual security requirements of block ciphers. More precisely, our design goals were: 1. Good security properties: ICEBERG has a resistance against known attacks comparable to recently published block ciphers (AES, NESSIE). 2. Hardware implementation efficiency (as previously defined). 3. Hardware implementation versatility: ICEBERG is scalable for different architectures (loop, unrolled, pipeline) and FPGA technologies. 4. Resistance against side-channel cryptanalysis.
3 3.1
Specifications Block and Key Size
Let n be the block bit-size and k be the key bit-size. The state X is represented as a n-bit vector where X(i) (0 ≤ i < n) represents the ith bit from the right. Alternatively, X can be represented as an array of n4 4-bit blocks, where Xj is the jth block from the right. ICEBERG operates on 64-bit blocks and uses a 128bit key. It is an involutional iterative block cipher based on the repetition of R identical key-dependent round functions. 3.2
The Non-Linear Layer γ
Function γ consists of the successive application of non-linear substitution boxes and bit permutations (i.e. wire crossings). Substitution layers S0 , S1 : The substitution layers Sj consists of the parallel application of substitution boxes sj to the blocks of the state. 16 Sj : Z16 24 → Z24 : x → y = Sj (x) ⇔ yi = sj (xi ) 0 ≤ i ≤ 15
Tables of S-boxes s0 , s1 are given in
(1)
282
Francois-Xavier Standaert et al.
Bit permutation layer P 8: The permutation layer P 8 consists of the parallel application of 8 permutations p8 to the state, where p8 consists of bit permutations on 8-bit blocks of data. Table of p8 is given in appendix B. P 8 : Z828 → Z828 : x → y = P 8(x) ⇔ y(8i + j) = x(8i + p8(j)) 0 ≤ i ≤ 7, 0 ≤ j ≤ 7
(2)
Based on previous descriptions, the non-linear layer γ can be expressed as: 64 γ : Z64 2 → Z2 : γ ≡ S0 ◦ P 8 ◦ S1 ◦ P 8 ◦ S0
(3)
For cryptanalytic and software implementation purposes, γ may also be viewed as a unique layer consisting of the application of 8 identical 8 × 8 S-boxes for which the table is given in appendix B. 3.3
The Key Addition Layer σK
The affine key addition σK consists of the bitwise exclusive or (XOR, ⊕) of a key vector K. 64 σK : Z64 2 → Z2 : x → y = σK (x) ⇔ y(i) = x(i) ⊕ K(i) 0 ≤ i ≤ 63
3.4
(4)
The Linear Layer K
Function K consists of the successive application of binary matrix multiplications and wire crossing layers, combined with the key addition layer for efficiency purposes. We describe it as: 64 K : Z64 2 → Z2 : K ≡ P 64 ◦ P 4 ◦ σK ◦ M ◦ P 64
(5)
where M , P 64, and P 4 are defined as follows: Matrix multiplication layer M : The matrix multiplication layer M is based on the parallel application of a simple involutional matrix multiplication. Let V ∈ Z4×4 be a binary involutional (i.e. such that V 2 = In ) matrix: 2 ⎡ ⎤ 0111 ⎢1 0 1 1⎥ ⎥ V =⎢ ⎣1 1 0 1⎦ 1110 M is then defined as: 16 M : Z16 24 → Z24 : x → y = M (x) ⇔ yi = V · xi
0 ≤ i ≤ 15
(6)
We define diffusion boxes D as performing multiplication by V . Table of D is given in appendix B.
ICEBERG : An Involutional Cipher Efficient for Block Encryption
S0
S0
S0
P8
S1
S0
P8
S1
S1
P8
S0
S0
S0
S0
P8
S1
S1
P8
S0
S0
S0
S0
P8
S1
S1
P8
S0
S0
S0
S0
P8
S1
S1
P8
S0
S0
S0
P8
S1
S1
P8
S0
S0
S0
S0
P8
S1
S1
P8
283
S0
P8
S1
S1
P8
S1
Non-Linear Layer
P8
S0
S0
S0
S0
S0
S0
S0
S0
D
D
D
D
D
D
D
D
P64
D
D
D
D
D
D
D
D
⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ P4
P4
P4
P4
P4
P4
P4
P4
P4
P4
P4
P4
P4
P4
P4
Diffusion + Key addition
P4
P64
Fig. 1. The round function ρK .
Bit permutation layer P 64: on 64-bit blocks of data.
Permutation P 64 performs bit permutations
64 P 64 : Z64 2 → Z2 : x → y = P 64(x) ⇔ y(i) = x(P 64(i)) 0 ≤ i ≤ 63
(7)
Table of permutation P 64 is given in appendix B. Bit permutation layer P 4: The permutation layer P 4 consists of the parallel application of 16 permutations p4 to the state. p4 consists of bit permutations on 4-bit blocks of data. Table of p4 is given in appendix B. 16 P 4 : Z16 24 → Z24 : x → y = P 4(x) ⇔ yi (j) = xi (p4(j))
0 ≤ i ≤ 15, 0 ≤ j ≤ 3
(8)
The purpose of permutation P 4 is to efficiently distinguish encryption from decryption. It will become clearer in section 3.8 and appendix A. 3.5
The Round Function ρK
Finally, the whole round function can be expressed as: 64 ρK : Z64 2 → Z2 : ρK ≡ K ◦ γ
It is illustrated in Figure 1. 3.6
The Key Schedule
The key scheduling process consists of key expansion and key selection.
(9)
284
Francois-Xavier Standaert et al.
The key expansion: This process expands the cipher key K ∈ Z128 into 2 0 a sequence of keys K 0 , K 1 , ..., K R also ∈ Z128 . We set the initial key K = K. 2 Then we expand K 0 by a simple key round function βC so that: K i+1 = βC (K i )
(10)
Where 0 ≤ i ≤ R and C ∈ Z2 is a round constant discussed in section 3.9. The key round βC is pictured in Figure 2. It consists of the application of non-linear substitution boxes, shift operations and bit permutations: βC : Z128 → Z128 : βC ≡ τC ◦ P 128 ◦ S ◦ P 128 ◦ τC 2 2
(11)
where τC , S , and P 128 are defined as follows: – Shift layer τC : The shift layer τC consists of the application of a variable shift operator to the bytes of the key : shift left if C = 1, shift right if C = 0. τC : Z128 → Z128 : x → y = τC (x) ⇔ 2 2 if C = 0 : y(i) = x((i + 8) if C = 1 : y(i) = x((i − 8)
mod 128) 0 ≤ i ≤ 127 mod 128) 0 ≤ i ≤ 127
(12)
– Substitution layer S : The substitution layer S consists of the parallel application of substitution boxes s0 to the blocks of the key. 32 S : Z32 24 → Z24 : x → y = S (x) ⇔ yi = s0 (xi ) 0 ≤ i ≤ 31
(13)
Table of S-box s0 is given in appendix B. – Bit permutations layer P 128: P 128 performs bit permutation on 128-bit blocks of data. P 128 : Z128 → Z128 : x → y = P 128(x) ⇔ y(i) = x(P 128(i)) 2 2 0 ≤ i ≤ 127
(14)
Table of P 128 is given in appendix B.
The key selection: From every 128-bit vector K i , we first apply a simple compression function E that selects 64 bits corresponding to key bytes of K i having odd indices. We denote the resulting key as K64i . Then we apply a key selection layer (φ) that consists of the parallel application of a selection function X to the blocks of the key. 16 i i i φsel : Z16 24 → Z24 : K64 → RKsel = φsel (K64 ) ⇔ i = Xsel (K64ij ) 0 ≤ j ≤ 15 RKsel,j
(15)
ICEBERG : An Involutional Cipher Efficient for Block Encryption
SHIFT
285
Left/Right
P128
S0
S0
S0
S0
S0
S0
S0
S0
....
S0
S0
S0
S0
S0
S0
P128
SHIFT
Left/Right
Fig. 2. The key round βC . x(2) x(1)
⊕
⊕
x(2) 1 0
x(1)
sel
sel
x(2)
⊕
⊕
x(0) 1 0
1 0
x(0)
x(0)
⊕
y(0)
y(2)
⊕
1 0
x(3)
x(3)
sel
sel
y(1)
y(3)
Fig. 3. The key selection function Xsel .
The selection function X takes 4-bit inputs and a selection bit sel: Xsel : Z24 → Z24 : x → y = Xsel (x) ⇔ ⎧ y(0) = (x(0) ⊕ x(1) ⊕ x(2)) · sel ∨ (x(0) ⊕ x(1)) · sel ⎪ ⎪ ⎨ y(1) = (x(1) ⊕ x(2)) · sel ∨ x(1) · sel ⎪ y(2) = (x(2) ⊕ x(3) ⊕ x(0)) · sel ∨ (x(2) ⊕ x(3)) · sel ⎪ ⎩ y(3) = (x(3) ⊕ x(0)) · sel ∨ x(3) · sel
(16)
This selection process is represented in Figure 3. As a result, we obtain a 64-bit round key denoted by RK1i if sel = 1 and RK0i if sel = 0. 3.7
Encryption Process
ICEBERG is defined for the cipher key K, as the transformation ICEBERG[K] = αR [RK10 , RK11 ,.. .,RK0R ] applied to the plaintext where: αR [RK10 , RK11 , ..., RK0R ] = σRK0R ◦ γ ◦ ( R−1 r=1 ρRK1r ) ◦ σRK10
(17)
286
Francois-Xavier Standaert et al.
The standard number of rounds is R = 16. 3.8
Decryption Process
We now show that ICEBERG is an involutional cipher in the sense that the only difference between encryption and decryption is in the key schedule. We will need the following theorem, proven in appendix A: −1 Theorem 1. For any K64 ∈ Z16 24 : RK0 = RK1 , where RK0 = φ0 (K64) and RK1 = φ1 (K64).
Then, the decryption process can be obtained as follows. We start from the encryption process: αR [RK10 , RK11 , ..., RK0R ] = σRK0R ◦ γ ◦ ( R−1 r=1 RK1r ◦ γ) ◦ σRK10 Then we have for decryption: −1 0 1 R 1 α−1 R [RK1 , RK1 , ..., RK0 ] = σRK10 ◦ ( r=R−1 γ ◦ RK r ) ◦ γ ◦ σRK0R 1
−1 0 1 R 1 ⇔ α−1 R [RK1 , RK1 , ..., RK0 ] = σRK10 ◦ γ ◦ ( r=R−1 RK r ◦ γ) ◦ σRK0R (19) 1
Finally the above theorem leads to: 0 1 R 1 r α−1 R [RK1 , RK0 , ..., RK0 ] = σRK10 ◦ γ ◦ ( r=R−1 RK0 ◦ γ) ◦ σRK0R
3.9
Round Constants
ICEBERG is an involutional cipher in the sense that the only difference between encryption and decryption is in the key schedule. Moreover, if properly chosen, the round constants allow to compute keys “on-the-fly” in encryption and decryption modes. Basically, we would like round keys to satisfy: K0 = KR K 1 = K R−1 K 2 = K R−2 ...
(21)
This involves that R is even. Then, if the first half of round constants (i.e. until round 8) is 0 (shift left) and the second half is 1 (shift right), the resulting round keys will satisfy Equation (21). As a consequence, the only difference between encryption and decryption is the selection function φ of the key bits, as −1 RK1 ≡ RK0 .
ICEBERG : An Involutional Cipher Efficient for Block Encryption
4
287
Security Analysis
4.1
Design Properties of the Components
S-Boxes: The non-linear layer γ may be viewed as made out of the parallel application of 8 copies of the same 8 × 8 S-box. We designed this S-box such that it has the following properties: – – – –
It is an involution. Its δ-parameter4 is 2−5 . Its λ-parameter5 is 2−2 . Its nonlinear order ν is maximum, namely 7.
For efficiency purposes, the s-box was generated from a fixed permutation p8 and small 4 × 4 s-boxes s0 and s1 that perfectly fits into 4-input LUTs. The generation of the s-boxes is detailed in Appendix C. The bit permutations: P 64 and P 128 were designed such as to disturb as much as possible the bit alignment inside bytes, in order to provide resistance against some attacks. A remarkable property of P 64 and P 128 is that 2 bits from the same byte are always mapped to 2 bits belonging to different bytes. p8 is involutional and allows to generate good substitution boxes. Finally, p4 allows the selection function Xsel to be implemented in one LUT. The Diffusion Layer: Due to the fact that we attached much importance to hardware implementation aspects in the design of the diffusion layer, it is not optimal. More precisely, it is easy to see that its byte branch number is 4, as the bit branch number of p4 ◦ σK ◦ D is 4, and because of the remarkable property of P 64 we have just mentioned. The diffusion boxes D were designed so that their combination with the key addition layer σK can be done inside one LUT. The key round: The key round has been chosen for its efficiency properties, as well as in order to provide resistance against key schedule cryptanalysis and slide attacks: 1. Non periodicity is provided by the shift operation τC . 2. Non linearity is provided by non-linear S-boxes. 3. Good diffusion properties are provided by the combination of shifts, S-boxes and bit permutations. Moreover, the shift layer τC is used in order to allow the property (19) to be respected. The selection function Xsel is necessary to prove the property of Appendix A and is designed such that it fits into a single LUT. 4 5
δ equals the probability of the best differential approximation. We define the bias of a linear approximation that holds with probability p as = |p − 1/2|. The λ-parameter of a S-box is equal to 2 times the bias of its best linear approximation.
288
4.2
Francois-Xavier Standaert et al.
Strength Against Known Attacks
Linear and Differential Cryptanalysis: From the properties of the S-box and the diffusion layer, we can compute that a differential characteristic [13] over 2 rounds of ICEBERG has probability at most (2−5 )4 = 2−20 . Also, a linear characteristic [14] over these 2 rounds has input-output correlation at most (2−2 )4 = 2−8 . Therefore loose bounds can be computed for the full cipher (16 rounds): – The probability of the best differential characteristic is smaller than 2−160 . – The input-output correlation of the best linear characteristic is smaller than 2−64 . The security margin is very likely big enough to prevent variants of differential and linear attacks, such as boomerang [15] and rectangle [16] attacks, multiple linear cryptanalysis [17], non-linear approximations of outer rounds [18],... Note also that the security margin of ICEBERG against linear and differential cryptanalysis is comparable to the one of Khazad. This is probably more than it is necessary, as resistance against structural attacks [19, 20] was probably more determinant in the choice of the number of rounds of Khazad (8), than security margins against linear and differential cryptanalysis. Truncated and Impossible Differentials: Truncated differentials were introduced in [21], and impossible differentials in [22, 23]. They typically apply to ciphers operating on well-aligned data blocks (often bytes), such as Khazad or the AES (and many others). However our cipher does not enter in this category because of the P 64 layer, which makes it very difficult to attack this way. Therefore such an attack on the full 16-round cipher seems very unlikely. Square Attacks: Like truncated differential attacks, square attacks [19] generally apply to ciphers operating on well-aligned data blocks. Therefore the P 64 layer should prevent them efficiently, at least on more than a few rounds. More precisely, consider a batch of 256m(m ∈ {1...7}) plaintexts, such that 8 − m bytes remain constant for all of them (these bytes are said passive), while the concatenation of the m other bytes takes every possible value (it is said active). This property is preserved by the γ layer. On the contrary, the P 64 layer makes all bytes garbled (i.e. not active nor passive), which prevents pushing a basic ”square characteristic” further. The same type of argument could be applied to truncated differential attacks. Interpolation Attacks: Interpolation attacks [24] are made possible when the S-box has a simple algebraic structure, allowing to express the cipher as a sufficiently simple polynomial or rational expression. The diffusion layer also has a role with this respect. As the S-box of ICEBERG has no simple algebraic expression, it prevents interpolation attacks for more than a few rounds of our cipher.
ICEBERG : An Involutional Cipher Efficient for Block Encryption
289
Higher Order Differential Cryptanalysis: It was introduced by Knudsen in [21], and relies on finding high order differentials being a constant for the whole cipher. But as the nonlinear order of the S-box we use is maximal, namely 7, we can expect that the maximal value of 63 for the non-linear order of the cipher is reached after a few rounds of ICEBERG. Slide Attacks: Slide attacks [25, 26] work against ciphers using a periodic key schedule. Although the sequence of subkeys produced by the key schedule of ICEBERG is not periodic, it has a particular structure, namely: (K 0 , K 1 , ..., K 7 , K 8 , K 7 , ..., K 0 ) The key schedule of the GOST cipher has some similarities with the one of ICEBERG. Vulnerability of some variants and reduced-round versions of GOST against slide attacks is examined in [26]. However none of the attacks presented there seems to be applicable to our cipher. Related-Key Attacks: The first related-key attack has been described in [27], and is the related-key counterpart of the slide attack. Let us examine a slightly simplified version of ICEBERG, where the initial key addition σRK10 is replaced by a normal round ρRK10 , and the final σRK0R ◦ γ is also replaced by a normal round ρRK0R . Then if 2 keys K and K ∗ are such that K 1 =K ∗0 , and 2 plaintexts P and P ∗ are such that P ∗ = ρRK10 (P ), encryption of P under K and of P ∗ under K ∗ will process the same way (with a difference of 1 round) during 8 rounds. However then round keys, and hence computation, will differ; therefore such a related key attack does not work against our key schedule. Forgetting the simplification we made on the first and last round of ICEBERG, a relatedkey attack becomes even more difficult. Differential related-key attacks [28] are also very unlikely to be applicable to ICEBERG, due to the good diffusion and nonlinearity of its key schedule. Weak keys: The design properties of the key round prevent ICEBERG from having weak keys. The only remarkable property of the key round is in the selection function Xsel where some symbols are independent of the selection bit. Namely, hexadecimal input symbols 0, 2, 8, A become 0, C, 3, F regardless of sel = 0 or sel = 1. However, this point is very unlikely to be an exploitable weakness. Biryukov’s observations on Involutional Ciphers: Observations of Biryukov on Khazad and Anubis [29] remain valid for ICEBERG. However this study could at best threaten 5 rounds of our cipher, while it is made out of 16 rounds.
290
Francois-Xavier Standaert et al.
Side-channel cryptanalysis: Although cryptosystem designers frequently assume that secret parameters will be manipulated in closed reliable computing environments, Kocher et al. stressed in 1998 [30] that actual computers and microchips leak information correlated to the data handled. Side-channel attacks based on time, power and electromagnetic measurements were successfully applied to smart card implementations of block ciphers. Protecting implementations against side-channel attacks is usually difficult and expensive. Masking all the data with random boolean values is suggested in several papers [31, 32] and the use of small substitution tables allows to implement this efficiently, although it is still an expensive solution. The key agility provided by ICEBERG (changing the key at every plaintext block is for free) also offers interesting opportunities to prevent most side-channel attacks by defining new encryption modes where the key is changed sufficiently often. As most side-channel attacks need to collect several leakage traces to remove the noise from useful information, changing the key frequently, even in a well chosen deterministic way (e.g. LFSR-based), would make most attacks somewhat unpractical. Actually, only template attacks [33] allow to extract information from a single sample but the context is also more specific as they require that an adversary has access to an experimental device (identical to the device attacked) that he can program to his choosing.
5
Performance Analysis
ICEBERG has been designed in order to allow very efficient reconfigurable hardware implementations, as defined in section 2. For this purpose, we applied the following design rules: 1. All components easily fit into 4-input LUTs. Practically, ICEBERG is made of the parallel application of 4-input-bit transforms combined with bit permutations or shifts. 2. All components are involutional so that encryption and decryption can be made with the same hardware. The only difference between encryption and decryption is in the selection bit of φsel . 3. The key expansion allows to derive round keys “on-the-fly” in encryption and decryption modes. There is no need to store the round keys and the key can be changed in one clock cycle. 4. The scheduling of the algorithm is balanced so that the round and key round can be made in the same number of clock cycles. 5. The non-linear layer can be efficiently implemented into the RAM blocks available in most modern FPGAs. 5.1
Hardware Implementations
As all components easily fit into 4-input LUTs, we can directly evaluate the hardware cost of ICEBERG:
ICEBERG : An Involutional Cipher Efficient for Block Encryption
291
Component HW cost (LUTs) Component HW cost (LUTs) S0 , S1 64 τC 128 γ 192 S 128 K 64 Key round βC 384 Round ρK 256 φsel 64 Complete round + key round 704
As a comparison, Khazad needs 576 LUTs for its round and 768 LUTs for its key round [11], with a more expensive encryption/decryption structure. AES Rijndael is even more critical as its round needs 2608 LUTs and its key round 768 LUTs [12]. Although comparisons between hardware implementations are made difficult by their high dependency on the design methodology, we may reasonably expect to have ICEBERG encryption/decryption for the cost of Khazad encryption only and half the cost of Rijndael encryption, with comparable encryption rates. Moreover, the parallel nature of ICEBERG allows to implement every possible throughput/area tradeoff as well as very efficient pipeline. The maximum pipeline can be obtained by inserting registers after every LUT which allows to reach an optimal ratio N br of LU T s/N br of registers = 1. Loop and unrolled architectures are easily implementable with LUT-based or RAM-based substitution boxes. As a consequence, ICEBERG offers various and efficient implementation opportunities. Its regular structure makes the classical design optimizations easily reachable. Software efficiency is not a design goal of ICEBERG. It is briefly discussed in Appendix D.
6
Conclusion
This paper presented the platform-specific encryption algorithm ICEBERG and the rationale behind its design. ICEBERG is based on a fast involutional structure in order to provide very efficient hardware implementation opportunities. We showed the specificity of this type of platform in block cipher design. We also underlined that the overall structure of a cipher is important for efficiency purposes (for example, in designing rounds and key rounds that can be made in the same number of clock cycles, or in allowing “on the fly” key derivation in both encryption and decryption modes). We believe ICEBERG to be as secure as AES and NESSIE candidates and much more efficient for reconfigurable hardware implementations. ICEBERG also offers free opportunities to defeat most side-channel attacks by using adequate encryption modes.
Acknowledgements The authors are grateful to Paulo Barreto for providing valuable comments and help during the design of ICEBERG.
292
Francois-Xavier Standaert et al.
References [1] NIST Home page, http://csrc.nist.gov/CryptoToolkit/aes/. 279 [2] J.Daemen, V.Rijmen, The Block Cipher Rijndael, Smart Card Research and Applications, pp 288-296, Springer-Verlag, LNCS 1820, 2000. 279 [3] P.Barreto, V.Rijmen, The KHAZAD Legacy-Level Block Cipher, Submission to NESSIE project, available from http://www.cosic.esat.kuleuven.ac.be/nessie/ 280, 298 [4] M.Matsui, Supporting Document of MISTY1, , Submission to NESSIE project, available from http://www.cosic.esat.kuleuven.ac.be/nessie/ 280 [5] Xilinx: Virtex 2 FPGAs Data Sheet, http://www.xilinx.com. 280 [6] Altera: Stratix 1.5V FPGAs Data Sheet, http://www.altera.com. 280 [7] M.McLoone and J. V.McCanny, High Performance Single Ship FPGA Rijndael Algorithm Implementations, in the proceedings of CHES 2001: The Third International CHES Workshop, Lecture Notes In Computer Science, LNCS2162, pp 65-76, Springer-Verlag. 281 [8] V.Fischer and M.Drutarovsky, Two Methods of Rijndael Implementation in Reconfigurable Hardware, in the proceedings of CHES 2001: The Third International CHES Workshop, Lecture Notes In Computer Science, LNCS2162, pp 6576, Springer-Verlag. 281 [9] A.Satoh et al, A Compact Rijndael Hardware Architecture with S-Box Optimization, Advances in Cryptology - ASIACRYPT 2001, LNCS 2248, pp239-254, Springer-Verlag. 281 [10] Helion Technology, High Performance AES (Rijndael) Cores for XILINX FPGA, http : //www.heliontech.com. 281 [11] F. X.Standaert,G.Rouvroy,J. J.Quisquater,J. D.Legat, Efficient FPGA Implementations of Block Ciphers KHAZAD and MISTY1, in the proceedings of the Third NESSIE Workshop, November 6-7 2002, Munich, Germany. 281, 291 [12] F. X.Standaert,G.Rouvroy,J. J.Quisquater,J. D.Legat, A Methodology to Implement Block Ciphers in Reconfigurable Hardware and its Application to Fast and Compact AES Rijndael, in the proceedings of FPGA 2003: the Field Programmable Logic Array Conference, February 23-25 2003, Monterey, California. 281, 291 [13] E.Biham, A.Shamir, Differential cryptanalysis of DES-like cryptosystems (Extended abstract), Proceedings of Crypto 90, pp 2-21, Springer-Verlag, LNCS 537, 1990. 288 [14] M.Matsui, Linear cryptanalysis method for DES cipher, Proceedings of EuroCrypt 93, pp 386-397, Springer-Verlag, LNCS 765, 1993. 288 [15] D.Wagner, The Boomerang Attack, Proceedings of FSE 99, pp 156-170, SpringerVerlag, LNCS 1636, 1999. 288 [16] E.Biham, O.Dunkelman, N.Keller, The rectangle Attack - Rectangling the Serpent, Proceedings of Eurocrypt 2001, pp 340-357, Springer-Verlag, LNCS 2045, 2001. 288 [17] B. S.Kaliski, M. J. B.Robshaw, Linear Cryptanalysis using Multiple Approximations, Proceedings of Crypto 94, pp.26-39, Springer-Verlag, LNCS 0839, 1994. 288 [18] L.Knudsen, M. J. B.Robshaw, Non-Linear Approximations in Linear Cryptanalysis, Proceedings of Eurocrypt 96, pp 224-236, Springer-Verlag, LNCS 1070, 1996. 288
ICEBERG : An Involutional Cipher Efficient for Block Encryption
293
[19] J.Daemen, L.Knudsen, V.Rijmen, The Block Cipher SQUARE, Proceedings of FSE 1997, pp 149-165, Springer-Verlag, LNCS 1267, 1999. 288 [20] N.Ferguson, J.Kelsey, S.Lucks, and al., Improved Cryptanalysis of Rijndael, Proceedings of FSE 2000, pp 213-230, Springer-Verlag, LNCS 1978, 2000. 288 [21] L.Knudsen, Truncated and Higher Order Differentials, Proceedings of FSE 94, pp 196-211, Springer-Verlag, LNCS 1008, 1995. 288, 289 [22] E.Biham, A.Biryukov and A.Shamir, Cryptanalysis of Skipjack Reduced to 31 Rounds Using Impossible Differentials, Proceedings of Eurocrypt 99, pp 12-23, Springer-Verlag, LNCS 1592, 1999. 288 [23] E.Biham, A.Biryukov and A.Shamir, Miss in the Middle Attacks on IDEA, Khufu, and Khafre, Proceedings of FSE 99, pp 124-138, Springer-Verlag, LNCS 1636, 1999. 288 [24] T.Jakobsen and L.Knudsen, The Interpolation Attack on Block Ciphers, Proceedings of FSE 97, pp 28-40, Springer-Verlag, LNCS 1267, 1997. 288 [25] A.Biryukov, D.Wagner, Slide Attacks, Proceedings of FSE’99, pp 245-259, Springer Verlag, LNCS 1636, 1999. 289 [26] A.Biryukov, D.Wagner, Advanced Slide Attacks, Proceedings of Eurocrypt 00, pp 589-606, Springer Verlag, LNCS 1807, 2000. 289 [27] E.Biham, New Type of Cryptanalytic Attacks Using Related Key, Proceedings of Eurocrypt 93, pp 229-246, Springer-Verlag, LNCS 765, 1994. 289 [28] J.Kelsey, B.Schneier, D.Wagner, Related-Key Cryptanalysis of 3-WAY, BihamDES, CAST, DES-X, NewDES, RC2, and TEA, Proceedings of AusCrypt’92, pp 196-208, Springer-Verlag, LNCS 718, 1993. 289 [29] A.Biryukov, Analysis of Involutional Ciphers: Khazad and Anubis, Proceedings of FSE 2003, Springer-Verlag, to appear. 289 [30] P.Kocher, J.Jaffe, B.Jun, Differential Power Analysis, in the proceedings of CRYPTO 99, Lecture Notes in Computer Science 1666, pp 398-412, SpringerVerlag. 290 [31] L.Goubin,J.Patarin, DES and Differential Power Analysis: The Duplication Method, in the proceedings of CHES 1999, Lecture Notes in Computer Science 1717, pp 158-172, Springer-Verlag. 290 [32] S.Chari et al., Towards Sound Approaches to Counteract Power-Analysis Attacks, in the proceedings of CRYPTO 1999, Lecture Notes in Computer Science 1666, pp 398-412, Springer-Verlag. 290 [33] S.Chari, J.Rao, P.Rohatgi, Template Attacks, in the proceedings of CHES 2002, Lecture Notes in Computer Science 2523, pp 13-28, Springer-Verlag. 290 [34] A.Pfitzmann,R.Aβmann, More Efficient Software Implementations of (Generalized) DES, Institut fur Rechnerent und Fehlertoleranz, Univ.Karlsruhe, Interner Bericht 18/90. 299 [35] E.Biham, A Fast New DES Implementation in Software, Technion - Computer Science Department, Technical Report CS0891 - 1997. 299 [36] A. M.Youssef, S. E.Tavares, H.Heys, A New Class of Substitution-Permutation Networks, Proceedings of Selected Areas in Cryptography (SAC 96), pp 132-147, 1996. [37] H. M. Heys, S. E. Tavares, Known Plaintext Cryptanalysis of Tree-Structured Block Ciphers, Electronics Letters, Vol. 31, pp 784-785, May 1995. [38] L.Knudsen, Block Ciphers - Analysis, Design and Applications, Doctoral Dissertation, DAIMI PB 485, Aarhus University, Denmark, 1994. [39] J.Daemen, Cipher and Hash Function Design, Doctoral Dissertation, March 1995, KULeuven.
294
Francois-Xavier Standaert et al.
[40] V.Rijmen, Cryptanalysis and Design of Iterated Block Ciphers, Doctoral Dissertation, October 1997, KULeuven.
ICEBERG : An Involutional Cipher Efficient for Block Encryption
A
295
Proof of theorem 1.
We have to prove that P 4 ◦ σRK1 ◦ M ≡ M ◦ σRK0 ◦ P 4. Inputs and outputs of every transform are represented in Figure 4. We simply write down relations between them. First we encrypt with RK1 : b 0 = a1 ⊕ a2 ⊕ a3 b 1 = a0 ⊕ a2 ⊕ a3 b 2 = a0 ⊕ a1 ⊕ a3 b 3 = a0 ⊕ a1 ⊕ a2 c0 = a1 ⊕ a2 ⊕ a3 ⊕ k0 ⊕ k1 ⊕ k2 c1 = a0 ⊕ a2 ⊕ a3 ⊕ k1 ⊕ k2 c2 = a0 ⊕ a1 ⊕ a3 ⊕ k2 ⊕ k3 ⊕ k0 c3 = a0 ⊕ a1 ⊕ a2 ⊕ k3 ⊕ k0 d0 = a0 ⊕ a2 ⊕ a3 ⊕ k1 ⊕ k2 d1 = a1 ⊕ a2 ⊕ a3 ⊕ k0 ⊕ k1 ⊕ k2 d2 = a0 ⊕ a1 ⊕ a2 ⊕ k3 ⊕ k0 d3 = a0 ⊕ a1 ⊕ a3 ⊕ k2 ⊕ k3 ⊕ k0 Then, when we decrypt with RK0 : e0 = d1 ⊕ d2 ⊕ d3 = a1 ⊕ k0 ⊕ k1 e1 = d0 ⊕ d2 ⊕ d3 = a0 ⊕ k1 e2 = d0 ⊕ d1 ⊕ d3 = a3 ⊕ k2 ⊕ k3 e3 = d0 ⊕ d1 ⊕ d2 = a2 ⊕ k3
Encryption
Decryption d
a M
M e
b
RK 0
RK 1
f
c P4
P4 d
g
Fig. 4. Theorem 1.
296
Francois-Xavier Standaert et al.
From (16), we have: f 0 = a1 f 1 = a0 f 2 = a3 f 3 = a2 And finally, permutation P 4 finishes the decryption: g 0 = a0 g 1 = a1 g 2 = a2 g 3 = a3 Remark that permutation P 4 allows the selection function Xsel to be efficiently implemented in LUTs as it has at most 4 inputs: 3 key bits and a selection bit.
B
Tables. 0123 1032 Table 1. p4. 01234567 01452367 Table 2. p8. 0123456789abcde f d7329ac1 f 45e60b8 Table 3. s0 . 0123456789abcde f 4a f c0d9be6173582 Table 4. s1 . 0123456789abcdef 0ed3b56879a4c21f Table 5. D.
ICEBERG : An Involutional Cipher Efficient for Block Encryption
00 10 20 30 40 50 60 70 80 90 a0 b0 c0 d0 e0 f0
00 24 40 07 03 10 ab 35 87 74 5b e0 75 4b b3 a0 cb
01 c1 6c f5 4e ce ae 19 5e 9c 89 b1 a1 01 ef 95 1f
02 38 d3 93 a2 be 0e 26 c2 db 2d 32 78 72 1d 65 8f
03 30 3d cd 97 b4 64 28 c7 36 22 37 d0 d7 12 bf 8e
04 e7 4a 00 0b 2a 7c 53 80 47 5c ea 43 4c bb 7a 3c
05 57 59 b6 60 3a da e2 b0 5d e1 5a f7 fa 88 39 21
06 df f8 62 83 96 1b 7f 6d de 46 f6 25 eb 0d 98 a6
07 20 77 a7 a3 84 05 3b 17 70 33 27 7b 73 c3 04 b5
08 3e fb 63 02 c8 a8 2f b2 d5 e6 58 7e 48 8d 9b 16
09 99 61 fe e5 9f 15 a9 ff 91 09 69 1c 8c 4f 9e af
0a 1a 0a 44 45 14 a5 cc e4 aa bc 8a ac 0c 55 a4 c5
0b 34 56 bd 67 c0 90 2e b7 3f e8 50 d4 f0 82 c6 18
0c ca b9 5f f4 c4 94 11 54 c9 81 ba 9a 6a ee cf 1e
0d d6 d2 92 13 6f 85 76 9d d8 7d dd 2b 23 ad 6e 0f
0e 52 fc 6b 08 31 71 ed b8 f3 e9 51 42 41 86 dc 29
14 51 30 48 46 29 62 31
15 61 31 62 47 13 63 63
0f fd f1 68 8b d9 2c 4d 66 f2 49 f9 e3 ec 06 d1 79
Table 6. 8 x 8 substitution box. 0 0 16 24 32 11 48 30
1 12 17 37 33 28 49 35
2 23 18 18 34 60 50 40
3 25 19 41 35 49 51 14
4 38 20 55 36 36 52 57
5 42 21 58 37 17 53 6
6 53 22 8 38 4 54 54
7 59 23 2 39 43 55 20
8 22 24 16 40 50 56 44
9 9 25 3 41 19 57 52
10 26 26 10 42 5 58 21
11 32 27 27 43 39 59 7
12 1 28 33 44 56 60 34
13 47 29 46 45 45 61 15
Table 7. P64. 0 76 16 71 32 109 48 84 64 28 80 46 96 49 112 13
1 110 17 101 33 87 49 96 65 9 81 22 97 7 113 35
2 83 18 89 34 75 50 125 66 37 82 15 98 8 114 5
3 127 19 126 35 113 51 68 67 4 83 2 99 62 115 39
4 67 20 72 36 120 52 93 68 51 84 48 100 30 116 45
5 114 21 107 37 66 53 105 69 43 85 31 101 17 117 59
6 92 22 81 38 103 54 119 70 58 86 57 102 47 118 23
7 97 23 118 39 115 55 79 71 16 87 33 103 38 119 54
8 98 24 90 40 122 56 123 72 20 88 27 104 29 120 36
9 65 25 124 41 108 57 86 73 26 89 18 105 53 121 10
Table 8. P128.
10 121 26 73 42 95 58 70 74 44 90 24 106 11 122 40
11 106 27 88 43 69 59 117 75 34 91 14 107 21 123 56
12 78 28 64 44 74 60 111 76 0 92 6 108 41 124 25
13 112 29 104 45 116 61 77 77 61 93 52 109 32 125 50
14 91 30 100 46 80 62 99 78 12 94 63 110 1 126 19
15 82 31 85 47 102 63 94 79 55 95 42 111 60 127 3
297
298
C
Francois-Xavier Standaert et al.
Generation of the ICEBERG S-box [3]
As P 8 is fixed, the only part of the ICEBERG S-box structure still unspecified consists of the s0 and s1 involutions, which are generated pseudo-randomly in a verifiable way. The searching algorithm starts with two copies of a simple involution without fixed points (namely, the negation mapping u → u ¯ = u ⊕ 0xF), and pseudorandomly derives from each of them a sequence of 4 × 4 substitution boxes (“mini-boxes”) with the optimal values δ = 1/4, λ = 1/2, and ν = 3. At each step, in alternation, only one of the sequences is extended with a new mini-box. The most recently generated mini-box from each sequence is taken, and the pair is combined according to the ICEBERG S-box shuffle structure; finally, the resulting 8 × 8 S-box, if free of fixed points, is tested for the design criteria regarding δ, λ, and ν. Given a mini-box at any point during the search, a new one is derived from it by choosing two pairs of mutually inverse values and swapping them, keeping the result an involution without fixed points; this is repeated until the running mini-box has optimal values of δ, λ, and ν. The pseudo-random number generator is implemented using the AES cipher Rijndael in counter mode, with a fixed key consisting of 128 zero bits and an initial counter value consisting of 128 zero bits. The following pseudo-code fragment illustrates the computation of the chains of mini-boxes and the resulting S-box: procedure ShuffleStructure(s0 , s1 ) for w ← 0 to 255 do u0 ← s0 [w 4]; v0 ← s0 [w & 0x0F]; u1 ← (u0 & 0xC) | ((v0 & 0xC) 2); v1 ← (v0 & 0x3) | ((u0 & 0x3) 2); u0 ← s1 [u1 ]; v0 ← s1 [v1 ]; u1 ← (u0 & 0xC) | ((v0 & 0xC) 2); v1 ← (v0 & 0x3) | ((u0 & 0x3) 2); S[w] ← (s0 [u1 ] 4) | s0 [v1 ]; end for return S; end procedure procedure SearchRandomSBox() // initialize mini-boxes to the negation involution: for u ← 0 to 255 do ¯; s1 [u] ← u ¯; s0 [u] ← u end for // look for S-box conforming to the design criteria: repeat // swap mini-boxes (update the “older” one only) s0 ↔ s1 ; // randomly generate a “good” GF(24 ) involution free of fixed points:
ICEBERG : An Involutional Cipher Efficient for Block Encryption 64
299
64
P64
S
Fig. 5. A modified ρK .
repeat repeat // randomly select x and y such that // x = y and s1 [x] = y (this implies s1 [y] = x): z ← RandomByte(); x ← z 4; y ← z & 0x0F; until x = y ∧ s1 [x] = y; // swap entries: u ← s1 [x]; v ← s1 [y]; s1 [x] ← v; s1 [u] ← y; s1 [y] ← u; s1 [v] ← x; until δ(s1 ) = 1/4 ∧ λ(s1 ) = 1/2 ∧ ν(s1 ) = 3; // build S-box from the mini-boxes: S ← ShuffleStructure(s0 , s1 ); // test the design criteria: until #FixedPoints(S) = 0 ∨ δ(S) 2−5 ∧ λ(S) 2−2 ∧ ν(S) = 7; return S; end procedure
D
Software implementations
Software efficiency is not a design goal of ICEBERG. Nevertheless, its round function may be implemented using a table lookup approach as it is suggested in [34] and is therefore comparable to the one of Khazad. The key expansion of ICEBERG is actually its most critical part as a lookup table implementation requires separate tables for transforms τC ◦ P 128 and transforms S ◦ P 128 ◦ τC . Note that ICEBERG is also susceptible to be implemented in bitslice mode as suggested in [35]. If a software-efficient key schedule is wanted, an alternative key round based on a small Feistel structure can be used, illustrated in Figure 5. We just use a conditional switch of the two 64-bit vectors so that we can encrypt during half the rounds and decrypt afterwards in order to satisfy Equation (21). This will only slightly affect hardware performances (an additional multiplexor is necessary to select the round keys).
Related Key Differential Attacks on 27 Rounds of XTEA and Full-Round GOST Youngdai Ko1 , Seokhie Hong1 , Wonil Lee1 , Sangjin Lee1 , and Ju-Sung Kang2 1
2
Center for Information Security Technologies (CIST) Korea University, Anam Dong, Sungbuk Gu, Seoul, Korea {koyd,hsh,wonil,sangjin}@cist.korea.ac.kr Section 0741, Information Security Technology Division, ETRI 161 Kajong-Dong, Yusong-Gu, Taejon, 305-350, Korea [email protected]
Abstract. In this paper, we present a related key truncated differential attack on 27 rounds of XTEA which is the best known attack so far. With an expected success rate of 96.9%, we can attack 27 rounds of XTEA using 220.5 chosen plaintexts and with a complexity of 2115.15 27-round XTEA encryptions. We also propose several attacks on GOST. First, we present a distinguishing attack on full-round GOST, which can distinguish it from a random permutation with probability 1 − 2−64 using a related key differential characteristic. We also show that H. Seki et al.’s idea combined with our related key differential characteristic can be applied to attack 31 rounds of GOST . Lastly, we propose a related key differential attack on full-round GOST. In this attack, we can recover 12 bits of the master key with 235 chosen plaintexts, 236 encryption operations and an expected success rate of 91.7%. Keywords: Related key differential attack, Distinguishing attack, XTEA, GOST, Differential characteristic
1
Introduction
XTEA [10] was proposed as a modified version of TEA [7] by R. Needham and D. Wheeler in order to resist related key attacks [5]. XTEA is a very simple block cipher using only exclusive-or operations, additions, and shifts. Until now, the best known result on XTEA is a truncated differential attack on 23 rounds of XTEA (8∼30 or 30∼52) proposed in [3]. In this paper, we present related key truncated differential attacks on 25 (1∼25) and 27 (4∼30) rounds of XTEA. GOST was proposed in the former Soviet Union [2]. It has a very simple round function and key schedule. GOST uses key addition modulo 232 in each round function. So, the probability of a differential characteristic depends not only on the value of input-output differences but also on the value of the round key. In order to reduce the effect of the round key addition, H. Seki et al. introduced a specific set of differential characteristics and proposed a differential attack on
This work is supported by the MOST research fund(M1-0326-08-0001).
B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 299–316, 2004. c International Association for Cryptologic Research 2004
300
Youngdai Ko et al. Table 1. Various attacks on reduced-round XTEA Attack method
paper
Rounds # of Chosen Plaintexts Total Complexity
Impossible Diff. attack
[6]
14
262.5
285
Diff. attack
[3]
15
259
2120
Truncated Diff. attack
[3]
23
220.55
2120.65
27
220.5
2115.15
R·K Truncated Diff. this paper
Table 2. Various attacks on GOST Attack method
paper
R·K Diff. attack
[4]
24
theoretical
theoretical
A set of Diff. Char.
[8]
13
251
Not mentioned.
R·K Diff. attack
[8]
Distinguishing attack this paper
Rounds # of C·P Total Complexity
56
21
2
Not mentioned.
full
2
2
R·K Diff. attack
this paper
31
226
239
R·K Diff. attack
this paper
full
235
236
13 rounds of GOST as well as a related key differential attack on 21 rounds of GOST [8]. Here, we present several attacks on GOST. First, we introduce a distinguishing attack on full-round GOST which can distinguish it from a random oracle with probability 1 − 2−64 using a related key differential characteristic. We also present a related key differential attack on 31 rounds of GOST using our related key differential characteristic combined with H. Seki et al.’s set of differential characteristics. Finally, we describe a related key differential attack on full-round GOST. In this attack, we can recover 12 bits of the master key with 235 chosen plaintexts, 236 encryption operations and an expected success rate of 91.7%. Table. 1 and 2 depict recent results on XTEA and GOST, respectively. The following is the outline of this paper. In Section 2, we present the notations used in this paper. In Section 3, we describe an 8-round related key truncated differential characteristic of XTEA and propose related key truncated differential attacks on 25 (1∼25) and 27 (4∼30) rounds of XTEA. In Section 4, we present a distinguishing attack and a related key differential attack on 31 rounds of GOST and full-round GOST. We conclude in Section 5.
Related Key Differential Attacks on 27 Rounds of XTEA
2
301
Notations
Here, we describe several notations used in this paper. Let , ⊕, · , and be addition modulo 232 , exclusive-or, multiplication modulo 232 and left and right shift operations, respectively. Let ≪ be left rotation and be concatenation of two binary strings. Let ei be a 32-bit binary string in which the i-th bit is one and the others are zero. Let A[i] be the i-th bit of a 32-bit block A. Let A[i ∼ j] denote A[j] A[j − 1] · · · A[i].
3
Related Key Truncated Differential Attacks on XTEA
In this section, we first briefly describe the XTEA algorithm and introduce an 8round related key truncated differential characteristic of XTEA, which is similar to that of [3] 1 . Then, we show that related key differential cryptanalysis can be applied to attack several reduced-round versions of XTEA using this 8-round related key truncated differential characteristic. 3.1
Description of XTEA
XTEA is a 64-round Feistel block cipher with 64-bit block size and 128-bit key size. Operations used in XTEA are just exclusive-or, additions and shifts. As shown in Fig. 1, XTEA has a very simple round function. Let δ be the constant value 9e3779b9x and P = (Ln , Rn ) be the input to the n-th round, for 1≤n≤64. Then the output of the n-th round is (Ln+1 , Rn+1 ), where Ln+1 = Rn and Rn+1 is computed as follows : For each i (1≤ i≤32), if n = 2i − 1 Rn+1 = Ln (((Rn 4 ⊕ Rn 5) Rn ) ⊕ ((i − 1) · δ K((i−1)·δ11)&3 ), and if n = 2i, Rn+1 = Ln (((Rn 4 ⊕ Rn 5) Rn ) ⊕ (i · δ K(i·δ11)&3 ). XTEA has a very simple key schedule: the 128-bit master key K is split into four 32-bit blocks K0 , K1 , K2 , K3 . Then, for r = 1, · · · , 64, the round keys Kr are derived from the following equation : K( r−1 ·δ11)&3 if r is odd 2 Kr = if r is even K( r2 ·δ11)&3 Table. 3 depicts the entire key schedule. 1
There is an explicit separation between our truncated differential characteristic and that of [3]. We use the internal difference caused by the specific related key and plaintext pair, whereas S. Hong et al. [3] used the difference resulting from the plaintext pair only. So, we call this characteristic related key truncated differential characteristic in this paper.
302
Youngdai Ko et al.
Ln
Rn
F <<4
& >>5
Ln+1
Rn+1
Fig. 1. 2i-th round of XTEA
3.2
8-round Related Key Truncated Differential Characteristic
In [3], S. Hong et al. suggested an 8-round truncated differential characteristic in order to attack 23 rounds of XTEA (8∼30 or 30∼52). Here, we construct a similar 8-round related key truncated differential characteristic. See Fig. 2. Let Ψ be our 8-round related key truncated differential characteristic described in Fig. 2. Let γ be 032 or e30 and RKi be the round key of the i-th round. We consider identical input values (zero difference) to the i-th round and a related round key pair RKi and RKi = RKi ⊕ e30 . Then, the value of the 30-th bit of the right output difference in the i-th round is always one, and the other bits are all zero (except for the 31-st bit). Note that the 31-st bit is unknown. (However, we do not need to consider this value). That is, the output difference of the i-th round is (0, e30 ) or (0, e31 ⊕ e30 ) with probability 1. As shown in Fig. 2, there are three possible colors for every bit: white, black, and gray. Every white bit denotes a zero difference. The bit which we focus on is the black bit. Note that
Table 3. Key schedule of XTEA Round 1 Key
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16
K0 K3 K1 K2 K2 K1 K3 K0 K0 K0 K1 K3 K2 K2 K3 K1
Round 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Key
K0 K0 K1 K0 K2 K3 K3 K2 K0 K1 K1 K1 K2 K0 K3 K3
Round 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Key
K0 K2 K1 K1 K2 K1 K3 K0 K0 K3 K1 K2 K2 K1 K3 K1
Round 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Key
K0 K0 K1 K3 K2 K2 K3 K2 K0 K1 K1 K0 K2 K3 K3 K2
Related Key Differential Attacks on 27 Rounds of XTEA
F RK i , RKi
30
e30
F 30
F 25
(i+2)-th round RKi+2 , RKi+2
20
F 20
(i+3)-th round 15
RKi+3 , RKi+3
F
(i+4)-th round
RKi+4 , RKi+4 15
i-th round
(i+1)-th round
RKi+1 , RKi+1
25
303
10
F
(i+5)-th round
RKi+5 , RKi+5 10
5
F
(i+6)-th round RKi+6 , RKi+6
5
F 0
RKi+7 , RKi+7
0
(i+7)-th round
Fig. 2. 8-round related key truncated differential characteristic Ψ (γ = 032 or e30 )
the value of the black bit does not change throughout Ψ , while its position is shifted up to 5 bits to the right per round. And the values of the gray bits are irrelevant. That is, if for each j (i + 1≤j ≤i + 7) the relation between RKj and RKj is RKj = RKj ⊕ γ (γ = 032 or e30 ), then by the property of the round function F of XTEA, the black bit will be located at bit position 0 in the left output difference with probability 1, after (i + 7) rounds. 3.3
Related Key Truncated Differential Attacks on XTEA
Using the above 8-round related key truncated differential characteristic, we can attack 25 (1∼25) and 27 (4∼30) rounds of XTEA. Here, we apply conventional related key differential cryptanalysis as described in [4]. We exploit the property that if there exists a related key pair (K, K ) such that a non-zero input difference of two plaintexts can be changed into a zero output difference, then we can bypass several rounds for free in our attack. Attack on 25 (1∼25) Rounds of XTEA. We consider the related key pair K = (K0 , K1 , K2 , K3 ) and K =(K0 ⊕e30 , K1 , K2 , K3 ) in order to attack 25 (1∼25) rounds of XTEA. Then, according to the key schedule of XTEA, K0 and K0 ⊕ e30 are used in the first round. Assume that there exist plaintextciphertext pairs, (P, C) and (P , C ) respectively encrypted under the master keys K and K , such that the first round output value of the two plaintexts P
304
Youngdai Ko et al.
and P under the round keys K0 and K0 ⊕e30 are the same, i.e. such that the output difference of the first round is zero. (Here, we call the first 32-bit blocks of K and K , (K0 and K0 ⊕e30 ) ‘the related round key pair’). Then, due to the key schedule of XTEA, we can bypass 6 rounds for free in our attack. This means that the input difference to the 8-th round is zero. According to the key schedule of XTEA, the related round key pair, K0 and K0 ⊕e30 is reused in the 8-th round. Now, we can apply the 8-round related key truncated differential characteristic described in Fig. 2. As a result, bit position 0 of the left input difference to round 16, which is colored in black in Fig. 3, is one with probability 1. In order to obtain the above assumed plaintext pair P and P , which has the same output value after the first round under the key K and K respectively, we consider the following 1-round structure of plaintexts S(P ). S(P ) = {P, P ⊕ (e31 , 0), P ⊕ (e30 , 0), P ⊕ (e31 ⊕ e30 , 0)} We request the encryption of every plaintext in S(P ) under the related key pair K = (K0 , K1 , K2 , K3 ) and K =(K0 ⊕e30 , K1 , K2 , K3 ) respectively. Let C(P ) be the set of ciphertexts of the elements of S(P ) under the key K = (K0 , K1 , K2 , K3 ), i.e. C(P ) = {EK (P ), EK (P ⊕ (e31 , 0)), EK (P ⊕ (e30 , 0)), EK (P ⊕(e31 ⊕ e30 , 0)), }. And let C (P ) be the set of ciphertexts of the elements of S(P ) under the key K =(K0 ⊕e30 , K1 , K2 , K3 ), i.e. C (P ) = {EK (P ), EK (P ⊕ (e31 , 0)), EK (P ⊕ (e30 , 0)), EK (P ⊕(e31 ⊕e30 , 0)), }. Then, it is easy to see that there are exactly four plaintext pairs that have the same output value after the first round, i.e. we can obtain the required zero output difference after the first round. We denote these four plaintext pairs as (Pu , Pu ) where 1≤u≤4. We also denote (Cu = EK (Pu ), Cu = EK (Pu )) as the ciphertexts corresponding to plaintext Pu under key K and plaintext Pu under key K , respectively. Now we use the above property of the black bit located in bit position 0 of the left input difference in round 16, in order to attack 25 (1∼25) rounds of XTEA and recover 111 bits of the subkey derived from master key K. Algorithm ?? describes how to recover 111 bits of subkey material from the ciphertexts. We compute the difference of every dotted bit position and the values of the bit pair of every gray bit position in order to get the black bit of the left half of the input difference in round 16. (See Fig. 3). In detail, in order to compute the black bit of the input difference of round 16, (L16 [0]), we need to know the differences of R17 [0], L17 [0], and L17 [5], respectively. Also, in order to know the differences of R17 [0], L17 [0], and L17 [5] we need to know the differences of L18 [10] and R18 [5]. (For the knowledge of these differences, we need L18 [0∼10], R18 [0∼5], and K0 [0∼4].). Due to the structure of the round function of XTEA, the key and output bit positions related to the black bit increase by 5 bits per round. Consequently, if we guess all the bits of K0 , K2 , K3 , and 15 bits of K1 , and we also get the output pair after the 25-th round, we can compute the black bit of the input difference in round 16. In Algorithm ??, σ denotes the function which outputs the one-bit difference in L16 [0] using a given ciphertext pair and the guessed key bits. Let K be the concatenation of K1 [0∼14], K2 , and K3 , i.e., K = K1 [0∼14]||K2||K3 . Note that
Related Key Differential Attacks on 27 Rounds of XTEA
305
K is a 79-bit string. In Algorithm 1, we guess K0 and K. Using this algorithm, we are able to find 111 bits of K = (K0 , K1 , K2 , K3 ). 1
2
Input : m structures : S(P ), S(P ), · · · , S(P
m
),
corresponding m pairs of ciphertexts : (C(P 1 ), C (P 1 )), · · · , (C(P m ), C (P m )) Output : 111-bit partial key value of K = (K0 , K1 , K2 , K3 ) 1. For K0 = 0, 1, . . . , 232 − 1 1.1. For i = 1, . . . , m
1.1.1 Find the four plaintext pairs (Pui , Pui ), 1 ≤ u ≤ 4, such that for each u, the following two conditions hold:
(a)Pui = (LP i ⊕ v)||RP i and Pui = (LP i ⊕ w)||RP i for some v, w ∈ {0, e31 , e30 , (e31 ⊕ e30 )}. (b)[(LP i ⊕ v) F (RP i , K0 )] ⊕ [(LP i ⊕ w) F (RP i , K0 ⊕ e30 )] = 0 // Here, we use the notation LP i for the left half and RP i for the right half of the plaintext P i .// 1.2. For K = 0, 1, . . . , 279 − 1 1.2.1. For i = 1, . . . , m 1.2.1.1 For u = 1, . . . , 4
i i i Compute σu = σ(Cu , Cu , K0 , K)
If i = m, u = 4, and
i σu
= 1, then output K0 , K and stop.
i Else if σu = 0, goto 1.2.
Algorithm 1. Related key truncated differential attack on 25 rounds of XTEA
The output of the algorithm is the right value of some 111 bits of K = (K0 , K1 , K2 , K3 ) with high probability if m is sufficiently large. For each i (0≤i≤2111 −1), the probability that the attack algorithm outputs the i-th keycandidate is (1−2−4m )i . So, the average success rate of this attack is 2
−111
−1 2111
111
(1 − 2−4m )i = 24m−111 (1 − (1 − 2−4m )2
)
i=0 111−4m
≈ 24m−111 (1 − e−2
).
Let k be a key candidate (0≤k≤2k −1). For m structures of plaintexts, the expected number of trials, required until each k is determined as a wrong value, is 1 + 2−1 + 2−2 + · · · + 2−4m+1 = 2 − 2−4m+1 . If a key k is right, then the number of trials is exactly 4m. Thus, the average number of trials in the attack is 2
−k
k −1 2
i · (2 − 2−4m+1 ) + 4m = 4m + (1 − 2−4m )(2k − 1).
i=0
So, if we get ≈ 29 plaintext structures, the attack on 25 rounds of XTEA will succeed on average with probability 96.9%. This success rate implies that our
306
Youngdai Ko et al. 0
5
10
15
20
25
30
0
K1 , K1
0
K0 , K0
e30
0
K0 , K0
e30
0
K1 , K1
0
K0 , K0
15
25
30
10
16-th round
0
17-th round
0
18-th round
0
e30
19-th round
0
20-th round
K2 , K2
0
21-st round
K3 , K3
0
22-nd round
20
0
5
0
31
0
31
0
31
K3 , K3
0
23-rd round
31
0
31
K2 , K2
0
24-th round
31
0
31
K0 , K0
0
25-th round
e30
Fig. 3. Related key Truncated Differential Attack on 25 rounds of XTEA
attack reduces almost all key spaces efficiently. It has a data complexity of 29·4 = 116 chosen-plaintexts and time complexity of (116+(1−2−116 )(2111 −1))· 6.5 25 ·2 ≈ 2110.05 25-round XTEA encryptions. Attack on 27 (4∼30) Rounds of XTEA. In the key schedule of XTEA, K3 is not used from the 24-th round until the 30-th round. This means that we may expand more rounds for free. Using this observation, we can attack 27 (4∼30) rounds of XTEA. This attack only differs from the attack on 25 (1∼25) rounds of XTEA in two aspects. One is the use of the related key pair K = (K0 , K1 , K2 , K3 ) and K = (K0 , K1 ⊕e30 , K2 , K3 ). The other is the use of 2round plaintext structures, S (P ). First, we describe what we mean by a 2-round plaintext structure. Let P be a plaintext and A be the set of all 32-bit values whose lower 22 bits are fixed to 10 · · · 0. We define the 2-round structure of plaintexts S (P ) as follows : S (P ) = {P } ∪ {P ⊕ (w, v)|w ∈ A, v ∈ X},
Related Key Differential Attacks on 27 Rounds of XTEA
307
Table 4. Various attacks on 27 rounds of XTEA variant rounds
Key Bits
relation of keys
13-th ∼ 39-th K1 , K2 , K3 : 32 bits, respectively, K0 : 15 bits K⊕K =(0, 0, 0, e30 ) 17-th ∼ 43-rd K0 , K1 , K3 : 32 bits, respectively, K2 : 15 bits K⊕K =(0, e30 , 0, 0) 22-nd ∼ 48-th K1 , K2 , K3 : 32 bits, respectively, K0 : 20 bits K⊕K =(0, 0, e30 , 0) 31-st ∼ 57-th K0 , K2 , K3 : 32 bits, respectively, K1 : 15 bits K⊕K =(e30 , 0, 0, 0) 35-th ∼ 61-st K0 , K1 , K2 : 32 bits, respectively, K3 : 15 bits K⊕K =(0, 0, e30 , 0)
where X is the following set :
X = {01000010 · · · 0, 01000110 · · · 0, 01001110 · · · 0, 01011110 · · · 0, 01111110 · · · 0, 00111110 · · · 0, 11000010 · · · 0, 11000110 · · · 0, 11001110 · · · 0, 11011110 · · · 0, 11111110 · · · 0, 10111110 · · · 0} Note that S (P ) contains 12, 289 chosen-plaintexts and there are 12, 288 plaintext pairs of the form (P, P ⊕(w, v)) where w∈A and v∈ X. We consider encryptions of these plaintexts under the keys K = (K0 , K1 , K2 , K3 ) and K = (K0 , K1 ⊕e30 , K2 , K3 ), respectively. Then, for every subkey K2 , there exist (w1 ,v1 ), (w2 , v2 ) and (w3 , v3 ) such that the second round output differences of (P , P ⊕(w1 , v1 )), (P, P ⊕(w2 , v2 )), (P, P ⊕(w3 , v3 )) are respectively (e30 , 0), (e31 , 0) and (e30 ⊕e31 , 0). Note that this attack is starting from the 4-th round. That is, the 4-th and 5-th round correspond to the first and second round, respectively. This means that there exist four plaintexts in S (P ) which have the same property as the elements of the 1-round structure S(P ) described in the attack on 25 rounds of XTEA. Furthermore, in the key schedule, the related round key pair K1 and K1 ⊕e30 is first used in the 6-th round and then again in the 11-th round. So we can again get the same output values after the third round (6-th round) of encryption, i.e. the output difference after the third round is zero. Thus, we can bypass 5 rounds for free. Then, Ψ is applied from the eighth round (11-th round) through the fifteenth round (18-th round). Therefore, with similar methods as for the attack on 25 (1∼25) rounds of XTEA, we can recover 116 bits of the master key K (K0 , K1 , K2 , and K3 [0∼19]). Overall, we use 121 structures to attack 27 (4∼30) rounds with an expected success rate of 96.9%. This requires (121 ∗ 12289 = 1486949) ≈ 220.5 chosen-plaintexts and 115.15 (121 + (1 − 2−121 )(2116 − 1) · 7.5 27-round XTEA encryptions. 27 · 2 ≈ 2 In addition, with similar methods, various attacks on 27 rounds of XTEA are possible. Table 4. depicts these attacks. ‘Key Bits’ denotes the total number of bits in K0 , K1 , K2 , and K3 recovered by the attack.
308
Youngdai Ko et al.
Li
Ri
F S8
Ki
<<< 11 S2 S1
Li+1
Ri+1
Fig. 4. i-th round of GOST
4
Related Key Differential Attacks on GOST
In this section, we describe the specification of GOST and briefly introduce H. Seki et al.’s differential cryptanalysis of a reduced-round version of it [8]. Next, we show that we can distinguish full-round GOST from a random oracle with probability 1 − 2−64 using a related key differential characteristic and also present a related key differential attack on 31 rounds of GOST. Finally, we propose a related key differential attack on full-round GOST. 4.1
Description of GOST and Previous Work
GOST is a 32-round Feistel block cipher with 64-bit block size and 256-bit key size. It iterates a simple round function F composed of key additions, eight different 4 × 4 S-boxes Si (1≤i≤8) and cyclic rotations. See Fig. 4. The key schedule of GOST is very simple. The 256-bit master key K is split into eight 32-bit blocks K1 , · · · , K8 , i.e. K = (K1 , · · · , K8 ) and each round uses one of them as shown in Table 5. Due to the subkey addition operation in the round function, the differential properties of GOST vary not only with the values of the input and output differences, but also with the value of the subkey itself. In order to minimize the dependence of the differential probability on the key, H. Seki et al. introduced the idea of using a set of differential characteristics [8]. They use two differential sets ∆ = {0abc} and ∇ = {abc0} where a, b, c ∈ {0, 1}, which respectively represent nonzero 4-bit input and output differences of an S-box. In addition,
Table 5. Key schedule of GOST Round Key
1 ...8
9 . . . 16
17 . . . 24
25 . . . 32
K1 . . . K8 K1 . . . K8 K1 . . . K8 K8 . . . K1
Related Key Differential Attacks on 27 Rounds of XTEA
K1 , K1
e31
e31
e31
F K2 , K2
e31
e31
F
e31 K2 , K2
K1 , K 1
e31
e31
e31
e31
309
1-st round e31 2-nd round e31
F
F
e31
e31 31-st round e31 32-nd round e31
Fig. 5. 32-round related key differential characteristic of GOST
they computed the following average probability of differentials for each S-box represented by pSi . Si → ∇} pSi : P rob{∆ − Si
→ ∇} varies from 0.30 to 0.75 depending both on the The value of P rob{∆ − S-box Si and on the key value. See Table. 6 (for more details, refer to [8]). Using this set of characteristics, they attack 13 rounds of GOST and also present a combined related key attack on 21 rounds of GOST.
4.2
Related Key Differential Attacks on GOST
Now we present several attacks on GOST. Distinguishing Attack. We can distinguish full-round GOST from a truly random permutation with probability 1−2−64 using a related key differential characteristic. Here, we consider an attacker that has two oracles O and O . O is the oracle which, given a plaintext P , outputs a ciphertext EK (P ) under key K = (K1 , · · · , K8 ). O is the oracle which, given a plaintext P , outputs a ciphertext EK (P ) under key K = (K1 ⊕e31 , K2 ⊕e31 , · · · , K8 ⊕e31 ). Note that the key K = (K1 , · · · , K8 ) is unknown to the attacker. However he knows the relation K⊕K = (e31 , · · · , e31 ). Let us first consider the function E as GOST. In this case, if we query O for P = (PL , PR ), and O for P = (PL ⊕ e31 , PR ⊕ e31 ) respectively, and obtain the corresponding ciphertexts C and C , then the output difference C ⊕ C of full-round GOST is always (e31 , e31 ). More specifically, it is easy to see that for every round, the input difference of each S-box after key addition is zero with probability 1. Therefore the difference between the plaintexts, (e31 , e31 ) is maintained after every round. In other words, this is a 32-round related key differential characteristic with probability 1 (See Fig. 5).
310
Youngdai Ko et al.
K1 , K1
e31
0x
e31
F K2 , K2
1-st round
e31
0x
F
e31
0x
K7 , K7
e31
2-nd round
e31
0x
F K8 , K8
e31
F
0x
23-rd round 0x 24-th round
e31
Fig. 6. 24-round related key differential characteristic (1∼24) with probability 1
If we consider the function E as a truly random permutation, then the output pair of the truly random permutation is unpredictable so that we can’t obtain any information from it. So we can successfully distinguish full-round GOST from a truly random permutation with high probability, namely 1−2−64 . Note that this distinguishing attack is possible with only two chosen plaintexts under the given key relation. Related Key Differential Attack on 31 Rounds of GOST. We use the differential probability for each S-box presented in [8], as their differential characteristic enables us to mount a related key differential attack on 31 rounds of GOST. For this attack we consider the following two related keys K and K . K = (K1 , K2 , K3 , K4 , K5 , K6 , K7 , K8 ) K = (K1 ⊕ e31 , K2 , K3 ⊕ e31 , K4 , K5 ⊕ e31 , K6 , K7 ⊕ e31 , K8 ) We request the encryption of P = (PL , PR ) under key K and of P = (PL , PR ⊕e31 ) under key K . Then we obtain a 24-round related key differential characteristic with probability 1. See Fig. 6. With this 24-round related key differential characteristic, we can bypass 24 rounds for free in our attack. As shown in Fig. 6., the output difference of the 24-th round is (0, e31 ), i.e. the input difference of the 25-th round is (0, e31 ). We use the set of differential characteristics [8] mentioned in Section 4.1 in order to construct another 6-round related key differential characteristic from the 25-th round through the 30-th round. See Fig. 7. In this figure, # denotes an element of the differential set ∆ = {0abc}, where a, b, c ∈ {0, 1}. In the 25-th round, the input difference of
Related Key Differential Attacks on 27 Rounds of XTEA
311
Table 6. Average differential probability of each S-box p S1 p S2 p S3 p S4 p S5 p S6 p S7 p S8 0.43 0.38 0.37 0.37 0.37 0.35 0.47 0.45
0x
80000000 x
K8 , K 8
25-th round
F
p:
00000#00 x K7 , K7
e31
00000#00 x
F
26-th round
p:
80#00#00 x K6 , K 6
80#00#00 x
F
27-th round
p:
00#00#0# x K5 , K5
e31
00#00#0# x
F
28-th round
p:
80#0##0# x K4 , K 4
80#0##0# x
F
29-th round
p:
0##0##0# x K3 , K3
e31
0##0##0# x
F
30-th round
p:
8##0#### x 0##0##0# x
8##0#### x
Fig. 7. 6-round related key differential characteristic (rounds 25∼30)
round function F , 80000000x becomes 00000#00x with probability 34 . 2 After the 25-th round, each average related key differential probability is computed using Table. 6 [8]. Thus, combining these two related key differential characteristics, we construct a 30-round related key differential characteristic from the 1-st round to the 30th-round of GOST with probability about 2−23.33 . Now, we consider a 1R [1] related key differential attack on 31 rounds of GOST using the above constructed 30-round related key differential characteristic. Considering 226 chosen plaintext pairs, there remain about 212 ciphertext pairs after the filtering step. Among them, we expect that there exist at least 5 right pairs. A wrong key is counted with probability 2−17 by the above 30-round related key differential characteristic. The signal-to-noise ratio SN [1] of this related key differential characteristic is about 222.67 . Thus, according to [9], we can 2
S
8 → ∇ = {abc0}} because of the We only need to compute the probability P rob{1000 − structure of the round function F . This probability is easily checked by simulation and also represented in [8].
312
Youngdai Ko et al.
recover the 32 bits of the 31-st round subkey with about 226 chosen plaintexts 1 and time complexity (232 × 226 × 2−14 × 31 ) ≈ 239 with an expected success rate of 97.9%. Full Rounds Attack on GOST. In this section, we suggest an algorithm to find 12 bits of K1 with high probability of success. Consider P = (PL , PR ) and P = (PL ⊕ e30 , PR ⊕ e30 ) encrypted under keys K = (K1 , · · · , K8 ) and K = (K1 ⊕e30 , K2 ⊕e30 , · · · , K8 ⊕e30 ) respectively. Then, after key addition, in each round the input difference becomes 0 with probability 2−1 . Thus, we can construct a 30-round related key differential characteristic with probability 2−30 as shown in Fig. 8. In this figure, white bits denote a zero difference, black bits denote a nonzero difference and gray bits are unknown. Let C = (CL , CR ), C = (CL , CR ) be the ciphertexts of P and P under keys K and K , respectively and assume that (P, P ) is a right pair for a related key differential characteristic such as described in Fig. 8. (i.e. the output difference after round 30 is (e30 , e30 )). Then there are four types of differential characteristics C1, C2, C3 and C4 as listed below. = e30 and CL ⊕CL = e30 . C1. CR ⊕CR This case means that the input differences of the S-boxes in round 31 and round 32 are 0, so we can recover K1 [30] by checking CR [30] + K1 [30] = [30] + (K1 [30] ⊕ 1) where “+” means integer addition. CR = e30 , (CL ⊕CL )[0 ∼ 6] = 0, (CL ⊕CL )[11 ∼ 29] = 0 and C2. CR ⊕CR (CL ⊕CL )[31] = 0. (Refer to Fig. 8a.) This case means that the input difference of S8 in round 31 is zero, but in round 32 it is nonzero, so we can recover K1 [30] by check ing CR [30] + K1 [30] = CR [30] + (K1 [30] ⊕ 1). Also if such a pair is given, K1 [28], K1 [29], K1[31] can be recovered by checking S8(CR [28 ∼ 31] + K1 [28 ∼ 31]) ⊕ CL [7 ∼ 10] = S8(CR [28 ∼ 31] + K1 [28 ∼ 31]) ⊕ CL [7 ∼ 10] or S8(CR [28 ∼ 31] + 1 + K1 [28 ∼ 31]) ⊕ CL [7 ∼ 10] = S8(CR [28 ∼ 31] + 1 + K1 [28 ∼ 31]) ⊕ CL [7 ∼ 10].
If we add CR [0 ∼ 27] to K1 [0 ∼ 27], a carry may occur at the 27-th bit position, so we need to check the above two equations. We denote S8(CR [28 ∼ 31] + K1 [28 ∼ 31]) ⊕ CL [7 ∼ 10] by Fkj (CR [28 ∼ 31]) ⊕ CL [7 ∼ 10] in Algorithm 2.
= e30 , (CR ⊕CR )[7] = 0 C3. CR ⊕CR This case means that the input difference of S8 in round 31 is nonzero and (CR ⊕CR )[7] = 0. In this case, if we know K1 [0 ∼ 11], we can compute the ))[11 ∼ 22]. So K1 [8 ∼ 11] exact value of (FK1 (CR ))[11 ∼ 22] and (FK1 (CR can be recovered by checking (FK1 [0∼11] (CR ))[19 ∼ 22] ⊕ CL [19 ∼ 22] = (FK1 [0∼11] (CR ))[19 ∼ 22] ⊕ CL [19 ∼ 22]. Note that (FK1 [0∼7] (CR ))[11 ∼ 18] ⊕ CL [11 ∼ 18] = (FK1 [0∼7] (CR ))[11 ∼
Related Key Differential Attacks on 27 Rounds of XTEA
30
P L , P' L
P R , P' R
30
F
K 1 ,K 1
e30
S-box
<<<11
1-st round : p=2 -1
30 -th round : p=2-1 30
30
F e 30
K2 ,K2 S
<<<11
30
31 -st round
30
F 31 28
K1 ,K1
e 30
S
<<<11
32 -nd round
10 7
30
10
30
7
C L ,CL '
C R ,CR '
a : C2
30
P L ,PL'
P R ,PR '
30
F
K1 ,K1
e30
S-box
<<<11
1-st round : p= 2-1 30 -th round : p= 2-1
30
30
F 31 28
e30
K 2 ,K 2 S
<<<11
31 -st round
10 7
30
30
10
7
F e 30
K1 ,K1 S
<<<11
32 -nd round
14 11 7 4 14 11
30
C L ,CL '
10
7
C R ,CR '
b : C4
Fig. 8. 32-round related key differential characteristic of GOST
313
314
Youngdai Ko et al.
18] ⊕ CL [11 ∼ 18] for any arbitrary candidate key K1 [0 ∼ 7], so we cannot find the right value K1 [0 ∼ 7] with high probability. That is the reason why we only consider recovering K1 [8 ∼ 11].
= e30 , (CR ⊕CR )[7] = 0. (Refer to Fig. 8b.) C4. CR ⊕CR This case means that the input difference of S8 in round 31 is nonzero )[7] = 0. By similar arguments as for case C3 we can reand (CR ⊕ CR cover K1 [4 ∼ 11] by checking (FK1 [0∼11] (CR ))[15 ∼ 22] ⊕ CL [15 ∼ 22] = (FK1 [0∼11] (CR ))[15 ∼ 22] ⊕ CL [15 ∼ 22].
Note that the key bits found in case C1 and C3 can also be found in case C2 and C4, respectively. An attack algorithm is given in Appendix A. Let us consider the success probability of Algorithm 2. If we choose 235 pairs, there exist at least 4 pairs satisfying the conditions C2 and C4 respectively, with probability about 0.96. Also there are at most 15 wrong pairs surviving the filtering step with probability about 0.99. Since the probability that a wrong key is counted at most 3 times in step 3 and step 4 is about 1, the success probability of Algorithm 2 is about 0.917 using about 2 × 235 encryptions.
5
Conclusion
We presented related key differential attacks on XTEA and GOST. In the case of XTEA, we use 121 structures to attack 27 rounds of XTEA with an expected success rate of 96.9%; this attack requires about 220.5 chosen-plaintexts and 2115.15 27-round XTEA encryptions. Furthermore, we can successfully distinguish the block cipher GOST from a random permutation with probability 1−2−64 and attack full-round GOST. As a result, we can recover 12 bits of the master key with an expected success rate of 91.7% using 235 chosen plaintexts, in 236 encryption operations. Therefore, we believe that our result is valuable to analyze the security of GOST.
Acknowledgements The authors are thankful to Deukjo Hong for discussing XTEA and very much appreciate Helena Handschuh’s and the anonymous referees’ aid in improving the presentation of this work.
References [1] E. Biham and A. Shamir, “Differential Cryptanalysis of the Data Encryption Standard”, Springer-Verlag, 1993. 311 [2] GOST, Gosudarstvennyi Standard 28147-89, “Cryptographic Protection for Data Processing Systems”, Government Committee of the USSR for Standards, 1989. 299
Related Key Differential Attacks on 27 Rounds of XTEA
315
[3] S. Hong, D. Hong, Y. Ko, D. Chang, W. Lee, and S. Lee, “Differential Cryptanalysis of TEA and XTEA”, Pre-Proceedings of the 6th Annual International Conference on Information Security and Cryptology (ICISC ’03), Lecture Notes in Computer Science. Springer-Verlag, 2003. pp. 413-428. 299, 300, 301, 302 [4] J. Kelsey, B. Schneier, and D. Wagner, “Key Schedule Cryptanalysis of IDEA, G-DES, GOST, SAFER, and Triple-DES”, Advances in Cryptology - CRYPTO ’96, volume 1109 of Lecture Notes of Computer Science, Springer-Verlag, 1996, pp. 237-251. 300, 303 [5] J. Kelsey, B. Schneier, and D. Wagner, “Related-Key Cryptanalysis of 3-WAY, Biham-DES, CAST, DES-X, NewDES, RC2, and TEA”, Proceedings of International Conference on Information and Communications Security (ICICS ’97), volume 1334 of Lecture Notes of Computer Science, Springer-Verlag, 1997, pp. 233-246. 299 [6] D. Moon, K. Hwang, W. Lee, S. Lee, and J. Lim, “Impossible Differential Cryptanalysis of Reduced Round XTEA and TEA”, Fast Software Encryption ’02, volume 2365 of Lecture Notes of Computer Science, Springer-Verlag, 2002, pp. 49-60. 300 [7] R. Needham and D. Wheeler, “eXtended Tiny Encryption Algorithm”, October, 1997. 299 [8] H. Seki and T. Kaneko, “Differential Cryptanalysis of Reduced Rounds of GOST”, Seventh Annual Workshop on Selected Areas in Cryptography (SAC ’00), volume 2012 of Lecture Notes of Computer Science, Springer-Verlag, 2001, pp. 315-323. 300, 308, 309, 310, 311 [9] A. Sel¸cuk and A. Bi¸cak “On Probability of Success in Linear and Differential Cryptanalysis”, Third International Conference, SCN 2002, volume 2365 of Lecture Notes of Computer Science, Springer-Verlag, 2002, pp. 174-185. 311 [10] D. Wheeler and R. Needham, “TEA, a Tiny Encryption Algorithm”, Fast Software Encryption, Second International Workshop Proceedings, volume 1008 of Lecture Notes of Computer Science, Springer-Verlag, 1995, pp. 97-110. 299
316
6
Youngdai Ko et al.
Appendix A Assumption : The attacker knows that K ⊕ K is equal to (e30 , e30 , · · · , e30 ) Input : (Pi , Pi ), (i = 1, · · · , 235 ) where Pi ⊕ Pi = (e30 , e30 ) as in Fig. 8 Output: 12-bit partial key K1 ; K1 [4 ∼ 11] and K1 [28 ∼ 31] // Setup stage //
· LetK = {k1 , k2 , · · · , k24 } and K = {k1 , k2 , · · · , k28 } be the set of candidate keys for K1 [28 ∼ 31] and K1 [4 ∼ 11], respectively · D, D : empty set · ctr1 = 0, · · · , ctr24 = 0, ctr1 = 0, · · · , ctr2 8 = 0 // Filtering step // 1. For i = 1, . . . , 235 1.1. Request the ciphertexts Ci = EK (Pi ) and Ci = EK (Pi ) If Ci ⊕ Ci satisfies condition C2, D = D ∪ {(Ci , Ci )}
If Ci ⊕ Ci satisfies condition C4, D = D ∪ {(Ci , Ci )} // Finding key K1 [28 ∼ 31] // 2. For each (Ci , Ci ) ∈ D
, CR ) ∗/ / ∗ For convenience let Ci = (CL , CR ) and Ci = (CL
2.1.
For j = 1, . . . , 24 If Fkj (CR [28 ∼ 31]) ⊕ CL [7 ∼ 10] = Fkj (CR [28 ∼ 31]) ⊕ CL [7 ∼ 10],
ctrj + = 1 If Fkj (CR [28 ∼ 31] + 1) ⊕ CL [7 ∼ 10] = Fkj (CR [28 ∼ 31] + 1) ⊕ CL [7 ∼ 10],
ctrj + = 1 If ctrj ≥ 4, output kj as K1 [28 ∼ 31] and goto 3 // Finding key K1 [4 ∼ 11] // 3. For each (Cm , Cm ) ∈ D
/ ∗ For convenience let Ci = (CL , CR ) and Ci = (CL , CR ) ∗/
3.1. 3.2.
For j = 1, . . . , 28 For i = 0, . . . , 24 − 1 )[15 ∼ 22] ⊕ CL [15 ∼ 22], If Fk ||i (CR )[15 ∼ 22] ⊕ CL [15 ∼ 22] = Fk ||i (CR j
j
ctrj + = 1 / ∗ Since kj denotes K1 [4 ∼ 11] and i denotes K1 [0 ∼ 3], kj ||i denotes K1 [0 ∼ 11] ∗ / 3.3.
If ctrj ≥ 4, output kj as K1 [4 ∼ 11] and terminate this algorithm
Algorithm 2: 32-round related key differential attack on GOST.
On the Additive Differential Probability of Exclusive-Or Helger Lipmaa1 , Johan Wall´en1 , and Philippe Dumas2 1
Laboratory for Theoretical Computer Science Helsinki University of Technology P.O.Box 5400, FIN-02015 HUT, Espoo, Finland {helger,johan}@tcs.hut.fi 2 Algorithms Project INRIA Rocquencourt, 78153 Le Chesnay Cedex, France [email protected]
Abstract. We study the differential probability adp⊕ of exclusive-or when differences are expressed using addition modulo 2N . This function is important when analysing symmetric primitives that mix exclusive-or and addition—especially when addition is used to add in the round keys. (Such primitives include idea, Mars, rc6 and Twofish.) We show that adp⊕ can be viewed as a formal rational series with a linear representation in base 8. This gives a linear-time algorithm for computing adp⊕ , and enables us to compute several interesting properties like the fraction of impossible differentials, and the maximal differential probability for any given output difference. Finally, we compare our results with the dual results of Lipmaa and Moriai on the differential probability of addition modulo 2N when differences are expressed using exclusive-or. Keywords: Additive differential probability, differential cryptanalysis, rational series.
1
Introduction
Symmetric cryptographic primitives like block ciphers are typically constructed from a small set of simple building blocks like bitwise exclusive-or and addition modulo 2N . Surprisingly little is known about how these two operations interact with respect to different cryptanalytic attacks, and some of the fundamental relations between them have been established only recently [LM01, Lip02, Wal03]. Our goal is to share light to this question by studying the interaction of these two operations in one concrete application: differential cryptanalysis [BS91], by studying the differential probability of exclusive-or when differences are expressed using addition modulo 2N . This problem is dual to the one explored by Lipmaa and Moriai [LM01, Lip02]. We hope that our results will be helpful in evaluating the precise security of ciphers that mix addition and exclusive-or against differential cryptanalysis.
B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 317–331, 2004. c International Association for Cryptologic Research 2004
318
Helger Lipmaa et al.
Differential Cryptanalysis. Differential cryptanalysis studies the propagation of differences in functions. Let G, H be Abelian groups and let f : G → H be a function. The input difference x − x∗ ∈ G is said to propagate to the output difference f (x) − f (x∗ ) ∈ H through f . A differential of f is a pair (α, β) ∈ G × H. This is usually denoted by α → β. If the difference between x, x∗ ∈ G is x − x∗ = α, the differential α → β can be used to predict the corresponding output difference f (x) − f (x∗ ). It is thus natural to measure the efficiency of a differential by its differential probability dpf (α → β) = Pr [f (x + α) − f (x) = β] . x∈G
When a cipher uses both bitwise exclusive-or and addition modulo 2N , both operators are natural choices for expressing differences (depending on how the round keys are added). Depending on this choice, one must either study the differential properties of addition when differences are expressed using exclusive-or, or the dual differential probability of exclusive-or when differences are expressed using addition modulo 2N . The differential probability of addition was studied in detail by [LM01, Lip02]. However, the dual differential probability of exclusive-or has remained open. This dual case is just as interesting in practise, since most of the popular block ciphers that mix addition and exclusive-or use addition—and not exclusive-or—for adding in the round keys. (Examples include idea [LMM91], Mars [BCD+ 98], rc6 [RRSY98] and Twofish [SKW+ 99].) We will exclusively deal with the set {0, 1, . . . , 2N − 1} equipped with two group operations. On one hand, we use the usual addition modulo 2N , which we denote by +. On the other hand, we identify {0, 1, . . . , 2N − 1} and the set ZN 2 of N -tuples of bits using the natural correspondence that identifies xN −1 2N −1 + · · · + x1 2 + x0 ∈ Z2N with (xN −1 , . . . , x1 , x0 ) ∈ ZN 2 . In this way the usual componentwise addition ⊕ in ZN 2 (or bitwise exclusive-or) carries over to a group operation in {0, 1, . . . , 2N − 1}. We can thus especially view ⊕ as a function ⊕ : Z2N × Z2N → Z2N . We call the differential probability of the resulting mapping the additive differential probability of exclusive-or and denote it by adp⊕ : Z32N → [0, 1], adp⊕ (α, β → γ) = Pr [((x + α) ⊕ (y + β)) − (x ⊕ y) = γ] . x,y
(1)
The dual mapping, the exclusive-or differential probability of addition, denoted xdp+ : Z32N → [0, 1], is given by xdp+ (α, β → γ) = Pr [((x ⊕ α) + (y ⊕ β)) ⊕ (x + y) = γ] . x,y
This dual mapping was studied in detail by Lipmaa and Moriai [LM01, Lip02], who gave a closed formula for xdp+ . Their formula in particular leads to an Θ(log N )-time algorithm for computing xdp+ and the differential probability of some related mappings like the pseudo-Hadamard transform [Lip02].
On the Additive Differential Probability of Exclusive-Or
319
Our contributions. In this paper, we present a detailed analysis of the mapping adp⊕ : Z32N → [0, 1]. This concrete problem has been addressed (and in a rather ad hoc manner) in a few papers, including [Ber92], but it has never been addressed completely—probably because of its “apparent complexity” [Ber92]. We show that adp⊕ can be expressed as a formal rational series in the sense of formal language theory with a linear representation in base 8. That is, there are eight square matrices Ai , a column vector C and a row vector L, such that if we write the differential (α, β → γ) as an octal word w = wN −1 · · · w1 w0 in a natural way, adp⊕ (α, β → γ) = adp⊕ (w) = LAwN −1 · · · Aw1 Aw0 C . This representation immediately gives a linear-time algorithm for computing adp⊕ . This should be be compared to the na¨ıve Θ(22N )-time algorithm which seems to be the only previously known algorithm for adp⊕ . In addition, we derive some other properties, like the fraction 37 + 47 · 81N of differentials with nonzero probability, and determine the maximal differential probability maxα,β adp⊕ (α, β → γ) for any given output difference γ. Finally, we show how our approach based on rational series easily can be adapted for studying the dual mapping xdp+ . The paper is organised as follows. We first show that adp⊕ is a rational series and derive a linear representation for it. This gives an efficient algorithm that computes adp⊕ (w) in time O(|w|). In Sect. 3, we discuss the distribution of adp⊕ and differentials with maximal probability. Sect. 4 describes how similar methods can be used to analyse xdp+ . The appendix contains some omitted proofs.
2
Rational Series adp⊕
Throughout this paper, we let N denote the default word length. We will consider adp⊕ as a function of octal words by writing the differential (α, β → γ) as the octal word w = wN −1 · · · w0 , where wi = αi 4 + βi 2 + γi . This defines adp⊕ as a function from the octal words of length N to the interval [0, 1]. As N varies in the set of nonnegative integers, we obtain a function from the set of all octal words to [0, 1]. In the terminology of formal language theory, the additive differential probability adp⊕ is a formal series over the monoid of octal words with coefficients in the field of real numbers. A remarkable subset of these series is the set of rational series [BR88]. One possible characterisation of such a rational series S is the following: there exists a square matrix Ak of size q × q for each letter k in the alphabet, a row matrix L of size 1×q and a column matrix C of size q ×1 such that for each word w = w1 · · · w , the value of the series is S(w) = LAw1 · · · Aw C. The family L,(Ak )k , C is called a linear representation of dimension q of the rational series. In our case, the alphabet is the octal alphabet {0, 1, . . . , 7}. Theorem 1 (Linear representation of adp⊕ ). The formal series adp⊕ has the 8-dimensional linear representation L, (Ak )k , C, where L = 1 1 1 1 1 1 1 1 ,
320
Helger Lipmaa et al.
C= 10000000 , ⎞ 40010110 ⎜0 0 0 1 0 1 0 0⎟ ⎟ ⎜ ⎜0 0 0 1 0 0 1 0⎟ ⎟ ⎜ 1⎜ 0 0 0 1 0 0 0 0⎟ ⎟ , ⎜ A0 = ⎜ 4 ⎜0 0 0 0 0 1 1 0⎟ ⎟ ⎜0 0 0 0 0 1 0 0⎟ ⎟ ⎜ ⎝0 0 0 0 0 0 1 0⎠ 00000000 ⎛
and Ak , k = 0, is obtained from A0 by permuting row i with row i ⊕ k and column j with column j ⊕ k: (Ak )ij = (A0 )i⊕k,j⊕k . (For completeness, the matrices A0 , . . . , A7 are given in Table 1.) Thus, adp⊕ is a rational series. For example, the differential (α, β → γ) = (00110, 10100 → 01110) corresponds to the octal word w = 21750 and adp⊕ (α, β → γ) = adp⊕ (w) = 5 LA2 A1 A7 A5 A0 C = 32 . The linear representation immediately implies that ⊕ adp (w) can be computed using O(|w|) arithmetic operations. Since the arithmetic operations can be carried out using 2|w|-bit integer arithmetic, which can be implemented in constant time on a |w|-bit ram model, we have Corollary 1. The additive differential probability adp⊕ (w) can be computed in time O(|w|) on a standard unit cost |w|-bit ram model of computation. This can be compared with the O(log|w|)-time algorithm for computing xdp+ (w) from [LM01]. As a side remark (we will not use this result later), note that the matrices A0 , . . . A7 in the linear representation for adp⊕ are substochastic. Thus, we could view the linear representation as a inhomogeneous Markov chain by adding a dummy state and dummy state transitions. The rest of this section is devoted to the technical proof of Theorem 1. To prove this result, we will first give a different formulation of adp⊕ . For x, y ∈ {0, . . . , 2N − 1}, let xy denote their componentwise product in ZN 2 (equivalently, the bitwise and of two N -bit strings). Let borrow(x, y) = x ⊕ y ⊕ (x − y)
Table 1. All eight matrices Ai
A0 = 1 4
A4 = 1 4
⎛4 0 ⎜0 0 ⎜ ⎜0 0 ⎜ ⎜0 0 ⎜ ⎜0 0 ⎜ ⎜0 0 ⎝ 0 0 0 0 ⎛0 1 ⎜0 1 ⎜ ⎜0 0 ⎜ ⎜0 0 ⎜ ⎜0 1 ⎜ ⎜0 1 ⎝0 0 0 0
0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0
1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0
1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0
0⎞ 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎠ 0 0 0⎞ 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 1⎟ ⎟ 1⎟ 1⎠ 1
A1 = 1 4
A5 = 1 4
⎛0 0 ⎜0 4 ⎜ ⎜0 0 ⎜ ⎜0 0 ⎜ ⎜0 0 ⎜ ⎜0 0 ⎝ 0 0 0 0 ⎛1 0 ⎜1 0 ⎜ ⎜0 0 ⎜ ⎜0 0 ⎜ ⎜1 0 ⎜ ⎜1 0 ⎝0 0 0 0
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1
1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
0⎞ 1⎟ ⎟ 0⎟ ⎟ 1⎟ ⎟ 0⎟ ⎟ 1⎟ ⎠ 0 1 0⎞ 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 0⎟ 0⎠ 0
A2 = 1 4
A6 = 1 4
⎛0 1 ⎜0 1 ⎜ ⎜0 1 ⎜ ⎜0 1 ⎜ ⎜0 0 ⎜ ⎜0 0 ⎝ 0 0 0 0 ⎛1 0 ⎜0 0 ⎜ ⎜1 0 ⎜ ⎜0 0 ⎜ ⎜1 0 ⎜ ⎜0 0 ⎝1 0 0 0
0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1
1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0
0⎞ 0⎟ ⎟ 1⎟ ⎟ 1⎟ ⎟ 0⎟ ⎟ 0⎟ ⎠ 1 1 0⎞ 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 0⎟ 0⎠ 0
A3 = 1 4
A7 = 1 4
⎛1 0 ⎜1 0 ⎜ ⎜1 0 ⎜ ⎜1 0 ⎜ ⎜0 0 ⎜ ⎜0 0 ⎝ 0 0 0 0 ⎛0 0 ⎜0 1 ⎜ ⎜0 0 ⎜ ⎜0 1 ⎜ ⎜0 0 ⎜ ⎜0 1 ⎝0 0 0 1
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1
0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0
0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0
0⎞ 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎠ 0 0 0⎞ 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 0⎟ 0⎠ 4
On the Additive Differential Probability of Exclusive-Or
321
denote the borrows, as an N -tuple of bits, in the subtraction x − y. Alternatively, borrow(x, y) can be recursively defined by borrow(x, y)0 = 0 and borrow(x, y)i+1 = 1 if and only if xi − borrow(x, y)i < yi as integers. This can be used to define borrow(x, y)N = 1 if and only if xN −1 − borrow(x, y)N −1 < yN −1 as integers. The borrows can be used to give an alternative formulation of adp⊕ . Lemma 1. For all α, β, γ ∈ Z2N , adp⊕ (w) = Pr[a ⊕ b ⊕ c = α ⊕ β ⊕ γ] , x,y
where a = borrow(x, α), b = borrow(y, β) and c = borrow(x⊕y, (x−α)⊕(y−β)). Proof. By replacing x and y with x − α and y − β in the definition (1) of adp⊕ , we see that adp⊕ (α, β → γ) = Prx,y [(x ⊕ y) − ((x − α) ⊕ (y − β)) = γ]. Since (x ⊕ y) − ((x − α) ⊕ (y − β)) = γ if and only if γ = c ⊕ x ⊕ y ⊕ (x − α) ⊕ (y − β) = a ⊕ b ⊕ c ⊕ α ⊕ β if and only if a ⊕ b ⊕ c = α ⊕ β ⊕ γ, the result follows. We furthermore need the following technical lemma. Lemma 2. For all x, y, α, β, γ, ai+1 = (aa ⊕ α ⊕ a x)i , bi+1 = (bb ⊕ β ⊕ b y)i and ci+1 = [c ⊕ a ⊕ b ⊕ c(a ⊕ b ) ⊕ (a ⊕ b )(x ⊕ y)]i , where a = borrow(x, α), b = borrow(y, β), c = borrow(x ⊕ y, (x − α) ⊕ (y − β)), a = a ⊕ α and b = b ⊕ β. Proof. By the recursive definition of borrow(x, y), borrow(x, y)i+1 = 1 if and only if xi < yi + borrow(x, y)i as integers. The latter event occurs if and only if either yi = borrow(x, y)i and at least two of xi , yi and borrow(x, y)i are one, or yi = borrow(x, y)i and at least two of xi , yi and borrow(x, y)i are zero. That is, borrow(x, y)i+1 = 1 if and only if yi ⊕borrow(x, y)i ⊕maj(xi , yi , borrow(x, y)i ) = 1, where maj(u, v, w) denotes the majority of the bits u, v, w. Since maj(u, v, w) = uv⊕uw⊕vw, we have borrow(x, y)i+1 = [y ⊕borrow(x, y)⊕xy⊕x borrow(x, y)⊕ y borrow(x, y)]i . For a, we thus have ai+1 = (α ⊕ a ⊕ xα ⊕ xa ⊕ αa)i = (a ⊕ a α ⊕ a x)i = [a ⊕ a (a ⊕ a) ⊕ a x]i = (aa ⊕ α ⊕ a x)i . The formula for bi+1 is completely analogous. For c, we have ci+1 = [(x − α) ⊕ (y − β) ⊕ c ⊕ (x ⊕ y)((x − α) ⊕ (y − β)) ⊕ (x ⊕ y)c ⊕ ((x − α) ⊕ (y − β))c]i = [x ⊕ a ⊕ y ⊕ b ⊕ c ⊕ (x ⊕ y)(x ⊕ a ⊕ y ⊕ b ) ⊕ (x ⊕ y)c ⊕ (x ⊕ a ⊕ y ⊕ b )c]i = [c ⊕ a ⊕ b ⊕ c(a ⊕ b ) ⊕ (a ⊕ b )(x ⊕ y)]i . Proof (of Theorem 1). Let (α, β → γ) be the differential associated with the word w. Denote N = |w| and let x, y be uniformly distributed random variables in Z2N . Denote a = borrow(x, α), b = borrow(y, β) and c = borrow(x ⊕ y, (x − α) ⊕ (y − β)). Let ξ be the octal word of borrow triples, ξi = ai 4 + bi 2 + ci . We define ξN in the natural way using borrow(u, v)N = 1 if and only if uN −1 −
322
Helger Lipmaa et al.
borrow(u, v)N −1 < vN −1 as integers. For compactness, denote xor(w) = α⊕β⊕γ and xor(ξ) = a ⊕ b ⊕ c. Let P (w, k) be the 8 × 1 substochastic matrix Pj (w, k) = Pr[xor(ξ) ≡ xor(w) x,y
(mod 2k ), ξk = j]
for 0 ≤ k ≤ N . Let M (w, k) be the 8 × 8 substochastic transition matrix Mij (w, k) = Pr[xor(ξ)k = xor(w)k , ξk+1 = i | x,y
xor(ξ) ≡ xor(w)
(mod 2k ), ξk = j]
for 0 ≤ k < N . Then Pi (w, k + 1) = j Mij (w, k)Pj (w, k) and thus P (w, k + 1) = M (w, k)P (w, k). Note furthermore that P (w, 0) = C, since a0 = b0 = c0 = 0, and that LP (w, N ) = j Prx,y [xor(ξ) ≡ xor(w) (mod 2N ), ξN = j] = Prx,y [xor(ξ) ≡ xor(w) (mod 2N )] = adp⊕ (w), where the last equality is due to Lemma 1. We will show that M (w, k) = Awk for all k. By induction, it follows that adp⊕ (w) = LP (w, N ) = LM (w, N − 1) · · · M (w, 0)C = LAwN −1 · · · Aw0 C. It remains to show that M (w, k) = Awk for all k. Towards this end, let x, y be such that xor(ξ) ≡ xor(w) (mod 2k ) and ξk = j. We will count the number of ways we can choose (xk , yk ) such that xor(ξ)k = xor(w)k and ξk+1 = i. Denote a = a ⊕ α, b = b ⊕ β and c = c ⊕ γ. Note that xor(ξ)k = xor(w)k if and only if ck = (a ⊕ b )k . Under the assumption that xor(ξ)k = xor(w)k we have (cc ⊕ γ)k = [c(a ⊕ b ) ⊕ c ⊕ a ⊕ b ]k . By Lemma 2, (xk , yk ) must thus be a solution to V xk yk = U in Z2 , where U and V are the matrices ⎞ ⎞ ⎛ ⎛ 0 (aa ⊕ α)k ⊕ ak+1 ak ⎠ 0 bk U = ⎝ (bb ⊕ β)k ⊕ bk+1 ⎠ and V = ⎝ (cc ⊕ γ)k ⊕ ck+1 (a ⊕ b )k (a ⊕ b )k over Z2 . If this equation has a solution, it has exactly 22−rank(V ) solutions. But rank(V ) = 0 if and only if ak = bk = 0 (then there are 4 solutions) and rank(V ) = 2 otherwise (then there is 1 solution). The equation has a solution (xk , yk ) exactly when rank(V ) = rank(V U ). From this and from the requirement that ck = (a ⊕ b )k , we see that there are solutions exactly in the following cases. – If ak = bk = 0, then ck = 0 and rank(V ) = 0. There are solutions (4 solutions) if and only if ak+1 = αk , bk+1 = βk and ck+1 = γk . – If ak = 0 and bk = 1 then ck = 1 and rank(V ) = 2. There is a single solution if and only if ak+1 = αk . – If ak = 1 and bk = 0, then ck = 1 and rank(V ) = 2. There is a single solution if and only if bk+1 = βk . – If ak = 1 and bk = 1 then ck = 0 and rank(V ) = 2. There is a single solution if and only if ck+1 = γk .
On the Additive Differential Probability of Exclusive-Or
323
Since j = ξk = ak 4 + bk 2 + ck and i = ξk+1 = ak+1 4 + bk+1 2 + ck+1 , the derivation so far can be summarised as ⎧ ⎪ 1 , j = (αk , βk , γk ) , i = (αk , βk , γk ) , ⎪ ⎪ ⎪ ⎪ ⎪ 1/4 , j = (αk , βk ⊕ 1, γk ⊕ 1) , i = (αk , ∗, ∗) , ⎨ Mij (w, k) = 1/4 , j = (αk ⊕ 1, βk , γk ⊕ 1) , i = (∗, βk , ∗) , ⎪ ⎪ ⎪ 1/4 , j = (αk ⊕ 1, βk ⊕ 1, γk ) , i = (∗, ∗, γk ) , ⎪ ⎪ ⎪ ⎩0 , otherwise , where we have identified the integer r2 4+r1 2+r0 with the binary tuple (r2 , r1 , r0 ) and ∗ represents an arbitrary element of {0, 1}. It follows that M (w, k) = A0 if wk = 0 and Mi,j (w, k) = Mi⊕wk ,j⊕wk (0, k). That is, M (w, k) = Awk for all w, k. This completes the proof.
3 3.1
Distribution of adp⊕ and Maximal Differentials Distribution
We will use notation from formal languages to describe octal words (and thus differentials). In particular, we will use concatenation (xy), the corresponding powers (x0 = λ is the empty word and xn+1 = xxn ), union (x + y) and the Kleene star (x∗ = n≥0 xn ). Throughout this section, L, (Ak )k , C is the linear representation of adp⊕ . We will first consider the effect of tailing and leading zeros. Corollary 2. For all octal words w, adp⊕ (w0∗ ) = adp⊕ (w). This trivial result follows from the observation that A0 C = C. Corollary 3. Let w be a word and let a = a0 · · · a7 = A|w|−1 · · · Aw0 C. Let α = a0 and β = a3 + a5 + a6 . Let w be a word of the form w = 0∗ w. Then adp⊕ (w ) = α + β3 + 83 · β · 4−(|w |−|w|) . Proof. Using a Jordan form J = P −1 A0 P ⎛ k 4k 0 0 4 3−1 ⎜0 00 1 ⎜ ⎜0 00 1 ⎜ ⎜0 00 1 Ak0 = 4−k ⎜ ⎜0 00 0 ⎜ ⎜0 00 0 ⎜ ⎝0 00 0 0 00 0
of A0 , it is easy to see that ⎞ k k 0 4 3−1 4 3−1 0 0 1 0 0⎟ ⎟ 0 0 1 0⎟ ⎟ 0 0 0 0⎟ ⎟ . 0 1 1 0⎟ ⎟ 0 1 0 0⎟ ⎟ 0 0 1 0⎠ 0 0 0 0
If we let j = |w | − |w|, we see that LAj0 a = a0 + a5 + a6 ) = α + β3 + 83 · β · 4−(|w |−|(|w)) .
4j −1 3·4j
(a3 + a5 + a6 ) +
3 4j (a3
+
324
Helger Lipmaa et al.
This means that adp⊕ (0n w) decreases with n and adp⊕ (0n w) → α + β/3 as n → ∞. This can be compared to [LM01], where it was shown that xdp+ (00w) = xdp+ (0w) for all w. Theorem 2. The additive differential probability adp⊕ (w) is nonzero if and only if w has the form w = 0∗ or w = w (3 + 5 + 6)0∗ for any octal word w . Proof. Since A1 C = A2 C = A4 C = A7 C = 0, adp⊕ (w (1 + 2 + 4 + 7)0∗ ) = 0. Conversely, let w be a word of the form w = w (3 + 5 + 6)0∗ . Let ei be the canonical (column) basis vector with a 1 in the ith component and 0 in the others. By direct computation, the kernels are ker A0 = ker A3 = ker A5 = ker A6 = e1 , e2 , e4 , e7 and ker A1 = ker A2 = ker A4 = ker A7 = e0 , e3 , e5 , e6 . For all i and j = i, ej ∈ ker Ai , it can be seen that Ai ei = ei and that Ai ej has the form Ai ej = (ek + e + em + en )/4, where k = , m = n, ek , e ∈ ker A0 and em , en ∈ ker A1 . Since C = e0 , we see by induction that Awi · · · Aw0 C ∈ ker Awi+1 for all i. Thus, adp⊕ (w) = 0. Since the matrices Ak are nonnegative, an alternative proof is the following. Since LAwN −1 · · · Aw0 C = LiN (AwN −1 )iN ,iN −1 · · · (Aw0 )i1 ,i0 Ci0 , i0 ,...,iN
we see that adp⊕ (w) is nonzero if and only if there are indexes i0 , . . . , iN such that LiN (AwN −1 )iN ,iN −1 · · · (Aw0 )i1 ,i0 Ci0 = 0. We construct a nondeterministic finite automaton with state set {0, . . . , 7, init} and input alphabet {0, . . . , 7} as follows. There is an empty transition from the initial state init to state i if and only if Li = 0. There is a transition labelled x from state i to state j if and only if (Ax )i,j = 0. The state i is accepting if and only if Ci = 0. Clearly, the automaton accepts the word w read from left to right if and only if adp⊕ (w) = 0. If we convert the automaton to a minimal deterministic automation, we obtain the following automation.
3, 5, 6
0, 3, 5, 6
0, 1, 2, 4 1
init 1, 2, 4, 7
This automaton clearly accepts the language 0∗ + (0 + 1 + · · · + 7)∗ (3 + 5 + 6)0∗ . A complete determination of the distribution of adp⊕ falls out of the scope of this paper and we will restrict ourselves to some of the most important results. First, we turn to the fraction of possible differentials—that is, differentials with adp⊕ (w) = 0.
On the Additive Differential Probability of Exclusive-Or
Corollary 4. For all N ≥ 0, Pr|w|=N [adp⊕ (w) = 0] =
3 7
+
4 7
·
1 8N
325
.
Proof. According to Theorem 2, adp⊕ (w) = 0 if and only if w is the zero word or has form w = w ξ0k , where w is an arbitrary word of length N − k − 1 and ξ ∈ {3, 5, 6}. For a fixed value of k, we can choose w and ξ in 3 · 8N −k−1 ways. N −1 Thus, there are 1 + k=0 3 · 8N −k−1 = 47 + 37 · 8N words with adp⊕ (w) = 0 in total. This result can be compared with [LM01, Theorem 2], which states that the N corresponding probability for xdp+ is Pr|w|=N [xdp+ (w) = 0] = 47 · 78 . This means, in particular, that Pr|w|=N [adp⊕ (w) = 0] → 37 while Pr|w|=N [xdp+ (w) = 0] → 0 as N → ∞. Since the number of possible differentials is larger for adp⊕ than for xdp+ , the average possible differential will obtain a smaller probability. Next, if w = (0 + 3 + 5 + 6)0∗ then clearly adp⊕ (w) = 1. On the other hand, for any ξ ∈ {0, . . . , 7}, adp⊕ (ξw) ≤ 1/2. It follows that Pr|w|=N [adp⊕ (w) = 1] = 4 · 8−N , and Pr|w|=N [adp⊕ (w) = k] = 0 if 1/2 < k < 1. One can further establish easily that adp⊕ (w) = 1/2 if and only if w = Σ(0 + 3 + 5 + 6)0∗ , where Σ = 0 + 1 + · · · + 7. The following straightforward lemma is useful in such calculations. Lemma 3. Let w be an octal word. Denote Σ0 = {0, 3, 5, 6}, Σ1 = {1, 2, 4, 7} and Aw = Aw|w|−1 · · · Aw0 . Then adp⊕ (xw) = adp⊕ (0w) for all x ∈ Σ0 and adp⊕ (yw) = adp⊕ (1w) for all y ∈ Σ1 . Thus, adp⊕ (w) = adp⊕ (xw) + adp⊕ (yw) and adp⊕ (xzw) = adp⊕ (yzw) + (Aw C)z for all x, z ∈ Σb and y ∈ Σ1−b . Proof. By direct calculation, we see that LA0 = LA3 = LA5 = LA6 = 1 0 0 1 0 1 1 0 and LA1 = LA2 = LA4 = LA7 = 0 1 1 0 1 0 0 1 . Since adp⊕ (vw) = LAv Aw C, we have adp⊕ (xw) = adp⊕ (0w) and adp⊕ (yw) = adp⊕ (1w) for all x ∈ Σ0 and y ∈ Σ1 . Finally, if x, z ∈ Σb and y ∈ Σ1−b , we have adp⊕ (w) = LAw C = (LAx + LAy )Aw C = adp⊕ (xw) + adp⊕ (yw) and adp⊕ (xzw) − adp⊕ (yzw) = (L(Ax − Ay )Az )Aw C = ez Aw C = (Aw C)z , where ez is a row vector with a 1 in column z and 0 in the other columns. 3.2
Maximal Differentials
Although many of the enumerative aspects of adp⊕ seem infeasible, some optimisation problems are surprisingly simple. For all output differences γ, denote ⊕ adp⊕ 2max (γ) = max adp (α, β → γ) . α,β
For all γ, there is a simple differential with adp⊕ (α, β → γ) = adp⊕ 2max (γ). Theorem 3. For all output differences γ, adp⊕ (0N , γ → γ) = adp⊕ 2max (γ). The proof is omitted from the conference version.
326
4
Helger Lipmaa et al.
Rational Series xdp+
Lipmaa and Moriai [LM01, Lip02] used completely different techniques to analyse the exclusive-or differential probability xdp+ of addition. We will now demonstrate the power of our approach by showing that it can easily adapt to analyse xdp+ as well. 4.1
Linear Representation
As for adp⊕ , we write the differential (α, β → γ) as the octal word w = wN −1 . . . w0 , where wi = αi 4 + βi 2 + γi . When N varies, we obtain a rational series xdp+ with a linear representation of dimension 2. Theorem 4 (Linear representation the 2-dimensional linear representation 1 0 and Xk is given by ⎧ ⎪ ⎨1 − T (k2 + k1 + j) (Xk )ij = T (k2 + k1 + j) ⎪ ⎩ 0
+ of xdp+ ). The formal series xdp has 7 L, (Xk )k=0 , C, where L = 1 1 , C =
if i = 0 and k2 ⊕ k1 ⊕ k0 = j , if i = 1 and k2 ⊕ k1 ⊕ k0 = j , otherwise
for i, j ∈ {0, 1}, where k = k2 4 + k1 2 + k0 and T : {0, 1, 2, 3} → R is the mapping T (0) = 0, T (1) = T (2) = 12 and T (3) = 1. (For completeness, all the matrices Xk are given in Table 2.) Thus, xdp+ is a rational series. For example, the differential (α, β → γ) = (11100, 00110 → 10110) corresponds to the word w = 54730 and xdp+ (α, β → γ) = xdp+ (w) = LX5 X4 X7 X3 X0 C = 1 4 . The proof of this result is given in the appendix. 4.2
Words with a Given Probability
The simplicity of the linear representation of xdp+ allows us to derive an explicit description of all words with a certain differential probability. Theorem 5. For all nonempty words w, xdp+ (w) ∈ {0} ∪ {2−k | k ∈ {0, 1, . . . , |w| − 1}}. The differential probability xdp+ (w) = 0 if and only if w has the form w = w (1 + 2 + 4 + 7), w = w (1 + 2 + 4 + 7)0w or w = w (0 + 3 + 5 + 6)7w for arbitrary words w , w , and xdp+ (w) = 2−k if and only if xdp+ (w) = 0 and |{0 ≤ i < N − 1 | wi = 0, 7}| = k. Table 2. All the eight matrices Xk in Theorem 4 20 01 01 1 1 1 1 1 X0 = 2 X1 = 2 X2 = 2 X3 = 2 00 01 01 1 01 10 10 0 X5 = 12 X6 = 12 X7 = 12 X4 = 12 01 10 10 0
0 0 0 2
On the Additive Differential Probability of Exclusive-Or
327
Proof. Let L, Xk and C be as in Theorem 4 and denote e0 = 1 0 and e1 = 0 1 . Then the kernels of Xi are ker X0 = ker X3 = ker X5 = ker X6 = e1 and ker X1 = ker X2 = ker X4 = ker X7 = e0 . By direct calculation, X0 e0 = e0 , X3 e0 = X5 e0 = X6 e0 = 12 (e0 + e1 ), X1 e1 = X2 e1 = X4 e1 = 12 (e0 + e1 ) and X7 e1 = e1 . Since C = e0 , we thus have xdp+ (w) = 0 if and only if w has the form w = w (1 + 2 + 4 + 7), w = w (1 + 2 + 4 + 7)0w or w = w (0 + 3 + 5 + 6)7w for arbitrary words w , w . Similarly, when w is such that adp+ (w) = 0, we see that Xwn−1 · · · Xw0 C has the form 2− 2− , 2− 0 or 0 2− , where = |{wi | wi ∈ {0, 7}, 0 ≤ i < n}| for all n. Thus, xdp+ (w) = 2−k , where k = |{0 ≤ i < N − 1 | wi = 0, 7}|. For example, if w is the word w = 54730, we see that xdp+ (w) = 0 and |{0 ≤ i < 4} | wi = 0, 7}| = 2. Thus, xdp+ (w) = 2−2 . This result immediately gives the closed formula from [LM01] and thus the O(log N )-time algorithm. 4.3
Distribution
Based on the explicit description of all words with a certain differential probability, it is easy to determine the distribution of xdp+ . Let A(n, k), B(n, k) and C(n, k) denote the languages that consist of the words of length n > 0 with xdp+ (w) = 2−k , and wn−1 = 0, wn−1 = 7 and wn−1 = 0, 7, respectively. The languages are clearly given recursively by A(n, k) = 0A(n − 1, k) + 0C(n − 1, k − 1) , B(n, k) = 7B(n − 1, k) + 7C(n − 1, k − 1) , C(n, k) = Σ0 A(n − 1, k) + Σ1 B(n − 1, k) + (Σ0 + Σ1 )C(n − 1, k − 1) , where Σ0 = 3 + 5 + 6 and Σ1 = 1 + 2 + 4. The base cases are A(1, 0) = 0, B(1, 0) = ∅ and C(1, 0) = 3 + 5 + 6. Let A(z, u) = n,k |A(n, k)|uk z n , B(z, u) = k n k n n,k |B(n, k)|u z and C(z, u) = n,k |C(n, k)|u z be the corresponding ordinary generating functions. The recursive description of the languages immediately gives the the linear system ⎧ ⎪ ⎨ A(z, u) = zA(z, u) + uzC(z, u) + z , B(z, u) = zB(z, u) + uzC(z, u) , ⎪ ⎩ C(z, u) = 3zA(z, u) + 3zB(z, u) + 6uzC(z, u) + 3z . Denote D(z, u) = A(z, u) + B(z, u) + C(z, u) + 1. Then the coefficient of uk z n in D(z, u), [uk z n ]D(z, u), gives the number of words of length n with xdp+ (w) = 2−k (the extra 1 comes from the case n = 0). By solving the linear system, we see that 4z . D(z, u) = 1 + 1 − (1 + 6u)z Since the coefficient of z n in D(z, u) for n > 0 is [z n ]D(z, u) = 4[z n ]z
∞
(1 + 6u)m z m = 4(1 + 6u)n−1 ,
m=0
328
Helger Lipmaa et al.
Table 3. Short comparison of the functions xdp⊕ and adp⊕ and of their computational complexity xdp+ Possibility verification Algorithm complexity Θ(1) N−1 1 Probability of possibility · 78 2 Evaluation of a possible differential Algorithm complexity Θ(log N ) + Maximal differentials adp⊕ 2max and xdp2max Finding max. differential (α, β) Θ(log N ) Computing max. differential when (α, β) is known Θ(log N )
we see that
adp⊕ Θ(log N ) 3 + 47 · 8−N 7 Θ(N ) Θ(1) Θ(N )
n−1 [u z ]D(z, u) = 4 · 6 k k n
k
for all 0 ≤ k < n. The coefficient of z n in D(z, 1) for n > 0, [z n ]D(z, 1) = z 4[z n ] 1−7z = 4 · 7n−1 gives the number of words of length n with xdp+ (w) = 0. Theorem 6 ([LM01, Theorem 2]). There are 4 · 7n−1 words of length n > 0 with xdp+ (w) = 0. Of these, 4 · 6k n−1 have probability 2−k for all 0 ≤ k < n. k
5
Conclusions
We analysed the additive differential probability adp⊕ of exclusive-or. We expect that our results combined with the work of Lipmaa and Moriai will facilitate advanced differential cryptanalysis of ciphers that mix addition and exclusive-or as well as the design of such ciphers. These results can also be used to guide the choice between addition and exclusive-or as the key-mixing operation. In general, adp⊕ is much more difficult to analyse than xdp+ (note especially the straightforward analysis of xdp+ in Sect. 4). On the other hand, it is easier to find the maximal differentials for adp⊕ , although the maximal differentials for + xdp+ have higher probability: it can be seen that adp⊕ 2max (γ) ≤ xdp2max (γ) = + maxα,β xdp (α, β → γ) for all γ. (See Fig. 1.) A short comparison of some of the properties of xdp+ and adp⊕ is given in Table 3. Maybe the main contribution of this paper is the formal series approach. In addition to the new results, we were able to give a simpler proof of the results of Lipmaa and Moriai on xdp+ . The results from [Wal03] can also be rephrased using our approach. We expect that our approach of using formal series has also other applications in cryptanalysis.
Acknowledgements This work was partially supported by the Finnish Defence Forces Research Institute of Technology.
On the Additive Differential Probability of Exclusive-Or
329
0 -1 -2 -3 -4 0
16
32
48
64
80
96 112 128 144 160 176 192 208 224 240 γ
0 -1 -2 -3 -4 0
16
32
48 γ
64
80
Fig. 1. Top: tabulation of values log2 adp⊕ 2max (γ), 0 ≤ γ ≤ 255, for N = 8. Bottom: + partial tabulation of values log 2 adp⊕ 2max (γ) (solid) and log 2 xdp2max (γ) (dashed), 0 ≤ γ ≤ 95, for N = 8
References [BCD+ 98]
[Ber92]
[BR88]
[BS91] [Lip02]
[LM01]
Carolynn Burwick, Don Coppersmith, Edward D’Avignon, Rosario Gennaro, Shai Halevi, Charanjit Jutla, Stephen M. Matyas Jr., Luke O’Connor, Mohammad Peyravian, David Safford, and Nevenko Zunic. MARS — A Candidate Cipher for AES. Available from http://www.research.ibm.com/security/mars.html, June 1998. 318 Thomas A. Berson. Differential Cryptanalysis Mod 232 with Applications to MD5. In Rainer A. Rueppel, editor, Advances in Cryptology — EUROCRYPT ’92, volume 658 of Lecture Notes in Computer Science, pages 71–80, Balatonf¨ ured, Hungary, 24–28 May 1992. Springer-Verlag. ISBN 3-540-56413-6. 319 Jean Berstel and Christophe Reutenauer. Rational Series and Their Languages. EATCS Monographs on Theoretical Computer Science. SpringerVerlag, 1988. 319 Eli Biham and Adi Shamir. Differential Cryptanalysis of DES-like Cryptosystems. Journal of Cryptology, 4(1):3–72, 1991. 317 Helger Lipmaa. On Differential Properties of Pseudo-Hadamard Transform and Related Mappings. In Alfred Menezes and Palash Sarkar, editors, INDOCRYPT 2002, volume 2551 of Lecture Notes in Computer Science, pages 48–61, Hyderabad, India, 15–18 December 2002. Springer-Verlag. 317, 318, 326 Helger Lipmaa and Shiho Moriai. Efficient Algorithms for Computing Differential Properties of Addition. In Mitsuru Matsui, editor, Fast Soft-
330
Helger Lipmaa et al.
ware Encryption 2001, volume 2355 of Lecture Notes in Computer Science, pages 336–350, Yokohama, Japan, 2–4 April 2001. Springer-Verlag, 2002. 317, 318, 320, 324, 325, 326, 327, 328, 330 [LMM91] Xuejia Lai, James L. Massey, and Sean Murphy. Markov Ciphers and Differential Cryptanalysis. In Donald W. Davies, editor, Advances in Cryptology — EUROCRYPT ’91, volume 547 of Lecture Notes in Computer Science, pages 17–38, Brighton, UK, April 1991. Springer-Verlag. 318 [RRSY98] Ronald L. Rivest, Matt J. B. Robshaw, R. Sidney, and Y. L. Yin. The RC6 Block Cipher. Available from http://theory.lcs.mit.edu/~rivest/rc6.ps, June 1998. 318 [SKW+ 99] Bruce Schneier, John Kelsey, Doug Whiting, David Wagner, Chris Hall, and Niels Ferguson. The Twofish Encryption Algorithm: A 128-Bit Block Cipher. John Wiley & Sons, April 1999. ISBN: 0471353817. 318 [Wal03] Johan Wall´en. Linear Approximations of Addition Modulo 2n . In Thomas Johansson, editor, Fast Software Encryption 2003, volume 2887 of Lecture Notes in Computer Science, pages 261–273, Lund, Sweden, February24– 26 2003. Springer-Verlag. 317, 328
A
Proof of Theorem 4
In order to prove Theorem 4, we introduce the following notation. Define the N N N carry function carry : ZN by carry(x, y) = 2 × Z2 → Z2 of addition modulo 2 (x + y) ⊕ x ⊕ y. It is easy to see that xdp+ (α, β → γ) = Pr[carry(x, y) ⊕ carry(x ⊕ α, y ⊕ β) = α ⊕ β ⊕ γ] . x,y
Denote c = carry(x, y) and c∗ = carry(x ⊕ α, y ⊕ β), where x, y, α and β are understood from context. Note that ci can be recursively defined as c0 = 0 and ci+1 = 1 if and only if at least two of xi , yi and ci are 1. To simplify some of the formulae, denote xor(x, y, z) = x ⊕ y ⊕ z and ∆c = c ⊕ c∗ . Then xdp+ (α, β → γ) = Prx,y [∆c = xor(α, β, γ)]. Let furthermore xy denote the componentwise product of x and y, (xy)i = xi yi . The linear representation of xdp+ follows easily from the following result [LM01, Lemma 2]. Lemma 4. Fix α, β ∈ Zn2 and i ≥ 0. Then Pr[∆ci+1 = 1 | ∆ci = r] = T (αi + βi + r) ,
x,y
where T is as in Theorem 4. This result follows easily from the recursive definition of the carry function and a case-by-case analysis. Proof (of Theorem 4). Let (α, β → γ) be the differential associated to the word w. Let x, y be uniformly distributed independent random variables over
On the Additive Differential Probability of Exclusive-Or
331
|w|
Z2 . For compactness, we denote xor(w) = α ⊕ β ⊕ γ. Let P (w, k) be the 2 × 1 substochastic matrix given by Pj (w, k) = Pr[∆c ≡ xor(w) x,y
(mod 2k ), ∆ck = j]
for 0 ≤ k ≤ |w| and let M (w, k) be the 2 × 2 substochastic transition matrix Mij (w, k) = Pr[∆ck = xor(w)k , ∆ck+1 = i | ∆c ≡ xor(w) x,y
(mod 2k ), ∆ck = j]
for 0 ≤ k < |w|. Since Pi (w, k + 1) = j Mij (w, k)Pj (w, k), P (w, k + 1) = M (w, k)P (w, k). Note furthermore that P (w, 0) = C and that xdp+ (w) = j Pj (w, |w|) = LP (w, |w|). By Lemma 4, it is clear that ⎧ ⎪ ⎨1 − T (αk + βk + j) Mij (w, k) = T (αk + βk + j) ⎪ ⎩ 0
if i = 0 and xor(w)k = j , if i = 1 and xor(w)k = j and otherwise .
That is, M (w, k) = Xwk for all k. It follows by induction that xdp+ (w) = LXw|w|−1 · · · Xw0 C.
Two Power Analysis Attacks against One-Mask Methods Mehdi-Laurent Akkar1 , R´egis B´evan2 , and Louis Goubin3 1
Texas Instruments 821 Avenue Jack-Kilby, BP 5, 06271 Villeneuve-Loubet Cedex, France [email protected] 2 Oberthur Card Systems 25 rue Auguste Blanche, 92800 Puteaux, France [email protected] 3 Axalto Cryptography Research & Advanced Security 36-38 rue de la Princesse, BP 45, 78431 Louveciennes Cedex, France [email protected]
Abstract. In order to protect a cryptographic algorithm against Power Analysis attacks, a well-known method consists in hiding all the internal data with randomly chosen masks. Following this idea, an AES implementation can be protected against Differential Power Analysis (DPA) by the “Transformed Masking Method”, proposed by Akkar and Giraud at CHES’2001, requiring two distinct masks. At CHES’2002, Trichina, De Seta and Germani suggested the use of a single mask to improve the performances of the protected implementation. We show here that their countermeasure can still be defeated by usual first-order DPA techniques. In another direction, Akkar and Goubin introduced at FSE’2003 a new countermeasure for protecting secret-key cryptographic algorithms against high-order differential power analysis (HO-DPA). As particular case, the “Unique Masking Method” is particularly well suited to the protection of DES implementations. However, we prove in this paper that this method is not sufficient, by exhibiting a (first-order) enhanced differential power analysis attack. We also show how to avoid this new attack. Keywords: Tamper-resistant devices, Side-Channel attacks, Power Analysis, DPA, Transformed Masking Method, Unique Masking Method, DES, AES.
1
Introduction
The framework of Differential Power Analysis (also known as DPA) was introduced by P. Kocher, J. Jaffe and B. Jun in 1998 ([13]) and subsequently published in 1999 ([14]). The initial focus was on symmetrical cryptosystems such as DES (see [13, 16, 2]) and the AES candidates (see [4, 5, 8]), but public key cryptosystems have since also been shown to be vulnerable to the DPA attacks (see [17, 7, 11, 12, 19]). B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 332–347, 2004. c International Association for Cryptologic Research 2004
Two Power Analysis Attacks against One-Mask Methods
333
In software two main families of countermeasures against DPA are known: – In [11, 12], L. Goubin and J. Patarin described a generic countermeasure consisting in splitting all the intermediate variables, using the secret sharing principle. This duplication method was also proposed shortly after by S. Chari et al. in [5] and [6]. – In [3], M.-L. Akkar and C. Giraud introduced the transformed masking method (TMM), an alternative countermeasure to the DPA. The basic idea is to perform all the computation such that all the data are XORed with a random mask. Moreover, the tables (e.g. the DES S-Boxes) are modified such that the output of a round is masked by the same mask as the input. Both these methods have been proven secure against the initial DPA attacks, and are now widely used in real life implementations of many algorithms. The TMM method can be used to protect AES implementations against DPA. Two masking values are then required to cope with the (non-linear) ByteSub operation. In a recent paper E. Trichina, D. De Seta, L. Germani [20] proposed the “Simplified Adaptive Multiplicative Masking” (SAMM), a variation of TMM with a single masking value, thus providing simpler and faster implementations for AES. Unfortunately, we will show in this paper that this method can be broken by usual DPA attacks. Also suggested by P. Kocher, J. Jaffe and B. Jun [13, 14], and formalized by T. Messerges [15], Higher-Order Differential Power Analysis (HO-DPA) consists in studying correlations between the secret data and several points of the electric consumption curves (instead of single points for the basic DPA attack). To protect secret-key algorithms against this new class of attacks, M.-L. Akkar and L. Goubin recently proposed [1] a new countermeasure: the so-called “Unique Masking Method” (UMM). In this paper, we describe an unexpected power-analysis attack, which can be applied to implementations of secret-key algorithms using the UMM method. More precisely, in the chosen-plaintext model, the attacker can recover the secret key by successively applying two classical DPA attacks on the second round. The paper is organized as follows: – In section 2, we recall basic notions about Differential Power Analysis (DPA), the Transformed Masking Method (TMM) and about the Simplified Adaptive Multiplicative Masking (SAMM). – In section 3, we analyze the mathematical hypotheses on which the security of SAMM relies and point out a flaw in the design of the countermeasure. – In section 4, we show how this flaw can be exploited by studying the power consumption of a real component. – In section 5, we recall basic notions about Higher-Order Differential Power Analysis (HO-DPA) and about the Unique Masking Method (UMM). – In section 6, we theoretically study the security of the UMM applied to DES and show how it could be cryptanalysed. – In section 7, we give our perspectives and conclusions about the attacks presented in this paper.
334
2 2.1
Mehdi-Laurent Akkar et al.
Background Differential Power Analysis
Differential Power Analysis (DPA) was introduced by Kocher, Jaffe and Jun in 1998 [13] and published in 1999 [14]. The basic idea is to make use of potential correlations between the data handled by the micro-controller and the electric consumption measured values. Since these correlations are often very low, statistical methods must be applied to deduce sufficient information from them. The principle of DPA attacks consists in comparing consumption values measured on the real physical device (for instance a GSM chip or a smart card) with values computed in an hypothetical model of this device (the hypotheses being made among others on the nature of the implementation, and chiefly on a part of the secret key). By comparing these two sets of values, the attacker tries to recover all or part of the secret key. The initial target of DPA attacks was limited to symmetric algorithms. Vulnerability of DES – first shown by Kocher, Jaffe and Jun [13, 14] – was further studied by Goubin and Patarin [11, 12], Messerges, Dabbish, Sloan [16] and Akkar, B´evan, Dischamp, Moyart [2]. Applications of these attacks were also largely taken into account during the AES selection process, notably by Biham, Shamir [4], Chari, Jutla, Rao, Rohatgi [5] and Daemen, Rijmen [8]. However public-key algorithms were also shown to be threatened: Goubin, Patarin [11, 12] and Messerges, Dabbish, Sloan [17] showed how to apply DPA against RSA, and the case of elliptic curve cryptosystems was analyzed by Coron [7], Okeya, Sakurai [19] and many others (see for instance [10] for a detailed bibliography). In the basic DPA attack (see [13, 14] or [9]), also known as first-order DPA (or just DPA), the attacker records the power consumption signals and computes statistical properties of the signal for each individual moment in time of the computation. This attack does not require any knowledge about the individual electric consumption of each instruction, nor about the position in time of each of these instructions. It only relies on the following fundamental hypothesis (quoted from [12]): Fundamental hypothesis (order 1): There exists an intermediate variable, that appears during the computation of the algorithm, such that knowing a few key bits (in practice less than 32 bits) allows us to decide whether two inputs (respectively two outputs) give or not the same value for this variable. 2.2
The Transformed Masking Method for AES
More details about this technique can be found in [3]. The idea is to mask the message at the beginning of the AES algorithm, and to recover the same mask at the end of each round. An important step for the AES is to securely perform the inversion step. For this, one need to compute A−1 i,j ⊕ Xi,j from Ai,j ⊕ Xi,j where Ai,j is the block (i, j) in an AES
Two Power Analysis Attacks against One-Mask Methods
335
Fig. 1. Modified inversion in GF(28 ) with masking countermeasure
computation and Xi,j the corresponding masking value. To perform this securely, Akkar and Giraud proposed to use the following operations (see Fig. 1): 1. 2. 3. 4. 5.
2.3
Multiply the masked value A by a non zero random value Y to get AY ⊕ XY XOR with XY to get AY Perform the inversion to get A−1 Y −1 XOR with XY −1 to get A−1 Y −1 ⊕ XY −1 Multiply by Y to get A−1 ⊕ X
The Simplified Adaptive Multiplicative Masking for AES
More details about this technique can be found in [20]. The general idea is not to use a different Y masking value to switch from an additive mask to a multiplicative one, but to use the same masking value X instead. The algorithm is the following. 1. 2. 3. 4. 5.
Multiply the masked value A by X to get AX ⊕ X 2 XOR with X 2 to get AX Perform the inversion to get A−1 X −1 XOR with 1 to get A−1 X −1 ⊕ 1 Multiply by X to get A−1 ⊕ X
One can notice that the particular value AX ⊕ X 2 appears during the computation. The authors of [20] suggest that even if AX ⊕ X 2 is not fully random,
336
Mehdi-Laurent Akkar et al.
it is sufficiently random to serve the purpose. In the next section, we will precisely study the function AX ⊕ X 2 and show that it introduces a weakness in the method.
3 3.1
Theoretical Analysis of the SAMM Study of the repartition of AX + X 2
In the following part we will denote: K = IF2 and K8 = IF28 We will also define: fA : K8 −→ K8 X −→ AX + X 2 and F : K8 −→ P(K8 ) A −→ {fA (X), X ∈ K8 } where P(K8 ) represents the power set of K8 . Remark: In what follows, we will denote by #(A) the cardinality of A. With these definitions we have the following result: Theorem 1. If A ∈ K8 then we have: #F (A) = 128 if A = 0 = 256 if A = 0 Proof. – Case A = 0: f0 (X) = X 2 and is bijective on K8 . Therefore we obviously have #F (0) = 256. – Case A = 0: fA=0 (X) = AX + X 2 , if given a Y , we want to solve the equation fA (X) = Y , by defining Z = X/A (because A = 0), we obtain the following equation to solve: Z 2 + Z = Y /A2 and it is well known that this equation has no solution if T race(Y /A2 ) = 1 and two solutions (if one gets one solution W the other is W + 1) if T race(Y /A2 ) = 0. Therefore it is easy to see that #F (A) = 128.
Two Power Analysis Attacks against One-Mask Methods
337
This theorem shows that the simplified mask covers only one half of the 256 possible values on K8 . This was already noticed in the article [20]. However, let us now study the distribution of F (A1 ) and F (A2 ), for two distinct values A1 and A2 . The following proposition gives us a more precise result: Theorem 2. If (A1 , A2 ) ∈ (K8 \ {0})2 and A1 = A2 then: #(F (A1 ) ∩ F (A2 )) = 64 Proof. Let A1 and A2 be two elements of K8 \ {0}. We are looking for the values Y such as the two equations Y = A1 X + X 2 Y = A2 X + X 2 are simultaneously solvable or unsolvable. This is equivalent to: T race(Y /A1 ) = T race(Y /A2 ) By linearity of the trace operator we obtain: T race(Y /A1 − Y /A2 ) = 0 Let now consider Z ∈ K8 . To Z corresponds a unique value Y such as Y /A1 − Y /A2 = Z, which is Y = Z/(1/A1 − 1/A2 ). Since there exist 128 elements Z of trace 0, there exist 128 elements Y such that the previous system is simultaneously solvable or unsolvable. Let us now consider: ⎧ n1 = {Y ∈ K8 such that T race(Y /A1 ) = 0 and T race(Y /A2 ) = 1} ⎪ ⎪ ⎨ n2 = {Y ∈ K8 such that T race(Y /A2 ) = 0 and T race(Y /A1 ) = 1} n3 = {Y ∈ K8 such that T race(Y /A1 ) = 0 and T race(Y /A2 ) = 0} ⎪ ⎪ ⎩ n4 = {Y ∈ K8 such that T race(Y /A2 ) = 1 and T race(Y /A1 ) = 1} The last result gives us the following equation: n3 + n4 = 128 This equation, together with obvious trace considerations, gives the following system: ⎧ n3 + n4 = 128 ⎪ ⎪ ⎨ n1 + n3 = 128 n2 + n3 = 128 ⎪ ⎪ ⎩ n2 + n4 = 128 Solving this system, we obtain: n1 = n2 = n3 = n4 = 64, thus achieving the proof of Theorem 2.
338
3.2
Mehdi-Laurent Akkar et al.
Consequences
Theorem 2 is really important because it proves that two distinct values (during the computation) are “projected” onto two sets of 128 values when the SAMM is implemented and that these two sets have 64 common values and 64 different ones. Therefore when an attacker performs a DPA on an implementation including the SAMM he will on average record the consumption of F (A) instead of A but for two distinct values A and B, F (A) and F (B) are also distinct, allowing the attacker to distinguish the two cases. This give strong evidence that the attack is very likely to work. Moreover even if the attacker does not know the fact that SAMM is implemented, he will be able to recover the key because the attack is exactly the same.
4
Semi-real and Real Analysis of the SAMM
We have seen in the previous section that there is a theoretical flaw in the SAMM method. To go further, we have to check which results are obtained when using different models of power consumption. We also have to experiment on a real embedded device. More work about the consumption model of embedded device can be found in many papers: see for example [6, 2, 16]. 4.1
Linear Model
The “linear model” considers that the power consumption of the card is proportional to the value of each bit of the manipulated value. For example the consumption of an 8-bit value X = b0 b1 b2 b3 b4 b5 b6 b7 will be equal to c0 ∗ b 0 + c1 ∗ b 1 + c2 ∗ b 2 + c3 ∗ b 3 + c4 ∗ b 4 + c5 ∗ b 5 + c6 ∗ b 6 + c7 ∗ b 7 where {ci }i≤8 is the average consumption of bit i. For example the Hamming weight (denoted by HW) model is a linear one with ci = cj for all (i, j). We have performed some experimentations in the following way. We have computed for every A = 0..255 the whole subset F (A) and we have computed the average weight of each bit of the values in F (A) to check if there was any bias in the results. The results are as follows: – For 8 values of A (0x03,0x15,0x87,0x8C,0xCE,0xEB,0xED and 0xF6), one specific bit of the eight bits of fA (X) always vanishes, whatever the value of X is! – For the 248 other values A, the probability that “the i-th bit of fA (X) is equal to zero” is equal to 12 . So an attacker is unable to predict one bit in order to compute a classical DPA attack.
Two Power Analysis Attacks against One-Mask Methods
339
If we now analyze the repartition of the HW of the values F (A) we get the nine following sets: Subset 1 2 3 4 5 6 7 8 9
# HW 0 HW 1 HW 2 HW 3 HW 4 HW 5 HW 6 HW 7 HW 8 1 1 8 28 56 70 56 28 8 1 28 2 12 32 52 60 52 32 12 2 56 2 10 26 50 70 62 30 6 0 8 2 14 42 70 70 42 14 2 0 70 2 8 24 56 76 56 24 8 2 28 2 4 32 60 60 60 32 4 2 8 2 2 42 42 70 70 14 14 0 56 2 6 26 62 70 50 30 10 0 1 2 0 56 0 140 0 56 0 2
For example, line 8 in the table means that there are 56 values of A such as, there exists 2 values X for which HW (fA (X)) = 0, 6 values X for which HW (fA (X)) = 1, ..., and no element X such as HW (fA (X)) = 8. Moreover, one can notice that the average Hamming weight of all the subsets is equal to 4 except for the subset 4 (which contains the 8 special values detailed in the previous paragraph), whose mean is equal to 3.5. The explicit values A of the nine subsets can be found in the appendix. The conclusion is that even if only a small bias exists, if a component respects a linear model of consumption (such as the HW model), the masking method would probably be quite efficient in practice. However, one has to check what happens with a real embedded device: some experiments on an 8051-based 8-bit CPU are discussed in the following section. 4.2
Real Device Analysis of the SAMM
To obtain concrete results, we have made a comparison between the power consumption of a card manipulating a value A and a card manipulating the value AX ⊕ X 2 with random X. For this we have recorded the consumption of a load operation on an 8 bits CPU based on an 8051 core. The following curves (see Fig. 2 and Fig. 3) have been obtained by using the average consumption of 1024 power consumption traces. For the unmasked value we have used the record of 512 times the same value A. For the SAMM we have used the average of twice the 256 values AX ⊕ X 2 with X ∈ [0, 255]. Then we have ordered the value per consumption to get an idea of the variance of the value. As seen before there are eight special values (the ones for which one bit is always 0) that we have excluded from the curves, indeed the consumption of these values was really different from the others1 . As can be seen on these two curves, masked or unmasked, there is a quite important difference between different consumption values. That proves that, 1
That shows that probably a SPA attack may be quite easy by focusing on these particular values !
340
Mehdi-Laurent Akkar et al.
Fig. 2. Repartition of the consumption with unmasked values
Fig. 3. Repartition of the consumption with SAMM values
in a real embedded device, even if it gets somewhat slower (the variance in the SAMM curve is smaller), a successful DPA attack on a classical AES implementation will also succeed on an AES implementation using the SAMM method. If the attacker knows that the SAMM method is used, he can use an adapted DPA attack to retrieve the key with less measurements than the usual first order DPA. For every hypothesis Kj on the key byte, one computes the difference of
Two Power Analysis Attacks against One-Mask Methods
341
the means of the following sets: S0 = {measurements for the messages Mk |Mk ⊕ Kj ∈ subset 4} S1 = {measurements for the messages Mk |Mk ⊕ Kj ∈ / subset 4} Then the correct hypothesis is the one with the highest difference of means. Remark. Due to obvious confidentiality reasons, details about the chip we used have to remain undisclosed.
5 5.1
The Unique Masking Method High-Order Differential Power Analysis
As mentioned in section 2, Differential Power Analysis implies the use of a hypothetical model of the physical device which performs the cryptographic computations. If this model is able to predict a single value, for instance the electric consumption of the device for a single instant t, the differential power analysis is said to be of first order. If the model is able to predict several such values, the differential power analysis is said to be of higher order. High-order differential power analysis (HO-DPA), suggested by Kocher, Jaffe and Jun [13, 14] (see also [9]), was formalized by Messerges in [15]. In the spirit of [12], Akkar and Goubin (see [1]) gave a necessary and sufficient condition for a DPA attack of order n to be applicable: Fundamental hypothesis (order n): There exists a set of n intermediate variables, that appear during the computation of the algorithm, such that knowing a few key bits (in practice less than 32 bits) allows to decide whether two inputs (respectively two outputs) give or not the same value for a known function of these n variables. In [1], Akkar and Goubin studied the impacts of HO-DPA attacks on implementations of cryptographic algorithms, and showed that usual countermeasures against DPA are not sufficient to avoid this new class of attacks. Moreover, they proposed a generic protection against higher-order attacks, illustrated in details for the DES case. 5.2
A Countermeasure against HO-DPA
The so-called “Unique Masking Method” (UMM) aims at providing a generic protection against differential power analysis of order n, whatever the value n may be. The two principles of this method is first to mask only the values that depend on less than 32 bits of the key in order to prevent DPA and secondly intermediate independent variables depending on less than 32 bits of the key must not be masked by the same value in order to thwart HO-DPA. Let us describe the basic idea for the DES example. The unique mask is a random 32-bit value α. From this value, two sets of 8 S-boxes, denoted by S˜1 and S˜2 , are defined by ∀x ∈ (IF2 )48 , S˜1 (x) = S(x ⊕ E(α)) ∀x ∈ (IF2 )48 , S˜2 (x) = S(x) ⊕ P −1 (α)
342
Mehdi-Laurent Akkar et al.
Fig. 4. The five possibilities for DES rounds
where S denotes the 8 usual DES S-boxes, E is the expansion function and P is the classical DES permutation just after the S-boxes. Let fKi (1 ≤ i ≤ 16) be the functions involved in the Feistel scheme of DES (with the usual S-boxes), and f˜1,Ki (resp. f˜2,Ki ) the analogous function with the S˜1 (resp. S˜2 ) S-boxes. DES rounds can then be built from five possible frames, given in Figure 4 (solid lines correspond to unmasked data, dashed lines to masked data). To be consistent with the DES computation, the sequence of rounds has to follow some rules, which can be summarized by a finite automaton, as shown in Figure 5. Initial states correspond to non-masked inputs (A or B), final states correspond to non-masked outputs (A or E). As an example, BCDCDCEBCDCDCDCE is a valid sequence. To provide a protection against differential power analysis attacks, all the values depending on less than 36 key bits are masked by α. This gives further constraints on the sequence of rounds: the three first ones have to be of the form BCD or BCE, and symmetrically the three last ones must be of the form BCE or DCE. For instance, the sequence BCDCDCEBCDCDCDCE fulfills these conditions. The unique masking method has several advantages: the structure of classical implementations can be kept unchanged, the only difference is the generation of random tables (see [1] for several practical methods for securely generating S-boxes from the mask α). Moreover performances remain acceptable. For instance, [1] reports on a DES implementation, including the unique masking
Two Power Analysis Attacks against One-Mask Methods
343
Fig. 5. Valid sequences for the rounds
method (together with SPA and DFA protections), which runs in 38 ms (at 10 MHz) on a ST19 chip.
6 6.1
Enhanced DPA of the Unique Masking Method Basic Idea
For all the proposed sequences of rounds, the second round is always a “C”-type round. The output of the S-Boxes is unmasked and stay unmasked after being XORed with the left part of the message. After this XOR, the value goes through an “E”-type or “D”-type round for which the computation of the beginning of the f function (expansion function) is unmasked. For the second case (where the third round is of “D” type) even the S-Boxes output and the P permutation are unmasked. The fact that the output of second round S-Boxes is unmasked will be the base of our attack. 6.2
Attacking the UMM
The attack is a chosen message attack. The main idea of the attack is to retrieve two intermediate values which are not protected against DPA, and then to get the key bits by solving an equation involving the two intermediate values. Before describing the attack, let us introduce some notations: – – – – – –
IP denotes the initial permutation. E denotes the expansion function (from 32 to 48 bits). S denotes the S-Boxes. P denotes the P permutation after the S-Boxes ⊕ denotes the XOR operation Ki denotes the 48 bits subkey of round i.
344
Mehdi-Laurent Akkar et al.
– Finally, for a message M , Li and Ri will denote the left part and the right part (32 bits each) of the output of the round i. The attack can be described as follows: – First Part: • We perform a DES computation with some chosen messages Mi for which R0,i (the right part of the message Mi after IP) will be set to an arbitrary constant R0 . The left part L0,i will be random. • We then perform a first-order DPA attack on the input of each S-Box of the second round. Because the output of the S-Boxes is unmasked we will guess the value of the second round key XORed with the – unknown but constant – output of the S-Boxes of the first round. The found value will be: δ = K2 ⊕ E(P (S(K1 ⊕ E(R0 )))) – Second Part: • We will perform a second first-order DPA with other messages with a different known constant value R0 , which will provide: δ = K2 ⊕ E(P (S(K1 ⊕ E(R0 )))) – Final Part: • By taking XOR of the two values found at last steps, we obtain the value: δ ⊕ δ = (K2 ⊕ E(P (S(K1 ⊕ E(R0 ))))) ⊕ (K2 ⊕ E(P (S(K1 ⊕ E(R0 ))))) The value K2 vanishes and the linearity of functions E and P gives us the equation: S(K1 ⊕ E(R0 )) ⊕ S(K1 ⊕ E(R0 )) = g(δ ⊕ δ ) • Because we know R0 and R0 , doing a exhaustive search on each 6-bits subkeys of K1 , will give us all the possible values for the subkey K1 . On average, the differential properties of S will give us about 4 possibilities for each subkeys. Since there are 8 subkeys and we need to find the 8 bits which are not in K1 , this gives us 48 ×28 = 224 possibilities on the key. So an exhaustive search with one known plaintext/ciphertext pair will take a few seconds on a PC. If one does not have access to such a pair, another attack with a R0 constant value will decrease the possibilities for K1 (practically, K1 is completely known if R0 is not badly chosen). Then it is possible to perform classical DPA against K2 , once K1 is known, to get directly the 56 bits of the DES key. The reader has to notice that even if the attack exploits the correlation of two results, the attack consists just in applying twice a really usual first-order DPA attack. So the number of traces and the processing time is just twice those needed for a classical DPA against an unprotected DES.
Two Power Analysis Attacks against One-Mask Methods
6.3
345
Attack Scenarios and Countermeasures
– Our attack is feasible only if the attacker is able to keep constant the right part of the message after the initial permutation. If it is not possible the attacker has to “wait” for messages having the same R0 part (1 message every 232 messages on average). So the attack becomes quite long to perform. – If we consider a scenario for which only the output is known, the attack becomes as difficult as the one for which the message could not be controlled. A chosen cipher text attack mainly applies against an authentication scheme, in which the device has to cipher a challenge from the outside. – A solution consists in masking the output of the second round, which seems to make this attack unfeasible. One can use a different mask but the use of α1 is not forbidden since the bits of R1 and R2 that are masked by the same value depends on 42 bits of the key. So we need one more function f˜3,Ki with the modified S-Boxes S˜3 (x) such that ∀x ∈ (IF2 )48 , S˜3 (x ⊕ E(α)) = S(x) ⊕ P −1 (α).
7
Conclusion
In this paper we presented two new power analysis attacks. The first one applies to the Simplified Adaptive Multiplicative Masking (SAMM). Even if, in some models (HW Model or Linear Model), the SAMM seems to be quite secure, the SAMM is (in practice) vulnerable to usual (firstorder) DPA attacks. Moreover we have seen that, from a theoretical point of view, this countermeasure is not correct because of the distribution of the values AX ⊕ X 2 , which is quite unbalanced. We thus recommend, to obtain DPAresistant implementations of AES, using the original TMM method with two masks (correctly implemented due to the zero problem as described in [20, 1]) or, to use a dynamic inversion of the S-Box if the 256 bytes needed in RAM are available. The second attack applies to the Unique Masking Method for DES, showing that in the chosen-message scenario, an enhanced first-order DPA attack is still possible. This is not due to the method itself but to a different model of the attacker, allowing her to have full access to the inputs of the algorithm. In the case of the DES, we have then shown that unique masks have to be extended to at least two rounds to protect the implementation against a chosen text attack. The drawbacks are an overhead on performances and an increase of required memory by one third (corresponding to the computation of one more modified S-box).
346
Mehdi-Laurent Akkar et al.
References [1] M.-L. Akkar, L. Goubin, A Generic Protection against High-Order Differential Power Analysis. In Proceedings of FSE’2003, LNCS 2887, Springer-Verlag, 2003. 333, 341, 342, 345 [2] M.-L. Akkar, R. Bevan, P. Dischamp, D. Moyart, Power Analysis: What is now Possible. In Proceedings of ASIACRYPT’2000, LNCS 1976, pp. 489-502, SpringerVerlag, 2000. 332, 334, 338 [3] M.-L. Akkar, C. Giraud, An Implementation of DES and AES Secure against Some Attacks. In Proceedings of CHES’2001, LNCS 2162, pp. 309-318, Springer-Verlag, 2001. 333, 334 [4] E. Biham, A. Shamir, Power Analysis of the Key Scheduling of the AES Candidates. In Proceedings of the Second Advanced Encryption Standard (AES) Candidate Conference, March 1999. Available from http://csrc.nist.gov/encryption/aes/round1/Conf2/aes2conf.htm 332, 334 [5] S. Chari, C. S. Jutla, J. R. Rao, P. Rohatgi, A Cautionary Note Regarding Evaluation of AES Candidates on Smart-Cards. In Proceedings of the Second Advanced Encryption Standard (AES) Candidate Conference, March 1999. Available from http://csrc.nist.gov/encryption/aes/round1/Conf2/aes2conf.htm 332, 333, 334 [6] S. Chari, C. S. Jutla, J. R. Rao, P. Rohatgi, Towards Sound Approaches to Counteract Power-Analysis Attacks. In Proceedings of CRYPTO’99, LNCS 1666, pp. 398-412, Springer-Verlag, 1999. 333, 338 [7] J.-S. Coron, Resistance Against Differential Power Analysis for Elliptic Curve Cryptosystems. In Proceedings of CHES’99, LNCS 1717, pp. 292-302, SpringerVerlag, 1999. 332, 334 [8] J. Daemen, V. Rijmen, Resistance Against Implementation Attacks: A Comparative Study of the AES Proposals. In Proceedings of the Second Advanced Encryption Standard (AES) Candidate Conference, March 1999. Available from http://csrc.nist.gov/encryption/aes/round1/Conf2/aes2conf.htm 332, 334 [9] J. Daemen, M. Peters, G. Van Assche, Bitslice Ciphers and Power Analysis Attacks. In Proceedings of FSE’2000, LNCS 1978, Springer-Verlag, 2000. 334, 341 [10] L. Goubin, A Refined Power-Analysis Attack on Elliptic Curve Cryptosystems. In Proceedings of PKC’2003, LNCS 2567, pp. 199-211, Springer-Verlag, 2003. 334 [11] L. Goubin, J. Patarin, Proc´ed´e de s´ecurisation d’un ensemble ´electronique de cryptographie ` a cl´e secr`ete contre les attaques par analyse physique. European Patent, Bull CP8, February 4th, 1999, Publication Number: 2789535. 332, 333, 334 [12] L. Goubin, J. Patarin, DES and Differential Power Analysis – The Duplication Method. In Proceedings of CHES’99, LNCS 1717, pp. 158-172, Springer-Verlag, 1999. 332, 333, 334, 341 [13] P. Kocher, J. Jaffe, B. Jun, Introduction to Differential Power Analysis and Related Attacks. Technical Report, Cryptography Research Inc., 1998. Available from http://www.cryptography.com/dpa/technical/index.html 332, 333, 334, 341 [14] P. Kocher, J. Jaffe, B. Jun, Differential Power Analysis. In Proceedings of CRYPTO’99, LNCS 1666, pp. 388-397, Springer-Verlag, 1999. 332, 333, 334, 341
Two Power Analysis Attacks against One-Mask Methods
347
[15] T. S. Messerges, Using Second-Order Power Analysis to Attack DPA Resistant software. In Proceedings of CHES’2000, LNCS 1965, pp. 238-251, Springer-Verlag, 2000. 333, 341 [16] T. S. Messerges, E. A. Dabbish, R. H. Sloan, Investigations of Power Analysis Attacks on Smartcards. In Proceedings of the USENIX Workshop on Smartcard Technology, pp. 151-161, May 1999. Available from http://www.eecs.uic.edu/∼tmesserg/papers.html 332, 334, 338 [17] T. S. Messerges, E. A. Dabbish, R. H. Sloan, Power Analysis Attacks of Modular Exponentiation in Smartcards. In Proceedings of CHES’99, LNCS 1717, pp. 144157, Springer-Verlag, 1999. 332, 334 [18] I. Mironov, (Not So) Random Shuffles of RC4. In Proceedings of CRYPTO’2002, LNCS 2442, pp. 304-319, Springer-Verlag, 2002. [19] K. Okeya, K. Sakurai, Power Analysis Breaks Elliptic Curve Cryptosystem even Secure against the Timing Attack. In Proceedings of INDOCRYPT’2000, LNCS 1977, pp. 178-190, Springer-Verlag, 2000. 332, 334 [20] E. Trichina, D. De Seta, L. Germani, Simplified Adaptive Multiplicative Masking for AES. In Proceedings of CHES’2002, LNCS 2523, pp. 187-197, Springer-Verlag, 2002. 333, 335, 337, 345
Appendix. Explicit subsets of values A, depending of the repartition of the Hamming weight of fA(X). – Subset 1: 0x00 – Subset 2: 0x01, 0x06, 0x09, 0x22, 0x24, 0x2B, 0x46, 0x48, 0x56, 0x62, 0x65, 0x68, 0x75, 0x7B, 0x7C, 0x7D, 0x99, 0x9D, 0xA9, 0xB1, 0xC1, 0xCA, 0xCD, 0xEF, 0xF0, 0xF8, 0xFA, 0xFB – Subset 3: 0x02, 0x04, 0x07, 0x0C, 0x0E, 0x12, 0x18, 0x19, 0x1C, 0x23, 0x29, 0x2A, 0x2D, 0x2E, 0x31, 0x37, 0x3E, 0x3F, 0x44, 0x49, 0x52, 0x54, 0x59, 0x5B, 0x5C, 0x5F, 0x60, 0x67, 0x6B, 0x78, 0x79, 0x81, 0x89, 0x8D, 0x8E, 0x8F, 0x90, 0x95, 0x96, 0x98, 0x9C, 0xA5, 0xB0, 0xB2, 0xB6, 0xB7, 0xB8, 0xBF, 0xC0, 0xC3, 0xD6, 0xD9, 0xE1, 0xE7, 0xEA, 0xF7 – Subset 4: 0x03, 0x15, 0x87, 0x8C, 0xCE, 0xEB, 0xED, 0xF6 – Subset 5: 0x05, 0x08, 0x0A, 0x0D, 0x11, 0x1A, 0x1D, 0x1E, 0x1F, 0x21, 0x27, 0x32, 0x34, 0x38, 0x3A, 0x3B, 0x3D, 0x43, 0x47, 0x4B, 0x4C, 0x4E, 0x4F, 0x51, 0x64, 0x69, 0x6D, 0x6E, 0x76, 0x77, 0x7E, 0x85, 0x88, 0x92, 0x97, 0x9B, 0x9E, 0xA0, 0xA1, 0xA2, 0xA4, 0xA8, 0xAA, 0xAC, 0xAE, 0xB3, 0xB9, 0xBB, 0xBE, 0xC4, 0xC5, 0xC7, 0xC8, 0xC9, 0xCF, 0xD0, 0xD1, 0xD2, 0xD5, 0xDC, 0xE5, 0xE8, 0xE9, 0xEC, 0xF3, 0xF4, 0xF5, 0xFD, 0xFE, 0xFF – Subset 6: 0x0B, 0x0F, 0x10, 0x20, 0x26, 0x28, 0x36, 0x39, 0x40, 0x5E, 0x63, 0x6C, 0x6F, 0x83, 0x8A, 0x9F, 0xA7, 0xAD, 0xB5, 0xBC, 0xBD, 0xC2, 0xCC, 0xD8, 0xDB, 0xE4, 0xE6, 0xF9 – Subset 7: 0x13, 0x1B, 0x2C, 0x66, 0x72, 0x80, 0x84, 0xD3 – Subset 8: 0x14, 0x17, 0x25, 0x2F, 0x30, 0x33, 0x35, 0x3C, 0x41, 0x42, 0x45, 0x4A, 0x4D, 0x50, 0x53, 0x55, 0x57, 0x58, 0x5A, 0x5D, 0x61, 0x6A, 0x70, 0x71, 0x73, 0x74, 0x7A, 0x7F, 0x82, 0x86, 0x8B, 0x91, 0x93, 0x94, 0x9A, 0xA3, 0xA6, 0xAB, 0xAF, 0xB4, 0xBA, 0xC6, 0xCB, 0xD4, 0xD7, 0xDA, 0xDD, 0xDE, 0xDF, 0xE0, 0xE2, 0xE3, 0xEE, 0xF1, 0xF2, 0xFC – Subset 9: 0x16
Nonce-Based Symmetric Encryption Phillip Rogaway1,2 1
Dept. of Computer Science, University of California Davis, CA 95616, USA 2 Dept. of Computer Science, Faculty of Science Chiang Mai University, Chiang Mai 50200, Thailand [email protected] www.cs.ucdavis.edu/~rogaway/
Abstract. Symmetric encryption schemes are usually formalized so as to make the encryption operation a probabilistic or state-dependent function E of the message M and the key K: the user supplies M and K and the encryption process does the rest, flipping coins or modifying internal state in order to produce a ciphertext C. Here we investigate an alternative syntax for an encryption scheme, where the encryption process E is a deterministic function that surfaces an initialization vector (IV). The user supplies a message M , key K, and initialization vector N , getting N (M ). We concenback the (one and only) associated ciphertext C = EK trate on the case where the IV is guaranteed to be a nonce—something that takes on a new value with every message one encrypts. We explore definitions, constructions, and properties for nonce-based encryption. Symmetric encryption with a surfaced IV more directly captures real-word constructions like CBC mode, and encryption schemes constructed to be secure under nonce-based security notions may be less prone to misuse. Keywords: Initialization vector, modes of operation, nonces, provable security, symmetric encryption.
1
Introduction
Ever since Goldwasser and Micali’s landmark paper [7], formalizations of encryption schemes have usually made the encryption algorithm probabilistic or stateful. In this paper we investigate a different formalization for symmetric encryption: the encryption algorithm is made to be a deterministic function, but one of its argument is a user-supplied initialization vector (IV). Effectively, the user and not the encryption algorithm is made responsible for flipping coins or maintaining state. We are mostly interested in security properties that can be guaranteed as long as the IV is a nonce—a value, like a counter, used at most once within a session. Our formalization leads to what is effectively a stronger notion of privacy than the conventional formalization, and a stronger notion of authenticity as well. As a consequence, encryption schemes created so as to satisfy the given notions would seem to be less likely to be misused. B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 348–359, 2004. c International Association for Cryptologic Research 2004
Nonce-Based Symmetric Encryption
Algorithm CBC.EncryptN K (M ) if |M | ∈ {n, 2n, 3n, . . .} then return Parse M into M1 · · · Mm where |Mi | = n C0 ← N for i ← 1 to m do Ci ← EK (Ci−1 ⊕ Mi ) return C1 · · · Cm
349
Algorithm CBC.DecryptN K (C) if |C| ∈ {n, 2n, 3n, . . .} then return Parse C into C1 · · · Cm where |Mi | = n C0 ← N for i ∈ [1 .. m] do −1 (Ci ) Mi ← Ci−1 ⊕ EK return M1 · · · Mm
Fig. 1. Scheme CBC. Encryption and decryption depend on a block cipher E : Key × {0, 1}n → {0, 1}n . The key space for the encryption scheme is the same as the key space for the block cipher. The IV N is a string in {0, 1}n . The encryption scheme is the pair (CBC.Encrypt, CBC.Decrypt)
Algorithm CBC$.EncryptK (M ) if |M | ∈ {n, 2n, 3n, . . .} then return Parse M into M1 · · · Mm where |Mi | = n $ C0 ← {0, 1}n for i ← 1 to m do Ci ← EK (Ci−1 ⊕ Mi ) return C0 C1 · · · Cm
Algorithm CBC$.DecryptK (C) if |C| ∈ {2n, 3n, 4n, . . .} then return Parse C into C0 C1 · · · Cm where |Mi | = n for i ∈ [1 .. m] do −1 Mi ← Ci−1 ⊕ EK (Ci ) return M1 · · · Mm
Fig. 2. Scheme CBC$. The mechanism depends on a block cipher E : Key × {0, 1}n → {0, 1}n . Scheme CBC$ is a conventional probabilistic encryption scheme. It is just like scheme CBC except that the encryption routine chooses the IV N = C0 internally and at random. The user cannot influence it. The value is now returned as part of the ciphertext
Comparing CBC and CBC$ encryption. Popular modes of operation for encryption have always surfaced an IV. For example, CBC using block cipher E : Key × {0, 1}n → {0, 1}n requires an initialization vector N ∈ {0, 1}n to encrypt a message M (or decrypt a message C) under key K ∈ Key. See Fig. 1 and note that the initialization vector N is an argument to both CBC.Encrypt and CBC.Decrypt. Given that the IV is manifestly present in the description of CBC mode, this would seem to be quite natural. Nonetheless, the approach is at odds with the customary formalization of symmetric encryption going back to [1, 7]. There one explicitly models some particular manner of generating the IV and folds this into the definition of the scheme. For example, one considers the scheme CBC$ (i.e., CBC with a random IV) as defined in Fig. 2. Contributions. It is the purpose of this note to treat symmetric encryption schemes in a way that explicitly surfaces the IV. The approach was first taken in our earlier work on authenticated encryption [9, 10], where we adopted noncebased definitions without significant comment. Here we more systematically in-
350
Phillip Rogaway
vestigate the explicit-IV notion of encryption, giving definitions, schemes, and basic results. We are mostly interested in the case when the IV is a nonce: a value used at most once within the scope of a given session. This note aims to call attention to the explicit-IV approach and to nudge future work on practical encryption schemes into adopting the nonce-based framework. Standards. We believe that a nonce-based formalization is especially desirable when constructing an encryption scheme for a cryptographic standard: not knowing how the scheme will be used, standards would do well to achieve the strongest practical notion of security relative to the interface that they export. The viewpoint, then, is that conventional encryption modes like CBC, as defined in Fig. 1, are “deficient” insofar as they do not achieve a strong notion of security unless one assumes something very strong about their IVs. One would prefer an encryption mode that achieves a strong notion of security when one assumes very little about the IV. It is thus our view that, in the future, standards for privacy-only encryption would do well to achieve privacy in the ind$-sense that we will define in Section 3, while standards for authenticated encryption would do well to achieve, in addition, authenticity in the auth-sense that we define in Section 6. Further reasons to surface the IV. Another motivation for explicitly surfacing the IV in the definition of an encryption scheme is that books and systems often get wrong what it may or may not be. Books will say, for example, that it is fine for the IV in CBC encryption to be a counter, or the last block of encrypted ciphertext. Both statements are wrong, assuming that one intends to achieve a strong notion of privacy. Having definitions that expose the IV across the encryption and decryption interface facilitates answering what the IV may or may not be in order to achieve a given notion of security. Yet another motivation for surfacing the IV is that it allows a particularly simple and strong notion of privacy: indistinguishability from random bits with respect to an adaptive chosen-plaintext-and-IV attack (ind$, to be defined later). This attack allows the adversary to select not only plaintexts but also the IVs that will be used to encrypt each of them, subject only to the constraint that no IV is reused. The model captures the possibility that the IVs may be chosen in an unfortunate way by the sender, possibly even influenced by the adversary, when we do not mandate any requirement on an IV beyond its non-reuse. A small warning. Nothing in this paper should be construed to suggest that the overall encryption process should become deterministic and stateless (and therefore not semantically secure). We are simply drawing the abstraction boundary a little differently, so that what is “inside” the scheme is deterministic, the coins or state being pushed “outside” of the scheme’s formalization.
Nonce-Based Symmetric Encryption
2
351
Syntax
Definitions. We begin by specifying the syntax for an encryption schemes that surfaces an IV. An IV-based encryption scheme is a pair of algorithms Π = (E, D) where E : Key × IV × Plaintext → Ciphertext and D : Key × IV × Ciphertext → Plaintext ∪ {} are deterministic functions. These functions are called the encryption function and the decryption function, respectively. Here Key, IV, Plaintext, and Ciphertext are nonempty sets of strings, the first of which is finite or is otherwise endowed with a distribution (the understood distribution on a finite set being the uniform one). These sets are called the key space, the IV space, the message space, and the ciphertext space. We insist that Plaintext has the structure that if it contains a string M then it contains all string M N having the same length of M . We often write EK (M ) in place of E(K, N, M ) N N N (EK (M )) = M for and DK (C) in place of D(K, N, C). We require that DK any K ∈ Key and N ∈ IV and M ∈ Plaintext. For simplicity, we assume that N N |EK (M )| depends only on |M | and, in particular, that |EK (M )| = |M | + τ for some constant τ associated to the encryption scheme. We call τ the stretch of the encryption scheme. Comments. (1) We will often use the word nonce instead of IV and write Nonce, the nonce space, instead of IV. We do this when we are thinking in terms of our nonce-based definitions for privacy (to follow). In such cases we call an IV-based encryption scheme a nonce-based encryption scheme. (2) We emphasize that E and D are deterministic and stateless functions: they may not flip coins N (M ) is not expected or preserve state. (3) What we call the ciphertext C = EK to encode the IV, even though the IV is needed to decrypt. The IV may be communicated “out of band” to the receiver, maintained as shared state, or it may be manifest within the context of use, as when the IV is the sector index on a disk. (4) The encryption function E may be length-preserving, meaning that N (M )| = |M | for all K, N, M . Indeed we will see that encryption schemes can |EK achieve a strong notion of privacy yet have zero stretch. (5) We have allowed for the possibility that the decryption of a string returns the distinguished value , which is used to indicate that the ciphertext is invalid. While this possibility is not needed for basic notions of privacy, it is needed for defining authenticity. (6) We have not said that the sender or receiver are stateless and without benefit of coins, only that E and D are. For example, the sender might maintain a counter to use as the IV. It is simply that this state is outside of the functionality of E.
3
Privacy
Indistinguishability from random bits. Our preferred notion of privacy is “indistinguishability from random bits under an adaptive chosen-plaintext-andIV attack”. To formalize this, let adversary A be an algorithm with access to an oracle and let Π = (E, D) be an IV-based encryption scheme with key space
352
Phillip Rogaway
Key and IV space Nonce and stretch τ . We define $ EK (·,·) Advind$ ⇒ 1 − Pr A$(·,·) ⇒ 1 Π (A) = Pr K ← Key : A The superscript ind$ may alternatively be written as ind$-cpa. The oracle EK (·,·), N (M ). We sometimes refer to this as the real encrypon input (N, M ), returns EK tion oracle. The oracle $(·, ·), on input (N, M ), returns |M | + τ random bits. We sometimes refer to this as the random-bits oracle. Both oracles return if N ∈ Nonce or M ∈ Plaintext. When we write AO ⇒ 1 we are referring to the event that adversary A, running with its oracle O, outputs the bit 1. We call an adversary A nonce-respecting if it never repeats a nonce: if A asks (N, M ) it never subsequently asks (N, M ) for any M . This must hold regardless of A’s coins and regardless of oracle responses. We assume that any ind$-adversary is nonce-respecting. Conventional indistinguishability. It is more customary to focus on a different kind of indistinguishability. Once again, let Π = (E, D) be a nonce-based encryption scheme with key space Key and nonce space Nonce. Let A be an adversary. Then define $ $ EK (·,·) EK (·,0|·| ) Advind (A) = Pr K ← Key : A ⇒ 1 − Pr K ← Key : A ⇒ 1 Π The superscript ind$ may alternatively be written as ind-cpa. The first oracle is a real encryption oracle, as before. The second oracle, on input (N, M ), returns EK (N, 0|M| ). We call this a fake encryption oracle. Both oracles return if N ∈ Nonce or M ∈ Plaintext, and A is always assumed to be nonce-respecting. Resource-parameterized definitions. If Π is a scheme and A is an adversary and Advxxx Π (A) is a measure of adversarial advantage already defined, xxx then we write Advxxx Π (R) to mean the maximal value of AdvΠ (A) over all adversaries A that use resources bounded by R. Here R is a list of variables specifying the resources of interest for the adversary in question. Adversarial resources to which we pay attention are: t—the running time of the adversary; q—the number of queries asked by the adversary; and σ—the aggregate length of these queries, plus the length of the adversary’s output, measured in n-bit blocks, for some understood value n. When an adversary’s query or output is a tuple of strings we count in σ the sum of the lengths of each component. Fractional blocks and the emptystring contribute 1. By convention, the running time of an algorithm includes the description size of that algorithm, relative to some standard encoding. Discussion: favoring ind$ over ind. It is easy to verify that the ind$-notion of security implies the ind-notion, and by a tight reduction, while ind does not imply ind$ at all. (The same is true if one speaks of indistinguishability and indistinguishability from random bits in the context of conventional, probabilistic encryptions schemes.) Despite this, typical encryption schemes seem to achieve
Nonce-Based Symmetric Encryption
Algorithm CBC1.EncryptN K (M ) if |M | ∈ {n, 2n, 3n, . . .} then return Parse M into M1 · · · Mm where |Mi | = n C0 ← EK (N ) for i ← 1 to m do Ci ← EK (Ci−1 ⊕ Mi ) return C1 · · · Cm
353
Algorithm CBC1.DecryptN K (C) if |C| ∈ {n, 2n, 3n, . . .} then return Parse C into C1 · · · Cm where |Mi | = n C0 ← EK (N ) for i ∈ [1 .. m] do −1 (Ci ) Mi ← Ci−1 ⊕ EK return M1 · · · Mm
Fig. 3. Scheme CBC1. The scheme is not secure
ind$ if they achieve ind (again, the IV is not considered part of the ciphertext). Furthermore, it usually seems to be no extra trouble—indeed often it is slightly simpler—to directly demonstrate that some scheme achieves ind$-security. Doing so is useful because an encryption scheme that satisfies ind$ makes a more versatile tool: it can be used to directly provide a pseudorandom generator or a pseudorandom function. Finally, we find ind$ seems to us conceptually simpler and easier to work with. For all of these reasons, we like ind$ as the basic notion of security for building practical IV-based symmetric encryption schemes. (The counter-argument is that being indistinguishable from random bits is irrelevant to the goal of encryption—one would argue that it goes beyond the intuition about what secure encryption needs to provide. This is true, and yet it has often proven desirable to use definitions that reach beyond the minimal notions that satisfy one’s intuition.)
4
Insecure Schemes
One can see right away that CBC encryption, as formalized in Fig. 1, is not ind-secure (and therefore it is not ind$-secure, either). Here is the attack. The adversary is trying to distinguish a real encryption oracle from a fake encryption oracle. Let us write 0 for 0n and 1 for 0n−1 1. The adversary asks a first oracle query of (N1 , M1 ) = (0, 0), getting back a ciphertext C1 . Let it then ask a second oracle query of (N2 , M2 ) = (1, 1), getting back a ciphertext C2 . If C1 = C2 then the adversary outputs 1 (it believes it has a real encryption oracle) and otherwise the adversary outputs 0 (it knows that it has a fake encryption oracle). The adversary is extremely efficient and has advantage close to 1. The attack above motivates a natural alternative to CBC: encipher the IV before using it, as shown in Fig. 3. We call the scheme CBC1. The key space Key for the encryption scheme remains the key space for the underlying block cipher and the nonce space Nonce is {0, 1}n . The scheme CBC1 still doesn’t work. Let the adversary ask query (N1 , M1 ) = (0, 00), obtaining ciphertext C11 C12 (where C11 and C12 are n bits). Note that if the adversary was provided a real encryption oracle then C12 = EK (C11 ). So next the adversary asks (N2 , M2 ) = (C11 , C12 ⊕ C11 ), getting result C21 . If C21 = C12
354
Phillip Rogaway
Algorithm CBC2.EncryptN K1 K2 (M ) if |M | ∈ {n, 2n, 3n, . . .} then return Parse M into M1 · · · Mm where |Mi | = n C0 ← EK1 (N ) for i ← 1 to m do Ci ← EK2 (Ci−1 ⊕ Mi ) return C1 · · · Cm
Algorithm CBC2.DecryptN K1 K2 (C) if |C| ∈ {n, 2n, 3n, . . .} then return Parse C into C1 · · · Cm where |Mi | = n C0 ← EK1 (N ) for i ∈ [1 .. m] do −1 (Ci ) Mi ← Ci−1 ⊕ EK2 return M1 · · · Mm
Fig. 4. Scheme CBC2. The scheme is now ind$-secure
then the adversary outputs 1 (it guesses that it has a real encryption oracle) and otherwise it outputs 0 (it is sure that it has a fake encryption oracle). The adversary is very efficient and is easily seen to have advantage close to 1.
5
Secure Schemes
Despite the two examples above, it is easy to construct encryption schemes that are secure in the ind$-sense. Consider first the scheme CBC2 shown in Fig. 4. The key space for the encryption scheme is Key × Key, where Key is the key n space for the underlying block cipher. The nonce space is {0, 1} . The message n + space remains ({0, 1} ) . The following result shows that CBC2 is a secure encryption scheme. We state the theorem in the information-theoretic setting. Passing to the complexityn theoretic case is standard. By Pn we mean the set of all permutations on {0, 1} . These are block ciphers in the natural way. Thus by CBC2[Pn × Pn ] we mean the scheme where EK1 and EK2 are random permutations from n bits to n bits. 2 n Theorem 1. Let n, σ ≥ 1. Then Advind$ CBC2[Pn ×Pn ] (σ) ≤ σ /2
♦
To avoid having two block-cipher keys one can modify the scheme using tricks like those from [4, 8]. However, it is not necessary to use a CBC-like scheme at all; simple forms of counter mode (CTR) work fine, and such modes have the advantage of being parallelizable and working directly on messages of any bit length. See Fig. 5 and Fig. 6 for two counter-based encryption schemes that n/2 (assume are secure in the ind$-sense. The first has a nonce space of {0, 1} n that n is even) and the second has a nonce space of {0, 1} but uses one extra block-cipher call. When S is an n-bit string and i is a number we denote by S + i the n-bit string which is obtained by treating S as a number (msb first, lsb last), adding i modulo 2n to this number, and then turning the result back into an n-bit string (msb first, lsb last). One should anticipate use of CTR1 only on strings of at most 2n/2 blocks, though the ind$-security of the scheme has already vanished by that point when the block cipher E is a PRP. Similarly, one should anticipate use of CTR2 only on strings of at most 2n blocks, though the ind$-security of the scheme has long before vanished when the function E is a PRP.
Nonce-Based Symmetric Encryption
Algorithm CTR1.EncryptN K (M ) S ← N 0n/2 m ← |M |/n P ← EK (S + 0) · · · EK (S + m − 1) C ← M ⊕ P [first |M | bits] return C
355
Algorithm CTR1.DecryptN K (C) S ← N 0n/2 m ← |M |/n P ← EK (S) · · · EK (S + m − 1) M ← C ⊕ P [first |C| bits] return M
Fig. 5. Scheme CTR1. The nonce space is Nonce = {0, 1}n/2
Algorithm CTR2.EncryptN K (M ) S ← EK (N ) m ← |M |/n P ← EK (S + 0) · · · EK (S + m − 1) C ← M ⊕ P [first |M | bits] return C
Algorithm CTR2.DecryptN K (C) S ← EK (N ) m ← |M |/n P ← EK (S) · · · EK (S + m − 1) M ← C ⊕ P [first |C| bits] return M
Fig. 6. Scheme CTR2. The nonce space is Nonce = {0, 1}n 2 n Theorem 2. Let n, σ ≥ 1. Then Advind$ CTR1[Pn ×Pn ] (σ) ≤ σ /2
♦
2 n Theorem 3. Let n, σ ≥ 1. Then Advind$ CTR2[Pn ×Pn ] (σ) ≤ σ /2
♦
Recall our conventions that when multiple strings are encoded into a single one, as in a query (N, M ), one sums the length of each component in the resource bound σ. This explains the absence of a term like q 2 /2n in the second bound (where q is the number of queries).
6
Stronger Notions of Security
One desirable property of a nonce-based encryption scheme is that an adversarial-produced ciphertext, coupled with its nonce, should be deemed invalid by the receiver unless, of course, it is a copy a prior ciphertext and its nonce. We recall the definition of this property and then look at some other strong properties for a nonce-based encryption scheme. Authenticity. A notion of authenticity of ciphertexts for nonce-based encryption schemes was formalized in [9, 10] following [6, 3, 2]. Fix an encryption scheme Π = (E, D) with key space Key. Let A be a nonce-respecting adversary having an encryption oracle EK . We say that A forges if it outputs a pair (N, C) N (C) = . We such that C was not the response to any EK (N, M ) query and DK write $ EK (·,·) Advauth (A) = Pr K ← Key : A forges Π and lift this to give resource-bounded definitions in the usual way.
356
Phillip Rogaway
Chosen-ciphertext security. We define indistinguishability from random bits under an adaptive chosen-plaintext-and-ciphertext-and-IV attack. The defining game is as with ind$-cpa except that the adversary is given access to a decryption oracle as well. Queries may not be repeated, and one forbids the adversary from making a decryption query of (N, C) if the adversary already encrypted some (N, M ) and got back an answer C; and one similarly forbids the adversary from encrypting (N, M ) if the adversary already decrypted some (N, C) and got back an answer M . These restrictions must hold regardless of the adversary’s coins and query responses. Only such an adversary is deemed to be valid. In defining chosen-ciphertext security one restricts attention to valid, noncerespecting adversaries. Be clear that the nonce-respecting condition applies only to encryption-queries; the adversary is free to repeat nonces in its decryption oracle. This reflects the understanding that the party encrypting a message is the one that is responsible for providing fresh nonces; the receiver may be stateless. Let adversary A be a valid, nonce-respecting adversary and let Π = (E, D) be a nonce-based encryption scheme with key space Key. We define $ -cca (A) = Pr K ← EK (·,·) DK (·,·) $(·,·) DK (·,·) Key : A ⇒ 1 − Pr A ⇒ 1 Advind$ Π The notion can be modified to ind-cca in the natural way. Nonmalleability. The notion of nonmalleability [5] can likewise be adapted to the nonce-based setting. The adversary’s goal will be to create a ciphertext (perhaps by modifying ciphertexts that have already been seen) whose underlying plaintext is something about which the adversary can say something interesting. More specifically, the adversary will output a tuple (N, C, f, Y ) where N ∗ ∗ is a nonce and C is a ciphertext and f : {0, 1} ∪ {} → {0, 1} is a function (encoded as a program) and Y is a string. The output should be interpreted as N (C). The the adversary guessing that the value of f (M ) is Y , where M = DK formalization captures the idea that the adversary should be right about this guess no more often than that which is inherent for the game. We define nonmalleability under chosen-ciphertext attack (meaning chosenplaintext-and-ciphertext-and-IV attack). Fix an encryption scheme Π = (E, D) having key space Key. Consider a valid, nonce-respecting adversary A with access to oracles EK (·, ·) and DK (·, ·). At the end of the adversary’s execution, let Dec(N, C) be {M } if the adversary asked some query EK (N, M ) and this returned C or the adversary asked some query DK (N, C) and this returned M . If the adversary asked no such query, let Dec(N, C) = . One can regard Dec(N, C) as a “fake” decryption of C for nonce N : if the adversary trivially knows the decryption to be M then the value is M ; otherwise, the value is the “guess” that -cca (A) as the ciphertext is invalid. Then define Advnm Π $ $ Pr K ← Key; (N, C, f, Y ) ← AEK (·,·) DK (·,·) ; M ← DK (N, C) : f (M ) = Y − $ $ Pr K ← Key; (N, C, f, Y ) ← AEK (·,·) DK (·,·) ; M ← Dec(N, C) : f (M ) = Y .
Nonce-Based Symmetric Encryption
357
The corresponding notion for nonmalleability under a chosen-plaintext attack (nm-cpa) is obtained by insisting that the adversary asks no decryption queries. Though the above notions might not look like nonmalleability, really they are: the case of creating a ciphertext C whose plaintext M is related to the plaintext M of some other ciphertext C is just a special case. Implications and separations. As with probabilistic encryption, one can work out the complete set of implications and separations between the defined notions of nonce-based encryption. The most useful relations are that ind$ plus auth implies both ind$-cca and nm-cca. The intuition is clear: the auth-condition effectively renders useless the decryption oracle, since it almost always returns an answer that the adversary can anticipate. We omit further details. Achieving ind$+auth by generic composition. None of the ind$-secure encryption schemes given so far (CBC$, CTR1, CTR2) achieve the auth-notion of authenticity (nor do they achieve ind-cca or nm-cca). We now explain the most natural way to modify an ind$-secure encryption scheme so as to achieve authenticity (while preserving ind$-security, of course). Let Π = (E, D) be a nonce-based encryption scheme having nonce space n Nonce = {0, 1} and key space Key. Think of Π as being ind$-secure. Let n F : Key ×Dom → {0, 1} be a function. Think of it as being good as a pseudoran¯ = dom function. We want to combine Π and F to give an encryption scheme Π ¯ D) ¯ that will be ind$-secure and auth-secure. The simplest possibilities are as (E, follows. N • Encrypt-then-PRF. Let E¯K K (M ) = C T where T = FK (N C) and N C = EK (M ). Decryption (including the test for authenticity) proceeds in the natural way. N N • PRF-then-Encrypt. Let E¯K K (M ) = EK (M T ) where T = FK (N M ). Decryption (including the test for authenticity) proceeds in the natural way. The definition above assumes that the encryption scheme Π and the PRF F have appropriately matching domains. The situation is different from conventional probabilistic encryption [2]; for nonce-based encryption, both encrypt-then-PRF and PRF-then-encrypt work correctly. The proofs are straightforward; see [10] for the slightly more complex setting in which “associated data” is present as well.
7
Directions
With the syntax of an encryption having been modified to surface the IV, a number of weaker notions of security for IV-based encryption make sense. For example, to capture the requirement that “the IVs are to be some fixed sequence of distinct values” have the adversary provide a deterministic algorithm F that gives distinct n-bit strings F (1), F (2), . . . , F (2n ). Require indistinguishability from random bits with respect to the resulting scheme.
358
Phillip Rogaway
In this paper we have only treated symmetric encryption. Public-key encryption schemes traditionally do not surface an IV. But they do use random bits, and it makes just as much sense to consider nonce-based public-key encryption schemes as it does to consider nonce-based symmetric encryption schemes. This provides an approach to effectively weakening the requirement for randomness on the sender.
Acknowledgements Some of the ideas of this note were developed in the course of preparing some lectures for Helsinki University of Technology, Finland (April 2002); thanks to Helger Lipmaa for inviting me to give those lectures. Kind thanks also to Mihir Bellare, John Black, and Tom Shrimpton for their useful comments and suggestions. This work was supported under NSF CCR-0208842 and a gift from CISCO Systems.
References [1] M. Bellare, A. Desai, E. Jokipii and P. Rogaway. A concrete security treatment of symmetric encryption: Analysis of the DES modes of operation. Proceedings of 38th Annual Symposium on Foundations of Computer Science (FOCS 97), IEEE, 1997. 349 [2] M. Bellare and C. Namprempre. Authenticated encryption: Relations among notions and analysis of the generic composition paradigm. Advances in Cryptology— Asiacrypt ’00, Lecture Notes in Computer Science, vol. 1976, T. Okamoto, ed., Springer-Verlag, 2000. 355, 357 [3] M. Bellare and P. Rogaway. Encode-then-encipher encryption: How to exploit nonces or redundancy in plaintexts for efficient cryptography. Advances in Cryptology—Asiacrypt ’00. Lecture Notes in Computer Science, vol. 1976, T. Okamoto, ed., Springer-Verlag, 2000. 355 [4] J. Black and P. Rogaway. CBC MACs for arbitrary-length messages: The threekey constructions. Advances in Cryptology—CRYPTO ’00, Lecture Notes in Computer Science, vol. 1880, M. Bellare, ed., Springer-Verlag, pp. 197–215, Aug 2000. 354 [5] D. Dolev, C. Dwork, and M. Naor. Non-malleable cryptography. SIAM J. Computing, vol. 30, no. 2, pp. 391–437, 2000. 356 [6] J. Katz and M. Yung. Unforgeable encryption and chosen ciphertext secure modes of operation. Fast Software Encryption (FSE 2000), Lecture Notes in Computer Science, vol 1978, B. Schneier, ed., Springer, pp. 284–299, 2001. 355 [7] S. Goldwasser and S. Micali. Probabilistic encryption. Journal of Computer and System Sciences, vol. 28, pp. 270–299, April 1984. 348, 349 [8] T. Iwata and K. Kurosawa. One-key CBC MAC. Fast Software Encryption (FSE 2003). Lecture Notes in Computer Science (to appear), 2003. 354 [9] P. Rogaway, M. Bellare, J. Black, and T. Krovetz. OCB: A block-cipher mode of operation for efficient authenticated encryption. Proceedings of the 8th ACM Conference on Computer and Communications Security (CCS ’01), ACM Press, pp. 196–205, 2001. 349, 355
Nonce-Based Symmetric Encryption
359
[10] P. Rogaway. Authenticated-encryption with associated-data. Proceedings of the 9th ACM Conference on Computer and Communications Security (CCS ’02), ACM Press, pp. 98–107, 2002. 349, 355, 357
Ciphers Secure against Related-Key Attacks Stefan Lucks University of Mannheim, Germany http://th.informatik.uni-mannheim.de/people/lucks/index.html
Abstract. In a related-key attack, the adversary is allowed to transform the secret key and request encryptions of plaintexts under the transformed key. This paper studies the security of PRF- and PRP-constructions against related-key attacks. For adversaries who can only transform a part of the key, we propose a construction and prove its security, assuming a conventionally secure block cipher is given. By the terms of concrete security, this is an improvement over a recent result by Bellare and Kohno [2]. Further, based on some technical observations, we present two novel constructions for related-key secure PRFs, and we prove their security under number-theoretical infeasibility assumptions. Keywords: related-key attacks, provable security, pseudorandom functions, block ciphers, concrete security
1 Introduction In a related-key scenario, the adversary can partially control the key. It remains secret to the adversary (i.e., she can’t read it), but she can choose key transformations, modify the key accordingly, and request encryptions under the modified keys. The well-known DES complementation property can be viewed as a vulnerability against a related-key DES-distinguisher. One motivation to study related-key attacks is to evaluate the security of secretkey cryptosystems, namely the security of block ciphers and their “key schedules ”, see Knudsen [11] and Biham [3]. Kelsey, Schneier and Wagner [9, 10] presented related-key attacks against several block ciphers, including three-key triple-DES. Today, relatedkey attacks are a well established tool to evaluate the security of block ciphers, e.g. in the context of the AES [4, 5, 7]. Another motivation is the existence of cryptographic schemes, whose security depends on the related-key security of some underlying primitive. Two examples are tweakable block ciphers by Liskov, Rivest and Wagner [13] and RMAC by Jaulmes, Joux and Valette [8]. Knudsen and Kohno [12] pointed out that the triple-DES based variant of RMAC (which had been proposed for standardisation [6]) can be attacked by exploiting the related-key insecurity of triple-DES. Recently, Bellare and Kohno [1, 2] investigated related-key attacks from a theoretical point of view. They presented an approach to formally handle the notion of relatedkey attacks. As it turned out, the security of a scheme against related-key attacks greatly depends on the adversary’s capabilities, namely on the set of key transformations available to her.
B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 359–370, 2004. c International Association for Cryptologic Research 2004
360
Stefan Lucks
1.1 Focus of this Paper and Overview In the current paper, we follow the approach from [1, 2], presenting some improved possibility results, i.e., constructions for block ciphers (PRPs) and pseudorandom function generators (PRFs), which are provably secure against related-key (RK) adversaries. We first concentrate on “partially-transforming ” adversaries, where a part of the secret key is unaffected and remains constant. Then we deal with stronger “T + -transforming ” adversaries, where the adversary can add a known (or rather chosen) difference to the secret key. See Section 1.2 for the exact definitions. Finally, we provide a short summary. Our main results are: – For some applications of RK-secure PRFs or PRPs, it would suffice to use a cipher being secure against partially-transforming RK adversaries [2]. Section 2 introduces a new construction for secure PRPs provably secure against partiallytransforming adversaries. A similar construction and a proof of security can be found in [2]1 . The concrete complexity (i.e., the upper bound on the adversary’s advantage) shown in [2] turns out to be rather weak, though. The construction in Section 2 allows to prove a better bound. – In Section 3, we explore equivalent constructions for related-key secure PRFs, and we consider the composition of conventionally secure and related-key secure PRFs. Our observations may be useful as a tool for finding PRFs provably secure against more general related-key adversaries, instead of only partially-transforming ones. – Section 4 describes two new PRF-constructions. Based on certain numbertheoretical assumptions, we prove the security of these constructions against T + transforming adversaries. This is a step towards solving a challenge posed in [1, 2]. To the best of the author’s knowledge, these constructions are the only PRFs so far with a standard-model proof of security against group-induced transformations. Note though, that the assumptions we make are new and not well-studied. – Sections 5 and 6 conclude the paper with a remark on using a hash function as a tool to ensure related-key security, and with a summary. 1.2 Notation and Definitions We write PRF for a Pseudo-Random Function generator and PRP for a Pseudo-Random Permutation generator (= block cipher). Let K, D and R be finite sets. We write Perm(D) for the set of permutations over D. I.e., p : D → D is in Perm(D) if and only if p−1 : D → D exists with p−1 (p(d)) = d for all d ∈ D. We view a function F : K × D → R as a family of functions F(k, ·) = Fk (·) indexed by k ∈ K. If additionally D = R and Fk ∈ Perm(D) for all k ∈ K, then F is a family of permutations, also called a block cipher. We write Fk−1 for the inverse of Fk , i.e., for the decryption function. Perm(K, D) denotes the set of all block ciphers E : K × D → D. Below, E : K × D → D denotes a block cipher encryption function and E −1 denotes its inverse. Recall the advantage of an adversary in a (conventional) chosen plaintext attack (cf. e.g. [14]): Given E and an adversary A(CP-oracle) with access to a chosen plaintext 1
. . . , but it was not included in the Eurocrypt version [1] of that paper.
Ciphers Secure against Related-Key Attacks
361
oracle, the PRP-advantage of A when attacking E is the unsigned difference for A to distinguish the real case from a random case: prp AdvE (A) = Pr[k ∈R K : A(Ek (·)) = 1] − Pr[g ∈R Perm(D) : A(g(·)) = 1]. Let k ∈R K be a secret key. A related-key oracle Erk(·,k) (·) is an oracle with two inputs, a key transformation t : K → K, and an element d ∈ D. Given a query (t, d), the related-key oracle responds Et(k) (d) in the real case, which is to be distinguished from a random case. Definition 1 (Security of a PRP under RK attacks). Let the block cipher E and the set of transformations T be given. The adversary A(RK − oracle) with access to a related-key oracle is a T -transforming adversary,2 if she is allowed to choose queries (t, d) ∈ T × D as oracle queries. The PRP-RKadvantage of a T -transforming adversary A when attacking E is prp−rk AdvT,E = Pr[k ∈R K : A(Erk(·,k) (·)) = 1] − Pr[k ∈R K, G ∈R Perm(K, D) : A(Grk(·,k) (·)) = 1] . Here, the real case is the experiment “randomly choose k ∈ K and, on a query (t, d), respond the value Et(k) (d) ”. The random case is: “Randomly choose k ∈ K and G ∈ Perm(K, D), i.e., a family of |K| independent random permutations. Respond Gt(k) (d) to oracle queries (t, d). ” The attack game for A means to distinguish the real from the random case. Similarly, we define the security of a PRF under RK attacks. We concentrate on the following types of key transformations: Group-induced transformations: Let (K, ) be a group. We define T := { f : K → K | ∃δ ∈ K : f (k) = k δ}. In Section 4, we focus on T + , where “+ ” denotes addition mod |K|. Partial transformations: Set K = K1 × K2 for non-empty sets K1 and K2 . T is a set of partial transformations, if T can be rewritten as T = {t | ∃t ∈ T : t(k1 , k2 ) = (k1 , t (k2 ))}, where T is a set of functions K2 → K2 . Collision free sets of transformations: T is collision-free, if, for all k, k ∈ K, there exists at most one t ∈ T with t(k) = k . This is relevant in the context of protocol design, such the previously mentioned RMAC and tweakable block ciphers. Sets of group-induced transformations are collision free. Sets of partial transformations can be collision-free. 2
[1, 2] call this a “T -restricted adversary ”. This could be misleading, since RK adversaries appear to be enhanced and not restricted, in comparison to conventional adversaries.
362
Stefan Lucks
2 Secure PRPs and Partial Transformations Set K ∗ := {0, 1}m+n and consider a set T of partial transformations: T ⊆ {t ∈ {K ∗ → K ∗ } | ∃τ : {0, 1}n → {0, 1}n : t(x, y) = (x, τ(y))}.
(1)
Let E : {0, 1}m × {0, 1}n → {0, 1}n be a block cipher and consider 0 (M) = E X (M). E 0 : {0, 1}m+n × {0, 1}n → {0, 1}n , E(X,Y)
The adversary has no control over the key X in use. So if E is conventionally secure, shouldn’t E 0 be secure against T -transforming adversaries? Consider the following adversary: Choose transformations σ, τ : {0, 1}n → {0, 1}n with σ(Y) τ(Y) being likely for random Y. Ask for the encryptions of a random plaintext M under (X, σ(Y)) and (X, τ(Y)). In the real case (encryption using E0 ), you get the same answer both times. In the random case, if σ(Y) τ(Y) then M is encrypted under two independent random permutations – and the two answers are probably different. By comparing the two answers, the adversary can win her attack game. So we need a different construction. Assume E (as above) being conventionally secure and consider E : {0, 1}m+n × {0, 1}n → {0, 1}n , E(X,Y) (M) = E X (Y ⊕ E X (M)).
Theorem 1 (Security of E [2]). Let K ∗ , T , E, and E be as above. Let A be a T transforming adversary. We limit the oracle-queries (ti , xi, j ) made by A as follows: r is the number of different transformations ti and q is the highest number of different queries (ti , xi, j ) for ti . (Formally r and q are any transformation defined as r = {ti ∈ T | ∃ query (ti , ·)} and q = maxti {xi, j ∈ {0, 1}n | ∃ query (ti , xi, j )}. 3 ) For any such RK-adversary A attacking E , we can construct a chosen plaintext adversary A attacking E with AdvE (A) ≥ AdvT,E (A ) − prp
prp−rk
16r2 q2 + rq (q − 1) , 2n+1
where q = q ∗ maxk,k ∈{0,1}m+n |{ transformations t ∈ T with t(k) = k }|, and A needs the same running time as A . Theorem 1 describes the concrete security of E , depending on the security of E. As usual with concrete security analysis, Theorem 1 should provide a practically relevant prp security assurance for security architects. Intuitively: The difference between AdvE (A) prp−rk and AdvT,E (A ) is low (or rather negligible). Unfortunately, this only holds for large n. E.g., with E=AES and thus n = 128, the difference may exceed 16r2 q2 /2n+1 , even if T is collision-free. Assume the AES to be practically secure against chosen plaintext attacks. This means that the advantage of any “reasonable-time ” adversary A against E prp is AdvE (A) = ≈ 0. Allow for r = q = 231 . Since n = 128, a “reasonable-time ” RKprp−rk adversary A can exist, who distinguishes E from random with AdvT,E (A ) > 1/2 + . 3
Hence, the actual number of oracle queries A makes is between (r + q − 1) and rq.
Ciphers Secure against Related-Key Attacks
363
The number of oracle queries made by A can be as low as p +q −1 < 232 . Therefore, E can be insecure in practice, in spite of Theorem 1 and the (assumed) security of E=AES. We don’t claim A exists – but we would like to prove its nonexistence. Thus, it is practically interesting to find an improved bound, either for construction E , or an alternative construction. Below, we consider (M) = E EX (Y) (M), E : {0, 1}2n × {0, 1}n → {0, 1}n , E(X,Y)
where E : {0, 1}n × {0, 1}n → {0, 1}n is conventionally secure, as above. Thus, we have K ∗ = {0, 1}2n , and T is a set of partial transformations, as before (Eq. 1). For simplicity, we additionally require T to be collision-free. Theorem 2 (Security of E ). Let K ∗ = {0, 1}2n , T a collision-free set of partial transfor E . Count the transformations in formations. A is a T -transforming adversary A -queries by r = {ti ∈ T | ∃ query (ti , ·)} . Then a chosen plaintext adversary A for E exists, making no more oracle queries than A, with the same running time as A and the advantage prp−rk AdvT,E (A ) prp . AdvE (A) ≥ r+1 Proof. Assume the nonexistence of an adversary A for E with the advantage a ≥ prp−rk AdvT,E (A )/(r + 1). Observe that the oracle queries (ti , di, j) from A can be viewed as accessing r different oracles, each implementing a permutation. So in the real case, A is querying the r-tuple P = (E EX (t1 (Y)) , . . . , E EX (tr (Y)) ) of permutations over {0, 1}n . An oracle query (ti , di, j) is equivalent to asking the i-th permutation pi = E EX (ti (Y)) for pi (di, j ).4 Due to the collision-freeness of T , we have ti (Y) t j (Y) for ti t j , thus E X (ti (Y)) E X (t j (Y)). Hence, the r permutations in P are defined by r different keys E X (t1 (Y)), . . . , E X (tr (Y)). But in the random case, the tuple of permutations can actually be viewed as r independent random permutations Ei∗ over {0, 1}n . We write this tuple as ∗ Pr = (E1∗ , . . . , Er−1 , Er∗ ).
The attack game of A is equivalent to distinguishing the r-tuple P of permutations prp−rk from Pr . In doing so, the advantage of A is AdvT,E (A ). There are other ways to respond to oracle queries (ti , di, j ), different from both the real and the random case. Let E ∗ be a random permutation, and replace E EX (ti (Y)) (M) by E E∗ (ti (Y)) (M). This way, we get r independent random values Zi = E ∗ (ti (Y)), and a new r-tuple of permutations P0 = (EZ1 , . . . , EZr ). 4
This does not restrict the order in which A makes her oracle queries. After making an oracle query (ti , ·), and having seen the answer, A may of course freely choose some queries (ti , ·) for arbitrary values i ∈ {1, . . . , i}.
364
Stefan Lucks
Distinguishing P0 from P is equivalent to distinguishing E X from E ∗ , which is exactly the attack game for A. From the assumption on A, we conclude that A can only distinguish P from P0 with an advantage < a. What is the advantage of A in distinguishing P0 from Pr ? Consider the r-tuples P1 = (EZ1 , . . . , EZr−2 , EZr−1 , Er∗ ), ∗ P2 = (EZ1 , . . . , EZr−2 , Er−1 , Er∗ ), .. .
∗ ∗ , Er−1 , Er∗ ). Pr = (E1∗ , . . . , Er−2 If, for any i ∈ {1, . . . , r}, A could distinguish Pi−1 from Pi with an advantage a, then A could as well distinguish EZi from Ei∗ in the same running time. Since Zi is just a random value, and Ei∗ a random function, independent from the other values and functions here, distinguishing EZi from Ei∗ is (again) equivalent to winning the attack game for A. Thus, the advantage of A to distinguish Pi−1 from Pi must be less than a. Finally, we put things together: A can only distinguish P from P0 with an advantage less than a, A can only distinguish P0 from P1 with an advantage less than a, . . . , A can only distinguish Pr−1 from Pr with an advantage less than a. Consequently, the advantage for A in distinguishing P from Pr must be strictly smaller than (r + 1)a. See the picture below.
Pr−1
Pr
< (r+1)a By the definition of a, we know that A can distinguish P from Pr with the advantage (r + 1)a. This contradicts the assumption on A. Theorem 2 implies that if E is practically secure and r is not overwhelmingly large, then E is secure, too. As above, consider E=AES (with a key size of 128 bit) and assume the AES to be practically secure against chosen plaintext attacks. This means prp that the advantage of any “reasonable-time ” adversary A against E is AdvE (A) = ≈ 32 32 0. Restrict A to less than 2 oracle queries, thus r < 2 . In this case, attacking E can be at most 232 -times better (i.e. lead to an advantage 232 -times as large), compared to an attack on the AES in the same running time. We argue that the bound in Theorem 2 is sharp, and hence our result is close to optimal: The attack scenario on E allows the adversary to see encryptions under r different keys. Consider an exhaustive key-search attack against E , trying to find any of the 2r keys and compare it with an exhaustive key-search attack against E. The chances of successfully attacking E are 2r -times better than the chances of successfully attacking E.
3 Equivalence and Composition of PRF Constructions In this section, we make some technical observations. While rather simple, these observations may nevertheless be useful both for understanding the phenomenon of relatedkey security, and for designing ciphers provably secure against related-key attacks.
Ciphers Secure against Related-Key Attacks
365
Let F be a function F : K × D → R (which equivalently is a family of functions D → R). For F, we consider a set of transformations T ⊆ {K → K}. Let D = D × D (where even |D | = 1 or |D | = 1 is allowed). We can rewrite F as a function F with K
F : (K × D ) × D → R. Equivalently, F is a family of functions D → R. We consider the set T of transformations: T = {t : K → K | ∃t ∈ T, d ∈ D : t (k, x) = (t(k), d )}. Theorem 3 (Equivalence of F and F ). 1. Let A be a T -transforming RK adversary for F. A T -transforming RK adversary A for F exists, with the same running time, the same number of oracle queries and the same advantage. 2. Let A be a T -transforming RK adversary for F . A T -transforming RK adversary A for F exists, with the same running time, the same number of oracle queries and the same advantage. Proof. Consider claim 1 and the T -transforming RK adversary A for F. A’s queries are of the form (t, (d , d )) ∈ T × (D × D ). Our T -transforming adversary A for F is identical to T , except that each query (t, (d , d )) is replaced by the equivalent query ((t, d ), d ) ∈ K ×D . Thus, T makes exactly the same number of oracle queries, needs the same running time and wins the attack game with the same advantage as T . Proving claim 2 is similar. In the context of Theorem 3, we even allowed |D | = 1. In this case, F : K × D → R can, of course, be rewritten as F : K → R . This apparently trivial case is worth investigating. By means of some function F : R × D → R, we define a composed function F : K × D → R, Fk (d) = F F (k) (d). Theorem 4 (Security of composed function F). Let A be a T -transforming adversary for F. We can construct a T -transforming adversary A for F , and a chosen ciphertext adversary A for F , such that neither the running time of A , nor the running time of A exceed the running time of A, and the following condition holds: AdvT,F (A) ≤ AdvT,F (A ) + AdvF (A ). prf−rk
prf−rk
prf
(2)
Neither A nor A makes more oracle queries than A. Proof. Let k ∈ K be a random key, unknown to the adversary A. A distinguishes between the events Real and Random – Real: All responses to oracle queries (δ, d) ∈ K × D are generated as F F (k+δ) (d). – Random: Let F ∗ : K × D → R be a random function. All responses to oracle queries (δ, d) ∈ K × D are generated as F ∗ (δ, d).
366
Stefan Lucks
We introduce a third event, K -Random: – K -Random: Let F ∗∗ : K → R be a random function. All responses to oracle queries (δ, d) ∈ K × D are generated as F F∗∗ (δ) (d). Distinguishing Real from K -Random means to distinguish K from a random function. This is exactly the task A is supposed to do. Thus, we can turn A into A without increasing either running time or number of queries. Similarly, we observe that if we can distinguish K -Random from Random, we can mount a chosen plaintext attack against F and thus turn A into A , again with the same running time and number of queries. What about Condition 2? See the picture below. A
Real
Random
K’−Random
A’
A’’
With A and A as described above, condition 2 holds.
In short, Theorem 4 implies that if F is practically secure against T -transforming adversaries and F is practically secure against chosen ciphertext adversaries, then F must be practically secure against T -transforming adversaries. This provides us with a tool for finding RK-secure PRFs, or proving their existence under reasonable assumptions: Let T , K , D and R be given. We are searching for F : K × D → R, practically secure w.r.t. T -transforming adversaries. It is sufficient to choose an appropriate R and a conventionally secure PRF F : R × D → R, and then search for a function F : K → R , practically secure w.r.t. T -transforming adversaries. The idea is that finding F may be less difficult than finding F directly. In the next section, we concentrate on finding appropriate functions F .
4 PRFs and Group-Induced Key Transformations In this section, we describe two PRFs and prove their security against T + -transforming adversaries under certain assumptions from algorithmic number-theory. This is a step towards solving a “challenging problem ” posed by Bellare and Kohno [1, 2]. Note though, that our assumptions are non-standard and have not much been studied much, so far. It remains an open problem, to describe some PRFs or PRPs and reduce their security against T + -transforming adversaries to some cryptographic standard assumption, such as Decisional Diffie-Hellman, Quadratic Residousity, or others.
Ciphers Secure against Related-Key Attacks
367
4.1 The RSA-Based PRF FRSA Let N be the product of two large random primes. We define the function FRSA : N → N , FRSA (k) := kN mod N. To evaluate the security of FRSA , we define an appropriate problem:
Definition 2. Let N be the product of two large random primes. Let R be a random value in N . Define f (x) = (x + R)N mod N. (3) Interactive Dependent RSA Problem (IDRP): Distinguish f from a random function N → N . The distinguisher is given N and oracle access to the function (but neither R nor the factors of N). Interactive Dependent RSA Assumption: The IDRP is infeasible. Some remarks on the IDRP: 1. We can make the above scheme and the IDRP more “RSA-like ” by choosing any large RSA-exponent e (that means, e and ϕ(N) have no common divisors) and rewriting Equation 3 by f (x) = (x + R)e mod N. If e is small, however, this variant of the IDRP is feasible [15]. 2. The IDRP can be seen as a generalisation of Pointcheval’s Dependent-RSA problem [15]: For independent random x, y ∈ N , distinguish the random pair (x, y) from the pair (xe mod N, (x + 1)e mod N). Given an efficient algorithm to solve the Dependent-RSA problem, we could efficiently solve the IDRP. ). Under the Interactive Dependent RSA Assumption, no Theorem 5 (Security of FRSA efficient T + -transforming adversary with significant advantage for FRSA can exist. to be insecure. Then an effiProof. The proof is quite straightforward. Assume FRSA wins the following attack game with sigcient T + -transforming adversary A for FRSA nificant advantage:
– choose δ ∈ N , define a key transformation tδ (k) = k + δ mod N, (tδ (k)) = FRSA (k + δ) = (k + δ)N mod N (with k unknown), – ask for FRSA – and, after repeating the above two steps a couple of times, distinguish the results from the outputs of a random function. This attack game is equivalent to solving the IDRP.
4.2 The Diffie-Hellman based PRF FDH In [1, 2], Bellare and Kohno consider two PRF-constructions which are provably secure against chosen plaintext adversaries under the Decisional Diffie-Hellman assumption. Both turn out to be insecure against additive-transforming adversaries. How can we define a Diffie-Hellman based PRF, with plausible hope for security against additivetransforming adversaries?
368
Stefan Lucks
Let P, P2 , P3 and P4 be primes, P = 2P2 + 1, P2 = 2P3 + 1. P3 = 2P4 + 1. Let g be an element of order P2 in ∗P . Let g2 be an element of order P3 in ∗P2 . Let g3 be an element of order P4 in ∗P3 . As before, the key transformations are additions (below, we will formally define the set T + in this context). We consider the following functions: – The function F1 (k) = gk mod P is weak, since gk+δ = gk ∗ gδ mod P. Thus, given gk = F1 (k+0) and δ, we can compare a response from the RK oracle with F1 (k+0) = gk ∗ gδ mod P. This is used in [1, 2] for straightforward RK attacks certain DiffieHellman based PRFs. k k+δ k δ – Similarly, the function F2 (k) = g(g2 ) mod P is also weak, since g(g2 ) = g(g2 )(g2 ) = k (gδ2 ) mod P. g(g2 ) – The function (gk ) g 3 mod P FDH (k) = g 2 looks like a promising candidate. For FDH (k), the set of keys is P4 . Consequently, our set T + of key transformations is defined by the addition modulo P4 .
Definition 3. Let P, P4 , g, g2 , and g3 be defined as above. Let r be a random value in ∗P4 . Define f (x) = g
(g x+r ) g2 3
mod P.
(k)}. Define R = {z ∈ P | ∃k ∈ P4 : z = FDH Diffie-Hellman Random Function Assumption (DHRFA): It is infeasible, to distinguish f from a random function P4 → R. Theorem 6 (Security of FDH ). Under the DHRFA, there exists no efficient T + -transforming adversary for FDH with significant advantage.
The proof of Theorem 6 is similar to the proof of Theorem 5 and omitted here.
5 Using a Hash Function As Ross Anderson pointed out at the FSE workshop in Delhi, a common engineering technique to ensure related-key security is to combine a block cipher E with a hash function H, defining a new block cipher E XH (M) = E H(X) (M). This is a reasonable construction. In fact, for many types of related-key adversaries – including those considered in the current paper – it is straightforward to prove the security of E H in the random oracle model, assuming E to be conventionally secure. This approach has the following drawbacks, however:
Ciphers Secure against Related-Key Attacks
369
– For implementing E H , we need to implement two cryptographic primitives E and H. – The security of E H depends on the security of E and on the security of H. If, e.g., E is conventionally secure but H fails to meet its security requirements, E H can be insecure. – A random oracle proof of security for E H does reveal the security requirements for H. On the other hand, it may be possible to prove the security of E H against certain kinds of related-key adversaries in the standard model, making some nonstandard assumptions on H.
6 Summary This paper presented new constructions for related-key secure PRFs. For one construction, a tight security bound against partially-transforming adversaries has been shown, improving the concrete complexity of previous constructions. The proof assumes some block cipher to be secure in the conventional sense (i.e., without related keys). Two other constructions are shown secure against more general adversaries, however under certain non-standard number-theoretical assumptions.
References [1] M. Bellare, T. Kohno. A theoretical treatment of related-key attacks: RKA-PRPs RKAPRFs and applications. E. Biham, editor, Eurocrypt 2003, Springer Lecture Notes in Computer Science # 2654, pp. 491–506. 359, 360, 361, 366, 367, 368, 369 [2] M. Bellare, T. Kohno. A theoretical treatment of related-key attacks: RKA-PRPs RKAPRFs and applications. March 18, 2003. Full version of [1]. http://www.cs.ucsd.edu/users/tkohno/papers/RKA/ (URL checked: Jan. 14, 2004). 359, 360, 361, 362, 366, 367, 368 [3] E. Biham. New types of cryptanalytic attacks using related keys. T. Helleseth, editor, Eurocrypt 93, Springer Lecture Notes in Computer Science # 765, pp. 398–409. 359 [4] J. Daemen, V. Rijmen. AES proposal: Rijndael. 359 [5] J. Daemen, V. Rijmen. The design of Rijndael, Springer-Verlag, 2002. 359 [6] Morris Dworkin. DRAFT Recommendation for block cipher modes of operation: the RMAC authentication mode. NIST Special Publication 800-38b. October 18, 2002. 359 [7] N. Ferguson, J. Kelsey, S. Lucks, B. Schneier, M. Stay, D. Wagner, D. Whiting. Improved cryptanalysis of Rijndael. B. Schneier, editor, Fast Software Encryption 2000, Springer Lecture Notes in Computer Science # 1978, pp. 213–230. 359 [8] E. Jaulmes, A. Joux, F. Valette. On the security of randomized CBC-MAC beyond the birthday paradox limit: A new construction. J. Daemen, V. Rijmen, editors, Fast Software Encryption 2002, Springer Lecture Notes in Computer Science. 359 [9] J. Kelsey, B. Schneier, D. Wagner. Key-schedule cryptanalysis of IDEA, G-DES, GOST, SAFER, and Triple-DES. N. Koblitz, editor, Crypto ’96, Springer Lecture Notes in Computer Science # 1109, pp. 237–251. 359
370
Stefan Lucks
[10] J. Kelsey, B. Schneier, D. Wagner. Related-key cryptanalysis of 3-Way, Biham-DES, CAST, DES-X, NewDES, RC2, and TEA. Y. Han, T. Okamoto, S. Quing, editors, Information and Communications Security ’97, Springer Lecture Notes in Computer Science # 1334, pp. 233–246. 359 [11] L. Knudsen. Cryptanalysis of LOKI91. J. Seberry, Y. Zheng, editors, Auscrypt ’92, Springer Lecture Notes in Computer Science # 718, pp. 196–208. 359 [12] L. Knudsen, T. Kohno. Analysis of RMAC. T. Johansson, editor, Fast Software Encryption 2003, Springer Lecture Notes in Computer Science # 2887, pp. 182–191. 359 [13] M. Liskov, R. Rivest, D. Wagner. Tweakable block ciphers. M. Yung, editor, Crypto ’02, Springer Lecture Notes in Computer Science # 2422, pp. 31–46. 359 [14] M. Naor, O. Reingold. On the construction of pseudo-random permutations: Luby-Rackoff revisited. J. of Cryptology, vol 12, 1999, pp. 29-66. 360 [15] D. Pointcheval. New public key cryptosystems based on the dependent-RSA problems. Jacques Stern, editor, Eurocrypt 1999, Springer Lecture Notes in Computer Science # 1592, pp. 239-254. 367
Cryptographic Hash-Function Basics: Definitions, Implications, and Separations for Preimage Resistance, Second-Preimage Resistance, and Collision Resistance Phillip Rogaway1,2 and Thomas Shrimpton3 1
2
Dept. of Computer Science, University of California Davis, CA 95616, USA [email protected] www.cs.ucdavis.edu/~rogaway Dept. of Computer Science, Fac of Science, Chiang Mai University, 50200 Thailand 3 Dept. of Electrical and Computer Engineering, University of California Davis, CA 95616, USA [email protected] www.ece.ucdavis.edu/~teshrim
Abstract. We consider basic notions of security for cryptographic hash functions: collision resistance, preimage resistance, and second-preimage resistance. We give seven different definitions that correspond to these three underlying ideas, and then we work out all of the implications and separations among these seven definitions within the concrete-security, provable-security framework. Because our results are concrete, we can show two types of implications, conventional and provisional , where the strength of the latter depends on the amount of compression achieved by the hash function. We also distinguish two types of separations, conditional and unconditional . When constructing counterexamples for our separations, we are careful to preserve specified hash-function domains and ranges; this rules out some pathological counterexamples and makes the separations more meaningful in practice. Four of our definitions are standard while three appear to be new; some of our relations and separations have appeared, others have not. Here we give a modern treatment that acts to catalog, in one place and with carefully-considered nomenclature, the most basic security notions for cryptographic hash functions. Keywords: collision resistance, cryptographic hash functions, preimage resistance, provable security, second-preimage resistance.
1
Introduction
This paper casts some new light on an old topic: the basic security properties of cryptographic hash functions. We provide definitions for various notions of collision-resistance, preimage resistance, and second-preimage resistance, and B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 371–388, 2004. c International Association for Cryptologic Research 2004
372
Phillip Rogaway and Thomas Shrimpton
then we work out all of the relationships among the definitions. We adopt a concrete-security, provable-security viewpoint, using reductions and definitions as the basic currency of our investigation. Informal treatments of hash functions. Informal treatments of cryptographic hash functions can lead to a lot of ambiguity, with informal notions that might be formalized in very different ways and claims that might correspondingly be true or false. Consider, for example, the following quotes, taken from our favorite reference on cryptography [9, pp. 323–330]: preimage-resistance — for essentially all pre-specified outputs, it is computationally infeasible to find any input which hashes to that output, i.e., to find any preimage x such that h(x ) = y when given any y for which a corresponding input is not known. 2nd-preimage resistance — it is computationally infeasible to find any second input which has the same output as any specified input, i.e., given x, to find a 2nd-preimage x = x such that h(x) = h(x ). collision resistance — it is computationally infeasible to find any two distinct inputs x, x which hash to the same output, i.e., such that h(x) = h(x ). Fact Collision resistance implies 2nd-preimage resistance of hash functions. Note (collision resistance does not guarantee preimage resistance)
In trying to formalize and verify such statements, certain aspects of the English are problematic and other aspects aren’t. Consider the first statement above. Our community understands quite well how to deal with the term computationally infeasible. But how is it meant to specify the output y? (What, exactly, do “essentially all” and “pre-specified outputs” mean?) Is hash function h to be a fixed function or a random element from a set of functions? Similarly, for the second quote, is it really meant that the specified point x can be any domain point (e.g., it is not chosen at random)? As for the bottom two claims, we shall see that the first is true under two formalizations we give for 2nd-preimage resistance and false under a third; the second statement is true for all hash functions under two formalizations of preimage resistance, while under a third the strength of this separation depends on the extent to which the hash function is compressing. 1 Scope. In this paper we are going to examine seven different notions of security for a hash function family H : K × M → {0, 1}n. For a more complete discussion of nomenclature, see Appendix A and reference [9]. 1
We emphasize that it is most definitely not our intent here to criticize one of the most useful books on cryptography; we only use it to help illustrate that there are many ways to go when formalizing notions of hash-function security, and how one chooses to formalize things matters for making even the most basic of claims.
Cryptographic Hash-Function Basics Name
Find
Experiment
Some Aliases
Pre ePre aPre Sec eSec aSec Coll
preimage preimage preimage 2nd-preimage 2nd-preimage 2nd-preimage collision
random key, random challenge random key, fixed challenge fixed key, random challenge random key, random challenge random key, fixed challenge fixed key, random challenge random key (no challenge)
OWF
373
weak CR UOWHF strong CR, collision-free
How did we arrive at exactly these seven notions? We set out to be exhaustive. For two of our goals—finding a preimage and finding a second preimage—it makes sense to think of three different settings: the key and the challenge being random; the key being random and the challenge being fixed; or the key being fixed and the challenge being random. It makes no sense to think of the key and the challenge as both being fixed, for a trivial adversary would then succeed. For the final goal—finding a collision—there is no challenge and one is compelled to think of the key as being random, for a trivial adversary would prevail if the key were fixed. We thus have 2 · 3 + 1 = 7 sensible notions, which we name Pre, ePre, aPre, Sec, eSec, aSec, and Coll. The leading “a” in the name of a notion is meant to suggest always: if a hash function is secure for any fixed key, then it is “always” secure. The leading “e” in the name of a notion is meant to suggest everywhere: if a hash function is secure for any fixed challenge, then it is “everywhere” secure. Notions Coll, Pre, Sec, eSec are standard; variants ePre, aPre, and aSec would seem to be new. Comments. The aPre and aSec notions may be useful for designing higher-level protocols that employ hash functions that are to be instantiated with SHA1-like objects. Consider a protocol that uses an object like SHA1 but says it is using a collision-resistant hash function, and proves security under such an assumption. There is a problem here, because there is no natural way to think of SHA1 as being a random element drawn from some family of hash functions. If the protocol could instead have used an aSec-secure hash-function family, doing the proof from that assumption, then instantiating with SHA1 would seem to raise no analogous, foundational issues. In short, assuming that your hash function is aSec- or aPre-secure serves to eliminate the mismatch of using a standard cryptographic hash function after having done proofs that depend on using a random element from a hash-function family. Contributions. Despite the numerous papers that construct, attack, and use cryptographic hash functions, and despite a couple of investigations of cryptographic hash functions whose purpose was close to ours [15, 16], the area seems to have more than its share of conflicting terminology, informal notions, and assertions of implications and separations that are not supported by convincing proofs or counterexamples. Our goal has been to help straighten out some of the basics. See Appendix A for an abbreviated exposition of related work.
374
Phillip Rogaway and Thomas Shrimpton
We begin by giving formal definitions for our seven notions of hash-function security. Our definitions are concrete (no asymptotics) and treat a hash function H as a family of functions, H: K × M → {0, 1}n. After defining the different notions of security we work out all of the relationships among them. Between each pair of notions xxx and yyy we provide either an implication or a separation. Informally, saying that xxx implies yyy means that if H is secure in the xxx-sense then it is also secure in the yyy-sense. To separate notions, we say, informally, that xxx nonimplies yyy if H can be secure in the xxx-sense without being secure in the yyy-sense.2 Our implications and separations are quantitative, so we provide both an implication and a separation for the cases where this makes sense. Since we are providing implications and separations, we adopt the strongest feasible notions of each, in order to strengthen our results. We actually give two kinds of implications. We do this because, in some cases, the strength of an implication crucially depends on the amount of compression achieved by the hash function. For these provisional implications, if the hash function is substantially compressing (e.g., mapping 256 bits to 128 bits) then the implication is a strong one, but if the hash function compresses little or not at all, then the implication effectively vanishes. It is a matter of interpretation whether such a provisional implication is an implication with a minor “technical” condition, or if a provisional implication is fundamentally not an implication at all. A conventional implication is an ordinary one; the strength of the implication does not depend on how much the hash function compresses. We will also use two kinds of separations, but here the distinction is less dramatic, as both flavors of separations are strong. The difference between a conventional separation and an unconditional separation lies in whether or not one must effectively assume the existence of an xxx-secure hash function in order to show that xxx nonimplies yyy. When we give separations, we are careful to impose the hash-function domain and range first; we don’t allow these to be chosen so as to make for convenient counterexamples. This makes the problem of constructing counterexamples harder, but it also make the results more meaningful. For example, if a protocol designer wants to know if collision-resistance implies preimage-resistance for a 160-bit hash function H, what good is a counterexample that uses H to make a 161-bit hash function H that is collision resistant but not preimage-resistant? It would not engender any confidence that collision-resistance fails to imply preimage-resistance when all hash functions of interest have 160-bit outputs. Some of the counterexamples we use may appear to be unnatural, or to exhibit behavior unlike “real world” hash functions. This is not a concern; our goal is to demonstrate when one notion does not imply another by constructing counterexamples that respect imposed domain and range lengths; there is no need for the examples to look natural. 2
We say “nonimplies” rather than “does not imply” because a separation is not the negation of an implication; a separation is effectively stronger and more constructive than that.
Cryptographic Hash-Function Basics
375
Coll aSec
eSec
Sec ePre
aPre
Pre Fig. 1. Summary of the relationships among our seven notions of hash-function security. Solid arrows represent conventional implications, dotted arrows represent provisional implications (their strength depends on the relative size of the domain and range), and the lack of an arrow represents a separation
Our findings are summarized in Fig. 1, which shows when one notion implies the other (drawn with a solid arrow), when one notion provisionally implies the other (drawn with a dotted arrow), and when one notion nonimplies the other (we use the absence of an arrow and do not bother to distinguish between the two types of nonimplications). In Fig. 2 we give a more detailed summary of the results of this paper.
2
Preliminaries
We write M ← S for the experiment of choosing a random element from the distribution S and calling it M . When S is a finite set it is given the uniform distribution. The concatenation of strings M and M is denoted by M M or M M . When M = M1 · · · Mm ∈ {0, 1}m is an m-bit string and 1 ≤ a ≤ b ≤ m we write M [a..b] for Ma · · · Mb . The bitwise complement of a string M is written M . The empty string is denoted by ε. When a is an integer we write ar for the r-bit string that represents a. A hash-function family is a function H: K × M → Y where K and Y are finite nonempty sets and M and Y are sets of strings. We insist that Y = {0, 1}n for some n > 0. The number n is called the hash length of H. We also insist that if M ∈ M then {0, 1}|M| ⊆ M (the assumption is convenient and any reasonable hash function would certainly have this property). Often we will write the first argument to H as a subscript, so that HK (M ) = H(K, M ) for all M ∈ M. When H: K × M → Y and {0, 1}m ⊆ M we denote by TimeH,m the minimum, over all programs PH that compute H, of the length of PH plus the worstcase running time of PH over all inputs (K, M ) where K ∈ K and M ∈ {0, 1}m; $
376
Phillip Rogaway and Thomas Shrimpton
Pre Pre
→
ePre
→
(l)
aPre
→
(l)
→ Sec → eSec → aSec Coll →
to δ1 (a) to δ2 (b) to δ1 (a) to δ2 (c) to δ1 (a) to δ2 (b)
to δ1 (a)
ePre →
to δ3 (d)
→ →
to δ3 (d)
→
to δ3 (d)
→ →
(f)
to δ3 (d)
→ (g)
aPre
Sec
eSec
aSec
Coll
→
to δ4 (e)
→
(h)
→
(h)
→
(h)
→
(h)
→
to δ4 (e)
→
(h)
→
(h)
→
(h)
→
(h)
→
(h)
→
(h)
→
(h)
→
(h)
→ →
to δ4 (e)
→
to δ4 (e)
→ →
to δ1 (a) to δ2 (b)
to δ4 (e)
→ →
(l)
→
(l)
→
(l)
→
to δ5 (i)
→ →
→
to δ4 (e)
→
→
to δ4 (e)
→
→
to δ5 (i)
→
(l)
→
to δ4 (e)
→
to δ5 (i) to δ5 (j) (k)
to δ5 (i)
→
Fig. 2. Summary of results. The entry at row xxx and column yyy gives the relationships we establish between notions xxx and yyy. Here δ1 = 2n−m , δ2 = 1 − 2n−m−1 , δ3 = 2−m , δ4 = 1/|K|, and δ5 = 21−m . The hash functions H1, . . . , H6 and G1, G2, G3 are specified in Fig. 3. The annotations (a)-(j) mean: (a) see Theorem 1; (b) by G1, see Proposition 2; (c) by G3, see Proposition 3; (d) by H1, see Theorem 5; (e) by H2, see Theorem 5 (f) by H6, see Theorem 4; (g) by H6, see Theorem 3; (h) by H3, see Theorem 5; (i) by H4, see Theorem 5; (j) by G2, see Proposition 4; (k) by H5, see Theorem 2; (l) see Proposition 1
plus the the minimum, over all programs PK that sample from K, of the time to compute the sample plus the size of PK . We insist that PH read its input, so that TimeH,m will always be at least m. Some underlying RAM model of computation must be fixed. An adversary is an algorithm that takes any number of inputs. Some of these inputs may be long strings and so we establish the convention that the adversary can read the ith bit of argument j by writing (i, j), in binary, on distinguished query tape. The resulting bit is returned to the adversary in unit time. If A is an adversary and Advxxx H (A) is a measure of adversarial advantage already defined xxx then we write Advxxx H (R) to mean the maximal value of AdvH (A) over all adversaries A that use resources bounded by R. In this paper it is sufficient to consider only the resource t, the running time of the adversary. By convention, the running time is the actual worst case running time of A (relative to some fixed RAM model) plus the description size of A (relative to some fixed encoding of algorithms).
3
Definitions of Hash-Function Security
Preimage resistance. One would like to speak of the difficulty with which an adversary is able to find a preimage for a point in the range of a hash function. Several definitions make sense for this intuition of inverting.
Cryptographic Hash-Function Basics
377
0n if M = 0m H1K (M ) = HK (M ) otherwise 0n if K = K0 H2K (M ) = HK (M ) otherwise H3bK (M ) = HK (M [1..m − 1] b) 0n if M = 0m or M = 1m H4K (M ) = HK (M ) otherwise HK (0m−n HK (c)) if M = 1m−n HK (c) (1) H5cK (M ) = (2) HK (M ) otherwise ⎧n m ⎪ (1) ⎨0 if M = 0 H6K (M ) = HK (M ) if M = 0m and HK (M ) = 0n (2) ⎪ ⎩ (3) HK (0m ) otherwise M [1..n] if M [n + 1..m] = 0m−n G1K (M ) = n 0 otherwise 1n−m K if M ∈ {K, K} G2K (M ) = n−m M otherwise 0 in if M = (K + i) mod 2m m for some i ∈ [1..2n − 1] G3K (M ) = n 0 otherwise Fig. 3. Given a hash function H: K × {0, 1}m → {0, 1}n we construct hash functions H1, . . . , H6: K × {0, 1}m → {0, 1}n for our conditional separations. The value K0 ∈ K is fixed and arbitrary. The hash functions G1: {ε} × {0, 1}m → {0, 1}n , G2: {0, 1}m × {0, 1}m → {0, 1}n , G3: {1, . . . , 2m − 1} × {0, 1}m → {0, 1}n , are used in our unconditional separations
Definition 1 [Types of preimage resistance] Let H = K × M → Y be a hash-function family and let m be a number such that {0, 1}m ⊆ M. Let A be an adversary. Then define: $ $ $ (A) = Pr K ← K; M ← {0, 1}m ; Y ← HK (M ); M ← A(K, Y ) : HK (M ) = Y $ $ AdvePre H (A) = max Pr K ← K; M ← A(K) : HK (M ) = Y Y ∈Y $ $ aPre [m] AdvH (A) = max Pr M ← {0, 1}m ; Y ← HK (M ); M ← A(Y ) : K∈K HK (M ) = Y Pre [m]
AdvH
The first definition, preimage resistance (Pre), is the usual way to define when a hash-function family is a one-way function. (Of course the notion is different from a function f : M → Y being a one-way function, as these are syntactically
378
Phillip Rogaway and Thomas Shrimpton
different objects.) The second definition, everywhere preimage-resistance (ePre), most directly captures the intuition that it is infeasible to find the preimage of range points: for whatever range point is selected, it is computationally hard to find its preimage. The final definition, always preimage-resistance (aPre), strengthens the first definition in the way needed to say that a function like SHA1 is one-way: one regards SHA1 as one function from a family of hash functions (keyed, for example, by the initial chaining value) and we wish to say that for this particular function from the family it remains hard to find a preimage of a random point. Second-preimage resistance. It is likewise possible to formalize multiple definitions that might be understood as technical meaning for second-preimage resistance. In all cases a domain point M and a description of a hash function HK are known to the adversary, whose job it is to find an M different from M such that H(K, M ) = H(K, M ). Such an M and M are called partners. Definition 2 [Types of second-preimage resistance] Let H: K × M → Y be a hash-function family and let m be a number such that {0, 1}m ⊆ M. Let A be an adversary. Then define: Sec [m]
AdvH
eSec [m]
AdvH
$ $ $ (A) = Pr K ← K; M ← {0, 1}m ; M ← A(K, M ) :
(A) =
(M = M ) ∧ (HK (M ) = HK (M )) $ $ max m Pr K ← K; M ← A(K) :
M ∈{0,1}
(M = M ) ∧ (HK (M ) = HK (M ))
aSec [m]
AdvH
$ $ (A) = max Pr M ← {0, 1}m ; M ← A(M ) : K∈K
(M = M ) ∧ (HK (M ) = HK (M ))
The first definition, second-preimage resistance (Sec), is the standard one. The second definition, everywhere second-preimage resistance (eSec), most directly formalizes that it is hard to find a partner for any particular domain point. This notion is also called a universal one-way hash-function family (UOWHF) and it was first defined by Naor and Yung [12]. The final definition, always second-preimage resistance (aSec), strengthens the first in the way needed to say that a function like SHA1 is second-preimage resistant: one regards SHA1 as one function from a family of hash functions and we wish to say that for this particular function it is remains hard to find a partner for a random point. Collision resistance. Finally, we would like to speak of the difficulty with which an adversary is able to find two distinct points in the domain of a hash function that hash to the same range point. Definition 3 [Collision resistance] Let H: K × M → Y be a hash-function family and let A be an adversary. Then we define:
$ $ AdvColl H (A) = Pr K ← K; (M, M ) ← A(K) : (M = M ) ∧ (HK (M ) = HK (M ))
Cryptographic Hash-Function Basics
379
It does not make sense to think of strengthening this definition by maximizing over all K ∈ K: for any fixed function h: M → Y with |M| > |Y| there is is an efficient algorithm that outputs an M and M that collide under h. While this program might be hard to find in practice, there is no known sense in which this can be formalized.
4
Equivalent Formalizations with a Two-Stage Adversary
Four of our definitions (ePre, aPre, eSec, aSec) maximize over some quantity that one may imagine the adversary to know. In each of these cases it possible to modify the definition so as to have the adversary itself choose this value. That is, in a “first phase” of the adversary’s execution it chooses the quantity in question, and then a random choice is made by the environment, and then the adversary continues from where it left off, but now given this randomly chosen value. The corresponding definitions are then as follows: Definition 4 [Equivalent versions of ePre, aPre, eSec, aSec] Let H = K×M → Y be a hash-function family and let m be a number such that {0, 1}m ⊆ M. Let A be an adversary. Then define: $ $ $ AdvePre H (A) = Pr (Y, S) ← A(); K ← K; M ← A(K, S) : HK (M ) = Y $ $ $ aPre [m] (A) = Pr (K, S) ← A(); M ← {0, 1}m ; Y ← HK (M ); M ← A(Y, S) : AdvH HK (M ) = Y $ $ $ eSec [m] (A) = Pr (M, S) ← A(); K ← K; M ← A(K, S) : AdvH (M = M ) ∧ (HK (M ) = HK (M )) $ $ $ aSec [m] (A) = Pr (K, S) ← A(); M ← {0, 1}m ; M ← A(M, S) : AdvH (M = M ) ∧ (HK (M ) = HK (M )) eSec [m]
In the two-stage definition of AdvH (A) we insist that the message M output by A is of length m bits, that is M ∈ {0, 1}m. Each of these four definitions are extended to their resource-parameterized version in the usual way. The two-stage definitions above are easily seen to be equivalent to their one-stage counterparts. Saying here that definitions xxx and yyy are equivalent xxx [m] yyy [m] (t) ≤ AdvH (C(t + means that there is a constant C such that AdvH yyy [m] xxx [m] (t) ≤ AdvH (C(t + m + n)). Omit mention of +m m + n)) and AdvH and [m] in the definition for everywhere preimage resistance since this does not depend on m. Since the exact interpretation of time t was model-dependent anyway, two measures of adversarial advantage that are equivalent need not be distinguished. We give an example of the equivalence of one-stage and two-stage adversaries, explaining why eSec and eSec2 are equivalent, where eSec2 temporarily denotes the version of eSec defined in Definition 4 (and eSec refers to what is given in
380
Phillip Rogaway and Thomas Shrimpton
Definition 2). Let A attack hash function H in the eSec sense. For every fixed M there is a two-stage adversary A2 that does as well as A at finding a partner for M . Specifically, let A2 be an adversary with the value M “hardwired in” to it. Adversary A2 prints out M and when it resumes it behaves like A. Similarly, let A2 be a two-stage adversary attacking H in the eSec2 sense. Consider the random coins used by A2 during its first stage and choose specific coins that maximize the probability that A2 will subsequently succeed. For these coins there is a specific pair (M, S) that A2 returns. Let A be a (one-stage) adversary that on input (K, M ) runs exactly as A2 would on input (K, S).
5
Implications
Definitions of implications. In this section we investigate which of our notions of security (Pre, aPre, ePre, Sec, aSec, eSec, and Coll) imply which others. First we explain our notion of an implication. Definition 5 [Implications] Fix K, M, m, and n where {0, 1}m ⊆ M. Sup· · pose that xxx and yyy are labels for which Advxxx and Advyyy have been H H n defined for any H: K × M → {0, 1} . – Conventional implication. We say that xxx implies yyy, written xxx → yyy, · · if Advyyy (t) ≤ c Advxxx (t ) for all hash functions H: K × M → {0, 1}n H H where c is an absolute constant and t = t + c TimeH,m . – Provisional implication. We say that xxx implies yyy to , written · · (t) ≤ c Advxxx (t ) + for all hash functions xxx → yyy to , if Advyyy H H n H: K×M → {0, 1} where c is an absolute constant and t = t+c TimeH,m . In the definition above, and later, the · is a placeholder which is either [m] (for Pre, aPre, Sec, aSec, eSec) or empty (for ePre, Coll). Conventional implications are what one expects: xxx → yyy means that if a hash function is secure in the xxx-sense, then it is secure in the yyy-sense. Whether or not a provisional implication carries the usual semantics of the word implication depends on the value of . Below we will demonstrate provisional implications with a value of = 2n−m and so the interpretation of such a result is that we have demonstrated a “real” implication for hash functions that are substantially compressing (e.g., if the hash function maps 256 bits to 128 bits) while we have given a non-result if the hash function is length-preserving, lengthincreasing, or it compresses just a little. Conventional implications. The conventional implications among our notions are straightforward, so we quickly dispense with those, omitting the proofs. In particular, the following are easily verified. Proposition 1. [Conventional implications] Fix K, M, m, such that {0, 1}m ⊆ M, and n > 0. Let Coll, Pre, aPre, ePre, Sec, aSec, eSec be the corresponding security notions. Then:
Cryptographic Hash-Function Basics
(1) (2) (3) (4) (5) (6)
Coll → Sec Coll → eSec aSec → Sec eSec → Sec aPre → Pre ePre → Pre
381
In addition to the above, of course xxx → xxx for each notion xxx that we have given. Provisional implications. We now give five provisional implications. The value of implicit in these claims depends on the relative difference of the domain length m and the hash length n. Intuitively, one can follow paths through the graph in Figure 1, composing implications to produce the five provisional implications. The formal proof of these five results appears in the full version [14]. Theorem 1. [Provisional implications] Fix K, M, m, such that {0, 1}m ⊆ M, and n > 0. Let Coll, Pre, aPre, Sec, aSec, eSec be the corresponding security notions. Then: (1) (2) (3) (4) (5)
6
Sec → Pre to 2n−m aSec → Pre to 2n−m eSec → Pre to 2n−m Coll → Pre to 2n−m aSec → aPre to 2n−m
Separations
Definitions. We now investigate separations among our seven security notions. We emphasize that asserting a separation—which we will also call a nonimplication—is not the assertion of a lack of an implication (though it does effectively imply this for any practical hash function). In fact, we will show that both a separation and an implication can exist between two notions, the relative strength of the separation/implication being determined by the amount of compression performed by the hash function. Intuitively, xxx nonimplies yyy if it is possible for something to be xxx-secure but not yyy-secure. We provide two variants of this idea. The first notion, a conventional nonimplication, says that if H is a hash function that is secure in the xxx-sense then H can be converted into a hash function H having the same domain and range that is still secure in the xxx-sense but that is now completely insecure in the yyy-sense. The second notion, an unconditional nonimplication, says that there is a hash function H that is secure in the xxx-sense but completely insecure in the yyy-sense. Thus the first kind of separation effectively assumes an xxx-secure hash function in
382
Phillip Rogaway and Thomas Shrimpton
order to separate xxx from yyy, while the second kind of separation does not need to do this.3 Definition 6 [Separations] Fix K, M, m, and n where {0, 1}m ⊆ M. Suppose · · that xxx and yyy be labels for which Advxxx and Advyyy have been defined H H for any H: K × M → {0, 1}n. – Conventional separation. We say that xxx nonimplies yyy to , in the conventional sense, written xxx → yyy to , if for any H: K × M → {0, 1}n · xxx · there exists an H : K×M → {0, 1}n such that Advxxx H (t) ≤ c AdvH (t )+ yyy · and yet AdvH (t ) = 1 where c is an absolute constant and t = t + c TimeH,m . – Unconditional separation. We say that xxx nonimplies yyy to , in the unconditional sense, written xxx yyy to , if there exists an H: K × · · (t) ≤ for all t and yet Advyyy M → {0, 1}n such that Advxxx H H (t ) = 1 where t = c TimeH,m for some absolute constant c. When = 0 above we say that we have a strong separation and we omit saying “to ” in speaking of it. When > 0 above we say that we have a provisional separation. The degree to which a provisional separation should be regarded as a “real” separation depends on the value . Some provisional separations. The following separations depend on the relative values of the domain size m and the range size n. As an example, if the hash-function family H is length-preserving, meaning H: K × {0, 1}n → {0, 1}n, then it being second preimage resistant won’t imply it being preimage resistant: just consider the identify function, which is perfectly second preimage resistant (no domain point has a partner) but trivially breakable in the sense of finding preimages. This counterexample is well-known. We now generalize and extend this counterexample, giving a “gap” of 1 − 2n−m−1 for three of our pairs of notions. Thus we have a strong separation when m = n and a rapidly weakening separation as m exceeds n by more and more. Taken together with Proposition 1 we see that this behavior is not an artifact of the proof: as m exceeds n, the 2n−m implication we have given effectively takes over. Proposition 2. [Separations, part 1a] Fix m ≥ n > 0 and let Sec, Pre, aSec, aPre be the corresponding security notions. Then: (1) Sec Pre to 1 − 2n−m−1 (2) aSec Pre to 1 − 2n−m−1 (3) aSec aPre to 1 − 2n−m−1 The proof is given in the full version of this paper [14]. Proposition 3. [Separations, part 1b] Fix m ≥ n > 0, and let Pre and eSec be the corresponding security notions. Then eSec Pre to 1 − 2n−m−1 . 3
That unconditional separations are (sometimes) possible in this domain is a consequence of the fact that, for some values of the domain and range, secure hash functions trivially exist (e.g., the identity function HK (M ) = M is collision-free).
Cryptographic Hash-Function Basics
383
The proof is given in the full version of this paper [14]. Additional Separations. We now give some further nonimplications. Unlike those just given, these nonimplications do not have a corresponding provisional implication. Here, the separation is the whole story of the relationship between the notions, and the strength of the separation is not dependent on the amount of compression performed by the hash function. Theorem 2. [Separations, part 2A] Fix m > n > 0 and let eSec and Coll be the corresponding security notions. Then eSec → Coll. The proof is in Appendix B. Because of the structure of the counterexample used in Theorem 2, we give the following proposition for completeness. Proposition 4. Fix n > 0 and m ≤ n, and let eSec and Coll be the correspond ing security notions. Then eSec Coll to 2−(m+1) . The proof is given in the full version of this paper [14]. Theorem 3. [Separations, part 2B] Fix m, n such that n > 0, and let Coll and ePre be the corresponding security notions. Then Coll → ePre. The proof is given in the full version of this paper [14]. Theorem 4. [Separations, part 2C] Fix m, n such that n > 0, and let eSec and ePre be the corresponding security notions. Then eSec → ePre. The proof is given in the full version of this paper [14]. The remaining 28 separations are not as hard to show those given so far, so we present them as one theorem and without proof. The specific constructions H1, H2, H3, H4 are those given in Fig. 3. Theorem 5. [Separations, part 3] Fix m, n such that n > 0, and let Coll, Pre, aPre, ePre, Sec, aSec, eSec be the corresponding security notions. Let H: K × {0, 1}m → {0, 1}n be a hash function and define H1, . . . , H6 from it according to Fig. 3. Then: m ePre (1) Pre → ePre to 2−m : AdvPre + AdvPre H1 (t) ≤ 1/2 H (t) and AdvH1 (t ) = 1 Pre aPre (2) Pre → aPre to 1/|K| : AdvPre H2 (t) ≤ 1/|K| + AdvH (t) and AdvH2 (t ) = 1 Pre Sec (3) Pre → Sec : AdvPre H3 (t) ≤ 2 · AdvH (t) and AdvH3 (t ) = 1 Pre eSec (4) Pre → eSec : AdvPre H3 (t) ≤ 2 · AdvH (t) and AdvH3 (t ) = 1 Pre aSec (5) Pre → aSec : AdvPre H3 (t) ≤ 2 · AdvH (t) and AdvH3 (t ) = 1 Pre Coll (6) Pre → Coll : AdvPre H3 (t) ≤ 2 · AdvH (t) and AdvH3 (t ) = 1 ePre (7) ePre → aPre to 1/|K| : AdvePre H2 (t) ≤ 1/|K| + AdvH (t) aPre [m] (t ) = 1 and AdvH2 Sec [m]
ePre (8) ePre → Sec : AdvePre H3 (t) ≤ 2 · AdvH (t) and AdvH3
(t ) = 1
eSec [m]
ePre (9) ePre → eSec : AdvePre H3 (t) ≤ 2 · AdvH (t) and AdvH3
(t ) = 1
aSec [m]
ePre (10) ePre → aSec : AdvePre H3 (t) ≤ 2 · AdvH (t) and AdvH3
(t ) = 1
384
Phillip Rogaway and Thomas Shrimpton ePre Coll (11) ePre → Coll : AdvePre H3 (t) ≤ 2 · AdvH (t) and AdvH3 (t ) = 1 aPre [m]
aPre [m]
(t) ≤ 1/2m + AdvH (12) aPre → ePre to 2−m : AdvH1 ePre and AdvH1 (t ) = 1
(t)
aPre [m]
aPre [m] Sec [m] (t) ≤ 2 · AdvH (t) and AdvH3 (t ) = 1 aPre [m] aPre [m] eSec [m] (t) ≤ 2 · AdvH (t) and AdvH3 (t ) = 1 aPre → eSec : AdvH3 aPre [m] aPre [m] aSec [m] (t) ≤ 2 · AdvH (t) and AdvH3 (t ) = 1 aPre → aSec : AdvH3 aPre [m] aPre [m] Coll aPre → Coll : AdvH3 (t) ≤ 2 · AdvH (t) and AdvH3 (t ) = 1 Sec [m] Sec [m] −m m : AdvH1 (t) ≤ 1/2 + AdvH (t) Sec → ePre to 2 ePre and AdvH1 (t ) = 1 Sec [m] Sec [m] (t) Sec → aPre to 1/|K| : AdvH2 (t) ≤ 1/|K| + AdvH aPre [m] (t ) = 1 and AdvH2 Sec [m] Sec [m] (t) Sec → eSec to 2−m+1 : AdvH4 (t) ≤ 1/2m−1 + AdvH eSec [m] (t ) = 1 and AdvH4 Sec [m] Sec [m] (t) Sec → aSec to 2−m : AdvH2 (t) ≤ 1/|K| + AdvH aSec [m] (t ) = 1 and AdvH2 Sec [m] Sec [m] (t) Sec → Coll to 2−m+1 : AdvH4 (t) ≤ 1/2m−1 + AdvH (t ) = 1 and AdvColl H4 eSec [m] eSec [m] (t) ≤ 1/|K| + AdvH (t) eSec → aPre to 1/|K| : AdvH2 aPre [m] (t ) = 1 and AdvH2 eSec [m] eSec [m] (t) ≤ 1/|K| + AdvH (t) eSec → aSec to 1/|K| : AdvH2 aSec [m] (t ) = 1 and AdvH2 aSec [m] aSec [m] (t) ≤ 1/2m + AdvH (t) aSec → ePre to 2−m : AdvH1 ePre and AdvH1 (t ) = 1 aSec [m] aSec [m] (t) ≤ 1/2m−1 + AdvH (t) aSec → eSec to 2−m : AdvH4 eSec [m] (t ) = 1 and AdvH4 aSec [m] aSec [m] (t) ≤ 1/2m−1 + AdvH (t) aSec → Coll to 2−m+1 : AdvH4 (t ) = 1 and AdvColl H4 Coll Coll → aPre to 1/|K| : AdvColl H2 (t) ≤ 1/|K| + AdvH (t) aPre and AdvH2 (t ) = 1
(13) aPre → Sec : AdvH3 (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) (27)
Coll (28) Coll → aSec to 1/|K| : AdvColl H2 (t) ≤ 1/|K| + AdvH (t) aSec and AdvH2 (t ) = 1
where t = c TimeH,m for some absolute constant c.
Acknowledgements Thanks to Mihir Bellare, Daniel Brown, and to various anonymous reviewers who provided useful comments on an earlier draft of this paper. This work was supported by NSF 0085961, NSF 0208842, and a gift from Cisco Systems. Many thanks to the NSF and Cisco for their support. Work on this paper was carried out while the authors were at Chiang Mai University, Chulalongkorn University, and UC Davis.
Cryptographic Hash-Function Basics
385
References [1] R. Anderson. The classification of hash functions. In IMA Conference in Cryptography and Coding IV, pages 83–94, December 1993. 386 [2] M. Bellare, A. Desai, D. Pointcheval, and P. Rogaway. Relations among notions of security for public-key encryption schemes. In H. Krawczyk, editor, Advances in Cryptology – CRYPTO ’98, volume 1462 of Lecture Notes in Computer Science, pages 232–249. Springer-Verlag, 1998. 387 [3] M. Bellare and P. Rogaway. Collision-resistant hashing: Towards making UOWHFs practical. In Advances in Cryptology – CRYPTO 97, volume 1294 of Lecture Notes in Computer Science, pages 470–484, 1997. 386 [4] J. Black, P. Rogaway, and T. Shrimpton. Black-box analysis of the blockcipher-based hash-function constructions from PGV. In Advances in Cryptology – CRYPTO ’02, volume 2442 of Lecture Notes in Computer Science. SpringerVerlag, 2002. 386 [5] D. Brown and D. Johnson. Formal security proofs for a signature scheme with partial message recovery. Lecture Notes in Computer Science, 2020:126–144, 2001. 386 [6] I. Damg˚ ard. Collision free hash fucntions and public key signature schemes. In Advances in Cryptology – EUROCRYPT ’87, volume 304 of Lecture Notes in Computer Science. Springer-Verlag, 1988. 386 [7] I. Damg˚ ard. A design principle for hash functions. In G. Brassard, editor, Advances in Cryptology – CRYPTO ’89, volume 435 of Lecture Notes in Computer Science. Springer-Verlag, 1990. 386 [8] S. Goldwasser and S. Micali. Probabilistic encryption. Journal of Computer and System Sciences, 28:270–299, April 1984. 387 [9] A. Menezes, P. van Oorschot, and S. Vanstone. Handbook of Applied Cryptography. CRC Press, 1996. 372 [10] R. Merkle. One way hash functions and DES. In G. Brassard, editor, Advances in Cryptology – CRYPTO ’89, volume 435 of Lecture Notes in Computer Science. Springer-Verlag, 1990. 386 [11] I. Mironov. Hash functions: From Merkle-Damg˚ ard to Shoup. In Advances in Cryptology – EUROCRYPT ’01, Lecture Notes in Computer Science. SpringerVerlag, 2001. 386 [12] M. Naor and M. Yung. Universal one-way hash functions and their cryptographic applications. In Proceedings of the Twenty-first ACM Symposium on Theory of Computing, pages 33–43, 1989. 378, 386 [13] B. Preneel. Cryptographic hash functions. Katholieke Universiteit Leuven (Belgium), 1993. 386 [14] P. Rogaway and T. Shrimpton. Cryptographic hash-function basics: Definitions, implications and separations for preimage resistance, second-preimage resistance, and collision resistance. Full version of this paper,www.cs.ucdavis.edu/˜rogaway, 2004. 381, 382, 383 [15] D. Stinson. Some observations on the theory of cryptographic hash functions. Technical Report 2001/020, University of Waterloo, 2001. 373, 386 [16] Y. Zheng, T. Matsumoto, and H. Imai. Connections among several versions of oneway hash functions. In Special Issue on Cryptography and Information Security, Proceedings of IEICE of Japan, 1990. 373, 386, 387
386
A
Phillip Rogaway and Thomas Shrimpton
Brief History
It is beyond the scope of the current work to give a full survey of the many hashfunction security-notions in the literature, formal an informal, and the many relationships that have (and have not) been shown among them. We touch upon some of the more prominent work that we know. The term universal one-way hash function(UOWHF) was introduced by Naor and Yung [12] to name their asymptotic definition of second-preimage resistance. Along with Damg˚ ard [7, 6], who introduced the notion of collision freeness, these papers were the first to put notions of hash-function security on a solid formal footing by suggesting to study keyed family of hash functions. This was a necessary step for developing a meaningful formalization of collisionresistance. Contemporaneously, Merkle [10] describes notions of hash-function security: weak collision resistance and strong collision resistance, which refer to second-preimage and collision resistance, respectively. Damg˚ ard also notes that a compressing collision-free hash function has one-wayness properties (our pre notion), and points out some subtleties in this implication. Merkle and Damg˚ ard [10, 7] each show that if one properly iterates a collisionresistant function with a fixed domain, then one can construct a collisionresistant hash-function with an enlarged domain. This iterative method is now called the Merkle-Damg˚ ard construction. Preneel [13] describes one-way hash functions (those which are both preimage-resistant and second-preimage resistant) and collision-resistant hash functions (those which are preimage, second-preimage and collision resistant). He identifies four types of attacks and studies hash functions constructed from block ciphers. Bellare and Rogaway [3] give concrete-security definitions for hash-function security and study second-preimage resistance and collision resistance. Their target collision-resistance(TCR) coincides with a UOWHF (eSec) and their any collision-resistance(ACR) coincides with Coll-security. Brown and Johnson [5] define a strong hash that, if properly formalized in the concrete setting, would include our ePre notion. Mironov [11] investigates a class of asymptotic definitions that bridge between conventional collision resistance and UOWHF. He also looks at which members of that class are preserved by the Merkle-Damg˚ ard constructions. Anderson [1] discusses some unconventional notions of security for hash functions that might arise when one considers how hash functions might interact with higher-level protocols. Black, Rogaway, and Shrimpton [4] use a concrete definition of preimage resistance that requires inversion of a uniformly selected range point. Two papers set out on a program somewhat similar to ours [15] and [16]. Stinson [15] considers hash function security from the perspective that the notions of primary interest are those related to producing digital signatures. He considers four problems (zero-preimage, preimage, second-preimage, collision) and describes notions of security based on them. He considers in some depth the relationship between the preimage problem and the collision problem.
Cryptographic Hash-Function Basics
387
Zheng, Matsumoto and Imai [16] examine some asymptotic formalizations of the notions of second-preimage resistance and collision resistance. In particular, they suggest five classes of second-preimage resistant hash functions and three classes of collision resistant hash functions, and then consider the relationships among these classes. Our focus on provable security follows a line that begins with Goldwasser and Micali [8]. In defining several related notions of security and then working out all relations between them, we follow work like that of Bellare, Desai, Pointcheval, and Rogaway [2].
B
Proof of Theorem 2
Let H: K × {0, 1}m → {0, 1}n be a hash function family and let H5: K × {0, 1}m → {0, 1}n be the function defined in Fig. 3. We show that eSec [m]
AdvH5
eSec [m]
(t) ≤ 2 AdvH
(t ) and AdvColl H5 (t ) = 1
where t ≤ t + TimeH,m for some absolute constant . Let PrK denote probability taken over K ∈ K. Given H we define for every c ∈ {0, 1}m an n-bit string Yc and a real number δc as follows. Let Yc be the lexicographically first string that maximizes δc = PrK [HK (c) = Yc ]. Over all pairs c, c we select the lexicographically first pair c, c (when considered as the 2n-bit string c c ) such that c = c and Yc = Yc and δc is maximized (ie, PrK [HK (c) = HK (c )] is maximized). Now let H5 = H5c be defined according to Fig. 3. We begin by exhibiting an adversary T that gains AdvColl H5 (T ) = 1 and runs in time m for some absolute constant . On input K ∈ K, let T output M = 1m−n HK (c) and M = 0m−n HK (c). Now we show that if H is strong in the eSec-sense then so is H5. Let A eSec [m] be a two-stage adversary that gains advantage δm = AdvH5 (A) and runs in time t. Let second-preimage-finding adversaries B and C be constructed as follows: Algorithm B [Stage 1] On input (): Run (M, S) ← A() return (M, S) [Stage 2] On input (K, S): Run M ← A(K, S) if M = M and M = 1m−n HK (c) then return M else return 0m−n HK (c)
Algorithm C [Stage 1] On input (): return (c, ε) [Stage 2] On input (K, S) return c
The central claim of the proof is as follows: eSec [m]
Claim: AdvH5
eSec [m]
(A) ≤ AdvH
eSec [m]
(B) + AdvH
(C)
Let us prove this claim. Recall that the job of A is to find an M and an M such that M = M and H5(M ) = H5(M ). Referring to the line numbers in Fig. 3,
388
Phillip Rogaway and Thomas Shrimpton
we say that u-v is a collision if M caused H5 to output on line u ∈ {1, 2} and M = M caused H5 to output on line v ∈ {1, 2}, and H5(M ) = H5(M ). We analyze the three possible u-v collisions that A can create. (Note that 1-1 is not a collision, since then M = M .) [Case 2-2] Assume A wins by causing a 2-2 collision. In this case M = M and M = 1m−n HK (c) and M = 1m−n HK (c). Thus HK (M ) = HK (M ) and so B finds a collision under H. We have then that PrK [A wins by a 2-2 collision] ≤ eSec [m] (B). AdvH [Case 1-2] Assume that A wins by creating a 1-2 collision. Then M = M and M = 1m−n HK (c). We claim that in this case adversary C wins. To see this, $ $ note that Pr[M ← A(); K ← K : M = 1m−n HK (c)] = PrK [HK (c) = Y ] for n some fixed Y ∈ {0, 1} . By the way we chose c and c we have PrK [HK (c) = Y ] ≤ PrK [HK (c) = Yc ] = PrK [HK (c) = Yc ] = PrK [HK (c) = HK (c )]; $ $ hence Pr[M ← A(); K ← K : M = 1m−n HK (c)] ≤ PrK [HK (c) = HK (c )]. The $ $ conclusion is that PrK [A wins by a 1-2 collision] ≤ Pr[M ← A(); K ← K : M = eSec [m] m−n HK (c)] ≤ AdvH (C). 1 [Case 2-1] Assume that A wins by creating a 2-1 collision. Then M = M and M = 1m−n HK (c), and so HK (M ) = HK (0m−n HK (c)). We claim that in this case either adversary B wins, or C does. Let BAD be the event that M = 0m−n HK (c). If M = 0m−n HK (c) then clearly B eSec [m] (B). If M = wins, so PrK [A wins by a 2-1 collision ∧ BAD] ≤ AdvH m−n HK (c) then we have that PrK [A wins by a 2-1 collision ∧ BAD] ≤ 0 $ $ eSec [m] (C) by an argument Pr[M ← A(); K ← K : M = 0m−n HK (c)] ≤ AdvH nearly identical to that given for Case 1-2,.
Pulling together all of the cases yields the following: eSec [m]
AdvH5
(A) = Pr[A wins by a 2-2 collision] Pr[2-2 collision] K
K
+ Pr[A wins by a 1-2 collision] Pr[1-2 collision] K
K
+ Pr[A wins by a 2-1 collision ∧ BAD] Pr[2-1 collision ∧ BAD] K
K
+ Pr[A wins by a 2-1 collision ∧ BAD] Pr[2-1 collision ∧ BAD] K
≤
K
eSec [m] (B) Pr[2-2 AdvH K
eSec [m]
collision] + AdvH
eSec [m] (B) Pr[2-1 +AdvH K eSec [m]
+AdvH ≤
(C) Pr[1-2 collision] K
collision ∧ BAD]
(C) Pr[2-1 collision ∧ BAD]
eSec [m] (B) AdvH
K
eSec [m]
+ AdvH
(C)
where the last inequality is because of convexity. This completes the proof of the claim. Finally, since the running time of B is t + TimeH,m + m for some absolute constant , and this is greater than the running time of C, we are done.
The EAX Mode of Operation Mihir Bellare1 , Phillip Rogaway2,3 , and David Wagner4 1
Dept. of Computer Science & Engineering, University of California at San Diego 9500 Gilman Drive, La Jolla, California 92093, USA [email protected] www-cse.ucsd.edu/users/mihir 2 Department of Computer Science, University of California at Davis Davis, California 95616, USA [email protected] www.cs.ucdavis.edu/~rogaway/ 3 Department of Computer Science, Faculty of Science, Chiang Mai University Chiang Mai 50200, Thailand 4 Department of Electrical Engineering and Computer Science University of California at Berkeley Berkeley, California 94720, USA [email protected] www.cs.berkeley.edu/~daw/
Abstract. We propose a block-cipher mode of operation, EAX, for solving the problem of authenticated-encryption with associated-data (AEAD). Given a nonce N , a message M , and a header H, our mode protects the privacy of M and the authenticity of both M and H. Strings N , M , and H are arbitrary bit strings, and the mode uses 2|M |/n + |H|/n + |N |/n block-cipher calls when these strings are nonempty and n is the block length of the underlying block cipher. Among EAX’s characteristics are that it is on-line (the length of a message isn’t needed to begin processing it) and a fixed header can be preprocessed, effectively removing the per-message cost of binding it to the ciphertext. Keywords: Authenticated encryption, CCM, EAX, message authentication, CBC MAC, modes of operation, OMAC, provable security.
1
Introduction
An authenticated encryption (AE) scheme is a symmetric-key mechanism by which a message M is a transformed into a ciphertext CT with the goal that CT protect both the privacy and the authenticity of M . The last few years has seen the emergence of AE as a recognized cryptographic goal. With this has come the development of new authenticated-encryption schemes and the analysis of old ones. This paper offers up a new authenticated-encryption scheme, EAX, and provides a thorough analysis of it. To understand why we are defining a new AE scheme, we need to give some background. B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 389–407, 2004. c International Association for Cryptologic Research 2004
390
Mihir Bellare et al.
Flavors of authenticated encryption. It useful to distinguish two kinds of AE schemes. In a two-pass scheme we make two passes through the data, one aimed at providing privacy and the other, authenticity. One way of making a two-pass AE scheme is by generic composition, wherein one pass constitutes a (privacy-only) symmetric-encryption scheme, while the other pass is a message authentication code (MAC). The encryption scheme and the MAC each use their own key. Analyses of some generic composition methods can be found in [6, 20, 5]. In a one-pass AE scheme we make a single pass through the data, simultaneously doing what is needed to engender both privacy and authenticity. Typically, the computational cost is about half that of a two-pass scheme. Such schemes emerged only recently. They include IAPM, OCB, and XCBC [17, 25, 12]. Soon after the emergence of one-pass AE schemes it was realized that often not all the data should be privacy-protected. Changes were needed to the basic definitions and mechanisms in order to support the possibility that some information, like a packet header, must not be encrypted. Thus was born the notion of authenticated-encryption with associated-data (AEAD), first formally defined in [24]. The non-secret data is called the associated data or the header. Like an AE schemes, an AEAD scheme might make one pass or two. Standardizing a two-pass AEAD scheme. Traditionally, it has been the designers of applications and network protocols who were responsible for combining privacy and authenticity mechanisms in order to make a two-pass AEAD scheme. This has not worked well. It turns out that there are numerous ways to go wrong in trying to make a secure AEAD scheme, and many protocols, products, and standards have done just that. (For example, see [11] for a wrong one-pass scheme, see [5] for weaknesses in the AEAD mechanism of SSH, and [6, 20] for attacks on some methods of popular use.) Nowadays, some standards bodies (including NIST, IETF, and IEEE 802.11) would like to standardize on an AEAD scheme. Indeed IEEE 802.11 has already done so. This is a good direction. Standardized AEAD might help minimize errors in mis-combining cryptographic mechanisms. So far, standards bodies have been unwilling to standardize on any of the one-pass schemes due to pending patents covering them. There is, accordingly, an established desire for standardizing on a two-pass AEAD scheme. The twopass scheme should be as good as possible subject to the limitation of falling within the two-pass framework. Generic-composition would seem to be the obvious answer. But defining a generic-composition AEAD scheme is not an approach that has moved forward within any of the standards bodies. There would seem to be a number of reasons. One reason is a relatively minor inefficiency—the fact that generic composition methods must use two keys. Probably a bigger issue is that the architectural advantage of generic composition brings with it an “excessive” degree of choice—after deciding on a generic composition method, one still needs two lower-level specifications, namely a symmetric encryption scheme and a MAC, for each of which numerous block-cipher based choices exist. Standards bodies
The EAX Mode of Operation
391
want something self-contained, as well as being a patent-avoiding, block-cipher based, single-key mechanism. So far, there has been exactly one proposal for such a method (though see the “contemporaneous work” section below). It is called CCM [26], and is due to Whiting, Housley, and Ferguson [26]. CCM has enjoyed rapid success, and is now the required mechanism for IEEE 802.11 wireless LANs as well as 802.15.4 wireless personal area networks. NIST has indicated that it plans to put out a “Recommendation” based on CCM. Our contributions. It is our view that CCM has a good deal of pointless complexity and inefficiency. It is the first contribution of this paper to explain these limitations. It is the second and main contribution of this paper to provide a new AEAD scheme, EAX, that avoids these limitations. CCM limitations. A description of CCM, together with a detailed description of its shortcomings, can be found in the full version of this paper [8]. Some of the points we make and elaborate on there are the following. CCM is not on-line, meaning one needs to know the lengths of both the plaintext and the associated data before one can proceed with encryption. This may be inconvenient or inefficient. CCM does not allow pre-processing of static associated data. (If, for example, we have an unchanging header attached to every packet being authenticated, we would like that the cost of authenticating this header be paid only once, meaning header authentication should have no significant cost after a single pre-computation. CCM fails to have this property.) CCM’s parameterization is more complex than necessary, including, in addition to the block cipher and tag length, a message-length parameter. CCM’s nonce length is restricted in such a way that it may not provide adequate security when nonces are chosen randomly. Finally, CCM implementations could suffer performance hits because the algorithm can disrupt word alignment in the associated data. EAX and its attributes. EAX is a nonce-using AEAD scheme employing n n no tool beyond the block cipher E : Key × {0, 1} → {0, 1} on which it is based. We expect that E will often be instantiated by AES, but we make no restrictions in this direction. (In particular we do not require that n = 128.) Nothing is assumed about the nonces except that they are non-repeating. EAX provides both privacy, in the sense of indistinguishability from random bits, and authenticity, in the sense of an adversary’s inability to produce a new but valid nonce, header, ciphertext triple. EAX is simple, avoiding complicated lengthannotation. It is a conventional two-pass AEAD scheme, making a separate privacy pass and authenticity pass, using no known intellectual property. EAX is flexible in the functionality it provides. It supports arbitrary-length messages: the message space is {0, 1}∗ . The key space for EAX is the key space Key of the underlying block cipher. EAX supports arbitrary nonces, mean∗ ing the nonce space is {0, 1} . Any tag length τ ∈ [0 .. n] is possible, to allow each user to select how much security she wants from the authenticity guarantees. The only user-selectable parameters are the block cipher E and that tag length τ .
392
Mihir Bellare et al.
EAX has desirable performance attributes. Message expansion is minimal: the length of the ciphertext (which, following the conventions of [25], excludes the nonce) is only τ bits more than the length of the plaintext. Implementations can profitably pre-process static associated data. (If an unchanging header is attached to every packet, authenticating this header has no significant cost after a single pre-computation.) Key-setup is efficient: all block-cipher calls use the same underlying key, so that we do not incur the cost of key scheduling more than once. For both encryption and decryption, EAX uses only the forward direction of the block cipher, so that hardware implementations do not need to implement the decryption functionality of the block cipher. The scheme is on-line for both the plaintext M and the associated data H, which means that one can process streaming data on-the-fly, using constant memory, not knowing when the stream will stop. Provable security. We prove that EAX is secure assuming that the block cipher that it uses is a secure pseudorandom permutation (PRP). Security for EAX means indistinguishability from random bits and authenticity of ciphertexts. The combination implies other desirable goals, like nonmalleability and indistinguishability under a chosen-ciphertext attack. The proof of security for EAX is surprisingly complex. The key-collapse of EAX2 destroys a fundamental abstraction boundary. Our security proof relies on a result about the security of a tweakable extension of OMAC (Lemma 3) in which an adversary can obtain not only a tag for a message of its choice, but also an associated key-stream. Pragmatics. The main reason there is any interest in two-pass schemes, as we have already discussed, is that one-pass schemes would seem to be subject to patents. Motivated by this, standardization bodies have expressed the intent of standardizing on a conventional, two-pass scheme, even understanding the factor-of-two performance hit. The merit of this judgment is debatable, but the pragmatic reality is that there has emerged a desire for a conventional scheme, like EAX, that is as good as possible subject to the two-pass constraint. Lack of a scheme like EAX will simply lead to an inferior scheme being standardized, which is to the disadvantage of the user community. Accordingly, EAX addresses a real and practical design problem. We took up work on this design problem at the suggestion of the co-Chair of the IRTF (Internet Research Task Force), which supports the standardization efforts of the IETF. We believe that EAX has the potential for widespread adoption and use. Afterwards. One non-goal of EAX was to be parallelizable. Another recent two-pass design, CWC [19], is parallelizable. It pays for this advantage with a somewhat complex algorithm, based on Carter-Wegman hashing using polynomial evaluation over a prime field. More recent still is GCM [22], a parallelizable, two-pass design based on multiplication in the finite field with 2128 elements. Other recent AEAD mechanisms include Helix [10] and SOBER-128 [13]. These are stream ciphers that aim to provide authenticity. The provable-security
The EAX Mode of Operation
393
methodology does not apply to these objects since they are built directly rather than from lower level primitives.
2
Preliminaries
All strings in this paper are over the binary alphabet {0, 1}. For L a set of strings and n ≥ 0 a number, we let Ln and L∗ have their usual meanings. The concatenation of strings X and Y is denoted X Y or simply X Y . The string of ∗ length 0, called the empty string, is denoted ε. If X ∈ {0, 1} we let |X| denote its ∗ length, in bits. If X ∈ {0, 1} and ≤ |X| then the first bits of X are denoted 8 X [first bits]. The set Byte = {0, 1} contains all the strings of length 8, and a string X ∈ Byte∗ is called a byte string or an octet string. If X ∈ Byte∗ we let X8 = |X|/8 denote its length in bytes. For ≥ 1 a number, we write Byte< for all byte strings having fewer than bytes. If X ∈ Byte∗ and ≤ Xn n then the first bytes of X are denoted X [first bytes]. When X ∈ {0, 1} is a nonempty string and t ∈ N is a number we let X + t be the n-bit string that results from regarding X as a nonnegative number x (binary notation, mostsignificant-bit first), adding x to t, taking the result modulo 2n , and converting this number back into an n-bit string. If t ∈ [0..2n − 1] we let [t]n denote the encoding of t into an n-bit binary string (msb first, lsb last). If X and P are strings then we let X ⊕ → P (the xor-at-the-end operator) denote of the string length = max{|X|, |P |} bits that is obtained by prepending |X| − |P | zerobits to the shorter string and then xoring this with the other string. (In other words, xor the shorter string into the end of the longer string.) A block cipher is a function E : Key × {0, 1}n → {0, 1}n where Key is a finite, nonempty set n and n ≥ 1 is a number and EK (·) = E(K, ·) is a permutation on {0, 1} . The number n is called the block length. Throughout this note we fix such a block cipher E. In Figure 1 we define the algorithms CBC, CTR, pad, OMAC (no superscript), and OMAC • (with superscript). The algorithms CBC (the CBC MAC) and CTR (counter-mode encryption) are standard. Algorithm pad is used only to define OMAC. Algorithm OMAC [14] is a pseudorandom function (PRF) that is a one-key variant of the algorithm XCBC [9]. Algorithm OMAC • is like OMAC but takes an extra argument, the integer t. This algorithm is a “tweakable” PRF [21], tweaked in the most simple way possible. We explain the notation used in the definition of OMAC. The value of iL n (line 40: i an integer in {2, 4} and L ∈ {0, 1} ) is the n-bit string that is obtained by multiplying L by the n-bit string that represents the number i. The multiplication is done in the finite field GF(2n ) using a canonical polynomial to represent field points. The canonical polynomial we select is the lexicographically first polynomial among the irreducible polynomials of degree n that have a minimum number of nonzero coefficients. For n = 128 the indicated polyno<1 if the first bit of L is 0 and mial is 128 +7 +2 + +1. In that case, 2L = L< <1 means the left shift of L by 2L = (L< <1) ⊕ 0120 10000111 otherwise, where L< one position (the first bit vanishing and a zero entering into the last bit). The
394
Mihir Bellare et al.
Algorithm CBCK (M ) 10 11 12 13 14
Let M1 · · · Mm ← M where |Mi | = n C0 ← 0n for i ← 1 to m do Ci ← EK (Mi ⊕ Ci−1 ) return Cm
Algorithm pad (M ; B, P ) 30 31 32
Algorithm CTRN K (M ) 20 21 22 23
m ← |M |/n S ← EK (N) · · · EK (N+m−1) C ← M ⊕ S [first |M | bits] return C Algorithm OMACK (M )
40 L ← EK (0n ); B ← 2L; P ← 4L if |M | ∈ {n, 2n, 3n, . . .} 41 return CBCK (pad (M ; B, P )) then return M ⊕ → B, t else return (M 10n−1−(|M | mod n) ) ⊕ → P Algorithm OMACK (M ) return OMACK ([t]n M )
50
Fig. 1. Basic building blocks. The block cipher E : Key × {0, 1}n → {0, 1}n is fixed and K ∈ Key. For CBC, M ∈ ({0, 1}n )+ . For CTR, M ∈ {0, 1}∗ and N ∈ {0, 1}n . For → xors the shorter string into pad, M ∈ {0, 1}∗ and B, P ∈ {0, 1}n and the operation ⊕ the end of longer one. For OMAC, M ∈ {0, 1}∗ and t ∈ [0..2n −1] and the multiplication of a number by a string L is done in GF(2n )
value of 4L is simply 2(2L). We warn that to avoid side-channel attacks one must implement the doubling operation in a constant-time manner. We have made a small modification to the OMAC algorithm as it was originally presented, changing one of its two constants. Specifically, the constant 4 at line 40 was the constant 1/2 (the multiplicative inverse of 2) in the original definition of OMAC [14]. The OMAC authors indicate that they will promulgate this modification [15], which slightly simplifies implementations.
3
The EAX Algorithm n
n
Algorithm. Fix a block cipher E : Key×{0, 1} → {0, 1} and a tag length τ ∈ [0..n]. These parameters should be fixed at the beginning of a particular session that will use EAX mode. Typically, the parameters would be agreed to in an authenticated manner between the sender and the receiver, or they would be fixed for all time for some particular application. Given these parameters, EAX provides a nonce-based AEAD scheme EAX[E, τ ] whose encryption algorithm has signature Key×Nonce×Header×Plaintext → Ciphertext and whose decryption algorithm has signature Key×Nonce×Header×Ciphertext → Plaintext∪{Invalid} ∗ where Nonce, Header, Plaintext, and Ciphertext are all {0, 1} . The EAX algorithm is specified in Figure 2 and a picture illustrating EAX encryption is given in Figure 3. We now discuss various features of our algorithm and choices underlying the design.
The EAX Mode of Operation
395
H H Algorithm EAX.EncryptN (M ) Algorithm EAX.DecryptN (CT ) K K
10 11 12 13 14 15 16
0 N ← OMACK (N ) 1 (H) H ← OMACK C ← CTRN (M ) K 2 (C) C ← OMACK Tag ← N ⊕ C ⊕ H T ← Tag [first τ bits] return CT ← C T
20 21 22 23 24 25 26 27 28 29
if |CT | < τ then return Invalid Let C T ← CT where |T | = τ 0 (N ) N ← OMACK 1 (H) H ← OMACK 2 (C) C ← OMACK Tag ← N ⊕ C ⊕ H T ← Tag [first τ bits] if T = T then return Invalid M ← CTRN K (C) return M
Fig. 2. Encryption and decryption under EAX mode. The plaintext is M , the ciphertext is CT , the key is K, the nonce is N , and the header is H. The mode depends on a block cipher E (that CTR and OMAC implicitly use) and a tag length τ
No encodings. We have avoided any nontrivial encoding of multiple strings into a single one.1 Some other approaches that we considered required a PRF to be applied to what was logically a tuple, like (N, H, C). Doing this raises encoding issues we did not want to deal with because, ultimately, there would seem to be no simple, efficient, compelling, on-line way to encode multiple strings into a single one. Alternatively, one could avoid encodings and consider a new kind of primitive, a multi-argument PRF. But this would be a non-standard tool and we didn’t want to use any non-standard tools. All in all, it seemed best to find a way to sidestep the need to do encodings. Why not generic composition? Why have we specified a block-cipher based (BC-based) AEAD scheme instead of following the generic-composition approach of combining a (privacy-only) encryption method and a message authentication code? In fact, there are reasonable arguments in favor of generic composition, based on aesthetic or architectural sensibilities. One can argue that generic composition better separates conceptually independent elements (privacy and authenticity) and, correspondingly, allows greater implementation flexibility [6, 20]. Correctness becomes much simpler and clearer as well. All the same, BC-based AEAD modes have some important advantages of their own. They make it easier for implementors to use a scheme without knowing a lot of cryptography, presenting a simpler abstraction boundary. They make it easier to obtain interoperably. They reduce the risk that implementors will choose insecure parameters. They can save on key bits and key-setup time, as generic-composition methods invariably require a pair of separate keys. 1
t One could view the prefixing of [t]n to M in the definition of OMACK (M ) as an encoding, but [t]n is a constant, fixed-length string, and the aim here is just to “tweak” the PRF. This is very different from needing to encode arbitrary-length strings into a single string.
396
Mihir Bellare et al.
N
M
0 OMACK
N
H
1 OMACK
CTRK
C
2 OMACK
C
H T Fig. 3. Encryption under EAX. The message is M , the key is K, and the header is H. The ciphertext is CT = C T
EAX can be viewed as having been derived from a generic-composition scheme we call EAX2, described in Section 4. Specifically, one instantiates EAX2 using CTR mode (counter mode) and OMAC, and then collapses the two keys into one. If one favors generic composition, EAX2 is a nice algorithm for it. On-line. We say that an algorithm is on-line if it is able to process a stream of data as it arrives, with constant memory, not knowing in advance when the stream will end. Observe then that on-line methods should not require knowledge of the length of a message until the message is finished. A failure to be on-line has been regarded as a significant defect for an encryption scheme or a MAC. EAX is on-line. Now it is true that in many contexts where one would be encrypting a string one does know the length of the string in advance. For example, many protocols
The EAX Mode of Operation
Functionality Built from Parameters
Message space
CCM AE with AD Block cipher E with 128-bit blocksize
Key space Ciphertext expansion
64
from Byte2 −1 to Byte<2 −1 Parameterized, with a value of 15 − λ {0, 1}∗ bytes. From 56 bits to 104 bits One block-cipher key One block-cipher key τ bytes τ bits
Block-cipher calls 2
|M | 128
+
|H| 128
+ 2 + δ, for δ ∈ {0, 1}
|M | |H| Block-cipher calls 2 128 + 128 + 2 + δ, for δ ∈ {0, 1} with static header Key setup Block cipher subkeys IV requirements Parallelizable? On-line? Preprocessing (/msg) Memory rqmts Provable security? Patentencumbered?
EAX AE with AD Block cipher E with n-bit blocksize Block cipher E Tag length τ ∈ [0..n]
Block cipher E Tag length τ∈{4, 6, 8, 10, 12, 14, 16} Length of message length field λ∈[2..8] Parameterized: 7 choices: λ ∈ [2..8]. Each {0, 1}∗ possible message space a subset of Byte∗ , 16
Nonce space
397
2 2
|M | n |M | n
+
+
|H| n |N | n
+
|N | n
Non-repeating nonce No No Limited (key stream)
Block cipher subkeys 3 block-cipher calls Non-repeating nonce No Yes Limited (key stream, header)
Small constant Yes (if E is a good PRP) Bound of Θ(σ2 /2128 ) No
Small constant Yes (if E is a good PRP) Bound of Θ(σ2 /2n ) No
Fig. 4. A comparison of basic characteristics of CCM and EAX. The count on blockcipher calls for EAX ignores key-setup costs. We denote by τ the length of the EAX tag in bits, and by τ (boldface) the length of the CCM tag in bytes
will already have “packaged up” the string length at a lower level. In effect, such strings have been represented in the computing system as sequence of bytes and a count of those bytes. But there are also contexts where one does not know the length of a message in advance of getting an indication that it is over. For examples, a printable string is often represented in computer systems as a sequence of non-zero bytes followed by a terminal zero-byte. Certainly one should be able to efficiently encrypt a string which has been represented in this way. Ability to process static AD. In many scenarios the associated data H will be static over the course of a communications session. For example, the associated data may include information such as the IP address of the sender, the receiver, and fixed cryptographic parameters associated to this session. In H such a case one would like that the amount of time to compute EncryptN K (M ) NH and DecryptK (C) should be independent of |H|, disregarding the work done in
398
Mihir Bellare et al.
H Algorithm EAX2.EncryptN K1,K2 (M )
10 11 12 13 14 15 16
0 N ← FK1 (N ) 1 (H) H ← FK1 N C ← EK2 (M ) 2 (C) C ← FK1 Tag ← N ⊕ C ⊕ H T ← Tag [first τ bits] return CT ← C T
H Algorithm EAX2.DecryptN K1,K2 (CT )
20 21 22 23 24 25 26 27 28 29
if |CT | < τ then return Invalid Let C T ← CT where |T | = τ 0 (N ) N ← FK1 1 (H) H ← FK1 2 (C) C ← FK1 Tag ← N ⊕ C ⊕ H T ← Tag [first τ bits] if T = T then return Invalid N M ← DK2 (C) return M
Fig. 5. Encryption and decryption under EAX2. The mode is built from a PRF F : Key1 × {0, 1}∗ → {0, 1}n and an IV-based encryption scheme Π = (E , D) having key space Key2 and message space {0, 1}∗ . The plaintext is M and the key i i we mean the function where FK (M ) = is (K1, K2) and the header is H. By FK FK ([i]n M )
a preprocessing step. The significance of this goal was already explained in [24]. EAX achieves this goal. Additional features. Invalid messages can be rejected at half the cost of decryption. This is one of the benefits of following what is basically an encryptthen-authenticate approach as opposed to an authenticate-then-encrypt approach. To obtain a MAC as efficient as the PRF underlying EAX define MACK (H)= n Encrypt0K H (ε). Comparison with CCM. Figure 4 compares CCM and EAX along a few relevant dimensions. A description of CCM and an extended comparison can be found in the full version of this paper [8].
4
EAX2 Algorithm
To understand the the proof of security of EAX and the approach taken for its design, we introduce EAX2, a generic composition method. EAX is EAX2 for the particular case of CTR encryption and OMAC authentication, but then collapsed to a single key. ∗
n
EAX2 composition. Let F : Key1 × {0, 1} → {0, 1} be a PRF, where n ≥ 2. Let Π = (E, D) be an IV-based encryption scheme having key space Key2 n n ∗ ∗ and IV space {0, 1} . This means that E : Key2 × {0, 1} × {0, 1} → {0, 1} n ∗ ∗ and D : Key2 × {0, 1} × {0, 1} → {0, 1} and Key2 is a set of keys and for N (M ) then every K ∈ Key2 and N ∈ {0, 1}n and M ∈ {0, 1}∗ , if C = EK N DK (C) = M . Let τ ≤ n be a number. Now given F and Π and τ we define
The EAX Mode of Operation
399
an AEAD scheme EAX2[Π, F, τ ] = (EAX2.Encrypt, EAX2.Decrypt) as follows. Set FKt (M ) = FK ([t]n M ). Set Key = Key1 × Key2. Then the encryption al∗ ∗ ∗ gorithm EAX2.Encrypt : Key × {0, 1} × {0, 1} → {0, 1} and the decryption ∗ ∗ ∗ algorithm EAX2.Decrypt : Key × {0, 1} × {0, 1} → {0, 1} ∪ {Invalid} are defined in Figure 5. Scheme EAX2[Π, F, τ ] is provably secure under natural assumptions about Π and F . See Section 6. EAX1 composition. Let EAX1 be the single-key variant of EAX2 where one insists that Key = Key1 = Key2 and where one keys F , E, and D with a single key K ∈ Key. One associates to F and Π the scheme EAX1[Π, F, τ ] that is defined as with EAX2 but where the one key K keys everything. Notice that EAX[E, τ ] = EAX1[CTR[E], OMAC[E], τ ]. This is a useful way to look at EAX.
5
Definitions
AEAD schemes. A set of keys is a nonempty set having a distribution (the uniform distribution when the set is finite). A (nonce-based) authenticatedencryption with associated-data (AEAD) scheme is a pair of algorithms Π = (E, D) where E is a deterministic encryption algorithm E : Key × Nonce × Header × Plaintext → Ciphertext and a D is a deterministic decryption algorithm D : Key × Nonce × Header × Ciphertext → Plaintext ∪ {Invalid}. The key space Key is a set of keys while the nonce space Nonce and the header space Header (also called the space of associated data) are nonempty sets of strings. We write H NH EN K (M ) for E(K, N, H, M ) and DK (CT ) for D(K, N, H, CT ). We require NH NH that DK (EK (M )) = M for all K ∈ Key and N ∈ Nonce and H ∈ Header and M ∈ Plaintext. In this note we assume, for notational simplicity, that Nonce, ∗ H Header, Plaintext, and Ciphertext are all {0, 1} and that |EN K (M )| = |M |. An adversary is a program with access to one or more oracles. Nonce-respecting. Suppose A is an adversary with access to an encryp·· H (·). This oracle, on input (N, H, M ), returns EN tion oracle EK K (M ). Let (N1 , H1 , M1 ), . . . , (Nq , Hq , Mq ) denote its oracle queries. The adversary is said to be nonce-respecting if N1 , . . . , Nq are always distinct, regardless of oracle responses and regardless of A’s internal coins. Privacy of AEAD schemes. We consider adversaries with access to an en·· (·). We assume that any privacy-attacking adversary is noncecryption oracle EK respecting. The advantage of such an adversary A in violating the privacy of AEAD scheme Π = (E, D) having key space Key is · · $ $ EK (·) $ · · (·) Advpriv (A) = Pr K ← Key : A = 1 − Pr K ← Key : A = 1 Π where $ · · (·) denotes the oracle that on input (N, H, M ) returns a random string of length |M |. Authenticity of AEAD schemes. This time we provide the adversary with ·· (·) as above and also a verification oracle two oracles, an encryption oracle EK
400
Mihir Bellare et al.
· · (·). The latter oracle takes input (N, H, CT ) and returns 1 if DN H (CT ) ∈ D K K H Plaintext and returns 0 if DN K (CT ) = Invalid. The adversary is assumed to satisfy three conditions, and these must hold regardless of the responses to its oracle queries and regardless of A’s internal coins: • Adversary A must be nonce-respecting. (The condition is understood to apply only to the adversary’s encryption oracle. Thus a nonce used in an encryption-oracle query may be used in a verification-oracle query.) • Adversary A may never make a verification-oracle query (N, H, CT ) such that the encryption oracle previously returned CT in response to a query (N, H, M ). • Adversary A must call its verification-oracle exactly once, and may not subsequently call its encryption oracle. (That is, it makes a sequence of encryption-oracle queries, then a verification-oracle query, and then halts.) We say that such an adversary forges if its verification oracle returns 1 in response to the single query made to it. The advantage of such an adversary A in violating the authenticity of AEAD scheme Π = (E, D) having key space Key is · · ·· $ (A) = Pr K ← Key : AEK (·), DK (·) forges . Advauth Π IV-based encryption. An IV-based encryption scheme (an IVE scheme) is a pair of algorithms Π = (E, D) where E : Key × IV × Plaintext → Ciphertext is a deterministic encryption algorithm and D : Key × IV × Ciphertext → Plaintext ∪ {Invalid} is a deterministic decryption algorithm. The key space Key is a set of keys and the plaintext space Plaintext and ciphertext space Ciphertext and R (M ) for E(K, R, M ) IV space IV are all nonempty sets of strings. We write EK R R R (C) for D(K, R, C). We require that DK (EK (M )) = M for all K ∈ and DK Key and R ∈ IV and M ∈ Plaintext. We assume, as before, that Plaintext = ∗ n R Ciphertext = {0, 1} and that |EK (M )| = |M |. We also assume that IV = {0, 1} for some n ≥ 1 called the IV length. Privacy of IVE schemes with random IVs. Let Π = (E, D) be an IVE n scheme with key space Key and IV space IV = {0, 1} . Let E $ be the probabilistic algorithm defined from E that, on input K and M , chooses an IV R at n R (M ), and then returns C along with the random from {0, 1} , computes C ← EK chosen IV: $ Algorithm EK (M ) //The probabilistic encryption scheme built from IVE scheme E $ n R R ← {0, 1} ; C ← EK (M ) ; return R C
Then we define the advantage of an adversary A in violating the privacy of Π (as an encryption scheme using random IV) by $ $ $ EK (·) $(·) (A) = Pr K ← Key : A = 1 − Pr K ← Key : A = 1 Advpriv Π
The EAX Mode of Operation
401
where $(·) denotes the oracle that on input M returns a random string of length n + |M |. This is just the ind$-privacy of the randomized symmetric encryption scheme associated to Π. We comment that we have used a superscript of “priv” for an IVE scheme and “priv” (bold font) for an AEAD scheme. Pseudorandom functions. A family of functions, or a pseudorandom function n (PRF), is a map F : Key × D → {0, 1} where Key is a set of keys and D is a nonempty set of strings. We call n the output length of F . We write FK for the $ $ function F (K, ·) and we write f ← F to mean K ← Key ; f ← FK . We denote ∗ n by R∗n the set of all functions with domain {0, 1} and range {0, 1} ; by Rnn n n the set of all functions with domain {0, 1} and range {0, 1} ; and by RIn the n set of all functions with domain I and range {0, 1} . We identify a function with its key, making Rnn , R∗n and RIn pseudorandom functions. The advantage of adversary A in violating the pseudorandomness of the family of functions ∗ n F : Key × {0, 1} → {0, 1} is $ $ FK (·) ∗ ρ(·) (A) = Pr K ← Key : A = 1 − Pr ρ ← R : A = 1 Advprf n F n
n
A family of functions E : Key × D → {0, 1} is a block cipher if D = {0, 1} and each EK is a permutation. We let Pn denote all the permutations on {0, 1}n and define $ $ EK (·) π(·) Advprp (A) = Pr K ← Key : A = 1 − Pr π ← P : A = 1 n E Resources. If xxx is an advantage notion for which Advxxx Π (A) has been defined xxx (R) for the maximal value of Adv (A) over all adversaries A we write Advxxx Π Π that use resources at most R. When counting the resource usage of an adversary, one maximizes over all possible oracle responses, including those that could not be returned by any experiment we have specified for adversarial advantage. Resources of interest are: t—the running time; q—the total number of oracle queries; qe —the number of oracle queries to the adversary’s first oracle; qv — the number of oracle queries to the adversary’s second oracle; and σ—the data complexity. The running time t of an algorithm is its actual running time (relative to some fixed RAM model of computation) plus its description size (relative to some standard encoding of algorithms). The data complexity σ is defined as the sum of the lengths of all strings encoded in the adversary’s oracle queries, plus the total number of all of these strings.2 In this paper the length of strings is measured in n-bit blocks, for some understood value n. The number of blocks in a string M is defined as M n = max{1, |M |/n }, so that the empty string counts as one block. As an example, an adversary that asks queries (N1 , H1 , M1 ), (N2 , H2 , M2 ) to its first oracle and query (N, H, M ) to its second oracle has data complexity N1 n + H1 n + M1 n + N2 n + H2 n + M2n + N n + Hn + M n + 9. The name of a resource measure (t, t , q, etc.) will be enough to make clear what resource it refers to. 2
There is a certain amount of arbitrariness in this convention, but it is reasonable and simplifies subsequent accounting.
402
Mihir Bellare et al.
When we use big-O notation it is understood that the constant hidden inside (x)) for O(f (x) lg(f (x)). When F the notation may depend on n. We write O(f is a function we write TimeF (σ)) for the maximal amount of time to compute the function F over inputs of total length σ. When Π = (E, D) is an AEAD scheme or an IVE scheme with key space Key we write TimeE (σ) for the time $ to compute a random element K ← Key plus the maximal amount of time to compute the function EK on arguments of total length σ.
6
Security Results
We first obtain results about the security of EAX2 and then prove a result about the security of a tweakable-OMAC extension. These results are applied to derive results about the security of EAX. The notation and security measures referred to below are defined in Section 5. Security of EAX2. We begin by considering the EAX2[Π, F, τ ] scheme with F n n being equal to Rnn , the set of all functions with domain {0, 1} and range {0, 1} . In other words, we are considering the case where FK1 is a random function with n n domain {0, 1} and range {0, 1} . First we show that EAX2[Π, Rnn , τ ] inherits the privacy of the underlying IVE scheme Π. The proof of the following is in the full version of this paper [8]. Lemma 1. [Privacy of EAX2 with a random PRF] Let Π be an IVE n scheme with IV space {0, 1} and let τ ∈ [0..n]. Then priv Advpriv EAX2[Π,Rn ,τ ] (t, q, σ) ≤ AdvΠ (t , q, σ) n
where t = t + O(σ).
2
We now turn to authenticity. The following shows that EAX2[Π, Rnn , τ ] provides authenticity under the assumption that the underlying IVE scheme Π provides privacy. The proof is in the full version of this paper [8]. Lemma 2. [Authenticity of EAX2 with a random PRF] Let Π be an IVE n scheme with IV space {0, 1} and let τ ∈ [0..n]. Then −τ (t, q, σ) ≤ Advpriv Advauth EAX2[Π,Rn Π (t , q, σ) + 2 n ,τ ]
where t = t + O(σ).
2
Our definition of authenticity allows the adversary only one query to its verification oracle, meaning only one forgery attempt. A standard argument says that the advantage of an adversary making qv verification queries can grow by a factor of at most qv . As per the above this means it is at most qv · [2−τ + Advpriv Π (t , q, σ)]. We believe that in fact the bound is better than this, namely that it is qv 2−τ + Advpriv Π (t , q, σ). However, we do not have a proof of this stronger bound.
The EAX Mode of Operation
403
The above allows us to obtain results about the security of the general EAX2[Π, F, τ ] scheme based on assumptions about the security of the component schemes. The proof of the following is in the full version of this paper [8]. ∗
n
Theorem 1. [Security of EAX2] Let F : Key1×{0, 1} → {0, 1} be a family n of functions, let Π = (E, D) be an IVE scheme with IV space {0, 1} and let τ ∈ [0..n]. Then priv prf −τ Advauth (1) EAX2[Π,F,τ ] (t, q, σ) ≤ AdvΠ (t2 , q, σ) + AdvF (t1 , 3q + 3, σ) + 2 priv prf Advpriv EAX2[Π,F,τ ] (t, q, σ) ≤ AdvΠ (t2 , q, σ) + AdvF (t3 , 3q, σ)
(2)
+nq) and t3 = t+TimeE (σ)+ and t2 = t+ O(σ where t1 = t+TimeE (σ)+ O(σ) O(σ). 2 We remark that although “birthday” terms of the form σ 2 /2n or q 2 /2n do not appear explicitly in the bounds above, they may appear when we bound the prf Advpriv Π (·, ·, ·) and AdvF (·, ·, ·) in terms of their arguments. Security of a Tweakable-OMAC Extension. This section develops the core result underlying why key-reuse “works” across OMAC and CTR modes. To do this, we consider the following extension of the tweakable-OMAC con∗ struction. Fix n ≥ 1 and let t ∈ {0, 1, 2} and ρ ∈ Rnn and M ∈ {0, 1} and s ∈ N. Then define Algorithm OMACρ (t, M, s) 10 11 12
R ← OMACtρ (M ) for j ← 0 to s − 1 do Sj ← ρ(R + j) return R S0 S1 · · · Ss−1
Thus an OMACρ oracle, when asked (t, M, s), returns not only R = OMACtρ (M ) but also a key stream S0 S1 . . . Ss formed using CTR-mode and start-index R. We emphasize that the key stream is formed using the same function ρ (that is, the same key) that underlies the OMAC computation. Note too that we have limited the tweak t to a small set, {0, 1, 2}. We imagine providing an adversary A with one of two kinds of oracles. The first is an oracle OMACρ (·, ·, ·) for a randomly chosen ρ ∈ Rnn . The second is an oracle $n (·, ·, ·) that, on input (t, M, s), returns n(s + 1) random bits. Either way, we assume that the adversary is length-committing: if the adversary asks a query (t, M, s) it does not ask any subsequent query (t, M, s ). As the adversary runs, it asks some sequence of queries (t1 , M1 , s1 ), . . . , (tq , Mq , sq ). The resources of interest to us are the sum of the block lengths of the messages being Mi n , and the total number σ2 = si of key-stream blocks MACed, σ1 = that the adversary requests. We claim that a reasonable adversary will have little advantage in telling apart the two oracles, and we bound its distinguishing probability in terms of the resources σ1 and σ2 that it expends. Recall that
404
Mihir Bellare et al.
for oracles X and Y and an adversary A we measure A’s ability to distinguish X Y between oracles X and Y by the number Advdist X,Y (A) = Pr[A = 1] − Pr[A = 1]. The proof of the following is in the full version of this paper [8]. Lemma 3. [Pseudorandomness of OMAC] Fix n ≥ 2. Then, for lengthcommitting adversaries, (σ1 , σ2 ) ≤ Advdist OMAC[Rn n ],$n
(σ1 + σ2 + 3)2 2n
2
Security of EAX. We are now ready to consider the security of EAX. The proof of the following is in the full version of this paper [8]. Theorem 2. [Security of EAX] Let n ≥ 2 and τ ∈ [0..n]. Then Advpriv EAX[Rn ,τ ] (σ) ≤
9 σ2 2n
Advauth (σ) ≤ EAX[Rn n ,τ ]
10.5 σ 2 1 + τ n 2 2
n
2
Finally, we may, in the customary way, pass to the corresponding complexitytheoretic result where we start with an arbitrary block cipher E. n
n
Corollary 1. [Security of EAX] Let n ≥ 2 and E : Key × {0, 1} × {0, 1} be a block cipher and let τ ∈ [0..n]. Then Advpriv EAX[E,τ ] (t, σ) ≤
9.5 σ 2 + Advprp E (t , σ) 2n
Advauth EAX[E,τ ] (t, σ) ≤
11 σ 2 1 + τ + Advprp E (t , σ) 2n 2
where t = t + O(σ).
2
We omit the proof, which is completely standard.
References [1] M. Bellare, A. Desai, E. Jokipii, and P. Rogaway. A concrete security treatment of symmetric encryption: Analysis of the DES modes of operation. Proceedings of the 38th Symposium on Foundations of Computer Science, IEEE, 1997. [2] M. Bellare, R. Gu´erin, and P. Rogaway. XOR MACs: New methods for message authentication using finite pseudorandom functions. Advances in Cryptology – CRYPTO ’95, Lecture Notes in Computer Science, vol. 963, D. Coppersmith ed., Springer-Verlag, 1995. [3] M. Bellare, O. Goldreich, and H. Krawczyk. Stateless evaluation of pseudorandom functions: Security beyond the birthday barrier. Advances in Cryptology – CRYPTO ’96, Lecture Notes in Computer Science, vol. 1109, N. Koblitz ed., Springer-Verlag, 1996.
The EAX Mode of Operation
405
[4] M. Bellare, J. Kilian, and P. Rogaway. The security of the cipher block chaining message authentication code. Journal of Computer and System Sciences (JCSS), vol. 61, no. 3, pp. 362–399, Dec 2000. [5] M. Bellare, T. Kohno, and C. Namprempre. Authenticated encryption in SSH: provably fixing the SSH binary packet protocol. Proceedings of the 9th Annual Conference on Computer and Communications Security, ACM, 2002. 390 [6] M. Bellare and C. Namprempre. Authenticated encryption: Relations among notions and analysis of the generic composition paradigm. Advances in Cryptology – ASIACRYPT ’00, Lecture Notes in Computer Science, vol. 1976, T. Okamoto ed., Springer-Verlag, 2000. 390, 395 [7] M. Bellare and P. Rogaway. Encode-then-encipher encryption: How to exploit nonces or redundancy in plaintexts for efficient encryption. Advances in Cryptology – ASIACRYPT ’00, Lecture Notes in Computer Science, vol. 1976, T. Okamoto ed., Springer-Verlag, 2000. [8] M. Bellare, P. Rogaway and D. Wagner. The EAX mode of operation (A twoPass authenticated-encryption scheme optimized for simplicity and efficiency). Full version of this paper, available via http://www.cs.ucdavis.edu/∼rogaway 391, 398, 402, 403, 404 [9] J. Black and P. Rogaway. CBC MACs for arbitrary-length messages: The threekey constructions. Advances in Cryptology – CRYPTO ’00, Lecture Notes in Computer Science, vol. 1880, M. Bellare ed., Springer-Verlag, 2000. 393 [10] N. Ferguson, D. Whiting, B. Schneier, J. Kelsey, S. Lucks, and T. Kohno. Helix: Fast encryption and authentication in a single cryptographic primitive. Fast Software Encryption (FSE 2003), Lecture Notes in Computer Science, vol. 2887, Springer-Verlag, pp. 330–346, 2003. 392 [11] V. Gligor and P. Donescu. Integrity-aware PCBC encryption. Security Protocols, 7th International Workshop. Lecture Notes in Computer Science, vol. 1796, Springer-Verlag, pp. 153–171, 1999. 390 [12] V. Gligor and P. Donescu. Fast encryption and authentication: XCBC encryption and XECB authentication modes. Presented at the 2nd NIST Workshop on AES Modes of Operation, Santa Barbara, CA, August 24, 2001. 390 [13] P. Hawkes and G. Rose. Primitive specification for SOBER-128. Cryptology ePrint Archive Report 2003/48. April 2003. 392 [14] T. Iwata and K. Kurosawa. OMAC: One-key CBC MAC. Fast Software Encryption (FSE 2003), Lecture Notes in Computer Science, vol. 2887, Springer-Verlag, pp. 129–153, 2003. 393, 394 [15] T. Iwata and K. Kurosawa. Personal communications, January 2002. 394 [16] J. Jonsson. On the security of CTR + CBC-MAC. Proceedings of Selected Areas of Cryptography (SAC), 2002. [17] C. Jutla. Encryption modes with almost free message integrity. Advances in Cryptology – EUROCRYPT ’01, Lecture Notes in Computer Science, vol. 2045 , B. Pfitzmann ed., Springer-Verlag, 2001. 390 [18] J. Katz and M. Yung. Unforgeable encryption and adaptively secure modes of operation. Fast Software Encryption ’00, Lecture Notes in Computer Science, vol. 1978, B. Schneier ed., Springer-Verlag, 2000. [19] T. Kohno, J. Viega, and D. Whiting. A high-performance conventional authenticated encryption mode. These proceedings. 392 [20] H. Krawczyk. The order of encryption and authentication for protecting communications (or: how Secure is SSL?). Advances in Cryptology – CRYPTO ’01, Lecture Notes in Computer Science, vol. 2139, J. Kilian ed., Springer-Verlag, 2001. 390, 395
406
Mihir Bellare et al.
[21] M. Liskov, R. Rivest, and D. Wagner. Advances in Cryptology – CRYPTO ’02, Lecture Notes in Computer Science, vol. 2442, pp. 31–46, Springer-Verlag, 2002. 393 [22] D. McGrew and J. Viega. Flexible and efficient message authentication in hardware and software. Manuscript, 2003. Available from http://www.zork.org/ 392 [23] E. Petrank and C. Rackoff. CBC MAC for real-time data sources. Journal of Cryptology, vol. 13, no. 3 pp. 315–338, 2000. [24] P. Rogaway. Authenticated-encryption with associated-data. Proceedings of the 9th Annual Conference on Computer and Communications Security (CCS-9), pp. 98–107, ACM, 2002. 390, 398 [25] P. Rogaway, M. Bellare, and J. Black. OCB: A block-cipher mode of operation for efficient authenticated encryption. ACM Transactions on Information and System Security (TISSEC), vol. 6, no. 3, pp. 365–403, Aug. 2003. 390, 392 [26] D. Whiting, R. Housley, and N. Ferguson. Counter with CBC-MAC (CCM). June 2002. Available at http://csrc.nist.gov/encryption/modes/proposedmodes/ 391, 406
A
Definition of CCM
Since CCM [26] was a major motivation for our work, we recall its definition, writing it in a new form. First some notation. Write string constants in hexadecimal, as in 0xFFFE. When X ∈ {0, 1} is a nonempty string and i ∈ N is a number we let X + i be the -bit string that results from regarding X as a nonnegative number x (binary notation, msb first), adding x to i, taking the result modulo 2n , and converting this number back into an -bit string. Now CCM depends on three parameters: • E — the block cipher — where E : Key × {0, 1}128 → {0, 1}128 • τ — the tag length — where τ ∈ {4, 6, 8, 10, 12, 14, 16} • λ — the length-of-the-message-length-field — where λ ∈ {2, 3, 4, 5, 6, 7, 8} 128
128
Once parameters (E, τ , λ) have been fixed, where E : Key × {0, 1} → {0, 1} is a block cipher, CCM is the AE scheme specified in Figure 6. The nonce space 64 is Nonce = Byte15−λ and the header space is Header = Byte<2 and the 8λ message space is Plaintext = Byte<2 . There is a tradeoff between the length of nonces, η = |N | = 15 − λ bytes, and the longest permitted message, 256λ − 1 bytes.
The EAX Mode of Operation
H Algorithm CCM.EncryptN (M) K
100 101 102 103 104 105 106 107 108 109 110 111 112 113
B ← 0 if H = ε then 0 else 1 endif [τ /2 − 1]3 [λ − 1]3 N [Mn ]8λ if H = ε then ε elseif Hn < 62580 then [Hn ]16 elseif Hn < 232 then 0xFFFE [Hn ]32 else 0xFFFF [Hn ]64 endif H if H = ε then ε elseif Hn < 62580 then [0]n (14−Hn ) mod 16 elseif Hn < 232 then [0]n (10−Hn ) mod 16 else [0]n (6−Hn ) mod 16 endif M [0]n (−M n ) mod 16 U ← CBCK (B) A0 ← [λ − 1]8 N [0]n 15−λ A V C ← CTRK0 (U M) where |V | = 128 T ← V [first τ bytes] return CT ← C T
H (CT ) Algorithm CCM.DecryptN K
200 201 202
if CT n < τ then return Invalid Partition CT into C T where T n = τ if Cn > 2λ − 1 then return Invalid
210 211
A0 ← [λ − 1]8 N A +1 M ← CTRK0 (C)
220 221 222 223 224 225 226 227 228 230 231 232 233 234
B ← 0 if H = ε then 0 else 1 endif [τ /2 − 1]3 [λ − 1]3 N [Mn ]8λ if H = ε then ε elseif Hn < 62580 then [Hn ]16 elseif Hn < 232 then 0xFFFE [Hn ]32 else 0xFFFF [Hn ]64 endif H if H = ε then ε elseif Hn < 62580 then [0]n (14−Hn ) mod 16 elseif Hn < 232 then [0]n (10−Hn ) mod 16 else [0]n (6−Hn ) mod 16 endif M [0]n (−M n ) mod 16 U ← CBCK (B) V ← EK (A0 ) ⊕ U T ← V [first τ bytes] if T = T then return Invalid return M
Fig. 6.
[0]n 15−λ
Encryption and decryption under CCM[E, τ , λ]
407
CWC: A High-Performance Conventional Authenticated Encryption Mode Tadayoshi Kohno1 , John Viega2 , and Doug Whiting3 1
3
UC San Diego [email protected] 2 Virginia Tech [email protected] Hifn, Inc. [email protected]
Abstract. We introduce CWC, a new block cipher mode of operation for protecting both the privacy and the authenticity of encapsulated data. CWC is the first such mode having all five of the following properties: provable security, parallelizability, high performance in hardware, high performance in software, and no intellectual property concerns. We believe that having all five of these properties makes CWC a powerful tool for use in many performance-critical cryptographic applications. CWC is also the first appropriate solution for some applications; e.g., standardization bodies like the IETF and NIST prefer patent-free modes, and CWC is the first such mode capable of processing data at 10Gbps in hardware, which will be important for future IPsec (and other) network devices. As part of our design, we also introduce a new parallelizable universal hash function optimized for performance in both hardware and software.
1
Introduction
An authenticated encryption associated data (AEAD) scheme is a symmetric encryption scheme designed to protect both the privacy and the authenticity of encapsulated data. There has recently been a strong push toward producing block cipher-based AEAD schemes [13, 10, 12, 24, 28, 23, 5]. Despite this push, among the previous works there does not exist any AEAD scheme simultaneously having all of the following properties: provable security, parallelizability, high performance in hardware, high performance in software, and free from intellectual property concerns. Even though not all applications will require all five of the these properties, almost all applications will require at least one of the them, and may very likely have to be able to interoperate with an application requiring a different property. We thus view finding an appropriate scheme having all five of these properties as a very important research goal. Finding an appropriate balance between all five of the aforementioned properties is, however, not easy because the most natural approaches to addressing some of the properties are actually disadvantageous with respect to other properties. We believe we have overcome these challenges and, in doing so, introduce a new mode of operation called CWC, or Carter-Wegman Counter mode. B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 408–426, 2004. c International Association for Cryptologic Research 2004
CWC: A High-Performance Conventional Authenticated Encryption Mode
409
Motivating example. One of the primary motivations for such a block cipherbased AEAD scheme is IPsec. From a pragmatic perspective, we note that many vendors and standardization bodies prefer patent-free modes over patented modes (the elegant OCB mode was apparently rejected from the IEEE 802.11 working group because of patent concerns). And, from a hardware performance perspective, we note that because none of the existing patent-free AEAD schemes are parallelizable, it to impossible to make existing patent-free AEAD schemes run faster than about 2Gbps using conventional ASIC technology and a single processing unit. Nevertheless, future network devices will be expected to run at 10Gbps. CWC addresses these issues, being both patent-free and capable of processing data at 10Gbps using conventional ASIC technology. The CWC solution. Our new mode of operation, called CWC, has all five of the properties mentioned above. It is provably secure. Moreover, our provable security-based analyses helped guide our research and helped us reject other schemes with similar performance properties but with slightly worse provable security bounds. CWC is also parallelizable, which means that we can make CWC-AES run at 10Gbps using conventional ASIC technology. CWC is also fast in software. Our current implementation of CWC-AES runs at about the same speed as the other patent-free modes on 32-bit architectures (Table 1), and we anticipate significant performance gains on 32-bit CPUs when using more sophisticated implementation techniques (Section 6), and we also see significantly better performance on 64-bit architectures. Of course, we do remark that the patented modes like OCB are capable of running even faster in software, which would make them very attractive were they not also encumbered in intellectual property issues. Like the other two unpatented block cipher-based AEAD modes, CCM [28] and EAX [5], CWC avoids patents by using two inter-related but mostly independent modules: one module to “encrypt” the data and one module to “authenticate” the data. Adopting the terminology used in [5], it is because of the twomodule structure that we call CWC a “conventional” block cipher-based AEAD scheme. Although CWC uses two modules, it can be implemented efficiently in a single pass. By using the conventional approach, CCM, EAX, and CWC are very much like composition-based AEAD scheme [4, 15], or AEAD schemes composed from existing encryption schemes and MACs. Unlike composition-based AEAD schemes, however, by designing CWC directly from a block cipher, we eliminate redundant steps and fine-tune CWC for efficiency, again keeping in mind both our hardware and software goals. For example, we use only one block cipher key, which saves expensive memory access in hardware. The encryption core of CWC is essentially counter (CTR) mode encryption, which is well-known to be efficient and parallelizable. Finding an appropriate algorithm for the authentication core of CWC proved to be more of a challenge. For authentication, we decided to base our design on the Carter-Wegman [27] universal hash function approach for message authentication. Part of the difficulty in the design came down to choosing the right type of universal hash function,
410
Tadayoshi Kohno et al.
Table 1. Software performance (in clocks per byte) for the three patent-free block cipher-based AEAD modes on a Pentium III. Values are averaged over 50 000 samples Linux/gcc-3.2.2 Payload lengths (bytes) 128 256 512 2048 8192 Mode CWC-AES 105.5 88.4 78.9 72.2 70.5 CCM-AES 97.9 87.1 82.0 78.0 77.1 EAX-AES 114.1 94.9 86.1 79.1 77.5
Windows 2000/VS6.0 Payload lengths (bytes) 128 256 512 2048 8192 84.7 70.2 62.2 56.5 55.0 64.8 56.7 52.5 49.5 48.7 75.2 61.8 55.3 50.4 49.1
with the right parameters. Since polynomial evaluation can be parallelized (if the polynomial is in x, one can split it into i polynomials in xi ), we chose to use a universal hash function consisting of evaluating a polynomial modulo the prime 2127 −1. We note the our hash function is similar to Bernstein’s hash127 [6] except that Bernstein’s hash function was optimized for software performance at the expense of hardware performance. To address this issue, we use larger coefficients than Bernstein uses. We believe our hardware- and software-optimized universal hash function to be of independent interest. Notation. As part of our research, we first created a general approach for combining CTR mode encryption with a universal hash function in order to provide authenticated encryption. We shall refer to this general approach as CWC (note no change in font), and shall use CWC-BC to refer to a CWC instantiation with a 128-bit block cipher BC as the underlying block cipher and with the universal hash function described briefly above. We shall use CWC as shorthand for CWC-BC and use CWC-AES to mean CWC-BC with AES [8] as the underlying block cipher. Other instantiations of the general CWC approach are possible, e.g., for legacy 64-bit block ciphers. Since we are primarily targeting new applications, and since a mode using a 128-bit block cipher will never be asked to interoperate with a mode using a 64-bit block cipher, we focus this paper only on our 128-bit CWC instantiation. When we say that an AEAD scheme’s encryption algorithm takes a pair (A, M ) as input and produces a ciphertext as output, we mean that the AEAD scheme is designed to protect the privacy of M and the authenticity of both A and M . This will be made more formal in the body. Performance. Let (A, M ) be some input to the CWC encryption algorithm. The CWC encryption algorithm derives a universal hash subkey from the block cipher key. Assuming that the universal hash subkey is maintained across invocations, encrypting (A, M ) takes |M |/128+2 block cipher invocations. The polynomial used in CWC’s universal hashing step will have degree d = |A|/96 + |M |/96. There are several ways to evaluate this polynomial (details in Section 6). As noted above, we could evaluate it in parallel. Serially, assuming no precomputation, we could evaluate this polynomial using d 127x127-bit multiplies. As
CWC: A High-Performance Conventional Authenticated Encryption Mode
411
another example, assuming n precomputed powers of the hash subkey, which are cheap to maintain in software for reasonable n, we could evaluate the polynomial using d − m 96x127-bit multiplies and m 127x127-bit multiplies, where m = (d + 1)/n − 1. In hardware using conventional ASIC technology at 0.13 micron, it takes approximately 300 Kgates to reach 10 Gbps throughput for CWC-AES. This is around twice as much as OCB, but avoids IP negotiation overhead and royalty payments to three parties. Table 1 relates the software performance, on a Pentium III, of CWC-AES to the two other patent-free AEAD modes CCM and EAX; the patented modes such as OCB are not included in this table, but are about twice as fast as the times given for the patent-free modes. The implementations used to compute Table 1 were written in C by Brian Gladman [9] and all use 128-bit AES keys; the current CWC-AES implementation does not use the above-mentioned precomputation approach for evaluating the polynomial. Table 1 shows that the current implementations of the three modes have comparable performance in software, the relative “best” depending on the OS/compiler and the length of the message. Using the above-mentioned precomputation approach and switching to assembly, we anticipate reducing the cost of CWC’s universal hashing step to around 8 cpb, thereby significantly improving the performance of CWC-AES in software compared to CCM-AES and EAXAES (since the authentication portions of CCM-AES and EAX-AES are limited by the speed of AES but the authentication portion of CWC-AES is limited by the speed of the universal hash function). For comparison, Bernstein’s related hash127, which also evaluates a polynomial modulo 2127 − 1 but whose specific structure makes it less attractive in hardware, runs around 4 cpb on a Pentium III when written in assembly and using the precomputation approach. On 64-bit G5s, our initial implementation of the hash function runs at around 6 cpb, thus showing that CWC-AES is very attractive on 64-bit architectures (when running the G5 in 32-bit mode, our implementation runs at around 15 cpb). We do not claim that CWC-AES will be particularly efficient on low-end CPUs such as 8-bit smartcards. However, our goal was not to develop an AEAD scheme for such low-end processors. The patent issue. The patent issue is a very peculiar one. While it may initially sound odd to let patents influence research, we note that it is also not uncommon, especially in other sciences. Indeed, we view this line of research as discovering the most appropriate solution given real-world constraints. And, just like performance constraints, intellectual property constraints are very real. Background and related work. The notion of an authenticated encryption (AE) scheme was formalized by Katz and Yung [13] and by Bellare and Namprempre [4] and the notion of an authenticated encryption with associated data (AEAD) scheme was formalized by Rogaway [23]. Bellare and Namprempre [4] and Krawczyk [15] explored ways to combine standard encryption schemes with MACs to achieve authenticated encryption. A number of dedicated AE and
412
Tadayoshi Kohno et al.
AEAD schemes also exist, including RPC [13], XECB [10], IAPM [12], OCB [24], CCM [28], and EAX [5]. CWC is similar to the combination of McGrew’s UST [20] and TMMH [19], where one of the main advantages of CWC over UST+TMMH is CWC’s small key size, which, as the author of UST and TMMH noted, can be a bottleneck for UST+TMMH in hardware at high speeds. The integrity portion of CWC builds on top of the Carter-Wegman universal hashing approach to message authentication [27]. The specific hash function CWC uses is similar to Bernstein’s hash127 [6], but is better suited for hardware. Shoup [26] and Nevelsteen and Preneel [22] also worked on software optimizations for universal hash functions. Rogaway and Wagner released a critique of CCM [25]. For each issue raised in [25], we find that we have addressed the issue (e.g., we designed CWC to be on-line) or we disagree with the issue (e.g., we feel that it is sufficient for new modes of operation to handle arbitrary octet-length, as opposed to arbitrary bitlength, messages; we stress, however, that, if desired, it is easy to modify CWC to handle arbitrary bit-length messages, see Section 5). CWC recently served as the starting point for GCM [21], another promising new conventional authenticated encryption mode.
2
Preliminaries
Notation. If x is a string then |x| denotes its length in bits. Let ε denote the empty string. If x and y are two equal-length strings, then x ⊕ y denotes the xor of x and y. If x and y are strings, then xy denotes their concatenation. If N is a non-negative integer and l is an integer such that 0 ≤ N < 2l , then tostr(N, l) denotes the encoding of N as an l-bit string in big-endian format. If x is a string, then toint(x) denotes the integer corresponding to string x in big-endian format (the most significant bit is not interpreted as a sign bit). For example, toint(10000010) = 27 + 2 = 130. If b is a bit and n a non-negative integer, then bn denote b concatenated with itself n times; e.g., 107 is the string $ 10000000. Let x ← y denote the assignment of y to x. If X is a set, let x ← X denote the process of uniformly selecting at random an element from X and $ assigning it to x. If f is a randomized algorithm, let x ← f (y) denote the process of running f with input y and a uniformly selected random tape. When we refer to the time of an algorithm or experiment, we include the size of the code (in some fixed encoding). There is also an implicit big-O surrounding all such time references. Authenticated encryption schemes with associated data. We use Rogaway’s notion of an authenticated encryption with associated data (AEAD) scheme or mode [23]. An AEAD scheme SE = (Ke , E, D) consists of three algorithms and is defined over some key space KeySpSE , some nonce space NonceSpSE = {0, 1}n, n a positive integer, some associated data (header) space AdSpSE ⊆ {0, 1}∗, and some payload message space MsgSpSE ⊆ {0, 1}∗. We require that membership in MsgSpSE and AdSpSE can be efficiently tested and
CWC: A High-Performance Conventional Authenticated Encryption Mode
413
that if M, M are two strings such that M ∈ MsgSpSE and |M | = |M |, then M ∈ MsgSpSE . The randomized key generation algorithm Ke returns a key K ∈ KeySpSE ; we $ denote this process as K ← Ke . The deterministic encryption algorithm E takes as input a key K ∈ KeySpSE , a nonce N ∈ NonceSpSE , a header (or associated data) A ∈ AdSpSE , and a payload message M ∈ MsgSpSE , and returns a cipherN,A text C ∈ {0, 1}∗; we denote this process as C ← EK (M ) or C ← EK (N, A, M ). The deterministic decryption algorithm D takes as input a key K ∈ KeySpSE , a nonce N ∈ NonceSpSE , a header A ∈ AdSpSE , and a string C ∈ {0, 1}∗ and outputs a message M ∈ MsgSpSE or the special symbol INVALID on error; we N,A N,A N,A denote this process as M ← DK (C). We require that DK (EK (M )) = M for all K ∈ KeySpSE , N ∈ NonceSpSE , A ∈ AdSpSE , and M ∈ MsgSpSE . Let l(·) denote the length function of SE; i.e., for all keys K, nonces N , headers A, and N,A messages M , |EK (M )| = l(|M |). Under the correct usage of an AEAD scheme, after a random key is selected, the application should never invoke the encryption algorithm twice with the same nonce value until a new key is randomly selected. In order to ensure that a nonce does not repeat, implementations typically use nonces that contain counters. We use the notion of a nonce, rather than simply a counter, because the notion of a nonce is more general and allows the developer the freedom to structure the nonce as he or she desires. Block ciphers. A block cipher E : {0, 1}k × {0, 1}L → {0, 1}L is a function from k-bit keys and L-bit blocks to L-bit blocks. We use EK (·), K ∈ {0, 1}k , $ $ to denote the function E(K, ·) and we use f ← E as short hand for K ← k {0, 1} ; f ← EK . Block ciphers are families of permutations; namely, for each key K ∈ {0, 1}k , EK is a permutation on {0, 1}L. We call k the key length of E and we call L the block length. We adopt the notion of security for block ciphers introduced in [17] and adopted for the concrete setting in [2]. Let E : {0, 1}k × {0, 1}L → {0, 1}L be a block cipher and let Perm(L) denote the set of all permutations on {0, 1}L. Let A be an adversary with access to an oracle and that returns a bit. Then $ $ f (·) = 1 − Pr g ← Perm(L) : Ag(·) = 1 Advprp F (A) = Pr f ← E : A denotes the prp-advantage of A in distinguishing a random instance of E from a random permutation. Intuitively, we say that E is a secure prp, or a secure block cipher, if the prp-advantages of all adversaries using reasonable resources is small. Modern block ciphers, such as AES [8], are believed to be secure prps.
3
The CWC Mode of Operation
We now describe our new AEAD scheme. Let BC : {0, 1}kl × {0, 1}128 → {0, 1}128 be a 128-bit block cipher. Let tl ≤ 128 is the desired tag length in bits. Then the CWC mode of operation using BC with tag length tl, CWC-BC-tl =
414
Tadayoshi Kohno et al.
(K, CWC-ENC, CWC-DEC), is defined as follows. The message spaces are: MsgSpCWC-BC-tl = { x ∈ ({0, 1}8)∗ : |x| ≤ MaxMsgLen } AdSpCWC-BC-tl = { x ∈ ({0, 1}8)∗ : |x| ≤ MaxAdLen } KeySpCWC-BC-tl = {0, 1}kl NonceSpCWC-BC-tl = {0, 1}88 where MaxMsgLen and MaxAdLen are both 128 · (232 − 1). That is, the payload and associated data spaces for CWC-BC-tl consist of all strings of octets that are at most 232 − 1 blocks long. The CWC-BC-tl key generation, encryption, and decryption algorithms are defined as follows: Algorithm CWC-DECK (N, A, C) If |C| < tl then return INVALID Parse C as στ where |τ | = tl If A ∈ AdSpCWC-BC-tl or σ ∈ MsgSpCWC-BC-tl then return INVALID Algorithm CWC-ENCK (N, A, M ) If τ = CWC-MACK (N, A, σ) then σ ← CWC-CTRK (N, M ) return INVALID τ ← CWC-MACK (N, A, σ) M ← CWC-CTRK (N, σ) Return στ Return M Algorithm K $ K ← {0, 1}kl Return K
The remaining algorithms (CWC-CTR, CWC-MAC, CWC-HASH) are defined below. The CWC-CTR algorithm handles generating the encryption and decryption keystreams, CWC-MAC handles the generation of an authentication tag, and uses CWC-HASH as the underlying universal hash function. Algorithm CWC-HASHK (A, σ) Algorithm CWC-CTRK (N, M ) Z ← last 127 bits of BCK (110126 ) α ← |M |/128 Kh ← toint(Z) For i = 1 to α do l ← min int such that 96 divides |A0l | si ← BCK (107 N tostr(i, 32)) l ← min int such that 96 divides |σ0l | x ← first |M | bits of s1 s2 · · · sα l l σ ←x⊕M X ← A0 σ0 ; β ← |X|/96 Return σ Break X into chunks X1 , X2 , . . . , Xβ For i = 1 to β do Algorithm CWC-MACK (N, A, σ) Yi ← toint(Xi ) R ← BCK (CWC-HASHK (A, σ)) lσ ← |σ|/8 ; lA ← |A|/8 τ ← BCK (107 N 032 ) ⊕ R Yβ+1 ← 264 · lA + lσ Return first tl bits of τ R ← Y1 Khβ + · · · + Yβ Kh + Yβ+1 mod2127 − 1 Return tostr(R, 128)
4
Theorem Statements
The CWC scheme is a provably secure AEAD scheme assuming that the underlying block cipher, e.g., AES, is a secure pseudorandom permutation. This is a quite reasonable assumption since most modern block ciphers, including AES,
CWC: A High-Performance Conventional Authenticated Encryption Mode
415
are believed to be pseudorandom. Furthermore, all provably-secure block cipher modes of operation that we are aware of make at least the same assumptions we make, and some modes, such as OCB [24], require the stronger, albeit still reasonable, assumption of super-pseudorandomness. The specific results for CWC appear as Theorem 1 and Theorem 2 below, and are proven in the full version of this paper [14]. In [14] we also present results for the general CWC construction, from which Theorems 1 and 2 follow. 4.1
Privacy
We first show that if BC is a secure block cipher, then CWC-BC-tl will preserve privacy under chosen-plaintext attacks. For our notion of privacy for AEAD schemes, we use the strong definition of indistinguishability from [23]. Let SE = (Ke , E, D) be an AEAD scheme with length function l(·). Let $(·, ·, ·) be an oracle that, on input (N, A, M ) ∈ NonceSpSE × AdSpSE × MsgSpSE , returns a random string of length l(|M |). Let B be an adversary with access to an oracle and that returns a bit. Then EK (·,·,·) = 1 − Pr B $(·,·,·) = 1 Advpriv SE (B) = Pr K ← Ke : B $
is the ind$-cpa-advantage of B in breaking the privacy of SE under chosenplaintext attacks; i.e., Advpriv SE (B) is the advantage of B in distinguishing between ciphertexts from EK (·, ·, ·) and random strings. An adversary B is noncerespecting if it never queries its oracle with the same nonce twice. Intuitively, a scheme SE preserves privacy under chosen plaintext attacks if the ind$-cpaadvantage of all nonce-respecting adversaries using reasonable resources is small. Theorem 1. [Privacy of CWC.] Let CWC-BC-tl be as in Section 3. Then given a nonce-respecting ind$-cpa adversary A against CWC-BC-tl one can construct a prp adversary CA against BC such that if A makes at most q oracle queries totaling at most µ bits of payload message data, then (µ/128 + 3q + 1)2 . (1) 2129 Furthermore, the experiment for CA takes the same time as the experiment for A and CA makes at most µ/128 + 3q + 1 oracle queries. prp Advpriv CWC-BC-tl (A) ≤ AdvBC (CA ) +
Let us elaborate on why Theorem 1 implies that CWC-BC will preserve privacy under chosen-plaintext attacks. Assume BC is a secure block cipher. This means that Advprp BC (C) must be small for all adversaries C using reasonable resources and, in particular, this means that, for CA as described in the theorem statement, Advprp BC (CA ) must be small assuming that A uses reasonable resources. And if Advprp BC (CA ) is small and µ, q are small, then, because of the above equapriv tions, AdvCWC-BC-tl (A) must also be small as well. I.e., any adversary A using reasonable resources will only be able to break the privacy of CWC-BC-tl with some small probability. As a concrete example, let us consider limiting the number of applications of CWC-BC-tl between rekeyings to some reasonable value such as q = 232 , and let
416
Tadayoshi Kohno et al.
us limit the total number of payload bits between rekeyings to µ = 250 . Then Equation 1 becomes 1 prp Advpriv CWC-BC-tl (A) ≤ AdvBC (CA ) + 42 2 which means that, assuming that the underlying block cipher is a secure prp, an attacker will not be able to break the privacy of CWC-BC-tl with advantage much greater than 2−42 . 4.2
Integrity/Authenticity
We now present our results showing that if BC is a secure block cipher, then CWC-BC-tl will protect the authenticity of encapsulated data. We use the strong notion of authenticity for AEAD schemes from [23]. Let SE = (Ke , E, D) be an AEAD scheme. Let F be a forging adversary and consider an experiment $ in which we first pick a random key K ← Ke and then run F with oracle access to EK (·, ·, ·). We say that F forges if F returns a pair (N, A, C) such that N,A DK (C) = INVALID but F did not make a query (N, A, M ) to EK (·, ·, ·) that resulted in a response C. Then $ EK (·,·,·) forges Advauth SE (F ) = Pr K ← Ke : F is the auth-advantage of F in breaking the integrity/authenticity of SE. Intuitively, the scheme SE preserves integrity/authenticity if the auth-advantage of all nonce-respecting adversaries using reasonable resources is small. Theorem 2. [Integrity/authenticity of CWC.] Let CWC-BC-tl be as specified in Section 3. (Recall that BC is a 128-bit block cipher and that the tag length tl is ≤ 128.) Consider a nonce-respecting auth adversary A against CWC-BC-tl. Assume the execution environment allows A to query its oracle with associated data that are at most n ≤ MaxAdLen bits long and with messages that are at most m ≤ MaxMsgLen bits long. Assume A makes at most q − 1 oracle queries and the total length of all the payload data (both in these q − 1 oracle queries and the forgery attempt) is at most µ. Then given A we can construct a prp adversary CA against BC such that (µ/128 + 3q + 1)2 n + m 1 1 + 133 + 125 + tl . (2) 2129 2 2 2 Furthermore, the experiment for CA takes the same time as the experiment for A and CA makes at most µ/128 + 3q + 1 oracle queries. prp Advauth CWC-BC-tl (A) ≤ AdvBC (CA )+
Let us elaborate on why Theorem 2 implies that CWC-BC will preserve authenticity. Assume BC is a secure block cipher. This means that Advprp BC (C) must be small for all adversaries C using reasonable resources and, in particular, this means that, for CA as described in the theorem statement, Advprp BC (CA ) must be (C small assuming that A uses reasonable resources. And if Advprp A ) is small and BC µ, q, m and n are small, then, because of the above equations, Advauth CWC-BC-tl (A) must also be small as well. I.e., any adversary A using reasonable resources will only be able to break the authenticity of CWC-BC-tl with some small probability.
CWC: A High-Performance Conventional Authenticated Encryption Mode
417
Let us consider some concrete examples. Let n = MaxAdLen and m = MaxMsgLen, which is the maximum possible allowed by the CWC-BC construction. Then Equation 2 becomes (µ/128 + 3q + 1)2 1 1 + 93 + tl . 129 2 2 2 as before, and if we take tl ≥ 43, then the above
prp Advauth CWC-BC-tl (A) ≤ AdvBC (CA ) +
If we set q = 232 and µ = 250 equation becomes
1 241 which means that, assuming that the underlying block cipher is a secure prp, an attacker will not be able to break the unforgeability of CWC-BC-tl with probability much greater than 2−41 . prp Advauth CWC-BC-tl (A) ≤ AdvBC (CA ) +
Remark 1. [Chosen-ciphertext privacy.] Since CWC-BC-tl preserves privacy under chosen-plaintext attacks (Theorem 1) and provides integrity (Theorem 2) assuming that BC is a secure pseudorandom permutation, it also provides privacy under chosen-ciphertext attacks under the same assumption about BC. See [4, 23] for a discussion of the relationship between chosen-plaintext privacy, integrity, and chosen-ciphertext privacy; this relationship was also used, for example, by the designers of OCB [24].
5
Design Decisions
Finding an appropriate balance between provable security, hardware efficiency, and software efficiency, while simultaneously avoiding existing intellectual property issues, proved to be one the the biggest challenges of this research project. In this section we discuss how our diverse set of goals affected our design decisions. The CWC-HASH universal hash function. We found that the best way to simultaneously achieve our parallelizability, hardware, and software goals was to base the authentication portion of CWC on the Carter-Wegman [27] universal hash function approach to message authentication. This is because universal hash functions, and especially the one we created for CWC, can be implemented in a multitude of ways, thus allowing different platforms and applications to implement CWC-HASH in the way most appropriate for them. For example, hardware implementations will like parallelize the computation of CWC-HASH by splitting it into multiple polynomials in Khi for some i. In more detail, if the polynomial is Y1 Khβ + Y2 Khβ−1 + Y3 Khβ−2 + Y4 Khβ−3 + · · · + Yβ Kh + Yβ+1 mod 2127 − 1 . then, setting i = 2, and y = Kh2 mod 2127 − 1, and assuming β is odd for illustration purposes, we can rewrite the above polynomial as Y1 y m + Y3 y m−1 + · · · + Yβ x + Y2 y m + Y4 y m−1 + · · · + Yβ+1 mod 2127 − 1 ,
418
Tadayoshi Kohno et al.
After splitting the polynomial, hardware implementations will then likely compute each polynomial using Horner’s rule (e.g., the polynomial aKh2i + bKhi + c would be evaluated as (((a)Khi +b)Khi )+c). Software implementations on modern CPUs, for which memory is cheap, will likely precompute a number of powers of Kh and evaluate the CWC-HASH polynomial directly, or almost directly, using a hybrid between a precomputation approach and Horner’s rule. We consider a number of possible implementation strategies in more detail in Section 6. CWC-HASH is an instantiation of the classic polynomial universal hash approach to message authentication [27], and is closely related to Bernstein’s hash127 [6], which also evaluates a polynomial modulo 2127 −1. Although hash127 is very fast in software, its structure makes it less suitable for use on high-speed hardware. In particular, Bernstein’s choice of 32-bit coefficients, while great for software implementations with precomputed powers of Kh , means that hardware implementations using Horner’s rule will be “wasting work.” Specifically, even with 32-bit coefficients, incorporating each new coefficient using Horner’s rule will require a 127x127-bit multiply because the accumulated value will be 127 bits long. By defining the CWC-HASH coefficients to be 96-bits long, we increase the performance of Horner’s rule implementations by a factor of three. (Of course, we could have gone even further and made the coefficients 126 bits long, but doing so would have required considerable additional complexity to perform bit and byte shifting within the coefficients.) An alternative approach for increasing the performance of a serial implementation of Horner’s rule would be to reduce the size of the CWC-HASH subkey Kh to 96 bits. We discuss why we rejected this option in more detail later, but remark here that there are already more efficient strategies than Horner’s rule for implementing CWC-HASH in software, and that in a parallelized approach the values Khi , i ≥ 2, will most often be full 127-bit values even if Kh is only 96-bits long. On using a single key. From a security perspective, it would have been perfectly acceptable, and in fact more traditional, to make the CWC-CTR block cipher key and the two CWC-MAC block cipher keys independent. Like others [28, 5], however, we acknowledge that there are several important reasons for sharing keys between the encryption and authentication portions of modes such as CWC. One of the most important reasons is simplicity of key management. Indeed, fetching key material can be a major bottleneck in high-speed hardware, and minimizing key material is thus important. This fact is also why we derive the hash subkey from the block cipher key rather than use an independent hash subkey. We could, of course, have defined a mode that derived a number of essentially independent block cipher and hash keys from a single block cipher key, but doing so would either have required more memory or more computation and, because we have proofs that our construction works, would have been unnecessary. Sharing the block cipher key in the way described above and deriving the hash subkey from the block cipher key did, however, mean that we had to be very careful with our proofs of security. To facilitate our proofs, we took extra
CWC: A High-Performance Conventional Authenticated Encryption Mode
419
care in our design to ensure that there would never be a collision in the plaintext inputs to the block cipher between the different usages of the block cipher. For example, by defining CWC-HASH to produce a 127-bit value as output, we know that the first application of BC to CWC-HASHK (A, σ) in CWC-MAC will always have its first bit set to 0. To avoid a collision with the input to the keystream generator, the block cipher inputs in CWC-CTR always have the first two bits set to 10. When using the block cipher to create the hash subkey Kh , the first two bits of the input are set to 11. On the choice of parameters. Part of this effort involved specifying the appropriate parameters for the CWC encryption mode. Example parameters include the nonce length and the way the nonce is encoded in the input to the block cipher. We chose to fix these parameters for interoperability purposes, but note that our general approach in [14] does not have theses parameters fixed. We chose to set the nonce length to 88 bits in order to handle future IPsec sequence numbers. And we chose to set the block counter length to 32 bits in order to allow CWC to be used with IPsec jumbograms and other large packets. We also chose to use big-endian byte ordering for consistency purposes and to maintain compatibility with McGrew’s ICM Internet-Draft [18] and the IETF, which strongly favors big-endian byte-ordering. Handling arbitrary bit-length messages. Since we do not believe that many applications will actually require the ability to encrypt arbitrary bit-length messages, we do not define CWC to take arbitrary bit-length messages as input. That said, we did design CWC in such a way that it will be easy to modify the specification to take arbitrary bit-length messages without affecting interoperability with existing implementations when octet-strings are communicated. For example, one could augment the computation of Yβ+1 in CWC-HASH as follows: rA ← |A| mod 8 ; rσ ← |σ| mod 8 ; Yβ+1 ← 2120 · rA + 2112 · rσ + 264 · lA + lσ . Of course, a cleaner approach for handling arbitrary bit-length messages would be to compute lA ← |A| and lσ ← |σ| in CWC-HASH. We do not define CWC this way because we do not consider it a good trade-off to define a mode for arbitrary bit-length messages at the expense of octet-oriented systems. 64-bit block ciphers. We did not define CWC for use with 64-bit block ciphers because we are targeting future high-speed cryptographic applications. Nevertheless, the general CWC approach in [14] can be instantiated with 64-bit block ciphers. A 64-bit instantiation may, however, require several uncomfortable tradeoffs; e.g., in the length of the nonce. Some possible alternatives. Here we discuss some other possible alternatives to CWC and why we rejected these alternatives. First, as noted earlier, it is possible to improve the performance in some situations by using shorter
420
Tadayoshi Kohno et al.
hash subkeys Kh , say of length 96 bits. Such an alternative will not increase the performance in high-speed hardware implementations that will parallelize the computation of CWC-HASH by evaluating a polynomial in (at least) Kh2 . A 96-bit hash subkey would have increased Horner’s rule performance in software, but would still be comparable in speed to a software-based approach using precomputed powers of Kh (see Section 6), so reducing the size of Kh to 96 bits would not provide a significant advantage in software either. In [14] we also consider what happens to our provable security bounds when the length of the hash subkey is reduced to less than 96 bits. There are a number of possible approaches for reducing the number of block cipher applications in the CWC-MAC algorithm by one. For example, one could use BCK (hK (N, A, σ)) as the tag, where h is a modified version of CWC-HASH designed to hash 3-tuples instead of pairs of strings. One could also use something like BCK (N ) + Y1 Khβ+2 + · · · + Yβ Kh3 + lA Kh2 + lσ Kh mod 2127 − 1 as the tag. In [14] we consider these and other alternatives and discuss why we chose to define CWC the way that we did instead of using an option with one fewer block cipher invocation. In the case of the two alternatives mentioned in this paragraph, we note that we rejected them because we were able to prove better bounds on the security of CWC as currently defined. Motivated by EAX2 [5], one possible alternative to CWC might be to use BCK (11105 N ) both as the value to encrypt R in CWC-MAC and as the initial counter to CTR mode-encrypt M (with the first two bits of the counter always set to 10). Other EAX2-motivated constructions also exist. For example, the tag might be set to BCK (h(X0 N )) ⊕ BCK (h(X1 A)) ⊕ BCK (h(X2 σ)), where X0 , X1 , X2 are strings, none of which is a prefix of the other, and h is a parallelizable universal hash function, like CWC-HASH but hashing only single strings (as opposed to pairs of strings). Compared to CWC, these alternatives have the ability to take longer nonces as input, and, from a functional perspective, can be applied to strings up to 2126 blocks long. But we do not view this as a reason to prefer these alternatives over CWC. From a practical perspective, we do not foresee applications needing nonces longer than 11 octets, or needing to encrypt messages longer than 232 − 1 blocks. Moreover, from a security perspective, applications should not encrypt too many packets between rekeyings, implying that even 11 octet nonces are more than sufficient. We do comment, however, that we believe the alternatives discussed in this paragraph are still more attractive than EAX because, like CWC but unlike EAX, these alternatives are parallelizable. We chose not to base the authentication portion of our new mode on XORMAC [3] or PMAC [7] because of patent concerns and our software performance requirements and we chose not to base the authentication portion on softwareefficient MACs such as HMAC [1] because of our hardware parallelizability requirement.
CWC: A High-Performance Conventional Authenticated Encryption Mode
6
421
Performance
Hardware. Since one of our main goals was to achieve high performance in hardware and, in particular, to provide a solution for future 10 Gbps IPsec (and other) network devices, let us focus first on hardware costs. As noted in the introduction, using 0.13 micron CMOS ASIC technology, it should take approximately 300 Kgates to achieve 10 Gbps throughput for CWC-AES. This estimate, which is applicable to AES with all key lengths, includes four AES counter-mode encryption engines, each running at 200 MHz and requiring about 25Kgates each. In addition, there are two 32x128-bit multiply/accumulate engines, each running at 200 MHz with a latency of four clocks, one each for the even and odd polynomial coefficients. Of course, simply keeping these engines “fed” may be quite a feat in itself, but that is generally true of any 10 Gbps path. Also, there may well be better methods to structure an implementation, depending on the particular ASIC vendor library and technology, but, regardless of the implementation strategy, 10 Gbps is quite achievable because of the inherent parallelism of CWC. Since OCB is CWC’s main competitor for high-speed environments, it is worth comparing CWC with OCB instantiated with AES (we do not compare CWC with CCM and EAX here since the latter two are not parallelizable). We first note that CWC-AES saves some gates because we only have to implement AES encryption in hardware. However, at 10 Gbps, OCB still probably requires only about half the silicon area of CWC-AES. The main question for many hardware designers is thus whether the extra silicon area for CWC-AES costs more than three royalty payments, as well as negotiation costs and overhead. With respect to negotiation costs and royalty payments, we note that despite significant demands, to date the relevant parties have not all offered publicly available IP fee schedules. Given this fact, and given today’s silicon costs, we believe that the extra silicon for CWC-AES is probably cheaper overall than the negotiation costs and IP fees required for OCB. Software. CWC-AES can also be implemented efficiently in software. Table 1 shows timing information for CWC-AES, as well as CCM-AES and EAX-AES, on a 1.133GHz mobile Pentium III dual-booting RedHat Linux 9 (kernel 2.4.208) and Windows 2000 SP2. The numbers in the table are the clocks per byte for different message lengths averaged over 50 000 runs and include the entire time for setting up (e.g., expanding the AES key-schedule) and encrypting. All implementations were in C and written by Brian Gladman [9] and use 128-bit AES keys. The Linux compiler was gcc version 3.2.2; the Windows compiler was Visual Studio 6.0. To be fair, we note that OCB does run at about twice the speeds given in Table 1. From Table 1 we conclude that the three patent-free modes, as currently implemented by Gladman, share similar software performances. The “best” performing one appears to depend on OS/compiler and the length of the message being processed. On Linux, it appears that CWC-AES performs slightly better than EAX-AES for all message lengths that we tested, and better than CCMAES for the longer messages, whereas Gladman’s CCM-AES and EAX-AES
422
Tadayoshi Kohno et al.
implementations slightly outperform his CWC-AES implementation on Windows for all the message lengths that we tested. Note, however, that all the implementations used to compute Table 1 were written in C. Furthermore, the current CWC-AES code does not make use of all of the optimization techniques (and in particular precomputation) that we describe below. By switching to assembly and using the additional optimization techniques, we anticipate the speed for CWC-HASH to drop to better than 8 clocks per byte, whereas the speed for the CBC-MAC portion of CCM-AES and EAX-AES will be limited by the speed of AES (the best reported speed for AES on a Pentium III is 14.1 cpb, due to a proprietary library by Helger Lipmaa; Gladman’s free hand-optimized Windows assembly implementation runs at 17.5 cpb [16]). Returning to the speed of CWC-HASH, for reference we note that Bernstein’s related hash127 [6] runs around 4 cpb on a Pentium III when written in assembly and using the precomputation approach. Bernstein’s hash127 also works by evaluating a polynomial modulo 2127 −1; the main difference is that the coefficients for hash127 are 32 bits long, whereas the coefficients for CWC-HASH are 96 bits long (recall Section 5, which discusses why we use 96-bit coefficients). We also note that the performance of CWC-HASH will increase dramatically on 64-bit architectures with larger multiplies; an initial implementation on a G5 using 64-bit integer operations runs at around 6 cpb (when running the G5 in 32-bit mode, the hash function runs at around 15 cpb). Since the implementation of CWC-HASH is more complicated than the implementation of the CWC-CTR portion of CWC, we devote the rest of this section to discussing CWC-HASH. Precomputation. As noted in Section 5, there are two general approaches to implementing CWC-HASH in software. The first is to use Horner’s rule. The second is to evaluate the polynomial directly, which can be faster if one precomputes powers of the hash key Kh at setup time (here the powers of Kh can be viewed as an expanded key-schedule). In particular, as noted in Section 5, evaluating the polynomial using Horner’s rule requires a 127x127-bit multiply for each coefficient, whereas evaluating the polynomial directly using precomputed powers of Kh requires a 96x127-bit multiply for each coefficient. (We discuss elsewhere why we did not make the hash subkey 96-bits, which could have sped up a serial Horner’s rule implementation.) The advantage with precomputation was first observed by Bernstein in the context of hash127 [6]. The above description of the precomputation approach assumed that if the polynomial is Y1 Khγ−1 +· · ·+Yγ−1 Kh +Yγ (i.e., the polynomial has γ coefficients), then we had precomputed the powers of Khi for all i ∈ {1, . . . , γ−1}. The precomputation approach extends naturally to the case where we have precomputed the powers Khj , j ∈ {1, . . . , n}, for some n ≤ γ−1. For simplicity, first assume that we know the polynomial has a multiple of n coefficients. For such a polynomial, one processes the first n coefficients (to get Y1 Khn−1 +. . .+Yn−1 Kh +Yn ), then multiplies the intermediate result by Khn (to get Y1 Kh2n−1 + . . . + Yn−1 Khn+1 + Yn Khn ). After that, one can continue processing data with the same precomputed values
CWC: A High-Performance Conventional Authenticated Encryption Mode
423
(to get Y1 Kh2n−1 +. . .+Y2n−1 Kh +Y2n ), and so on. Note that each chunk of n coefficients takes (n − 1) 96x127-bit multiplies, and all but the last chunk takes an additional 127x127-bit multiply. Now assume that the number of coefficients m in the polynomial is not necessarily a multiple of n. If m is known in advance, one could first process m mod n coefficients, multiply by Khn , then process in ncoefficient chunks as before. Alternately, as long as the end of the message is known n coefficients in advance, one could process n-coefficients chunks, and then finish off the final m mod n coefficients using Horner’s rule. Or, if the number of coefficients in the polynomial is not known until the final coefficient is reached, one could process the message in n-coefficient chunks and then multiply by a precomputed power of Kh−1 once the end of the message hash been reached. Naturally, precomputation requires extra memory, but that is usually cheap and plentiful in a software-based environment. Using 32-bit multiplies, the precomputation approach requires 12 32-bit multiplies per 96-bit coefficient, as well as 17 adds, all of which may carry. In assembly, most of these carry operations can be implemented for free, or close to it by using a special variant of the add instruction that adds in the operand as well as the value of the carry from the previous add operation. But when implemented in C, they will generally compile to code that requires a conditional branch and an extra addition. An implementation using Horner’s rule requires an additional four multiplies and three additions with carry per coefficient, adding about 33% overhead, since the multiplies dominate the additions. A 64-bit platform only requires four multiplies and four adds (which may all carry), no matter the implementation strategy taken, which explains why implementations of CWC-HASH for 64-bit architectures are much faster. Exploiting the parallelism of some instruction sets. On most 32-bit platforms, it turns out that the integer execution unit is not the fastest way to implement CWC-HASH. Many platforms have multimedia instructions that can be used to speed up the implementation. As another alternative, Bernstein demonstrated that, on most platforms, the floating point unit can be used to implement this class of universal hash functions far more efficiently than can be done in the integer unit. This is particularly true on the x86 platform where, in contrast to using the standard registers, two floating point multiples can be started in close proximity without introducing a pipeline stall. That is, the x86 can effectively perform two floating-point operations in parallel. The disadvantage of using floating-point registers is that the operands for the individual multiplies need to be small, so that the operations can be done without loss of precision. On the x86, Bernstein multiplies 24-bit values, allowing the sums of product terms to fit into double precision values with 53 bits of precision without loss of information. Bernstein details many ways to optimize this sort of calculation in [6]. As noted before, there are only two main differences between the structure of the polynomials of Bernstein’s hash127 and CWC-HASH. The first is that Bern-
424
Tadayoshi Kohno et al.
stein uses signed coefficients, whereas CWC-HASH uses unsigned coefficients; this should not have an impact on efficiency. The other difference is that Bernstein uses 32-bit coefficients, whereas CWC-HASH uses 96-bit coefficients. While both solutions average one multiplication per byte when using integer math, Bernstein’s solution requires only .75 additions per byte, whereas CWC-HASH requires 1.42 additions per byte, nearly twice as many. Using 32-bit multiplies to build a 96x127 multiplier (assuming precomputation), CWC-HASH should therefore perform no worse than at half the speed of hash127. When using 24-bit floating point coefficients to build a multiply (without applying any non-obvious optimizations), hash127 requires 12 multiplies and 16 adds per 32-bit word. CWC can get by with 8 multiples per word and 12.67 additions per word. This is because a 96-bit coefficient fits exactly into four 24-bit values, meaning we can use a 6x4 multiply for every three words. With 32-bit coefficients, we need to use two 24-bit values to represent each coefficient, resulting in a single 6x2 multiply that needs to be performed for each word. Gladman’s C implementation of CWC-HASH uses floating point arithmetic, but uses Horner’s rule instead of performing precomputation to achieve extra speed. Nothing about the CWC hash indicates that it should run any worse than half the speed of hash127, if implemented in a similar manner, in assembly, and using the floating point registers and precomputation. This upper-bound paints an encouraging picture for CWC performance, because hash127 on a Pentium III runs around 4 cpb when implemented in assembly and using the floating point registers and precomputation. This indicates that a well-optimized software version of CWC-HASH should run no slower than 8 cycles per byte on the same machine. Finally, it may be possible to further improve the performance of CWC-HASH. For example, literature from the gaming community [11] indicates that one can use both integer and floating point registers in parallel. Although we have not tested this approach, it seems reasonable to conclude that one might be able to interleave integer operations, and thereby obtain additional speedups.
Acknowledgements We thank Peter Gutmann, David McGrew, Fabian Monrose, Avi Rubin, Adam Stubblefield, and David Wagner for their comments. Additionally, we thank Brian Gladman for helping to validate our test vectors and for working with us to obtain timing information. T. Kohno was supported by a National Defense Science and Engineering Fellowship.
References [1] M. Bellare, R. Canetti, and H. Krawczyk. Keying hash functions for message authentication. In N. Koblitz, editor, CRYPTO ’96, volume 1109 of LNCS, pages 1–15. Springer-Verlag, Aug. 1996. 420
CWC: A High-Performance Conventional Authenticated Encryption Mode
425
[2] M. Bellare, A. Desai, E. Jokipii, and P. Rogaway. A concrete security treatment of symmetric encryption. In Proc. of the 38th FOCS, pages 394–403. IEEE Computer Society Press, 1997. 413 [3] M. Bellare, R. Gu´erin, and P. Rogaway. XOR MACs: New methods for message authentication using finite pseudorandom functions. In D. Coppersmith, editor, CRYPTO ’95, volume 963 of LNCS, pages 15–28. Springer-Verlag, Aug. 1995. 420 [4] M. Bellare and C. Namprempre. Authenticated encryption: Relations among notions and analysis of the generic composition paradigm. In T. Okamoto, editor, ASIACRYPT 2000, volume 1976 of LNCS, pages 531–545. Springer-Verlag, Dec. 2000. 409, 411, 417 [5] M. Bellare, P. Rogaway, and D. Wagner. The EAX mode of operation. In W. Meier and B. Roy, editors, FSE 2004, LNCS. Springer-Verlag, 2004. 408, 409, 412, 418, 420 [6] D. Bernstein. Floating-point arithmetic and message authentication, 2000. Available at http://cr.yp.to/papers.html##hash127. 410, 412, 418, 422, 423 [7] J. Black and P. Rogaway. A block-cipher mode of operation for parallelizable message authentication. In L. Knudsen, editor, EUROCRYPT 2002, volume 2332 of LNCS. Springer-Verlag, 2002. 420 [8] J. Daemen and V. Rijmen. The Design of Rijndael. Springer-Verlag, 2002. 410, 413 [9] B. Gladman. AES and combined encryption/authentication modes, 2003. Available at http://fp.gladman.plus.com/AES/index.htm. 411, 421 [10] V. Gligor and P. Donescu. Fast encryption and authentication: XCBC encryption and XECB authentication modes. In M. Matsui, editor, FSE 2001, LNCS. Springer-Verlag, 2001. 408, 412 [11] C. Hecker. Perspective texture mapping, part V: It’s about time. Game Developer, Apr. 1996. Available at http://www.d6.com/users/checker/pdfs/gdmtex5.pdf. 424 [12] C. Jutla. Encryption modes with almost free message integrity. In B. Pfitzmann, editor, EUROCRYPT 2001, volume 2045 of LNCS, pages 529–544. Springer-Verlag, May 2001. 408, 412 [13] J. Katz and M. Yung. Unforgeable encryption and chosen ciphertext secure modes of operation. In B. Schneier, editor, FSE 2000, volume 1978 of LNCS, pages 284– 299. Springer-Verlag, Apr. 2000. 408, 411, 412 [14] T. Kohno, J. Viega, and D. Whiting. CWC: A high-performance conventional authenticated encryption mode, 2003. Full version of this paper, available at http://eprint.iacr.org/2003/106/. 415, 419, 420 [15] H. Krawczyk. The order of encryption and authentication for protecting communications (or: How secure is SSL?). In J. Kilian, editor, CRYPTO 2001, volume 2139 of LNCS, pages 310–331. Springer-Verlag, Aug. 2001. 409, 411 [16] H. Lipmaa. AES/Rijndael: speed, 2003. Available at http://www.tcs.hut.fi/~helger/aes/rijndael.html. 422 [17] M. Luby and C. Rackoff. How to construct pseudorandom permutations from pseudorandom functions. SIAM J. Computation, 17(2), Apr. 1988. 413 [18] D. McGrew. Integer counter mode, Oct. 2002. Available at http://www.ietf.org/internet-drafts/draft-irtf-cfrg-icm-01.txt. 419 [19] D. McGrew. The truncated multi-modular hash function (TMMH), version two, Oct. 2002. Available at http://www.ietf.org/internet-drafts/draft-irtf-cfrg-tmmh-00.txt. 412
426
Tadayoshi Kohno et al.
[20] D. McGrew. The universal security transform, Oct. 2002. Available at http://www.ietf.org/internet-drafts/draft-irtf-cfrg-ust-01.txt. 412 [21] D. McGrew and J. Viega. Galois/counter mode. Submission to NIST. Available at http://csrc.nist.gov/CryptoToolkit/modes/proposedmodes/, 2004. 412 [22] W. Nevelsteen and B. Preneel. In J. Stern, editor, EUROCRYPT ’99, volume 1592 of LNCS, pages 24–41. Springer-Verlag, 1999. 412 [23] P. Rogaway. Authenticated encryption with associated data. In Proc. of the 9th CCS, Nov. 2002. 408, 411, 412, 415, 416, 417 [24] P. Rogaway, M. Bellare, J. Black, and T. Krovetz. OCB: A block-cipher mode of operation for efficient authenticated encryption. In Proc. of the 8th CCS, pages 196–205. ACM Press, 2001. 408, 412, 415, 417 [25] P. Rogaway and D. Wagner. A critique of CCM, Apr. 2003. Available at http://eprint.iacr.org/2003/070/. 412 [26] V. Shoup. On fast and provably secure message authentication based on universal hashing. In N. Koblitz, editor, CRYPTO ’96, volume 1109 of LNCS, pages 313– 328. Springer-Verlag, Aug. 1996. 412 [27] M. Wegman and L. Carter. New hash functions and their use in authentication and set equality. Journal of Computer and System Sciences, 22:265–279, 1981. 409, 412, 417, 418 [28] D. Whiting, N. Ferguson, and R. Housley. Counter with CBC-MAC (CCM). Submission to NIST. Available at http://csrc.nist.gov/CryptoToolkit/modes/proposedmodes/, 2002. 408, 409, 412, 418
New Security Proofs for the 3GPP Confidentiality and Integrity Algorithms Tetsu Iwata1 and Tadayoshi Kohno2 1
Dept. of Computer and Information Sciences, Ibaraki University, 4–12–1 Nakanarusawa, Hitachi, Ibaraki 316-8511, Japan [email protected] 2 Dept. of Computer Science and Engineering, University of California at San Diego 9500 Gilman Drive, La Jolla, California 92093, USA [email protected]
Abstract. This paper analyses the 3GPP confidentiality and integrity schemes adopted by Universal Mobile Telecommunication System, an emerging standard for third generation wireless communications. The schemes, known as f 8 and f 9, are based on the block cipher KASUMI. Although previous works claim security proofs for f 8 and f 9 , where f 9 is a generalized versions of f 9, it was recently shown that these proofs are incorrect. Moreover, Iwata and Kurosawa (2003) showed that it is impossible to prove f 8 and f 9 secure under the standard PRP assumption on the underlying block cipher. We address this issue here, showing that it is possible to prove f 8 and f 9 secure if we make the assumption that the underlying block cipher is a secure PRP-RKA against a certain class of related-key attacks; here f 8 is a generalized version of f 8. Our results clarify the assumptions necessary in order for f 8 and f 9 to be secure and, since no related-key attacks are known against the full eight rounds of KASUMI, lead us to believe that the confidentiality and integrity mechanisms used in real 3GPP applications are secure.
1
Introduction
Background. Within the security architecture of the 3rd Generation Partnership Project (3GPP) system there are two standardized constructions: A confidentiality scheme f 8, and an integrity scheme f 9 [1]. 3GPP is the body standardizing the next generation of mobile telephony. Both f 8 and f 9 are modes of operations based on the block cipher KASUMI [2]. f 8 is a symmetric encryption scheme which is a variant of the Output Feedback (OFB) mode with full feedback, and f 9 is a Message Authentication Code (MAC) which is a variant of the CBC MAC. Provable Security. Provable security is a standard security goal for block cipher modes of operations. Indeed, many of the block cipher modes of operations B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 427–445, 2004. c International Association for Cryptologic Research 2004
428
Tetsu Iwata and Tadayoshi Kohno
are provably secure assuming that the underlying block cipher is a secure pseudorandom permutation, or a super-pseudorandom permutation [21]. For example, we have: CTR mode [3] and CBC encryption mode [3] for symmetric encryption schemes, PMAC [8] and OMAC [14] for message authentication codes, and IAPM [17], OCB mode [22], CCM mode [23, 16], EAX mode [6] and CWC mode [20] for authenticated encryption schemes. Therefore, it is natural to ask whether f 8 and f 9 are provably secure if the underlying block cipher is a secure pseudorandom permutation. Making this assumption, it was claimed that f 8 is a secure symmetric encryption scheme in the sense of left-or-right indistinguishability [18] and that f 9 is a secure MAC [12], where f 9 is a generalized version of f 9. However, these claims were disproven [15]. One of the remarkable aspects of f 8 and f 9 is the use of a nonzero constant called a “key modifier,” or KM. In the f 8 and f 9 schemes, KASUMI is keyed with K and K ⊕ KM. The paper [15] constructs a secure pseudorandom permutation F with the following property: For any key K, the encryption function with key K is the decryption function with K ⊕ KM. That −1 (·). Then it was shown that f 8 and f 9 are insecure if F is is, FK (·) = FK⊕KM used as the underlying block cipher. This result shows that it is impossible to prove the security of f 8 and f 9 even if the underlying block cipher is a secure pseudorandom permutation. Our Contribution. Given the results in [15], it is logical to ask if there are assumptions under which f 8 and f 9 are actually secure and, if so, what those assumptions are. The answers to these questions would give us greater insights into the security of these two modes. Because of the constructions’ use of keys related by fixed xor differences, the natural conjecture is that if the constructions are actually secure, then the minimum assumption on the block cipher must be that the block cipher is secure against some class of xor-restricted related-key attacks, as introduced in [7] and formalized in [5]. We prove that the above hypotheses are in fact correct and, in doing so, we clarify what assumptions are actually necessary in order for the f 8 and f 9 modes to be secure. In more detail, we first consider a generalized version of f 8, which we call f 8 . f 8 is a nonce-based symmetric encryption scheme, and is the natural nonce-based extension of the original f 8. We then show that f 8 is a secure nonce-based deterministic symmetric encryption mode in the sense of indistinguishability from random strings if the underlying block cipher is secure against related-key attacks in which an adversary is able to obtain chosen-plaintext samples of the underlying block cipher using two keys related by a fixed known xor difference. We next consider a generalized version of f 9, which we call f 9 . f 9 is a deterministic MAC, and is a natural extension of f 9 that gives the user, or adversary, more liberty in controlling the input to the underlying CBC MAC core. We then show that f 9 is a secure pseudorandom function, which provably implies a secure MAC, if the underlying block cipher resists related-key attacks in which
New Security Proofs for the 3GPP Confidentiality
429
an adversary is able to obtain chosen-plaintext samples of the underlying block cipher using two keys related by a fixed known xor difference. Since both f 8 and f 9 are generalized versions of f 8 and f 9, and, since the best known related-key attack against KASUMI breaks only six out of eight rounds [9], our results show that unless a novel new attack is discovered against KASUMI, the 3GPP confidentiality and integrity mechanisms are actually secure. We view this as an important practical corollary of our research since the 3GPP constructions are destined for use in future mobile telephony applications. Additionally, because our proofs explicitly quantify what properties of the underlying block cipher are necessary in order for f 8 and f 9 to be secure, our results can help others decide whether it is safe to instantiate the generalized 3GPP modes with block ciphers other than KASUMI. Of course, because the assumptions we make are stronger than the standard pseudorandomness assumptions, as proven necessary in [15], unless there is a significant reason to do otherwise, we suggest that future systems use more conventional modes such as CTR mode and OMAC. For our proofs, rather than trying to find and re-use correct portions of the analyses in [18] and [12], we chose instead to prove the security of f 8 and f 9 directly. We did this in order to ensure the correctness of our results and to avoid presenting proofs covered with patches. We discuss some of problems with the previous analyses in more detail in Appendices A.1 and B.1. Related Works. Initial security evaluation of KASUMI, f 8 and f 9 can be found in [11]. Knudsen and Mitchell analyzed the security of f 9 against forgery and key recovery attacks [19].
2
Preliminaries
Notation. If x is a string then |x| denotes its length in bits. If x and y are two equal-length strings, then x⊕ y denotes the xor of x and y. If x and y are strings, then xy denotes their concatenation. Let x ← y denote the assignment of y to x. R If X is a set, let x ← X denote the process of uniformly selecting at random an element from X and assigning it to x. If F : {0, 1}k × {0, 1}n → {0, 1}m is a family of functions from {0, 1}n to {0, 1}m indexed by keys {0, 1}k , then we use the notation FK (D) as shorthand for F (K, D). We say F is a family of permutations, i.e., a block cipher, if n = m and FK (·) is a permutation on {0, 1}n for each K ∈ {0, 1}k . Let Rand(n, m) denote the set of all functions from {0, 1}n to {0, 1}m. When we refer to the time of an algorithm or experiment in the provable security sections of this paper, we include the size of the code (in some fixed encoding). There is also an implicit big-O surrounding all such time references. PRP-RKAs. The PRP-RKA notion was introduced in [5], and is based on the pseudorandomness notions introduced in [21] and later made concrete in [4]. The notion was designed to model block ciphers secure against related-key attacks [7].
430
Tetsu Iwata and Tadayoshi Kohno
Let Perm(k, n) denote the set of all block ciphers with domain {0, 1}n and R keys {0, 1}k . The notation G ← Perm(k, n) thus corresponds to selecting a random block-cipher, and comes down to defining G via For each K ∈ {0, 1}k do: GK ← Perm(n) , R
where Perm(n) is the set of all permutations on {0, 1}n. Given a family of functions F : {0, 1}k × {0, 1}n → {0, 1}n and a key K ∈ {0, 1}k , we define the related-key oracle Frk(·,K) (·) as an oracle that takes two arguments, a function φ : {0, 1}k → {0, 1}k and an element M ∈ {0, 1}n, and that returns Fφ(K) (M ), or the encipherment of M under the key φ(K). In this context, we shall refer to φ as a related-key-deriving (RKD) function. The PRP-RKA notion, which we now describe, is parameterized by a set of RKD functions Φ. Let E : {0, 1}k × {0, 1}n → {0, 1}n be a family of functions and let Φ be a set of RKD functions over {0, 1}k . Let A be an adversary with access to a related-key oracle, and restricted to queries of the form (φ, x) in which φ ∈ Φ and x ∈ {0, 1}n, and let A return a bit. Then def R Advprp-rka (A) = Pr(K ← {0, 1}k : AErk(·,K) (·) = 1) Φ,E
R R − Pr(K ← {0, 1}k ; G ← Perm(k, n) : AGrk(·,K) (·) = 1)
is defined as the PRP-RKA-advantage of A in a Φ-restricted related-key attack (RKA) on E. Intuitively, we say that E is a secure PRP-RKA under Φ-restricted related-key attacks if the PRP-RKA-advantage of all adversaries using reasonable resources is small. In this work we are primarily interested in keys that are related by some xor difference. For any ∆ ∈ {0, 1}k we let XOR∆ : {0, 1}k → {0, 1}k denote the ⊕ function which on input K ∈ {0, 1}k returns K ⊕ ∆. We define Φ⊕ k as Φk = k { XOR∆ : ∆ ∈ {0, 1} }. We briefly remark that modern block ciphers, e.g., AES [10], are designed to be secure PRP-RKAs under Φ⊕ k -restricted related-key attacks. Additionally, the best-known Φ⊕ -restricted related-key attack against k the block cipher KASUMI, which was designed for use with the 3GPP modes, only breaks six out of eight rounds [9]. def
Specifications of f 8, f 8, f 9 and f 9
3 3.1
3GPP Confidentiality Algorithm f 8 [1]
f 8 is a symmetric encryption scheme standardized by 3GPP1 . It uses a block cipher KASUMI : {0, 1}128 ×{0, 1}64 → {0, 1}64 as the underlying primitive. The f 8 key generation algorithm returns a random 128-bit key K. The f 8 encryption algorithm takes a 128-bit key K, a 32-bit counter COUNT, a 5-bit radio bearer 1
The original specification [1] refers f 8 as a symmetric synchronous stream cipher. The specification presented here is fully compatible with the original one.
New Security Proofs for the 3GPP Confidentiality
431
identifier BEARER, a 1-bit direction identifier DIRECTION, and a message M ∈ {0, 1}∗ to return a ciphertext C, which is the same length as M . Also, it uses a 128-bit constant KM = (01)64 (or 0x55...55 in hexadecimal) called the key modifier. In more detail, the encryption algorithm is defined as follows: Algorithm f8-EncryptK (COUNT, BEARER, DIRECTION, M ) m ← |M |/64 Y [0] ← 064 A ← COUNTBEARERDIRECTION026 A ← KASUMIK⊕KM (A) For i = 1 to m do: X[i] ← A ⊕ [i − 1]64 ⊕ Y [i − 1] Y [i] ← KASUMIK (X[i]) C ← M ⊕ (the leftmost |M | bits of Y [1] · · · Y [m]) Return C In the above description, [i − 1]64 denotes the 64-bit binary representation of i − 1. The decryption algorithm, which takes COUNT, BEARER, DIRECTION, and a ciphertext C as input and returns a plaintext M , is defined in the natural way. Since we analyze and prove results about a variant of f 8 whose encryption algorithm takes a nonce as input in lieu of COUNT, BEARER, and DIRECTION, we do not describe the specifics of how COUNT, BEARER, and DIRECTION are used in real 3GPP applications. We do note that 3GPP applications will never invoke the f 8 encryption algorithm twice with the same (COUNT, BEARER, DIRECTION) triple, which means that our nonce-based variant is appropriate. 3.2
A Generalized Version of f 8: f 8
f 8 is a nonce-based deterministic symmetric encryption scheme, which is a generalized (and weakened) version of f 8. It uses a block cipher E : {0, 1}k × {0, 1}n → {0, 1}n as the underlying primitive. Let f 8 [E, ∆] be f 8 , where E is used as the underlying primitive and ∆ is a non-zero k-bit key modifier. The f 8 key generation algorithm returns a random k-bit key K. The f 8 [E, ∆] encryption algorithm, which we call f8 -Encrypt, takes an n-bit nonce N instead of COUNT, BEARER and DIRECTION. That is, the encryption algorithm takes a k-bit key K, an n-bit nonce N , and a message M ∈ {0, 1}∗ to return a ciphertext C, which is the same length as M . Then the encryption algorithm proceeds as follows:
432
Tetsu Iwata and Tadayoshi Kohno
Algorithm f8 -EncryptK (N, M ) m ← |M |/n Y [0] ← 0n A←N A ← EK⊕∆ (A) For i = 1 to m do: X[i] ← A ⊕ [i − 1]n ⊕ Y [i − 1] Y [i] ← EK (X[i]) C ← M ⊕ (the leftmost |M | bits of Y [1] · · · Y [m]) Return C In the above description, [i − 1]n denotes n-bit binary representation of i − 1. Decryption is done in an obvious way. Notice that we treat COUNT, BEARER and DIRECTION as a nonce. That is, as we will define in Section 4, we allow the adversary to choose these values. Consequently, f 8 can be considered a weakened version of f 8 since it gives the an adversary the ability to control the entire initial value of A, rather than only a subset of the bits as would be the case for an adversary attacking f 8. 3.3
3GPP Integrity Algorithm f 9 [1]
f 9 is a message authentication code standardized by 3GPP. It uses KASUMI as the underlying primitive. The f 9 key generation algorithm returns a random 128-bit key K. The f 9 tagging algorithm takes a 128-bit key K, a 32-bit counter COUNT, a 32-bit random number FRESH, a 1-bit direction identifier DIRECTION, and a message M ∈ {0, 1}∗ and returns a 32-bit tag T . It uses a 128-bit constant KM = (10)64 (or 0xAA...AA in hexadecimal), called the key modifier. Let M = M [1] · · · M [m] be a message, where each M [i] (1 ≤ i ≤ m − 1) is 64 bits. The last block M [m] may have fewer than 64 bits. We define pad64 (COUNT, FRESH, DIRECTION, M ) as follows: It concatenates COUNT, FRESH, M and DIRECTION, and then appends a single “1” bit, followed by between 0 and 63 “0” bits so that the total length is a multiple of 64 bits. More precisely, pad64 (COUNT, FRESH, DIRECTION, M ) = COUNTFRESHM DIRECTION1063−(|M|+1 mod 64) . Then the tagging algorithm is defined as follows:
New Security Proofs for the 3GPP Confidentiality
433
Algorithm f9-TagK (COUNT, FRESH, DIRECTION, M ) M ← pad64 (COUNT, FRESH, DIRECTION, M ) Break M into 64-bit blocks M [1] · · · M [m] Y [0] ← 064 For i = 1 to m do: X[i] ← M [i] ⊕ Y [i − 1] Y [i] ← KASUMIK (X[i]) T ← KASUMIK⊕KM (Y [1] ⊕ · · · ⊕ Y [m]) T ← the leftmost 32 bits of T Return T The f 9 verification algorithm is defined in the natural way. As with f 8, since we analyze and prove the security of a generalized version of f 9, we do not describe how COUNT, FRESH, and DIRECTION are used in real 3GPP applications. 3.4
A Generalized Version of f 9: f 9 [12, 19, 15]
The message authentication code f 9 is a generalized (and weakened) version of f 9 that gives the user (or adversary) almost complete control over the input the underlying CBC MAC core. It uses a block cipher E : {0, 1}k × {0, 1}n → {0, 1}n as the underlying primitive. Let f 9 [E, ∆, l] be f 9 , where E is used as the underlying block cipher, ∆ is a non-zero k-bit key modifier, and the tag length is l, where 1 ≤ l ≤ n. The key generation algorithm returns a random k-bit key K. The tagging algorithm, which we call f9 -Tag, takes a k-bit key K and a message M ∈ {0, 1}∗ as input and returns an l-bit tag T . Let M = M [1] · · · M [m] be a message, where each M [i] (1 ≤ i ≤ m − 1) is n bits. The last block M [m] may have fewer than n bits. In f 9 , we use pad n instead of pad64 . pad n (M ) works as follows: It simply appends a single “1” bit, followed by between 0 and n − 1 “0” bits so that the total length is a multiple of n bits. More precisely, pad n (M ) = M 10n−1−(|M| mod n) . Thus, we simply ignore COUNT, FRESH, and DIRECTION. Equivalently, we consider COUNT, FRESH, and DIRECTION as a part of the message. The rest of the tagging algorithm is the same as with f 9. In pseudocode, Algorithm f9 -TagK (M ) M ← pad n (M ) Break M into n-bit blocks M [1] · · · M [m] Y [0] ← 0n For i = 1 to m do: X[i] ← M [i] ⊕ Y [i − 1] Y [i] ← EK (X[i]) T ← EK⊕∆ (Y [1] ⊕ · · · ⊕ Y [m]) T ← the leftmost l bits of T Return T
434
Tetsu Iwata and Tadayoshi Kohno
The verification algorithm is defined in the natural way. As we will define in Section 5, our adversary is allowed to choose COUNT, FRESH, and DIRECTION since f 9 treats them as a part of the message. In this sense, f 9 can be considered as a weakened version of f 9.
4
Security of f 8
Definitions. Before proving the security of f 8 , we must first formally define what we mean by a nonce-based encryption scheme, and what it means for such an encryption scheme to be secure. Mathematically, a nonce-based symmetric encryption scheme SE = (K, E, D) consists of three algorithms and is defined for some nonce length n. The randomized key generation algorithm K takes no input and returns a random key K. The stateless and deterministic encryption algorithm takes a key K, an nonce N ∈ {0, 1}n, and a message M ∈ {0, 1}∗ as input and returns a ciphertext C such that |C| = |M |; we write C ← EK (N, M ). The stateless and deterministic decryption algorithm takes a key K, a nonce N ∈ {0, 1}n, and a ciphertext C ∈ {0, 1}∗ as input and returns a message M such that |M | = |C|; we write M ← DK (N, C). For consistency, we require that for all keys K, nonces N , and messages M , DK (N, EK (N, M )) = M . We adopt the strong notion of privacy for nonce-based encryption schemes from [22]. This notion, which we call indistinguishability from random strings, provably implies the more standard notions given in [3]. Let $(·, ·) denote an oracle that on input a pair of strings (N, M ) returns a random string of length |M |. If A is an adversary with access to an oracle, then def R EK (·,·) $(·,·) (A) = ← K : A = 1) − Pr(A = 1) Advpriv Pr(K SE is defined as the PRIV-advantage of A in distinguishing the outputs of the encryption algorithm with a randomly selected key from random strings. We say that A is nonce-respecting if it never queries its oracle twice with the same nonce value. Intuitively, we say that an encryption scheme preserves privacy under chosen-plaintext attacks if the PRIV-advantage of all nonce-respecting adversaries A using reasonable resources is small. Provable Security Results. Let p8 [n] be a variant of f 8 that uses random functions on n-bits instead of EK and EK⊕∆ . Specifically, the key generation algorithm for p8 [n] returns two randomly selected functions R1 , R2 from Rand(n, n). The encryption algorithm for p8 [n], p8 -Encrypt, takes R1 and R2 as “keys” and uses them instead of EK and EK⊕∆ . The decryption algorithm is defined in the natural way. We first upper-bound the advantage of an adversary in breaking the privacy of p8 [n]. Let (Ni , Mi ) denote a privacy adversary’s i-th oracle query. If the adversary makes exactly q oracle queries,then we define the total number of blocks for the adversary’s queries as σ = qi=1 |Mi |/n.
New Security Proofs for the 3GPP Confidentiality
435
Lemma 4.1. Let p8 [n] be as described above and let A be a nonce-respecting privacy adversary which asks at most q queries totaling at most σ blocks. Then Advpriv p8 [n] (A) ≤
σ2 . 2n
(1)
A proof sketch is given in Appendix A, and a proof is given in the full version of this paper [13]. We now present our main result for f 8 (Theorem 4.1 below). At a high level, our theorem shows that if a block cipher E is secure against Φ-restricted related key attacks, where Φ is a small subset of Φ⊕ k , then the construction f 8 [E, ∆] based on E will be a provably secure encryption scheme. In more detail, our theorem states that given any adversary A attacking the privacy of f 8 [E, ∆] and making at most q oracle queries totaling at most σ blocks, we can construct a Φrestricted PRP-RKA adversary B attacking E such that B uses similar resources 2 2 n+1 (B) ≥ Advpriv . as A and B has advantage Advprp-rka Φ,E f 8 [E,∆] (A) − (3σ + q )/2 If we assume that E is secure against Φ-restricted related-key attacks and that A (B) must be small (and therefore B) uses reasonable resources, then Advprp-rka Φ,E by definition, and thus Advpriv (A) must also be small. This means that f 8 [E,∆] under these assumptions on E, f 8 [E, ∆] is provably secure. Since many block ciphers, including AES and KASUMI, are believed to resist -restricted related-key attacks, and since Φ is a small subset of Φ⊕ Φ⊕ k k , this theorem means that f 8 constructions built from these block ciphers will be provably secure. Additionally, because Φ is a small subset of Φ⊕ k , the f 8 construction actually requires a much weaker assumption on the underlying block cipher than resistance to the full class of Φ⊕ k -restricted related-key attacks, meaning that it is more likely for the underlying block cipher to resist Φ-restricted related-key attacks than Φ⊕ k -restricted related-key attacks. Of course, our results also suggest that if a block cipher is known to be insecure under Φ-restricted related-key attacks, that block cipher should not be used in the f 8 construction. Since f 8 is a weakened version of the KASUMI-based f 8 encryption scheme, and since KASUMI is currently believed to resist Φ⊕ k -restricted related-key attacks, our result shows that f 8 as designed for use in the 3GPP protocols is secure. Our main theorem statement for f 8 is given below. Theorem 4.1 (Main Theorem for f 8 ). Let E : {0, 1}k × {0, 1}n → {0, 1}n be a block cipher and let ∆ be a non-zero k-bit constant. Let f 8 [E, ∆] be as described in Sec. 3.2. Let id be the identity function on {0, 1}k and let Φ = k {id, XOR∆ } ⊆ Φ⊕ k be a set of RKD functions over {0, 1} . If A is a noncerespecting privacy adversary which asks at most q queries totaling at most σ blocks, then we can construct a Φ-restricted PRP-RKA adversary B against E such that 3σ 2 + q 2 Advpriv (A) ≤ + Advprp-rka (B) . (2) Φ,E f 8 [E,∆] 2n+1 Furthermore, B makes at most σ + q oracle queries and uses the same time as A.
436
Tetsu Iwata and Tadayoshi Kohno
Proof. Let f8 -Encrypt denote the encryption algorithm for f 8 [E, ∆] and let p8 -Encrypt denote the encryption algorithm for p8 [n]. Expanding the definition of Advpriv f 8 [E,∆] (A), we get: R k f8 -EncryptK (·,·) $(·,·) Pr(K Advpriv (A) = ← {0, 1} : A = 1) − Pr(A = 1) f 8 [E,∆] R k f8 -EncryptK (·,·) ≤ Advpriv = 1) p8 [n] (A) + Pr(K ← {0, 1} : A R − Pr(R1 , R2 ← Rand(n, n) : Ap8 -EncryptR1 ,R2 (·,·) = 1) . Applying Lemma 4.1 we get σ 2 R Advpriv (A) ≤ + Pr(K ← {0, 1}k : Af8 -EncryptK (·,·) = 1) f 8 [E,∆] 2n
R − Pr(R1 , R2 ← Rand(n, n) : Ap8 -EncryptR1 ,R2 (·,·) = 1) .
Let B be a Φ-restricted related-key adversary against E that runs A and that returns the same bit that A returns. Let Frk(·,K) (·) denote B’s related-key oracle. When A makes an oracle query (N, M ) to its oracle, B essentially computes the f8 -Encrypt algorithm, except that it uses its related-key oracle in place of EK and EK⊕∆ . In pseudocode, Algorithm B Frk(·,K) (·) Run A, replying to A’s oracle queries (N, M ) as follows: m ← |M |/n Y [0] ← 0n A←N A ← Frk(XOR∆ ,K) (A) For i = 1 to m do: X[i] ← A ⊕ [i − 1]n ⊕ Y [i − 1] Y [i] ← Frk(id,K) (X[i]) C ← M ⊕ (the leftmost |M | bits of Y [1] · · · Y [m]) Return C to A When A outputs b: output b We now observe that
Pr(K ← {0, 1}k : Af8 -EncryptK (·,·) = 1) = Pr(K ← {0, 1}k : B Erk(·,K) (·) = 1) R
R
since B, when given related-key oracle access to E with a randomly selected key K, responds to A exactly as the f8 -EncryptK (·, ·) oracle would respond with a randomly selected key K. Let Rand(k, n, n) denote the set of all functions from {0, 1}k × {0, 1}n to {0, 1}n. Then the equation
Pr(R1 , R2 ← Rand(n, n) : Ap8 -EncryptR1 ,R2 (·,·) = 1) R
= Pr(K ← {0, 1}k ; G ← Rand(k, n, n) : B Grk(·,K) (·) = 1) R
R
New Security Proofs for the 3GPP Confidentiality
437
follows from the fact that when G is randomly selected from Rand(k, n, n), regardless of the key K and since we assume ∆ = 0k , GK and GK⊕∆ are both randomly selected functions from Rand(n, n). Combining the above equations, we have that Advpriv f 8 [E,∆] (A) ≤
σ 2 R + Pr(K ← {0, 1}k : B Erk(·,K) (·) = 1) 2n
R R − Pr(K ← {0, 1}k ; G ← Rand(k, n, n) : B Grk(·,K) (·) = 1) σ 2 R = n + Pr(K ← {0, 1}k : B Erk(·,K) (·) = 1) 2 R R − Pr(K ← {0, 1}k ; H ← Perm(k, n) : B Hrk(·,K) (·) = 1) + Pr(K ← {0, 1}k ; H ← Perm(k, n) : B Hrk(·,K) (·) = 1) R
R
R R − Pr(K ← {0, 1}k ; G ← Rand(k, n, n) : B Grk(·,K) (·) = 1) . Using the PRP-RKA definition and applying a variant of the PRF/PRP switching lemma from [5], we get prp-rka Advpriv (B) + f 8 [E,∆] (A) ≤ AdvΦ,E
σ(σ − 1) q(q − 1) σ 2 + + n . 2n+1 2n+1 2
For the application of the PRF/PRP switching lemma, we note that B queries its related-key oracle with the RKD function id at most σ times and the RKD function XOR∆ at most q times. Rearranging the above equation and simplifying gives (2), as desired.
5
Security of f 9
Definitions. Before proving the security of f 9 , we must first formally define what we mean by a MAC, and what it means for a MAC to be secure. Mathematically, a message authentication scheme or MAC MA = (K, T , V) consists of three algorithms and is defined for some tag length l. The randomized key generation algorithm K takes no input and returns a random key K. The stateless and deterministic tagging algorithm takes a key K and a message M ∈ {0, 1}∗ as input and returns a tag T ∈ {0, 1}l; we write T ← TK (M ). The stateless and deterministic verification algorithm takes a key K, a message M ∈ {0, 1}∗, and a candidate tag T ∈ {0, 1}l as input and returns a bit b; we write b ← VK (M, T ). For consistency, we require that for all keys K and messages M , VK (M, TK (M )) = 1. For security, we adopt a strong notion of security for MACs, namely pseudorandomness (PRF). In [4] it was proven that if a MAC is secure PRF, then it is also unforgeable. If A is an adversary with access to an oracle, then def R R TK (·) g(·) Advprf (A) = ← K : A = 1) − Pr(g ← Rand(∗, l) : A = 1) Pr(K MA
438
Tetsu Iwata and Tadayoshi Kohno
is defined as the PRF-advantage of A in distinguishing the outputs of the tagging algorithm with a randomly selected key from the outputs of a random function with the same domain and range. Intuitively, we say that a message authentication code is pseudorandom or secure if the PRF-advantage of all adversaries A using reasonable resources is small. Provable Security Results. Let p9 [n] be a variant of f 9 that always outputs a full n-bit tag and that uses random functions on n-bits instead of EK and EK⊕∆ . Specifically, the key generation algorithm for p9 [n] returns two randomly selected functions R1 , R2 from Rand(n, n). The tagging algorithm for p9 [n], p9 -Tag, takes R1 and R2 as “keys” and uses them instead of EK and EK⊕∆ . The verification algorithm is defined in the natural way. We first upper-bound the advantage of an adversary in attacking the pseudorandomness of p9 [n]. Let Mi denote an adversary’s i-th oracle query. If an adversary makes exactly q oracle queries,then we define the total number of q blocks for the adversary’s queries as σ = i=1 |Mi |/n. Lemma 5.1. Let p9 [n] be as described above and let A be an adversary which asks at most q queries totaling at most σ blocks. Then Advprf p9 [n] (A) ≤
σ2 + q2 . 2n+1
(3)
A proof sketch is given in Appendix B, and a proof is given in the full version of this paper [13]. We now present our main result for f 9 (Theorem 5.1), which we interpret as follows: our theorem shows that if a block cipher E is secure against Φ-restricted related-key attacks, where Φ is a small subset of Φ⊕ k , then the construction f 9 [E, ∆, l] based on E will be a provably secure message authentication code. In more detail, we show that given any adversary A attacking f 9 [E, ∆, l] and making at most q oracle queries totaling at most σ blocks, we can construct a Φrestricted PRP-RKA adversary B against E such that B uses similar resources 2 2 (B) ≥ Advprf as A and B has advantage Advprp-rka Φ,E f 9 [E,∆,l] (A) − (3q + 2σ + 2σq)/2n+1 . If we assume that E is secure against Φ-restricted related-key attacks (B) must and that A (and therefore B) uses reasonable resources, then Advprp-rka Φ,E prf be small by definition. Therefore Advf 9 [E,∆,l] (A) must be small as well, proving that under these assumptions on E, f 9 [E, ∆, l] is secure. Since many block ciphers, including AES and KASUMI, are believed to resist -restricted related-key attacks, and since Φ is a small subset of Φ⊕ Φ⊕ k k , this theorem means that f 9 constructions built from these block ciphers will be provably secure. Furthermore, because f 9 is a weakened version of the KASUMI-based f 9 message authentication code, our result shows that f 9 as designed for use in the 3GPP protocols is secure. The precise theorem statement is as follows: Theorem 5.1 (Main Theorem for f 9 ). Let E : {0, 1}k × {0, 1}n → {0, 1}n be a block cipher, let ∆ be a non-zero k-bit constant, and let l, 1 ≤ l ≤ n, be
New Security Proofs for the 3GPP Confidentiality
439
a constant. Let f 9 [E, ∆, l] be as described in Sec. 3.4. Let id be the identity function on {0, 1}k and let Φ = {id, XOR∆ } ⊆ Φ⊕ k be a set of RKD functions over {0, 1}k . If A is a PRF adversary which asks at most q queries totaling at most σ blocks, then we can construct a Φ-restricted PRP-RKA adversary B against E such that Advprf f 9 [E,∆,l] (A) ≤
3q 2 + 2σ 2 + 2σq + Advprp-rka (B) . Φ,E 2n+1
(4)
Furthermore, B makes at most σ + 2q oracle queries and uses the same time as A. Proof. We first note that given any PRF adversary A against f 9 [E, ∆, l], we can construct a PRF adversary C against f 9 [E, ∆, n] such that the following equation holds prf (5) Advprf f 9 [E,∆,l] (A) ≤ Advf 9 [E,∆,n] (C) . This standard result follows from the fact that the extra bits provided to the adversary can only improve its chance of success. Our approach to upper-bounding Advprf f 9 [E,∆,n] (C) is similar to the approach
we used to upper-bound Advpriv f 8 [E,∆] (A) in the proof of Theorem 5.1. Let f9 -Tag denote the tagging algorithm for f 9 [E, ∆, n] and let p9 -Tag denote the tagging algorithm for p9 [n]. Expanding the definition of Advprf f 9 [E,∆,n] (C) and applying Lemma 5.1, we get: R Advprf (C) = Pr(K ← {0, 1}k : C f9 -TagK (·) = 1) f 9 [E,∆,n] R − Pr(g ← Rand(∗, n) : C g(·) = 1) σ 2 + q 2 R ≤ n+1 + Pr(K ← {0, 1}k : C f9 -TagK (·) = 1) 2 R − Pr(R1 , R2 ← Rand(n, n) : C p9 -TagR1 ,R2 (·) = 1) .
As with the proof of Lemma 5.1, let B be a Φ-restricted related-key adversary against E that runs C and that returns the same bit that C returns. Let Frk(·,K) (·) denote B’s related-key oracle. This time, when C makes an oracle query (N, M ) to its oracle, B essentially computes the f9 -Tag algorithm, except that it uses its related-key oracle in place of EK and EK⊕∆ . In pseudocode,
440
Tetsu Iwata and Tadayoshi Kohno
Algorithm B Frk(·,K) (·) Run C, replying to C’s oracle queries M as follows: M ← pad n (M ) Break M into n-bit blocks M [1] · · · M [m] Y [0] ← 0n For i = 1 to m do: X[i] ← M [i] ⊕ Y [i − 1] Y [i] ← Frk(id,K) (X[i]) T ← Frk(XOR∆ ,K) (Y [1] ⊕ · · · ⊕ Y [m]) Return T to C When C outputs b: output b We first observe that when B is given related-key oracle access to E with key K, it replies to C’s oracle queries exactly as f9 -TagK (·) does. This means that the following equation holds:
Pr(K ← {0, 1}k : C f9 -TagK (·) = 1) = Pr(K ← {0, 1}k : B Erk(·,K) (·) = 1) . R
R
We also observe that when B is given related-key oracle access to G with key K, where G is a randomly selected function family from Rand(k, n, n), the functions GK (·) and GK⊕∆ (·) are both randomly selected from Rand(n, n). This means that B replies to C’s oracle queries exactly as p9 -TagR1 ,R2 (·) would with two randomly selected functions R1 , R2 from Rand(n, n). Consequently, the following equation holds:
Pr(R1 , R2 ← Rand(n, n) : C p9 -TagR1 ,R2 (·) = 1) R
= Pr(K ← {0, 1}k ; G ← Rand(k, n, n) : B Grk(·,K) (·) = 1) R
R
Combining these equations, we have that σ 2 + q 2 R k Erk(·,K) (·) Advprf = 1) f 9 [E,∆,n] (C) ≤ 2n+1 + Pr(K ← {0, 1} : B R R − Pr(K ← {0, 1}k ; H ← Perm(k, n) : B Hrk(·,K) (·) = 1) + Pr(K ← {0, 1}k ; H ← Perm(k, n) : B Hrk(·,K) (·) = 1) R
R
R R − Pr(K ← {0, 1}k ; G ← Rand(k, n, n) : B Grk(·,K) (·) = 1) . Applying the PRP-RKA definition and a variant of the PRF/PRP switching lemma from [5], we get prp-rka Advprf (B) f 9 [E,∆,n] (C) ≤ AdvΦ,E
+
(σ + q) · (σ + q − 1) 2n+1
q · (q − 1) σ 2 + q 2 + n+1 . 2n+1 2 For the application of the PRF/PRP switching lemma, we note that B queries its related-key oracle with the RKD function id at most σ + q times and the RKD function XOR∆ at most q times. Combining the above with equation (5) and simplifying gives the theorem statement. +
New Security Proofs for the 3GPP Confidentiality
441
Acknowledgements T. Kohno was supported by a National Defense Science and Engineering Fellowship.
References [1] 3GPP TS 35.201 v 3.1.1. Specification of the 3GPP confidentiality and integrity algorithms, Document 1: f 8 and f 9 specification. Available at http://www.3gpp.org/tb/other/algorithms.htm. 427, 430, 432 [2] 3GPP TS 35.202 v 3.1.1. Specification of the 3GPP confidentiality and integrity algorithms, Document 2: KASUMI specification. Available at http://www.3gpp.org/tb/other/algorithms.htm. 427 [3] M. Bellare, A. Desai, E. Jokipii, and P. Rogaway. A concrete security treatment of symmetric encryption. Proceedings of The 38th Annual Symposium on Foundations of Computer Science, FOCS ’97, pp. 394–405, IEEE, 1997. 428, 434 [4] M. Bellare, J. Kilian, and P. Rogaway. The security of the cipher block chaining message authentication code. JCSS, vol. 61, no. 3, pp. 362–399, 2000. Earlier version in Y. Desmedt, editor, Advances in Cryptology – CRYPTO ’94, volume 839 of Lecture Notes in Computer Science, pages 341–358. Springer-Verlag, Berlin Germany, 1994. 429, 437 [5] M. Bellare, and T. Kohno. A theoretical treatment of related-key attacks: RKAPRPs, RKA-PRFs, and applications. In E. Biham, editor, Advances in Cryptology – EUROCRYPT 2003, volume 2656 of Lecture Notes in Computer Science, pages 491–506. Springer-Verlag, Berlin Germany, 2003. 428, 429, 437, 440 [6] M. Bellare, P. Rogaway, and D. Wagner. The EAX mode of operation. In W. Meier and B. Roy, editors, Fast Software Encryption, FSE 2004, Springer-Verlag, 2004. 428 [7] E. Biham. New types of cryptanalytic attacks using related keys. In T. Helleseth, editor, Advances in Cryptology – EUROCRYPT ’93, volume 765 of Lecture Notes in Computer Science, pages 398–409. Springer-Verlag, Berlin Germany, 1993. 428, 429 [8] J. Black and P. Rogaway. A block-cipher mode of operation for parallelizable message authentication. In L. R. Knudsen, editor, Advances in Cryptology – EUROCRYPT 2002, volume 2332 of Lecture Notes in Computer Science, pages 384–397. Springer-Verlag, Berlin Germany, 2002. 428 [9] M. Blunden and A. Escott. Related key attacks on reduced round KASUMI. In M. Matsui, editor, Fast Software Encryption, FSE 2001, volume 2355 of Lecture Notes in Computer Science, pages 277–285. Springer-Verlag, Berlin Germany, 2002. 429, 430 [10] J. Daemen and V. Rijmen. The Design of Rijndael. Springer-Verlag, Berlin Germany, 2002. 430 [11] Evaluation report (version 2.0). Specification of the 3GPP confidentiality and integrity algorithms, Report on the evaluation of 3GPP confidentiality and integrity algorithms. Available at http://www.3gpp.org/tb/other/algorithms.htm. 429 [12] D. Hong, J-S. Kang, B. Preneel and H. Ryu. A concrete security analysis for 3GPP-MAC. In T. Johansson, editor, Fast Software Encryption, FSE 2003, volume 2887 of Lecture Notes in Computer Science, pages 154–169. Springer-Verlag, Berlin Germany, 2003. 428, 429, 433, 443, 444, 445
442
Tetsu Iwata and Tadayoshi Kohno
[13] T. Iwata and T. Kohno. New security proofs for the 3GPP confidentiality and integrity algorithms. Full version of this paper, available at http://eprint.iacr.org/, 2004. 435, 438, 442, 443, 445 [14] T. Iwata and K. Kurosawa. OMAC: One-Key CBC MAC. In T. Johansson, editor, Fast Software Encryption, FSE 2003, volume 2887 of Lecture Notes in Computer Science, pages 129–153. Springer-Verlag, Berlin Germany, 2003. 428 [15] T. Iwata and K. Kurosawa. On the correctness of security proofs for the 3GPP confidentiality and integrity algorithms. In K. G. Paterson, editor, Cryptography and Coding, Ninth IMA International Conference, volume 2898 of Lecture Notes in Computer Science, pages 306–318. Springer-Verlag, Berlin Germany, 2003. 428, 429, 433 [16] J. Jonsson. On the Security of CTR + CBC-MAC. In K. Nyberg and H. M. Heys, editors, Selected Areas in Cryptography, 9th Annual Workshop (SAC 2002), volume 2595 of Lecture Notes in Computer Science, pages 76–93. Springer-Verlag, Berlin Germany, 2002. 428 [17] C. S. Jutla. Encryption modes with almost free message integrity. In B. Pfitzmann, editor, Advances in Cryptology – EUROCRYPT 2001, volume 2045 of Lecture Notes in Computer Science, pages 529–544. Springer-Verlag, Berlin Germany, 2001. 428 [18] J-S. Kang, S-U. Shin, D. Hong and O. Yi. Provable security of KASUMI and 3GPP encryption mode f 8. In C. Boyd, editor, Advances in Cryptology – ASIACRYPT 2001, volume 2248 of Lecture Notes in Computer Science, pages 255–271. Springer-Verlag, Berlin Germany, 2001. 428, 429, 443 [19] L. R. Knudsen and C. J. Mitchell. Analysis of 3gpp-MAC and two-key 3gpp-MAC. Discrete Applied Mathematics, vol. 128, no. 1, pp. 181–191, 2003. 429, 433 [20] T. Kohno, J. Viega, and D. Whiting. CWC: A high-performance conventional authenticated encryption mode. In W. Meier and B. Roy, editors, Fast Software Encryption, FSE 2004, Springer-Verlag, 2004. 428 [21] M. Luby and C. Rackoff. How to construct pseudorandom permutations from pseudorandom functions. SIAM J. Comput., vol. 17, no. 2, pp. 373–386, April 1988. 428, 429 [22] P. Rogaway, M. Bellare, J. Black, and T. Krovetz. OCB: a block-cipher mode of operation for efficient authenticated encryption. Proceedings of ACM Conference on Computer and Communications Security, ACM CCS 2001, ACM, 2001. 428, 434, 443 [23] D. Whiting, R. Housley, and N. Ferguson. Counter with CBC-MAC (CCM). Submission to NIST. Available at http://csrc.nist.gov/CryptoToolkit/modes/. 428
A
Proof Sketch of Lemma 4.1
We sketch the proof of Lemma 4.1 here, leaving the details to [13]. The adversary has an oracle which is either p8 -EncryptR1 ,R2 (·, ·) or $(·, ·). Let (Ni , Mi ) denote the adversary’s i-th oracle query, and let Ci denote the answer from the oracle. Assume that the length of Mi (and Ci ) is mi blocks, where mi ≥ 1. We write Mi = Mi [1] · · · Mi [mi ] and Ci = Ci [1] · · · Ci [mi ]. We define a bad query-answer pair and a bad event.
New Security Proofs for the 3GPP Confidentiality
443
Bad Query-Answer Pair. We say that (Ni , Mi , Ci ) is a bad query-answer pair if some string is repeated in the multiset [0]n , Mi [1] ⊕ Ci [1] ⊕ [1]n , . . . , Mi [mi − 1] ⊕ Ci [mi − 1] ⊕ [mi − 1]n . For the i-th query-answer pair (Ni , Mi , Ci ), the input sequence of R2 is Ai ⊕ [0]n , Ai ⊕ Mi [1] ⊕ Ci [1] ⊕ [1]n , . . . , Ai ⊕ Mi [mi − 1] ⊕ Ci[mi − 1] ⊕ [mi − 1]n where Ai = R1 (Ni ). Thus, for a bad query-answer pair, there is a collision among the input sequence of R2 . Bad Event. Let i < j. For (Ni , Mi , Ci ) and (Nj , Mj , Cj ), we say that a bad event occurs if Ai ⊕ [0]n , Ai ⊕ Mi [1] ⊕ Ci [1] ⊕ [1]n , . . . , Ai ⊕ Mi [mi − 1] ⊕ Ci[mi − 1] ⊕ [mi − 1]n and Aj ⊕[0]n , Aj ⊕Mj [1]⊕Cj [1]⊕[1]n , . . . , Aj ⊕Mj [mj −1]⊕Ci [mj −1]⊕[mj −1]n have some common element, where Ai = R1 (Ni ) and Aj = R1 (Nj ). If a bad event occurs, there is a collision between some input to R2 at the i-th query and some input to R2 at the j-th query. This implies p8 -EncryptR1 ,R2 (·, ·) does not behave like $(·, ·). Intuitively, we show that if all the query-answer pairs are not bad and the bad event does not occur, then the adversary cannot distinguish between p8 -EncryptR1 ,R2 (·, ·) and $(·, ·). The proof is completed by upper bounding the probability of some bad query-answer pair occurs, or some bad event occurs. A.1
Discussion of the Previous Work [18]
[18, p. 269, Lemma 7] might be seen to correspond to Lemma 4.1. However, there is a problem with the definition of their encryption scheme. Their encryption scheme, which we call p8 [n], is described as follows: The key generation algorithm for p8 [n] returns a randomly selected permutation P1 from Perm(n). The encryption algorithm for p8 [n] takes P1 as a “key” and uses P1 and P2 instead of EK and EK⊕∆ , but it is not defined how P2 is derive from P1 . We note that [12, p. 166, Lemma 2] has a similar problem, which is described in Appendix B.1. We also adopt the strong notion of privacy, indistinguishability from random strings [22]. This security notion is strictly stronger than the left-or-right indistinguishability used in [18, p. 269, Lemma 7]. In [13], we present the full security proof for p8 [n] in order to achieve this strong security notion and to establish self contained security proof.
444
Tetsu Iwata and Tadayoshi Kohno
B
Proof Sketch of Lemma 5.1
To prove Lemma 5.1, we define p9 -E[n], a variant of p9 [n]. The tagging algorithm for p9 -E[n] takes only messages of length multiple of n, and it does not perform the final encryption. Specifically, the key generation algorithm for p9 -E[n] returns a randomly selected function R1 from Rand(n, n). The tagging algorithm for p9 -E[n], p9 -E-Tag, takes R1 as a “key” and a message M such that |M | = mn for some m ≥ 1. In pseudocode: Algorithm p9 -E-TagR1 (M ) Break M into n-bit blocks M [1] · · · M [m] Y [0] ← 0n For i = 1 to m do: X[i] ← M [i] ⊕ Y [i − 1] Y [i] ← R1 (X[i]) Return Y [1] ⊕ · · · ⊕ Y [m] The verification algorithm is defined in the natural way. Let M1 , . . . , Mq be any fixed and distinct bit strings such that |Mi | = mi n, where mi ≥ 1. Then Lemma 5.1 is proved by deriving the upper bound of the collision probability among the output of p9 -E-TagR1 (Mi ). We show the following lemma. Lemma B.1. Let m1 , . . . , mq , M1 , . . . , Mq be as described above. Then Pr(R1 ← Rand(n, n) : 1 ≤ ∃ i < ∃ j ≤ q, p9 -E-TagR1 (Mi ) = p9 -E-TagR1 (Mj )) R
is at most (σ 2 + q 2 )/2n+1 , where σ = m1 + · · · + mq . Given the above lemma, the proof of Lemma 5.1 is completed from the well known fact that applying a random function to the output of an almost-universal hash function family is a PRF. B.1
Discussion of the Previous Work [12]
[12, p. 162, Lemma 1] corresponds to our Lemma 5.1. Then one might wonder if the relevant portion can be re-used. However, in the proof of [12, p. 162, Lemma 1], there is a flaw in the analysis of Game 5. We use our notation. Let q = 2 in Lemma B.1. Then [12, p. 166] says Pr(R1 ← Rand(n, n) : p9 -E-TagR1 (M1 ) = p9 -E-TagR1 (M2 )) = R
1 , 2n
since Y1 [1] is a random string in {0, 1}n, where Y1 [1] = R1 (M1 [1]). However, if M1 [1] = M2 [1], then we have Y1 [1] = Y2 [1], where Y2 [1] = R1 (M2 [1]), and their randomness disappears. This part needs to be fixed, which is done in Lemma B.1.
New Security Proofs for the 3GPP Confidentiality
445
Also, [12, p. 166, Lemma 2] doesn’t hold. There is a problem with the definition of their MAC. Their MAC, which we call p9 [n], is described as follows: the key generation algorithm for p9 [n] returns a randomly selected permutation P1 from Perm(n). The tagging algorithm for p9 [n] takes P1 as a “key” and uses P1 and P2 instead of EK and EK⊕∆ , and outputs a full n-bit tag, where P2 ∈ Perm(n) \ {P1 } is determined from P1 by some means. The verification algorithm is defined in the natural way. Then [12, p. 159] says the security of p9 [n] does not depend on how P2 is derived from P1 , which is not correct. For example if P2 is chosen as P2 = P1−1 , then it is easy to make a forgery. In [13], we present the full security proof for p9 [n] in order to avoid presenting proof covered with patches, and to establish self contained security proof.
Cryptanalysis of a Message Authentication Code due to Cary and Venkatesan Simon R. Blackburn and Kenneth G. Paterson Department of Mathematics Royal Holloway, University of London Egham, Surrey, TW20 0EX, U.K. {simon.blackburn,kenny.paterson}@rhul.ac.uk
Abstract. A cryptanalysis is given of a MAC proposal presented at CRYPTO 2003 by Cary and Venkatesan. A nice feature of the CaryVenkatesan MAC is that a lower bound on its security can be proved when a certain block cipher is modelled as an ideal cipher. Our attacks find collisions for the MAC and yield MAC forgeries, both faster than a straightforward application of the birthday paradox would suggest. For the suggested parameter sizes (where the MAC is 128 bits long) we give a method to find collisions using about 248.5 MAC queries, and to forge MACs using about 255 MAC queries. We emphasise that our results do not contradict the lower bounds on security proved by Cary and Venkatesan. Rather, they establish an upper bound on the MAC’s security that is substantially lower than one would expect for a 128-bit MAC. Keywords: Message authentication, MAC, matrix groups, cryptanalysis, birthday paradox.
1
Introduction
This paper is concerned with a proposal for a Message Authentication Code (MAC) presented at CRYPTO 2003 by Cary and Venkatesan [1]. Their idea is to take a MAC construction of Jakubowski and Venkatesan [3] based on linear operations over a finite field, and alter it by replacing finite field operations by operations in the ring of integers modulo some power of 2 (as the latter operations are more efficient on the current generation of processors). Cary and Venkatesan [1] have proved a lower bound on the security of their MAC. This paper presents two attacks on the MAC, and so establishes a corresponding upper bound on the MAC’s security. The first attack shows that an adversary with access to a MAC oracle is able to find collisions of the MAC considerably faster than a straightforward application of the birthday paradox would suggest. For an introduction to MACs and birthday attacks on them, see [4]. The second attack does more: it derives most of the secret key material for the MAC (which enables MACs to be forged). This second attack works by exploiting certain
This author supported by the Nuffield Foundation NUF-NAL 02.
B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 446–453, 2004. c International Association for Cryptologic Research 2004
Cryptanalysis of a Message Authentication Code
447
collisions in the MAC; these collisions are found almost as efficiently as in the first attack. Thus the proposal of Cary and Venkatesan [1], while efficient and offering some interesting provable security properties, does not offer the level of security that a MAC of its output length should aspire to. The next section describes the Cary–Venkatesan MAC. Sections 3 and 4 describe our two attacks on this MAC. The concluding section explains how our attacks impact on the practical level of security offered by the MAC when the suggested parameter sizes are used.
2
The Cary–Venkatesan MAC
Let , k and t be integers. The Cary–Venkatesan MAC operates on blocks consisting of t words x1 , x2 , . . . , xt each word being of length bits. We regard the words xi as -bit integers. The MAC has a t( − 1) + k-bit secret key. This key is made up of odd -bit integers a1 , a2 , . . . , at together with a k-bit string K. The MAC consists of two parts, a compression function H and a block cipher E. The compression function H takes as input the vector a = (a1 , a2 , . . . , at ) and a block x = x1 x2 . . . xt where xi ∈ {0, 1, . . . , 2 − 1}; it returns a 4-bit string h = Ha (x). The block cipher operates on 4-bit blocks. It takes as input the key K and the output h of the compression function; the cipher returns the 4-bit value EK (h) and this is the output of the MAC. Cary and Venkatesan allow the block cipher E to be any secure block cipher acting on 4-bit blocks with a k-bit key. They model E as an ideal cipher and concentrate their efforts on designing an efficient compression function H of the following form. Let A1 , A2 , . . . , At−1 be fixed 2 × 2 matrices, and let z0 and σ0 be fixed column vectors of length 2; suppose all the entries of these matrices and vectors lie in the ring Z2 of integers modulo 2 . (These matrices and vectors are public, and some suggested examples are given in [1].) Vectors v1 , v2 , . . . , vt ∈ (Z2 )2 are calculated as follows. Let i ∈ {1, 2, . . . , t} be fixed. Multiply the -bit integers ai and xi , to produce a 2-bit integer; this product is then broken into two bit integers, and the result vi is regarded as an element of (Z2 )2 . The way in which the product ai xi is split to form vi is not specified in [1]; we assume that T a natural choice of vi = ai xi mod 2 , ai xi div 2 is used. (Another natural T choice would be vi = ai xi div 2 , ai xi mod 2 . Our results are unaffected if this choice is used instead.) The output h of H is defined to be the pair (z, σ), where z = z0 + v1 + A1 v2 + A1 A2 v3 + · · · + A1 A2 · · · At−1 vt and where σ = σ0 + v1 + v2 + · · · + vt . Here all operations are over Z2 .
448
Simon R. Blackburn and Kenneth G. Paterson
Cary and Venkatesan propose two variants of their MAC: a way of chaining the compression function so that it can compress more than one block into 4bits (by making the ‘initial values’ z0 and σ0 used in the next block depend on z and σ above), and a method for doubling the length of output of the compression function (by computing the compression function above twice on the same block, using different keys, and then concatenating their outputs). Our attacks below can be adapted to apply to these variants as well, although we will not discuss the straightforward modifications that are needed.
3
The First Attack
For MACs such as the one considered here, which consist of a relatively weak keyed compression function followed by a block cipher encryption, it is generally assumed that it is computationally infeasible to invert the block cipher E without knowledge of the secret key K. (If the block cipher can be inverted efficiently, the output of the compression function H is available to the cryptanalyst. The keys used in the compression function can then usually be derived from MACs of a few chosen messages. Once these keys are known, MACs of a wide variety of messages may be forged. This shows that, in practice, the security level offered by a MAC of this type cannot be greater than the length of the block cipher key. This is certainly the case with the proposal of [1].) The final cipher E is often modelled as an ideal cipher, namely a set of random permutations indexed by the key K. An adversary has access to an oracle that adds MACs to messages; the adversary aims to generate a valid MAC for any message that has not been sent as a query to the oracle. In this ideal cipher model it is intuitively clear (and indeed formally provable) that finding two messages that are compressed by H to the same value h (finding a collision) is a prerequisite to breaking the MAC. Since the block length of the cipher E in the proposal of [1] is 4 bits, the birthday paradox implies that a collision will be found with reasonable probability after approximately 22 oracle queries on random messages. A good scheme in this model should therefore have the property that it is impossible for an adversary to produce collisions with reasonable probability unless the number of oracle queries is close to this upper bound. However, we show that when the MAC proposed in [1] is used, an adversary can produce collisions using significantly fewer than 22 messages. The basic idea of our collision finding attack is to construct a large set of messages with the property that the compression function H maps each message into a small subset of inputs to the block cipher (irrespective of the key a1 , a2 , . . . , at used). Because the block cipher is a permutation, there is a collision in the MAC output if and only if there is a collision at the input to the block cipher, i.e. at the output of the compression function H. Let r be an integer that is as large as possible √ subject to the conditions that 0 ≤ r < and that 2t(−r) is at least n = 24−r . (For most choices of parameters, r = − 1 or r = − 2 will suffice.)
Cryptanalysis of a Message Authentication Code
449
There are 2t(−r) messages x = x1 x2 · · · xt with the property that 2r divides xi for all i ∈ {1, 2, . . . , t}. We denote this set of messages by X and let Y be a subset of X consisting of n such messages: a subset of this size exists, by our choice of r. We obtain the MACs of all the messages in Y from the MAC oracle. Define B to be the set of all pairs (z, σ) ∈ ((Z2 )2 )2 such that the first component of σ − σ0 is divisible by 2r . Our condition on the elements xi for messages in X implies that the first component of each vector vi is divisible by 2r , and so the same is true for the vector σ − σ0 . Hence the image of X under H lies in B, whatever the value of the secret key a1 , a2 , . . . , at . Now, |B| = 24−r < n2 . Since we have requested the MACs of the n messages in the set Y ⊆ X, the birthday paradox implies that we will find a collision with reasonable probability (in fact, with probability about 0.63). Note that n is considerably less than 22 . Indeed, when r = − 1 or r = − 2, we have that n is approximately 23/2 . So we have found collisions for the function H, and hence for the MAC, considerably faster than a straightforward use of the birthday paradox would imply. We have assumed in this analysis that the image of X under H is uniformly distributed in B. A non-uniform distribution only enhances the probability of collisions.
4
The Second Attack
Our first attack found collisions efficiently. However, it is not clear how knowledge of these collisions could be used to forge MACs. We now present a second method for finding collisions, almost as efficient as the method above, with the extra feature that collisions may be used to find the key words ai , and hence to forge MACs. We begin by choosing two integer parameters d and s, which affect the efficiency and probability of success of the attack. The value of d is the number of collisions we need to find in the compression function, and is chosen as follows. Let V be a Z2 -vector space of dimension t − 1. Then d is chosen so that the probability of a set of d randomly chosen vectors from V forming a spanning set is large (say at least 0.5). A choice of d = t or d = t + 1 will suffice in most situations. The parameter s relates to the size of a set Y which is analogous to ts the set Y in our first attack. We choose s to be a positive integer such that 2 3+s is just greater than w = 2dt(2 ). For most parameter sets, this means that s is small (s = 2 or s = 3, say). Our choice of the values of d and s will become clear in the description of the attack below. The attack proceeds in 3 stages. In stage 1, we find many collisions in the compression function H by requesting MACs of messages of a special form. In stage 2, we use data gathered from these collisions to form a system of linear equations in the unknown key words ai . Solving these equations and performing a moderate amount of additional computation allows us to find the key a for the compression function H. In stage 3, we use knowledge of a to quickly find
450
Simon R. Blackburn and Kenneth G. Paterson
a collision in H without querying the MAC. This immediately leads to a MAC forgery. Stage 1: We ask for the MACs of a subset Y taken from the set of messages X = {x1 x2 · · · xt | 0 ≤ xi < 2s , ∀i ∈ {1, 2, . . . , t}}. There are 2ts ≥ w messages in X and we take |Y | = w. Define B to be the set of all pairs (z, σ) ∈ ((Z2 )2 )2 such that the second component of σ − σ0 lies between 0 and t2s . Our condition on the elements xi for messages in X implies that xi ai < 2s+ for all i. So the second component of vi is less than 2s and hence the second component of σ − σ0 is less than t2s . Thus the image of X under H is contained in B , whatever the value of the secret key a1 , a2 , . . . , at . Notice that B is of size u = t23+s and Y ⊂ X is of size w = 2dt(23+s ). Our choice of parameters implies that, by the birthday paradox, we will find collisions in the MAC function, and hence in the compression function, for the set Y . In fact, we have chosen d, s and w so that |Y | is a little larger than is needed for the birthday paradox to apply. This is so that we are likely to find many collisions. Indeed, applying results of [2], we have that for large w and u, the frequency distribution Q(x) of the number of collisions x can be approximated by a Poisson distribution Pλ (x) with parameter λ = w2 /2u. The error in approximating Q(x) by Pλ (x) can be bounded using results of [2]. We have: 5 3w w2 (1) x + 2. |Q(x) − Pλ (x)| ≤ x2 + w u 3u For our choice of parameters and for x ≤ d, the error term in eqn. (1) can be bounded by 8d3/2 /u1/2 . In our situation, d will be much smaller than u. Thus the approximation by a Poisson distribution will be excellent when x ≤ d. For our choice of parameters, we have λ = d. So in the case of interest to us (where d is approximately equal to t and so is reasonably large), the Poisson distribution has mean d, and median approximately d. Putting all of this together, we see that the probability that we find d or more collisions in Y is approximately equal to 0.5. The above analysis can of course be refined, but suffices for our purposes. Stage 2: We now assume that at least d collisions of the above type have been found. We proceed by examining the first component of σ for each of these collisions. Each gives an equation of the form (j)
(j)
(j)
(j)
x1 a1 + x2 a2 + · · · + xt at = x1 (j)
(j)
a1 + x2
(j)
(j)
for j ∈ {1, 2, . . . , d}. Writing yi = xi −xi in t unknowns a1 , . . . , at ∈ Z2 : (j)
(j)
at
mod 2 ,
, we obtain a system of d equations
(j)
y1 a1 + y2 a2 + · · · + yt at = 0 mod 2 , (j)
(j)
a2 + · · · + xt
(j)
1 ≤ j ≤ d. (j)
(2)
Define vectors y (j) ∈ (Z2 )t by y (j) = (y1 , y2 , . . . , yt ). The number of solutions to the system (2) depends on the linear independence properties of the vectors y (j) considered modulo 2. Let z (j) denote y (j) mod 2 and let V denote
Cryptanalysis of a Message Authentication Code
451
the dimension t − 1 subspace of Zt2 that consists of all vectors of even parity. Because the ai are odd and the equations (2) hold, we have that z (j) ∈ V for 1 ≤ j ≤ d. It is then elementary to show that the system (2) has a unique solution up to a Z2 scalar multiple if and only if the d vectors z (j) span the space V . The probability that the d vectors z (j) span V is at least 1 − 1/2d−t+1, assuming the vectors to be random. (To see this, notice that the vectors fail to span V if and only if they all lie in some subspace U of co-dimension 1 in V . There are exactly 2t−1 − 1 such subspaces U . Assuming the vectors z (j) to be randomly distributed in V , the probability that they all lie in any given U is equal to 2−d . Hence the probability that the vectors do not span V is at most (2t−1 − 1) · 2−d ≤ 1/2d−t+1 .) This probability is close to 1 as soon as d is slightly greater than t − 1. Given that the vectors z (j) do span V , a standard Gaussian elimination procedure over Z2 can be used to produce b1 , b2 , . . . , bt ∈ Z2 such that there exists an odd constant c with the property that ai = cbi mod 2 for all i ∈ {1, 2, . . . , t}. Stage 1 of our attack has given us d pairs of messages that collide under the compression function. To find c, we simply try each of the 2−1 possibilities in turn and check whether the compression function with key ai = cbi produces collisions for these pairs of messages. It is highly likely that a single value of c will produce all the correct collisions; this value will be the correct value of c. Thus we have recovered the value of the key words ai . Stage 3: Finally, we produce a MAC forgery as follows. We search for collisions in the compression function as in Sect. 3; however, since we now know the key to the compression function, we do not need to query the MAC oracle to obtain these collisions. After about 23/2 trials, we find a collision in the compression function: H(x) = H(x ) for distinct messages x and x . We then query the MAC oracle on the message x. The resulting MAC will be valid for the message x , and so we have forged a MAC as required. To summarise, we have forged a MAC after making w + 1 = 2dt(23+s ) + 1 oracle queries, and a comparatively small amount of additional effort (which mainly consists of storing the oracle outputs, together with computing the compression function about 23/2 times). The probability that our attack works is approximately (1 − 1/2d−t+1)/2. The probability of success can be made arbitrarily close to 1, firstly by increasing the value of d and secondly by taking a larger number of MAC queries to increase the probability that d pairs of collisions will result. We omit the routine details of these enhancements.
452
5
Simon R. Blackburn and Kenneth G. Paterson
Consequences for Suggested Parameter Sizes
In [1, Sect. 5], Cary and Venkatesan give details of an implementation of their MAC for the parameters = 32 and t = 50. They are able to prove that the resulting MAC offers 54 bits of security, which can be interpreted as meaning that collisions for the MAC should not be found until after at least 227 MAC queries have been made. Our attack in Sect. 3 shows that, for these parameters, MAC collisions can be found using about 248.5 MAC queries. (The attack will set r = 31; then the space B will be of size 297 and 248.5 MAC queries will be needed to obtain a collision with probability 0.63.). Taking d = t = 50, our attack in Sect. 4 uses s = 2 and finds MAC forgeries with probability about 0.25 using approximately 255 MAC queries. We have conducted experiments on cut down versions of the MAC. Using t = 50 and 8 ≤ ≤ 14, collisions in the compression function occurred as frequently as our analysis predicts. While we have not implemented our attacks for = 32, we see no reason why our attack should not scale as predicted. For both of our attacks, the complexity is significantly less than the 264 queries implied by a standard application of the birthday paradox, though a good deal greater than the level of security that has been established for the MAC. It seems fair to say that the MAC proposal of [1] does not offer the security levels that one would expect of a strong MAC algorithm with a 128 bit output. We expect that more sophisticated attacks than the ones presented here may reduce the complexity of finding and exploiting collisions further. In response to the attacks presented in this paper, the authors of [1] have suggested a new scheme, namely that the output of their compression function should first be passed through SHA-1 and then 54 bits of the result be taken as output. The intention of this construction is to match the proved security level of the original compression function with the length of the MAC. (The security level of this construction can be no greater than the 54 bits proved for the original MAC.) Note that the attacks presented in this paper do not carry over to this new scheme, since the overwhelming majority of collisions in the MAC will be caused by collisions in the final stage rather than by collisions in the compression function. Of course, a proof of security for this modified MAC can no longer rely on modelling a block cipher as a random permutation. Instead, this assumption would need to be replaced by an assumption concerning the collision properties of SHA-1 (or perhaps by modelling SHA-1 as a random function).
Acknowledgements The authors would like to thank Sean Murphy for his help with background to the birthday paradox arguments in Sect. 4, Bart Preneel for bringing reference [2] to our attention, and Chris Mitchell for his comments on an earlier draft of this paper. The authors would also like to thank the referees of the paper, for their constructive comments and suggestions.
Cryptanalysis of a Message Authentication Code
453
References [1] M. Cary and R. Venkatesan, “A message authentication code based on unimodular matrix groups”, in D. Boneh, editor, Advances in Cryptology – Proc. CRYPTO 2003, Lecture Notes in Computer Science Volume 2729, (Springer, Berlin, 2003), 500-512. 446, 447, 448, 452 [2] M. Girault, R. Cohen and M. Campana, “A generalized birthday attack”, in C. G. G¨ unther, editor, Advances in Cryptology – Proc. EUROCRYPT’88, Lecture Notes in Computer Science Volume 330, (Springer, Berlin, 1988), 129-156. 450, 452 [3] M. H. Jakubowski and R. Venkatesan, “The chain and sum primitive and its applications to MACs and stream ciphers”, in K. Nyberg, editor, Advances in Cryptology – Proc. EUROCRYPT ’98, Lecture Notes in Computer Science Volume 1403, (Springer, Berlin, 1998), 281-293. 446 [4] A. Menezes, P. C. van Oorschot and S. Vanstone, Handbook of Applied Cryptography, CRC Press, Boca Raton, 1997. 446
Fast Software-Based Attacks on SecurID Scott Contini1 and Yiqun Lisa Yin2 1
Macquarie University Computing Department NSW 2109 Australia [email protected], 2 Princeton University EE Department Princeton, NJ 08540, USA [email protected]
Abstract. SecurID is a widely used hardware token for strengthening authentication in a corporate environment. Recently, Biryukov, Lano, and Preneel presented an attack on the alleged SecurID hash function [1]. They showed that vanishing differentials – collisions of the hash function – occur quite frequently, and that such differentials allow an attacker to recover the secret key in the token much faster than exhaustive search. Based on simulation results, they estimated that the running time of their attack would be about 248 full hash operations when using only a single 2-bit vanishing differential. In this paper, we present techniques to improve the [1] attack. Our theoretical analysis and implementation experiments show that the running time of our improved attack is about 245 hash operations. We then investigate into the use of extra information that an attacker would typically have: multiple vanishing differentials or knowledge that other vanishing differentials do not occur in a nearby time period. When using the extra information, we believe that key recovery can always be accomplished within about 240 hash operations.
1
Introduction
The SecurID, developed by RSA Security, is a hardware token used for strengthening authentication when logging in to remote systems, since passwords by themselves tend to be easily guessable and subject to dictionary attacks. The SecurID adds an “extra factor” of authentication: one must not only prove themselves by getting their password correct, but also by demonstrating that they have the SecurID token assigned to them. The latter is done by entering the 6or 8-digit code that is being displayed on the token at the time of login. Each token has within it a 64-bit secret key and an internal clock. Every minute, or every half-minute in some tokens, the secret key and the current time are sent through a cryptographic hash function. The output of the hash function determines the next two authenticator codes, which are displayed on the LCD
B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 454–471, 2004. c International Association for Cryptologic Research 2004
Fast Software-Based Attacks on SecurID
455
screen. The secret key is also known to the “ACE/server”, so that the same authenticator can independently be computed and verified at the remote end. If ever a user loses their token, they must report it so that the current token can be deactivated and replaced with a new one. Thus, the user bears some responsibility in maintaining the security of the system. On the other hand, if the user were to temporarily leave his token in a place where it could be observed by others and then later recover it, then it should not be the case that the security of the device could be entirely breached, assuming the device is well-designed. The scenario just described was considered in a recent publication by Biryukov, Lano, and Preneel [1], where they showed that the hash function that is alleged to be used by SecurID [4] (ASHF) has weak properties that could allow one to find the key much faster than exhaustive search. The attack they describe requires recording all outputs of the SecurID using a PC camera with OCR software, and then later searching the outputs for indication of a vanishing differential – two closely related input times that result in the same output hash. If one is discovered, the attacker then has a good chance of finding the internal secret key using a search algorithm that they estimated to be equivalent to 248 hash function operations. On a 2.4 GHz PC, 248 hash operations take about 111 years. 1 It would require over 1300 of these PC’s to find the key in a month. In this paper, we present three techniques to significantly speed up the filtering, which is the bottleneck of their attack. Our theoretical analysis and implementation experiments show that the time complexity can be reduced to about 245 hash operations when using only a single vanishing differential. We then investigate into the use of extra information that an attacker would ordinarily have, in order to speed up the attack further. This information consists of either multiple vanishing differentials, or knowledge that no other vanishing differentials occur in a nearby time period of the observed one. In either case, the running time can be reduced significantly. Our preliminary analysis suggests that after a vanishing differential is observed, the attacker would nearly always be able to perform the key search algorithm in 240 hash operations or less. On a typical PC, this can be done in about 5 months, making the computing power requirements for the search attainable by almost any individual. The success probability of all attacks (including [1]) depend upon how long the attacker must wait for a vanishing differential to occur. Simulations have shown that in any one-week period, 1% of the SecurID cards will have a vanishing differential; in any one-year period, 35% of the tokens will have a vanishing differential. According to these statistics, we mention two realistic scenarios in which the token could be compromised. In the first scenario, a user may be on vacation for one week and left his token behind in a place where others could observe it, in which case there is a small but definitely non-negligible chance that a collision would happen. In the second scenario, the success is much more likely. Since the cost of SecurID tokens is very expensive, tokens are often reassigned to new users when a previous owner leaves a company [5]. This is a bad idea, 1
Requires some optimisations to Wiener’s code, such as re-ordering bytes to eliminate bswaps.
456
Scott Contini and Yiqun Lisa Yin
since the original user would have a high chance of being able to find the internal key, assuming he recorded many of the outputs while it was in his possession. In light of our new results, token reassignment becomes a very serious risk.
2
The SecurID Hash Function
We provide a high level description of the alleged SecurID hash function, following the same notation as in [1] wherever possible. More detailed descriptions can be found in [1, 4]. The function can be modeled as a keyed hash function y = H(k, t), where k is a 64-bit secret key stored on the SecurID token, t is a 24-bit time obtained from the clock every 30 or 60 seconds, and y is two 6- or 8-digit codes. The function consists of the following steps: – – – – – –
an expansion function that expands t into a 64-bit “plaintext”, an initial key-dependent permutation, four key-dependent rounds, each of which has 64 subrounds, an exclusive-or of the output of each round onto the key, a final key-dependent permutation (same algorithm as the initial one), and a key-dependent conversion from hexadecimal to decimal.
Throughout the paper, we use the following notation to represent bits, nibbles, and bytes in a word: a 64-bit word b, consisting of bytes B0 , ..., B7 , nibbles B0 , ..., B15 , and bits b0 b1 ...b63 . The nibble B0 corresponds to the most significant nibble of byte 0 and the bit b0 corresponds to the most significant bit. The other values are as one would expect. For our analysis, only the time expansion, key-dependent permutation, and the key-dependent rounds are of interest. In the next three sections, we will describe them in more detail. 2.1
Time Expansion
The time t is a 24-bit number representing twice the number of minutes since January 1, 1986 GMT. So the least significant bit is always 0, and if the token outputs codes every minute, then the expansion function will clear the 2nd least significant bit as well. Let the result be represented by the bytes T0 T1 T2 where T0 is the most significant. The expansion is of the form T0 T1 T2 T2 T0 T1 T2 T2 . Note that the least significant byte is replicated 4 times, and the other two bytes are replicated 2 times each. 2.2
Key-Dependent Permutation
We give a more insightful description of how the ASHF key-dependent permutation really works. The original code, obtained by Wiener [4] (apparently by reverse engineering the ACE/server code), is quite cryptic. Our description is different, but produces an equivalent output to his code.
Fast Software-Based Attacks on SecurID
457
The key-dependent permutation uses the key nibbles K0 . . . K15 in order to select bits of the data for output into a permuted data array. The data bits will be taken 4 at a time, copied to the permuted data array from right to left (i.e. higher indexes are filled in first), and then removed from the original data array. Every time 4 bits are removed from the original data array, the size shrinks by 4. Indexes within that array are always modulo the number of bits remaining. A pointer m is first initialised to the index K0 . The first 4 bits that are taken are those right before the index of m. For example, if K0 is 0x2, then bits 62, 63, 0, and 1 are taken. As these bits are removed from the array, the index m is adjusted accordingly so that it continues to point at the same bit it pointed to before the 4 bits were removed. The pointer m is then increased by a value of K1 , and the 4 bits prior to this are taken, as before. The process is repeated until all bits have been taken. Note that once the algorithm gets down to the final 3 or less key and data nibbles, the number of data bits remaining is at most 12 yet the number of choices for each key nibble is 16. Hence, multiple keys will result in the same permutation, which we call “redundancy of the key with respect to the permutation.” This was used in the attack [2], and to a lesser extent in [1]. 2.3
Key-Dependent Rounds
Each of the four key-dependent rounds takes as inputs a 64-bit key k and a 64-bit value b0 , and outputs a 64-bit value b64 . The key k is then exclusive-ored with the output b64 to produce the new key to be used in the next round. One round consists of 64 subrounds. For i = 1, ..., 64, subround i transforms bi−1 into bi using a single key bit ki−1 . Depending on whether the key ki−1 i−1 is equal to bi−1 is transformed according to two different func0 , the value b tions, denoted by R and S. The details of R and S are not so important for our research, with the exception of two properties: 1. Both the R and the S functions are byte-oriented, that is, they update each of the eight bytes in bi separately. After the update, only bytes B0 and B4 are modified, and the other six bytes remain the same. 2. The way R and S are used causes the hash function to have easy-to-find collisions after a small number of subrounds within the first round. At the end of each subround, all the bits are rotated one bit position to the left. So, up to subround N ≤ 25 of the first round, only 2N + 14 data bits have been involved in the computation. This property is used in the Biryukov, Lano, and Preneel attack.
3
The Attack of Biryukov, Lano, and Preneel
The attack of Biryukov, Lano, and Preneel [1] can determine the full 64-bit secret key when given a single collision of the hash function. Suppose that two input times t and t get expanded and permuted to become 64-bit words b and b , and
458
Scott Contini and Yiqun Lisa Yin
the two words collide in subround N of the first round. The collision from the pair (t, t ) is called a vanishing differential. In their key recovery attack, the attacker first guesses the subround N , and then uses a filtering algorithm for each N to search the set of candidate keys that make such a vanishing differential possible. According to their simulations, one only needs to do up to N = 12 to have a 50% chance of finding the key. 2 A summary of their description for N = 1 is given below. For simplicity, assume that a 2-bit vanishing differential is used, though this need not be the case. A one-time cost precomputation table is needed before the filtering starts. The table contains entries the form (k0 , B0 , B4 , B0 , B4 ). where k0 represents a key bit, (B0 , B4 ) represent data bytes of b after the initial keyed permutation, and (B0 , B4 ) represent data bytes of b after the permutation. The exact entries in the table are those where (B0 , B4 ) differs from (B0 , B4 ) in exactly 2-bits known as the “difference bits,” and for which a vanishing differential occurs during the first subround. Since none of the other key bits or data bytes are involved in the first subround, whether a vanishing differential can happen or not for N = 1 is completely characterised by this table. For each entry in the table, the filtering proceeds in two phases, each of which contains two steps. – First Phase. (process the first half of the key bits) • Step One. Guess key bits k1 , ..., k27 . Together with k0 , 28 key bits are set, which determines 28 bits of b and b after the initial key-dependent permutation. Since these bits overlap with the entries in the table in nibbles B9 and B9 , a key value that does not produce the correct nibbles for both b and b is filtered out. • Second Step. Continue to guess key bits k28 , ..., k31 . Filtering is done using overlaps in nibbles B8 and B8 . – Second Phase. (process the second half of the key bits) • First Step. Continue to guess key bits k32 , ..., k59 . Filtering is done using overlaps in nibbles B1 and B1 . • Second Step. Continue to guess key bits k60 , ..., k63 . Filtering is done using overlaps in nibbles B0 and B0 . Finally, each candidate key that passes the filtering is tested by performing a full hash function to see if it is the correct key. For general N , the two phases of 7+N filtering each involve 7+N 4 data nibbles, so the phases each have 4 steps. 2
Our own simulations suggest that one needs to search up to N = 16. The discrepancy is due to differences in the way the attack is viewed, which we elaborate on in Section 7.1. For larger values of N , the cost of the precomputation stage becomes prohibitive.
Fast Software-Based Attacks on SecurID
4
459
Analysis of the Biryukov, Lano, and Preneel Attack
Biryukov, Lano, and Preneel estimated the time complexity of their attack through simulation. They provided results for N = 1: step 1 of phase 1 reduced the number of possibilities to 227 , step 2 of phase 1 further reduced the count to to 225 , step 1 of phase 2 increased the count to 245 , and step 2 of phase 2 resulted in 241 true candidates. For larger values of N , they expect that the complexity of the attack would be lower due to stronger filtering. Here we analyse their algorithm, giving some mathematical justification for the simulation results they observed and also showing that their conjecture of the filtering improving for larger N appears to be correct. In our analysis, we sometimes treat probabilities as if they are independent, which is not always true, but it is assumed that it provides a reasonable approximation. Some properties of the precomputed tables are used in the analysis. For a given value of N , the table entries are of the following form: – legal values for the key bits in indices 0, . . . , N − 1, – legal values for the plaintext pairs after the initial permutation in bit indices 32, 33, . . . , 38+N which we label as (W4 , W4 ) (we use the subscript 4 because the words begins at byte B4 ), and – legal values for the plaintext pairs after the initial permutation in bit indices 0, 1, . . . , 6 + N which we label as (W0 , W0 ) (the word begins at byte B0 ). The words W0 , W0 , W4 , W4 each consist of 7 + N bits and the number of key bits is N . By “legal values” we mean that the combination of plaintext bits after the initial permutation and key bits will cause the difference to vanish in subround N . We also have one other requirement, which was previously overlooked (including in an earlier version of this research): the values of the two bits in b (or b ) where the differences are located must be the same, due to the way the time expansion works. This reduces the number of table entries and results in a speedup to the filtering. Although this is one of our three main filtering speedups, we apply it to the analysis of the original [1] algorithm in order to keep things as clean as possible. Analysis of Final Number of Candidates: Analysing the final step is equivalent to determining the true number of candidates that need to be tested with the full SecurID hash function. The expected number of true candidates can easily be determined since anything that matches an entry in the precomputed table will result in a vanishing differential. In other words, the entries in the table are not only a necessary set of cases for a vanishing differential to occur, but also sufficient. For each entry in the precomputed table, we have: 64 keys will permute the 2 difference – Only a portion of about 1/ 64 2 of the 2 bits into the locations corresponding to what is in that table entry.
460
Scott Contini and Yiqun Lisa Yin
– With probability 12 , the value of the two difference bits will match those in the table (recall, the 2 bits in b must be the same, and the corresponding bits in b are the complement). – With probability 22N1+12 , the remaining permuted data bits will match the table entry. – With probability 21N the guessed key bits will match the entry of the table. Hence, the expected number of final candidates is: 1 1 1 1 table size × 264 × 64 × × 2N +12 × N . 2 2 2 2
(1)
Run Time Analysis of Phase 2, Step 1: Phase 2, step 1 of the Biryukov, Lano, and Preneel attack is typically the dominant cost. To analyse it, we must first determine the number of candidates passing phase 1. Define C0 to be the number of unique table entries of the form (k0 , . . . , kN −1 , W4 , W4 ) where W4 = W4 , C1 similarly except W4 ⊕ W4 having hamming weight 1, and C2 similarly except W4 ⊕ W4 having hamming 642. weight / 2 Among the of 232 key bits considered in phase 1, a fraction of 57−N 2 C0 will put no difference in the tuple (W4 , W4 ). Of those, only a fraction of 27+N will match one of the C0 unique entries in the table for W4 (which is the same as W4 ). With probability 21N , the guessed key bits will match those in the table as well. Thus, the expected number of 32-bit keys resulting in no difference in (W4 , W4 ) that pass phase 1 is: 57−N 1 3192 − 113N + N 2 C0 32 2 × 7+N × N = 219−2N × × C0 . 2 × 64 2 2 63 2 For 1-bit differences, the equation is 57−N C1 1 57 − N 1 32 1 × × 6+N × N = 220−2N × 2 × 64 × C1 . 2 2 2 63 2 For 2-bit differences, the equation is C2 1 1 C2 1 . 232 × 64 × × 5+N × N = 221−2N × 2 2 2 63 2 The 12 in this last equation accounts for whether the two difference bits in the first plaintext match the table entry (the bits must be the same). Thus, the expected number of candidates to pass the phase 1 is T =
219−2N × (3192 − 113N + N 2 )C0 + (114 − 2N )C1 + 4C2 . 63
(2)
The first step in phase 2 involves guessing enough key bits so that the resulting permuted data array just begins to overlap with W0 and W0 . The exact
Fast Software-Based Attacks on SecurID
461
Table 1. Computing the running time estimates of algorithm [1] for N = 1..6 N 1 2 3 4 5 6
Table C0 C1 C2 T Time for Time for testing Total size phase 2, step 1 final candidates time 12 5 2 0 225.0 247.0 240.6 247.0 24.3 43.2 41.3 152 11 64 44 2 2 2 243.5 24.8 43.6 41.2 1130 64 362 128 2 2 2 243.9 7292 453 1750 712 225.5 244.3 240.9 244.4 26.0 44.9 40.6 48212 2775 10614 3864 2 2 2 244.9 26.4 41.4 40.1 276788 15076 52716 19520 2 2 2 241.9
number of key bits guessed in this step is 4 × 29−N 4 . Under the assumption that the permutation is 5% of the time required to do the full SecurID hash, the running time is equivalent to T × 24×
29−N 4
×
4 × 29−N 4 × 2 × 0.05 × s 64
(3)
full hash operations, where s is the speedup factor that can be obtained by taking advantage of the redundancy in the key with respect to the permutation. The 96 value of s is 256 for N = 1, 12 16 for N = 2..5, and 1 for all other values. We remark that in some cases, there is a chance that the second step of phase 2 may be a bit more time consuming than the first. A sufficient but not necessary condition for step 1 to be the most time consuming is if the fraction of 29−N
4 of the values considered. This is usually values that remain is less than 16 the case. We shall ignore the exceptional cases for now, but will deal with them when we present our filtering speedups.
Combined Analysis: The running time of algorithm [1] for a particular value of N is expected to be the approximately the sum of equations 3 and 1. For N = 1..6, these running times are given in Table 1. Again, we reiterate that the table sizes are different from [1] because of an extra condition due to the time expansion, which also gives a small improvement in the running time. The analysis for N = 1 closely matches the simulated results from [1].3 Even though the number of candidates T after the first phase are approximately the same as N goes from 1 to 2 and also from 5 to 6, the running times of the phase 2, step 1 drop significantly. This is because one less nibble of the key is being guessed, and an extra filtering step is being added. In general, we see the pattern that larger values of N are contributing less and less to the sum of the running times, which agrees with the conjecture from [1]. The total running time for N = 1 to 6 is 247.7 and larger values of N would appear to add minimally to this total. For vanishing differentials that involve ≥ 4-bits, which happens about one third of the time, preliminary analysis suggests that the run time is better. 3
A small discrepancy for T exists due to the fact that their simulations involved a precomputed table about twice as big ours.
462
5
Scott Contini and Yiqun Lisa Yin
Faster Filtering
Table 1 illustrates that the trick to speeding up the key recovery attack in [1] is faster filtering. We have found three ways in which their third filtering can be sped up: 1. Only include entries in the precomputed table that actually can be derived from the time expansion. In particular, the values of the two bits in b (or b ) where the differences are located must be the same. 2. In the original filter, a separate permutation is computed for each trial key. This is inefficient, since most of the permuted bits from one particular permutation will overlap with those from many other permutations. Thus, we can amortise the cost of the permutation computations. 3. We can detect ahead of time when a large portion of keys will result in “bad” permutations in steps 1 of both phase 1 and phase 2, and the filtering process can skip past chunks of these bad permutations. The first technique was already applied to the analyses in the previous section. Without this improvement, the running time would have been about 50% worse. The second technique is aimed at reducing the numerator of the factor 4× 29−N 4 64
29−N
4 = in equation 3. To do this, we view the key as a 64-bit 16 counter, where k0 is the most significant bit and k63 is the least. In phase 2, step 1 of the filter, the bits k0 , . . . , k31 are fixed and so are some of the least significant bits (the exact number depends upon N ), so we can exclude these for now. The keys are tried in order via a recursive procedure that handles one key nibble at a time. At the j th recursive branch, each of the possibilities for nibble K7+j are tried. The part of the permutation for that nibble is computed, and then the j + 1st recursive branch is taken. The level of recursion stops when key nibble K7+ 29−N is reached. Thus, the 29−N 4 from equation 3 gets replaced 4 29−N −1 −4i with the average cost per permutation trial, which is i=04 2 ≈ 1.07. 7 ≈ 6.5 speedup. This Observe that when N = 1, this results in a factor of 1.07 trick alone knocks more than 2 bits off the running time. The third speedup is dependent upon the second. It will apply in both phases of the filtering. During the process of trying a permutation, there will be large chunks of bad trial keys that can be identified immediately and skipped. In particular, whenever a difference bit is placed outside of words (W0 , W0 ) and (W4 , W4 ), the key can be skipped because the difference is not in a legal position. Moreover, any other key with the same most significant bits (up to the key nibble that placed the difference bit) will also result in illegal values, implying that the entire recursive branch can be skipped. Heuristically, one would expect that the number of keys that get for filtering in phase 2, step 1 to be about tested 64 / of the number for the attack in [1]. However, a fraction of about 14+2N 2 2 this over simplifies the analysis. A more proper analysis can be done similar to our analysis in the previous section.
Fast Software-Based Attacks on SecurID
463
Table 2. Running times using our improved filter, for N = 1..6 N 1 2 3 4 5 6
Time for Time for testing Total phase 2, step 1 final candidates time 238.7 240.6 240.9 36.4 41.3 2 2 241.3 237.1 241.2 241.3 37.9 40.9 2 2 241.1 38.6 40.6 2 2 240.9 35.7 40.1 2 2 240.2
The combined speedups give the run times in Table 2. In all cases, phase 2, step 1 has become faster than the time for testing the final candidates. The running time for N = 1..6 is 243.6 , so we conjecture that the run time for N up to 16 is no more than 16/6 × 243.6 ≈ 245 . We remark that the run times for the third speedup ignore the overhead time for rejecting keys in phase 2 where the difference bit gets put outside of (W0 , W0 ), but such overhead time is expected to make little difference. We have also ignored the time for other filtering steps of the algorithm. Of those, only step 2 of phase 2 is expected to have comparable cost to step 1 of phase 2. In fact, it can be more costly, especially when N ≡ 2 mod 4. However, there are several possible speedups for this step, particularly when N is small (this restriction is for practical reasons) where the run time becomes most relevant. Such speedups involve using additional preocomputed lookup tables to determine valid keys from the remaining data bits and testing whether the hamming weight of the remaining data bits matches that of the precomputed table entries before blindly trying keys. Therefore, it seems fair to assume that the testing of final candidates will always be the dominant cost in the modified algorithm. Although it appears that we cannot do much better using only a single vanishing differential, we can improve the situation if we use other information that an attacker would have. In later sections we will show that we can improve the time greatly if we take advantage of multiple vanishing differentials, or if we take advantage of knowledge that no other vanishing differentials occur within a small time period of the observed one.
6
Software Implementation
The attack of Biryukov, Lano, and Preneel was specially designed to keep RAM usage low - only one of the precomputed table entries needs to be in program memory at a time. We tested our ideas only for N = 1 and 2-bit differences, and since the table size is small, we took the freedom of implementing a slight variant of their attack which kept the whole precomputed table in memory at once. We programmed all filtering steps of both phases and the three main filtering speedups. In addition, we programmed an extra “table lookup” speedup that
464
Scott Contini and Yiqun Lisa Yin
would improve the running time by a factor of 8 for N = 1. The extra speedup is only applicable for small values of N due to the memory requirements. Thus, the running time is expected to be 8 times faster than the 238.7 listed in Table 2. On our 2.4 GHz PC, this translates to about 8 days of effort. Our code did the search in numerical order, when the key is viewed as a counter as described in Section 5. The only thing we did not do was testing the final candidates using the real function. Instead, we just stopped when we arrived at the target key. So our implementation was designed to test and time the filtering only, in order to confirm that filtering is significantly faster than testing of the final candidates. At the time of writing, we have not done the full key search yet. However, we have done a search that starts out knowing the correct first nibble of the key. The key we were searching for is 356b48b3ae15c271 which yields a vanishing differential when times 0x1c3ba8 and 0x1c3aa8 are sent in. We were able to find the key in 13.8 hours. If we assume that the full search will take at most 24 times longer, the full running time would be 9.2 days, which is on target of expectations.
7
Multiple Vanishing Differentials
There are two scenarios for multiple vanishing differentials: when they have the same difference and when they have different differences. The former is more likely to occur, but in either case we can speed up the attack. 7.1
Multiple Vanishing Differentials with the Same Difference
According to computer simulations, about 45% of the keys that had a collision over a two month period will actually have at least 2 collisions. There is a simple explanation for this, and a way to use the observation to speed up the key search even more. Consider a vanishing differential which comes from times t = T0 T1 T2 and t = T0 T1 T2 . As we saw earlier, the only bits that determine whether the vanishing differential will occur at a particular subround are those that get permuted into words W0 , W0 , W4 , and W4 . Suppose we flip one of the bits in T2 and T2 (the same bit in each). This bit will be replicated four times in the time expansion. If, after the permutation, none of those bits end up in W0 , W0 , W4 , or W4 , then we will witness another vanishing differential. The new vanishing differential will follow the same difference path and disappear in the same subround. Thus, new information is learned that can be used to speed up the key search, which we explain below. In the case that another vanishing differential does not occur, information is also learned which can improve the search, which is detailed in Section 8. Following the above thought process, it is evident that: – Flipping time bits in T1 , T1 or T0 , T0 will only replicate the flipped bit twice in the expansion. Since there are only two bits that are not allowed to be
Fast Software-Based Attacks on SecurID
465
Table 3. Number of final candidates assuming the attacker became aware of z-bits that do not get permuted into words W0 , W0 , W4 , or W4 N Number of final cands using Number of final Number of final Number of final only a single collision cands with z = 2 cands with z = 4 cands with z = 8 1 240.6 239.8 238.9 237.0 2 241.3 240.3 239.3 237.2 41.2 40.1 39.0 3 2 2 2 236.6 40.9 39.7 38.4 4 2 2 2 235.7 40.6 39.2 37.8 5 2 2 2 234.8 6 240.1 238.6 237.0 233.6
in W0 , W0 , W4 , and W4 , the collision is more likely to occur. On the other hand, the time between the collisions is increased, since these are more significant time bits. – Multiple vanishing differentials are more likely to occur when the first collision happened in a small number of subrounds. This is because the words W0 , W0 , W4 , and W4 are smaller, giving more places where the flipped bits can land without interfering with the collision.4 – The converse of these observations is that when multiple vanishing differentials occur, it is most often the case that the collisions all happened in the same subround and followed the same difference path. Moreover, the collisions usually happen within a few subrounds. By simply eying the time data that caused the multiple vanishing differentials, one can determine with close to 100% accuracy whether this situation has happened. The signs of it are: 1) Same input difference for all vanishing differentials, 2) All input times differ in only a few bits, and 3) It is the same bits that differ in all cases. An example is given in Appendix B. The attacker learns z ≥ 2 bits which cannot be permuted to words W0 , W0 , W4 , or W4 . This new knowledge can be combined with our third filtering speedup to skip past more bad keys. The 64 number of final key expected / z of the values given candidates to be tested becomes a fraction of 50−2N z in Table 2. See Table 3 for a summary of these figures when z = 2, z = 4, and z = 8. The times can be further reduced using information about where certain related plaintexts did not cause a vanishing differential: see Section 8. 4
This is the reason for the apparent discrepancy between our research claiming that one needs to precompute up to N = 16 in order to have a ≥ 50% of find the key and [1] claiming 12. In our view, the attacker has a single token and will perform a key search once a single vanishing differential has occurred. In their view, the attacker has several tokens for a fixed period of time, and the attacker selects a vanishing differential randomly among all vanishing differentials that have occurred [3]. Since their view includes multiple vanishing differentials, the expected number of subrounds is less.
466
7.2
Scott Contini and Yiqun Lisa Yin
Multiple Vanishing Differentials with Different Differences
Given two vanishing differentials with different differences, the number of candidate keys can be reduced significantly by constructing more effective filters in each step. Denote the two pairs of vanishing differentials V1 and V2 , and their N values N1 and N2 . We first make a guess of (N1 , N2 ). The number of guesses will be quadratic in the number of subrounds tested up to. The following is a simplified sketch for the new filtering algorithm. – First Phase. Take V1 and guess the first 32 bits of the key. For each 32-bit key that produces a valid (W4 , W4 ), test it against V2 to see if it also produces a valid (W4 , W4 ). – Second Phase. For 32-bit keys that pass phase 1, do the same thing to guess the second 32 bits of the key. The main idea here is to do double filtering within each stage so that the number of candidate keys is further reduced in comparison to when only a single vanishing differential is used. When N1 = N2 = 1, the probability that a 32-bit key passes phase 1 (see Table 1) is 225.0 /232 = 2−7.0 (assuming using the original filter of [1] - it is even more reduced using our improved filter), and the probability that a 64-bit key passes both phases is 240.6 /264 = 2−23.4 . If the two vanishing differentials are indeed independent, we would expect the number of keys to pass the first phase to be 232 × 2−7.0 × 2−7.0 = 218 and the number of keys to pass both phases to be 264 × 2−23.4 × 2−23.4 = 217.2 . Experimental results will reveal whether these figures are attainable in practice, but even if they are not, a big speed up is still expected. The situation should be better in the cases where differences with hamming weights ≥ 4 are involved. We should mention the caveat that the chances of success using the above technique are lower, since we need both difference pairs to disappear within 16 subrounds. On the other hand, the cost of trying this algorithm for two difference pairs is expected to be substantially cheaper than trying the previous algorithms for only one. Therefore, the double filtering should add negligible overhead to the search in the cases that it fails, and would greatly speedup the search when it is successful.
8
Using Non-Vanishing Differentials with a Vanishing Differential
In Section 7.1, we argued that even if only a single vanishing differential occurs over some time period, the search can still be sped up if one takes advantage of knowing where related differentials do not vanish. Here, we give the details.
Fast Software-Based Attacks on SecurID
467
Assume a vanishing differential occurred at times t and t , but no vanishing differential occurred among the time pairs (t⊕2i , t ⊕2i ) for i = 2, . . . , j. We start with i ≥ 2 because in the most typical case, where authenticators are displayed every minute, the least two significant bits of the time are 0 (see Section 2.1). For the values 2 ≤ i ≤ 7, the difference is replicated 4 times in the time expansion, and for i ≥ 8, it is replicated twice. For each value of i, we learn a set of 2 or 4 bits for which at least one in each set must be permuted into the words W0 , W0 , W4 , or W4 . Let us label these sets as U2 , . . . , Uj . For simplicity, we will take j = 13, which corresponds to no other vanishing differential within a window of 2.8 days before or after the observed one. So, we are interested in the probability of at least one bit in each of these sets getting permuted into words W0 , W0 , W4 , or W4 . We say a set Ui is represented with ci ≥ 1 bits if exactly ci bits from Ui get of ways 2N + 14 bits can be permuted into W0 , W0 , W4 , or W4 . The number selected to end up in W0 , W0 , W4 , or W4 is 2N64 +14 . The number of ways that exactly ci bits are represented in the selection for 2 ≤ i ≤ 13 is 7 4 i=2
ci
×
13 2 i=8
ci
×
28 13 . 2N + 14 − i=2 ci
The first product tells the number of ways of selecting ci bits from each set that has 4 bits, the second product is the same except for among sets with 2 bits, and the third product is the number of ways of selecting the remaining bits from the 28 bits that are not among any of the Ui . Thus, our desired probability is:
7 4 13 2 28 13 i=2 ci × i=8 ci × 2N +14− ci i=2 64 (4) 2N +14 all valid (c2 , . . . , c13 ) where valid (c2 , . . . , c13 ) means that each value is at least 1, but the sum of all values is no more than 2N + 14. We have computed these probabilities using the Magma [6] computer algebra package. The probabilities, and corresponding running time for the testing of final candidates are given in Table 4. Monte Carlo experiments have been done to double-check the accuracy of these results. The fact that the probabilities are so small for low values of N is consistent with the argument in Section 7.1 that when a collision happens early, other collisions are likely to follow soon after. One should not assume that the times for the testing the final candidates given in Table 4 are the dominant cost in applying this strategy. Unlike the filtering speedups given in Sections 5 and 7.1, the use of non-vanishing differentials seem to require more overhead in checking the conditions. So although we do not have an exact running time, we confidently surmise that the use of non-vanishing differentials will reduce the time down below 240 hash operations.
468
Scott Contini and Yiqun Lisa Yin
Table 4. Assuming no more vanishing differentials occur within 2.8 days before or after of a given vanishing differential, the final testing of candidates can be improved by the amounts given in this table N Fraction of keys Time for testing having property final candidates 1 2−14.3 226.3 −11.7 2 2 229.6 −9.7 3 2 231.5 −8.1 4 2 232.8 5 2−6.7 233.9 −5.7 6 2 234.4
9
Conclusion
The design of the alleged SecurID hash function appears to have several problems. The most serious appears to be collisions that happen far too frequently and very early within the computation. The involvement of only a small fraction of bits in the subrounds exacerbates the problem. Moreover, the redundancy of the key with respect to the initial permutation adds an extra avenue of attack. Altogether, ASHF is substantially weaker than one would expect from a modern day hash function. Our research has shown that the key recovery attack in [1] can be sped up by more than a factor of 8, giving an improved attack with time complexity about 245 hash operations. In practice, the attacker can actually obtain more information than just a single collision. We have shown that, with this extra information, the time complexity can be further reduced to about 240 hash operations, making the attack doable by anyone with a modern PC.
Acknowledgements We are grateful to Joe Lano for his insights and helpful comments, and for his hospitality while the first author of this document visited Brussels.
References [1] A. Biryukov, J. Lano, B. Preneel. Cryptanalysis of the Alleged SecurID Hash Function, In Proceedings of SAC 2003, to appear in LNCS. A longer version of this paper is available online from http://eprint.iacr.org/2003/162. 454, 455, 456, 457, 459, 461, 462, 465, 466, 468, 469 [2] S. Contini, The Effect of a Single Vanishing Differential in ASHF, sci.crypt post, 6 Sep, 2003. 457 [3] J. Lano, private communication, 28 Oct, 2003. 465
Fast Software-Based Attacks on SecurID
469
[4] I. C. Wiener, Sample SecurID Token Emulator with Token Secret Import, post to BugTraq, http://archives.neohapsis.com/archives/bugtraq/200012/0428.html, 21 Dec, 2000. 455, 456 [5] Tips on Reassigning SecurID Cards and Requesting New SecurID Cards, AMS Newsletter, March 2002, Issue No. 117. Available at http://www.utoronto.ca/ams/news/117/html/117-5.htm . 455 [6] The Magma Computer Algebra Package. Information available at http://magma.maths.usyd.edu.au/magma/ . 467
A
Analysing Precomputed Tables
Using computer experiments, we were able to exhaustively search for valid entries in the precomputed table up to N = 6 for 2-bit vanishing differentials and up to N = 4 for 4-bit differentials at this point. It was predicted in [1] that the size of the table gets larger by a factor of 8 as N grows and it may take up to 244 steps and 500GB memory to precompute the table for N = 12. Here we make an attempt to derive the entries in the table analytically when N = 1. If we could extend the method to N > 1, we may be able to enumerate the entries analytically without expensive precomputation and storage. We start with Equation (6) in [1]. Note that we are trying to find constraints for the values in the subround i−1. So for simplicity, we will omit the superscript i − 1 from now on, and Equation (6) becames the following. B4 = ((((B0 >>> 1) − 1) >>> 1) − 1) ⊕ B4 , B0
(5)
= 100 − B4 .
We first note that B0 and B0 have to be different in the msb. Therefore, there is at least one bit difference in (B0 , B0 ). The other bit difference can be placed either in the remaining 7 bits of (B0 , B0 ) or any of the 8 bits in (B4 , B4 ). Rewriting Equation 5, we have B0 = (((B4 ⊕ B4 ) + 1) <<< 1) + 1) <<< 1. Since there are at most one bit difference in (B4 , B4 ), it can only take on 9 possible values: 0 (for no bit difference) or 2i (for one bit difference in bit i). Below, for each possible value of (B4 , B4 ), we enumerate the possible values of (B0 , B0 ). During the enumeartion, we also take into consideration the additional requirement that the two bits in b where the differences occur must be the same (See Section 4). – If B4 ⊕ B4 = 0, then B0 =0x06. Since there is no bit difference in (B4 , B4 ), we know that B0 and B0 differ in two bits – one of them must be the msb, and the other can be any of the remaining bits. B4 ⊕ B4 B0 B0 k0 0x00 0x06 0x87, 84, 82, 8e, 96, a6, c6 0
470
Scott Contini and Yiqun Lisa Yin
Table 5. Example of 16 vanishing differentials that happened within 1.3 days, using key b5 a9 f4 8c 16 23 a6 1a
1e 1e 1e 1e 1e 1e 1e 1e 1e 1e 1e 1e 1e 1e 1e 1e
80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f
First plaintext 8c 8c 1e 80 8c 8c 8c 1e 81 8c 8c 8c 1e 82 8c 8c 8c 1e 83 8c 8c 8c 1e 84 8c 8c 8c 1e 85 8c 8c 8c 1e 86 8c 8c 8c 1e 87 8c 8c 8c 1e 88 8c 8c 8c 1e 89 8c 8c 8c 1e 8a 8c 8c 8c 1e 8b 8c 8c 8c 1e 8c 8c 8c 8c 1e 8d 8c 8c 8c 1e 8e 8c 8c 8c 1e 8f 8c
8c 8c 8c 8c 8c 8c 8c 8c 8c 8c 8c 8c 8c 8c 8c 8c
1e 1e 1e 1e 1e 1e 1e 1e 1e 1e 1e 1e 1e 1e 1e 1e
Second plaintext 90 8c 8c 1e 90 8c 91 8c 8c 1e 91 8c 92 8c 8c 1e 92 8c 93 8c 8c 1e 93 8c 94 8c 8c 1e 94 8c 95 8c 8c 1e 95 8c 96 8c 8c 1e 96 8c 97 8c 8c 1e 97 8c 98 8c 8c 1e 98 8c 99 8c 8c 1e 99 8c 9a 8c 8c 1e 9a 8c 9b 8c 8c 1e 9b 8c 9c 8c 8c 1e 9c 8c 9d 8c 8c 1e 9d 8c 9e 8c 8c 1e 9e 8c 9f 8c 8c 1e 9f 8c
8c 8c 8c 8c 8c 8c 8c 8c 8c 8c 8c 8c 8c 8c 8c 8c
The additional requirement rules out two possible values of B0 (0x84, 0x82), leaving 5 possible combinations. – If B4 ⊕ B4 = 2i , then there is only one bit difference in (B0 , B0 ), which is the msb. In this case, there are only one choice for B0 for each B0 . B4 ⊕ B4 0x01 0x02 0x04 0x08 0x10 0x20 0x40 0x80
B0 0x0a 0x0e 0x16 0x26 0x46 0x86 0x07 0x08
B0 0x8a 0x8e 0x96 0xa6 0xc6 0x06 0x87 0x88
k0 0 0 0 0 0 1 0 0
The additional requirement rules out every combination above except the first one (B0 =0x0a and B0 =0x8a). Combining the above two cases, we have 5 + 1 = 6 pairs of (B0 , B0 ), each of which giving a valid tuple (k0 , B0 , B4 , B0 , B4 ), where k0 is the msb of B0 . Finally, note that if (k0 , a, b, c, d) is a valid tuple, than (k0 , c, d, a, b) is also a valid typle. For example, if (0, 0x06, 0xdd, 0x87, 0xdd) is valid, then (0, 0x87, 0xdd, 0x06, 0xdd) is also valid. Therefore, the table consists of a total of 2 × 6 = 12 entries. These entries match the results from our simulation.
Fast Software-Based Attacks on SecurID
B
471
Example of Multiple Vanishing Differentials
Table 5 is an example where 16 vanishing differentials happened within 1.3 days. All had the same difference path, which collided at N = 2. One can see that only the 4 least significant bits of time byte T1 differ. Since each of these bits are duplicated twice, the expected running time of the last steps is given by z = 8 in Table 3. Taking into consideration N = 2, the total time is expected to be on the order of 238 operations.
A MAC Forgery Attack on SOBER-128 Dai Watanabe1 and Soichi Furuya1 Systems Development Laboratory, Hitachi, Ltd. 292 Yoshida-cho, Totsuka-ku, Yokohama, 244-0817, Japan {daidai, soichi}@sdl.hitachi.co.jp
Abstract. SOBER-128 is a stream cipher designed by Rose and Hawkes in 2003. It can be also uses for generating Message Authentication Codes (MACs). The developers claimed that it is difficult to forge the MAC generated by SOBER-128, though, the security model defined in the proposal paper is not realistic. In this paper, we examine the security of the MAC generation function of SOBER-128 under the security notion given by Bellare and Namprempre. As a result, we show the MAC generation function of SOBER-128 is vulnerable against differential cryptanalysis. The success probability of this attack is estimated at 2−6 . Keywords: Stream cipher, Message Authentication Code, Differential cryptanalysis, SOBER
1
Introduction
The desire to use a known cryptographic module for various applications exists from long ago. The first trial is realized as modes of operation of a block cipher. The OFB-mode and the counter-mode are the usages of a block cipher for a random number generation and CBC-MAC provides a message authentication mechanism. On the other hand, Anderson and Biham presented the construction of a block cipher from a pseudo-random number generator (PRNG) and a hash function [1]. The security of these modes of operation are provable under the assumption that the underlying primitives are ideal cryptographic functions. Daemen considered the construction of elemental cryptographic functions, a block cipher, a PRNG , and a hash function, from unreliably weak functions, e.g., a round function of a block cipher [6]. The security of these constructions are not certain, but their processing speeds are often significantly faster than that of modes of operation. Daemen and Clapp proposed a cryptographic module Panama in 1998 [7]. Panama can be used as a PRNG and as a hash function. Ferguson et al. proposed an authenticated encryption algorithm Helix in 2003 [11]. Helix can be also used as a PRNG and for a MAC generation. SOBER [18] is a stream cipher developed by Rose in 1998. SOBER adopts a linear feedback shift register (LFSR) defined over GF(28 ) as the update function of the internal state. This construction enables an efficient software implementation in 8-bit processors so that LFSRs defined over an extension field are employed not only by SOBER, but also by SSC2 [20], SNOW [8, 10], and so on. B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 472–482, 2004. c International Association for Cryptologic Research 2004
A MAC Forgery Attack on SOBER-128
473
Several variants of SOBER have been developed to strengthen its security or to be suitable for 16-bit and 32-bit processors. SOBER-t16 [12], t32 [13], Turing [14], SOBER-128 [15] are the published algorithms. SOBER-t16 and t32 were submitted to NESSIE project, but were rejected because some security flaws are reported [4, 5, 9]. A weakness of the initialization of Turing has been also reported [16]. SOBER-128 is the latest algorithm in SOBER family, and is the modified version of SOBER-t32. In addition, SOBER-128 can be used not only as a stream cipher, but also as a MAC generation. However, the security of MAC generation algorithm of SOBER-128 has not been evaluated, so is still unclear. In this paper we examine the security of the MAC generation function of SOBER-128. We show that MAC generation algorithm of SOBER-128 is vulnerable against differential cryptanalysis and the forgery of the MAC generated by SOBER-128 is successful with high probability, about 2−4 . The attack that we present in this paper is out of the expectation of the designers. Our attack follows the security notion defined by Bellare and Namprempre [2]. The rest of this paper is organized as follows. Firstly we define the supposed attacker in this paper in Sect. 2. Secondly we briefly describe the MAC generation algorithm of SOBER-128 in Sect. 3. Then we show how to forge MAC generated by SOBER-128 in Sect. 4 and discuss the weakness of SOBER-128 in Sect. 5. At last we summarize the result in Sect. 6.
2
The Model of the Attack
The designers mentioned that it is necessary for the recovery of the internal state (or the secret key) to forge a MAC generated by SOBER-128 and resolved the difficulty of a MAC forgery into that of the recovery of the internal state. In the attack to recover the internal state, the attacker can generate a MAC for a message with a secret key and an IV only once. Under the security notional description, the attacker can make queries only to a (probabilistic) MAC generation oracle. In this paper, we consider a MAC forgery attack without any knowledge about the secret internal state. The attacker we supposed is a malicious person in a public communication channel. In our attack, there are a sender of messages, a receiver, and the attacker. The sender sends pairs of messages and the associated tags (M, M AC(M, K)) to the receiver. The attacker intercepts them, changes only the messages, then sends (M , M AC(M, K)) to the receiver. The attack is successful if the receiver does not detect the manipulation of a message. The security notion of this attack was given by Bellare and Namprempre [2]. Under the notion, the attacker can make queries not only to the MAC generation oracle, but also to the MAC verification oracle. For each query, the MAC verification oracle returns 1 if the attached tag is correct, and return 0 otherwise. In real communication, the MAC verification oracle is the receiver and he outputs a requirement of retransmission to the sender if he detects the manipulation.
474
Dai Watanabe and Soichi Furuya
α R16 R15 R14 R13 R12 R11 R10 R9 R8 R7 R6 R5 R4 R3 R2 R1 R0
f >>>8
Konst
f
Non-Linear Filter
NLF νt Fig. 1. The structure of non-linear filter N LF
There is no difference between these two attacks if the oracles are deterministic as a MAC verification oracle can be constructed from a MAC generation oracle. However, they are essentially different if the oracle is probabilistic (i.e. the oracle has a nonce as an input). In our model, the attacker can make a query for each oracle. On the other hand, the attacker whom the designers supposed can make only a query to a MAC generation oracle. The difference is critical to apply our attack given in Sect. 4.
3
The Algorithm of SOBER-128
SOBER-128 is a filtering generator which takes a secret key and an initial vector (IV) as inputs. The lengths of inputs are less than 128 bits. The update function of the internal state is an LFSR whose feedback polynomial is defined over GF(232 ). The non-linear filtering function consists of additions modulo 232 , XORing, circular shift, and 8 × 32-bit substitution box (S-box) S (See Figure 1 for detail).
A MAC Forgery Attack on SOBER-128
3.1
475
The Linear Feedback Shift Register
Before giving the definition of the LFSR, we firstly define the bit representation of elements of a finite field GF(2w ). The extension field over GF(2) is defined by the residue field GF(2)[z]/(ϕ(z)), where ϕ(z) is the irreducible polynomial. An element of GF(2w ) is a polynomial whose degree is less than w. The bit representation of a polynomial aw−1 z w−1 + aw−2 z w−2 + · · · + a0 is given by aw−1 z w−1 ||aw−2 z w−2 || · · · ||a0 , where the notation || means a bit-concatenation. In the proposal of SOBER-128, the finite field GF(232 ) is defined by an extension field of a subfield GF(28 ). Firstly the subfield GF(28 ) is given by the residue field GF(2)[z]/(x8 + z 6 + z 3 + z 2 + 1). Next, GF(232 ) is defined by the residue field GF(28 )[y]/(g(y)), where g(y) = 0xd0·y 3 + 0x2b·y 2 + 0x43·y + 0x67. The coefficients in the irreducible polynomial g are the hexadecimal expressions of the elements of GF(28 ). Now we can define the feedback polynomial p(x) of the LFSR of SOBER-128: p(x) = x17 + x15 + x4 + α,
(1)
where α = 0x00000100 (hexadecimal expression) = y. The registers of SOBER-128 is denoted by R = (R[0], R[1], . . . , R[16]), where each R[i] is a register of 32-bit length. We use subscript, like Rt , if the time t should be clarified. The update function defined by above feedback polynomial is calculated as follows: tmp = R[0], R[i] = R[i + 1],
0 ≤ i < 16,
R[16] = R[14] ⊕ R[4] ⊕ α · tmp.
(2)
Furthermore, we can reduce the calculation of the multiplication with a constant in GF(232 ) on 32-bit processor. The multiplication consists of only two operations, referring a pre-computed table Multab indexed by the most significant byte and a shift operation: α · x = (x << 8) ^ Multab[(x >> 24)&0xFF],
(3)
where ^ is the XORing operation and <<, >> are the left and right shift operations respectively. 3.2
Non-linear Filtering N LF
The description of the non-linear filtering function N LF is not necessary for explaining our attack, but we give the brief description. N LF is defined by the following equation (See Figure 1.): N LF (R) = f (((R[0] R[16)]) ≫ 8 R[1] ⊕ Konst) R[6]) R[13], where is the addition modulo 232 and ≫ is the rotation to right in a 32-bit register. 8 × 32-bit non-linear substitution f is defined by f (a) = S1 [aH ]||(S2 [aH ] ⊕ aL ),
476
Dai Watanabe and Soichi Furuya
32-bit input 8-bit
S2 S1 32-bit output Fig. 2. The structure of function f
where aH and aL are the most significant 8 bits and the least significant 24 bits of the input a respectively (See Figure 2.). S1 and S2 in Figure 2 are the substitution boxes that output 8 bits and 24 bits values respectively. Please refer the proposal document of SOBER-128 [15] for more detail about the substitution table. Konst is dependent on the initial values, and does not change after the initialization. Hence we can consider it is a key-dependent constant. 3.3
The Message Feedback Function P F F
The message feedback function P F F is used for injecting a message to the LFSR in MAC generation. P F F is given by the following equation: P F F (R[4], p, Konst) = f ((f (R[4] p) ≫ 8) Konst). The value of the register R[4] is replaced by the output of P F F .
α R16 R15 R14 R13 R12 R11 R10 R9 R8 R7 R6 R5 R4 R3 R2 R1 R0 pt Plaintext Feedback Function
PFF
f >>>8
Konst f Fig. 3. The structure of plaintext feedback function P F F
A MAC Forgery Attack on SOBER-128
3.4
477
MAC Generation
The MAC generation of SOBER-128 is divided into two phases. The first phase is called MAC accumulation and the second phase is called MAC finalization. Before starting MAC generation, the internal state is initialized with a secret key and an IV. In the MAC accumulation phase, the message words are injected into the LFSR by using the message feedback function P F F : Rt [4] ← P F F (Rt [4], pt , Konst) = f ((f (Rt [4] pt ) ≫ 8) Konst).
(4)
In the MAC finalization phase, the internal state is mixed by XORing the constant IN IT KON ST to R[15], applying the non-linear update function Diffuse 18 times. The non-linear function Diffuse is defined as follows: Step 1. Updating the LFSR. Step 2. Applying N LF to the register and replacing R[4] with the value R[4] ⊕ ν, where ν is the output of N LF . After the mixing, SOBER-128 outputs MAC of arbitrary length. The last step is same as the random number generation using N LF .
4
The MAC Forgery
Our attack aims to find a collision of messages with a same secret key and a same IV. For that, we apply differential cryptanalysis developed by Biham and Shamir [3]. Differential cryptanalysis observes the differential propagation of the target encryption function. In differential cryptanalysis, the attacker firstly encrypts amount of plaintext pairs with a fixed differential. and observes the distribution of the differentials of ciphertext. If the output differentials do not distribute uniformly, the attacker can recover some information of the secret key from the deviation of the distribution. The basic idea of our attack is finding a pair of message sequence P and P , which yields same internal state value at a certain time T with a secret key and an IV. In a practical sense, the purpose of the attack is finding the differential sequence {∆pt }t with which any message sequence pair P = {pt }t and P = {pt ⊕ ∆pt }t make a collision with high probability. The search can be divided into two part. The search for the differential propagation of P F F with high probability and the construction of the differential elimination equation in the LFSR. In the following two subsection, each topic is discussed in detail. 4.1
The Differential Propagation in the Message Feedback Function P F F
Firstly, we discuss about the differential propagation in the message feedback function P F F .
478
Dai Watanabe and Soichi Furuya f
We denote the differential propagation in the function f by ∆ → ∆ . The differential characteristic of non-linear permutation f is very large because the transformation is not uniform. Only the most significant byte is transformed non-linearly. so that any differential ∆ whose most significant byte is zero does f not influenced by f , i.e., ∆ → ∆ holds with probability 1. Hence, if the bytewise expression of the differential of the register R[4] is given by (0, δ1 , δ2 , 0) and the differential of the message injection p is given by (0, δ1 , δ2 , 0), the following differential propagation of P F F holds with high probability:
((0, δ1 , δ2 , 0), (0, δ1 , δ2 , 0)) → (0, δ1 , δ2 , 0) f
→ (0, δ1 , δ2 , 0) ≫
→ (0, 0, δ1 , δ2 )
→ (0, 0, δ1 , δ2 ) f
→ (0, 0, δ1 , δ2 ). 4.2
(5)
How to Eliminate Differentials in the LFSR
Next we discuss how to eliminate differentials in the LFSR. To simplify the discussion, we replace the message feedback function P F F by XORing the message directly to Rt [4]. In fact, Eq. 5 indicates that any differential of Rt [4] can be entirely overwritten by addition to a message block. Hence the discussion with the simplified message injection is useful. To avoid confusion, we denote the message block at time t by qt and the differential by ∆qt respectively. The internal state initialized by a secret key K and an initial vector I is denoted by R0 . We denote by R0 (K, I) if the initial values should be clarified. We denote the expression matrix of the LFSR of SOBER-128 defined over GF(232 ) by A. In the following discussion, we treat the internal state (registers) of the LFSR as a vector space over GF(232 ). Eq. 5 shows that the injection of the differential to the LFSR is identified by addition of the vector Dt given by Dt = t (0, 0, 0, 0, ∆qt , 0, . . . , 0). The corresponding internal state to message sequence qt ⊕ ∆qt is given by RT ⊕ DT = A · (RT −1 ⊕ DT −1 ) ⊕ DT = A2 · (RT −2 ⊕ DT −2 ) ⊕ A · DT −1 ⊕ DT .. . T = AT · R0 ⊕ At · DT −t .
(6)
t=0
Eq. 6 indicates that the differential vector of the internal state becomes zero at time T iff the following equation is satisfied. T t=0
At · DT −t = 0.
(7)
A MAC Forgery Attack on SOBER-128
479
The differential vector Dt = t (0, 0, 0, 0, ∆qt , 0, . . . , 0) is divided into the elemental vector e4 = t (0, 0, 0, 0, 1, 0, . . . , 0) and the coefficient ∆qi : Dt = ∆qt · e4 . Multiplying a constant is commutative with any matrix, hence we can transform Eq. 7 as follows: 0=
T
At · DT −t
t=0
=
T
At (∆qT −t · e4 )
t=0 T =( ∆qT −t At )e4 .
(8)
t=0
A differential sequence which satisfies this equation is given by CayleyHamilton relation. The characteristic polynomial of matrix A is given by Eq. 1 so that A satisfies the following relation: A17 + A15 + A4 + αE = 0,
(9)
where E is the identity matrix of dimension 17. Therefore the following differential sequence ∆qt satisfies Eq. 8: ∆q0 = ∆q2 = ∆q13 = a,
∆q17 = α · a.
(10)
On the other hand, we examined exhaustive search for bite-wise truncated differential of message sequences {∆qt }t which satisfy Eq. 9 and T ≤ 20 on PC. The experiment concludes that Eq. 10 gives the best differential set, i.e. both the time T and the number of non-zero differentials are smallest. Now we get back to the message injection by P F F . There are two different purpose to inject a message differential ∆pt into the LFSR. One is for injecting a certain differential value into a register, And the other is for eliminating a differential in Rt [4]. For each case, ∆pt = ∆qt ≪ 8 and ∆pt = ∆qt yields the same result as in the discussion of the simplified message injection. Table 1 shows the propagation of the differential given in Eq. 10 in the internal state, where b = α · a. Eq. 3 implies a significant characteristic of α. If the byte-wise expression of the differential ∆q0 is given by (0, x, y, z), then α · ∆q0 = (x, y, z, 0) holds for every x, y, z. For example, the 32-bit differentials corresponding to the truncated differentials a and b in Table 1 are given by 0x00000001 and 0x00000100 respectively. 4.3
The Success Probability of the MAC Forgery
In the differential propagation of the message feedback function P F F given by Eq. 5, the differential changes probabilistically only at the addition over GF(232 ).
480
Dai Watanabe and Soichi Furuya Table 1. The differential propagation in the LFSR Time ∆pt ∆qt Differential in the LFSR 00 a ≪ 8 a 0000a000000000000 0 0 000a000000000000a 01 00a0a0000000000a0 02 a ≪ 8 a 0 0 0a0a0000000000a00 03 0 0 a0a0000000000a000 04 0 0 0a0000000000a000b 05 0 0 a0000000000a000b0 06 0 0 0000000000a000b00 07 0 0 000000000a000b000 08 0 0 00000000a000b0000 09 0 0 0000000a000b00000 10 0 0 000000a000b000000 11 0 0 00000a000b0000000 12 a a 00000000b00000000 13 0 0 0000000b000000000 14 0 0 000000b0000000000 15 0 0 00000b00000000000 16 b b 00000000000000000 17
Lipmaa and Moriai presented the efficient algorithm for calculating the differential probability of an addition modulo 2n for arbitrary n [17]. For example, if the differential a = 0x00000001 in Eq. 10 is chosen, the differential probabilities of P F F at each time t = 0, 2, 13, 17 are 2−2 , 2−2 , 2−1 , 2−1 respectively. Therefore the success probability of the MAC forgery of SOBER-128 is about 2−6 . We examined the attack with a sample code provided by the designers. The success probability of the experimental attack is about 2−5.5 .
5
Discussion
Generally, the supposed attack in MAC forgery is adaptive chosen plaintext attack. However, the attack applied in the previous section is known plaintext attack. Any information about IVs and the secret key is not necessary as the attack is applicable even if the internal state is perfectly random. Besides, though SOBER-128 has authenticated encryption mechanism, encrypting messages does not strengthen the security of message authentication function because altering ciphertext is equal to altering plaintext for the encryption by a stream cipher. In this section, we examine some idea to improve the security of SOBER-128 and consider its efficiency. The vulnerability of MAC generation algorithm of SOBER-128 is mainly derived from the fact that the substitution f is not uniformly non-linear. From the viewpoint of differential propagation, only the transformation of the most significant byte is non-linear. Besides, a rotation has no diffusion function. Hence
A MAC Forgery Attack on SOBER-128
481
at least 4 times iterations of these functions is necessary to apply the non-linear transformation to all bytes. This reduces the performance of SOBER-128. Another idea to improve the security is injecting the output of P F F to plural registers for increasing the complexity. However, the transformation in Eq. 8 holds for any vector whose elements are defined by vi = xi · v (GF(232 )). Hence this change of algorithm does not improve the security.
6
Conclusion
In this paper, we show that the MAC generation algorithm of SOBER-128 is vulnerable under a certain practical assumption. We assume that the attacker intercepts the message and change some bits of the message, and sends it to the receiver. Under this assumption, the attacker can forge a message with probability 2−6 .
References [1] R. Anderson and E. Biham, “The Practical and Provably Secure Block Ciphers: BEAR and LION,” Fast Software Encryption, FSE’96, Springer-Verlag, LNCS 1039, pp. 113–120, 1996. 472 [2] M. Bellare and C. Namprempre, “Authenticated Encryption: Relations among Notions and Analysis of the Generic Composition Paradigm,” Advances in Cryptology, Asiacrypto 2000, Springer-Verlag, LNCS 1976, pp. 531–545, 2000. 473 [3] E. Biham and A. Shamir, Differential Cryptanalysis of the Data Encryption Standard, Springer-Verlag, 1993. 477 [4] S. Babbage and J. Lano, “Probabilistic Factors in the Sober-t Stream Ciphers,” Third Open NESSIE Workshop, proceedings, 2002. 473 [5] C. De Canni`ere, J. Lano, B. Preneel, and J. Vandewalle, “Distinguishing Attacks on Sober-t32,” Third Open NESSIE Workshop, proceedings, 2002. 473 [6] J. Daemen, “Cipher and hash function design strategies based on linear and differential cryptanalysis,” Doctoral Dissertation, K. U.Leuven, 1995. 472 [7] J. Daemen and C. Clapp, “Fast Hashing and Stream Encryption with Panama,” Fast Software Encryption, FSE’98, Springer-Verlag, LNCS 1372, pp. 60–74, 1998. 472 [8] P. Ekdahl and T. Johansson, “SNOW – a new stream cipher, ” NESSIE project submission, 2000, available at http://www.cryptonessie.org/. 472 [9] P. Ekdahl and T. Johansson, “Distinguishing Attacks on SOBER-t16 and t32,” Fast Software Encryption, FSE 2002, Springer-Verlag, LNCS 2365, pp. 210–224, 2002. 473 [10] P. Ekdahl and T. Johansson, “A new version of the stream cipher SNOW, ” Selected Areas in Cryptography, SAC 2002, Springer-Verlag, LNCS 2595, pp. 47– 61, 2002. 472 [11] N. Ferguson, D. Whiting, B. Schneier, J. Kelsey, S. Lucks, and T. Kohno, “Helix, Fast Encryption and Authentication in a Single Cryptographic Primitive,” Fast Software Encryption, FSE 2003, pre-proceedings, pp. 345–362, 2003. 472 [12] P. Hawkes and G. Rose, “Primitive Specification and Supporting Documentation for SOBER-t16 Submission to NESSIE,” First Open NESSIE Workshop, proceedings, 2000. 473
482
Dai Watanabe and Soichi Furuya
[13] P. Hawkes and G. Rose, “Primitive Specification and Supporting Documentation for SOBER-t32 Submission to NESSIE,” First Open NESSIE Workshop, proceedings, 2000. 473 [14] P. Hawkes and G. Rose, “Turing, A Fast Stream Cipher,” Fast Software Encryption, FSE 2003, pre-proceedings, pp. 307–324, 2003. 473 [15] P. Hawkes and G. Rose, “Primitive Specification for SOBER-128,” IACR ePrint Archive, http://eprint.iacr.org/2003/81/, 2003. 473, 476 [16] A. Joux and F. Muller, “A Chosen IV Attack against Turing,” Selected Areas in Cryptography, SAC 2003, pre-proceedings, 2003. 473 [17] H. Lipmaa and S. Moriai, “Efficient Algorithms for Computing Differential Properties of Addition,” Fast Software Encryption, FSE 2001, Springer-Verlag, LNCS 2355, pp. 336–350, 2001. 480 [18] G. Rose, “A Stream Cipher based on Linear Feedback over GF(28 ),” “Proc. Australian Conference on Information Security and Privacy,” Springer-Verlag, 1998. 472 [19] R. Rueppel, Analysis and Design of Stream Ciphers, Springer-Verlag, 1986. [20] “The Software-Oriented Stream Cipher SSC2,” Fast Software Encryption, FSE 2000, Springer-Verlag, LNCS 1978, pp. 31–48, 2000. 472
On Linear Approximation of Modulo Sum Alexander Maximov Department of Information Technology Lund University Box 118, SE–22100 Lund, Sweden [email protected]
Abstract. The general case for a linear approximation of the form “X1 + · · · + Xk mod 2n ” → “X1 ⊕ · · · ⊕ Xk ⊕ N ” is investigated, where the variables and operations are n-bit based, and the noise variable N is introduced due to the approximation. An efficient and practical algorithm of complexity O(n · 23(k−1) ) to calculate the probability Pr{N } is given, and in some cases it can be reduced to O(2k−2 ) .
1
Introduction
Linear approximations of nonlinear blocks in a cipher is a common tool for cryptanalysis. One of the most typical approximations is the substitution of the arithmetical sum modulo 2n () with the XOR-operation (⊕) of the input variables. We introduce a noise variable N and write: X1 · · · Xk = X1 ⊕ · · · ⊕ Xk ⊕ N . For a distinguishing attack the bias of a linear combination of noise variables can be calculated if their distributions are known. For the considered approximation the distribution of N can be calculated in two ways: I.
for X1 = 0 . . . 2n − 1 .. .
← O(2k·n )
for Xk = 0 . . . 2n − 1 DistN [(X1 · · · Xk )⊕ (X1 ⊕ · · · ⊕ Xk )]++; or II.
for C = 0 . . . 2n − 1 DistN [C]=ProbOfN(C);
← O(c · 2n )
where the function ProbOfN(C) calculates the corresponding probability (see Section 2). Note that we deal with integer-valued distribution tables, i.e., Pr{N = C} = DistN [C]/2k·n .
2
The Function ProbOfN(C)
Let C = cn . . . c2 0 (note that Pr{N = cn . . . c2 1} = 0). Then: ProbOfN(C) = ( 1 1 . . . 1 ) ×
2
Tci × S0 ,
i=n B. Roy and W. Meier (Eds.): FSE 2004, LNCS 3017, pp. 483–484, 2004. c International Association for Cryptologic Research 2004
484
Alexander Maximov
where T0 , T1 , and S0 are fixed matrices. The algorithm to construct the matrices T0 , T1 , and S0 is given below. Initialization: S0 = (0) - is of size (2k−1 × 1) T0 = T1 = (0) - is of size (2k−1 × 2k−1 ) Algorithm 1: S0 – construction 1. for X = 0 to 2k − 1 2. S0 [ #X ]+ = 1 2 Algorithm 2: T0 , T1 – construction 1. for C = 0 to 2k−2 − 1 2. for X = 0 to 2k − 1 3. T0 [C + #X ][2C] + +, 2 4. T1 [C + #X+1 ][2C + 1] + +; 2 where #X is the Hamming weight of X.
3
Example
Assume that n = 5 and k = 3, i.e., N = (X1 X2 X3 ) ⊕ (X1 ⊕ X2 ⊕ X3 ). Then: ⎛ ⎛ ⎛ ⎞ ⎞ ⎞ 4000 0100 4 ⎜4 0 4 0 ⎟ ⎜0 6 0 1 ⎟ ⎜4⎟ ⎜ ⎜ ⎜ ⎟ ⎟ ⎟. T0 = ⎝ T = S = 0 0 4 0⎠ 1 ⎝0 1 0 6⎠ 0 ⎝0⎠ 0000 0001 0 Let C = 10110, then ProbOfN(C)= (1 1 1 1) × T1 × T0 × T1 × T1 × S0 , and ⇒ Pr{N = 10110} = 1536/23·5 = 0.046875.
4
Optimization Ideas
If n is not very large, say n = 32 bits, then optimization can be done in the following way. Represent C = AB0, where A = c32 . . . c 16 and B = c15 . . . c2 . Then 16 create two tables of vectors: RLef t [A] = (1 1 . . . 1) × i=32 Tci and RRight [B] = 2 i=15 Tci × S0 , for all A and B. Then the probability Pr{N = C} is just a scalar product RLef t [c32 · · · c16 ] × RRight [c15 · · · c2 ], and the time complexity is O(2k−2 ). This idea of partitioning can be extended to larger n as well.
Author Index
Akkar, Mehdi-Laurent . . . . . . . . . 332 Armknecht, Frederik . . . . . . . . . . . . 65
Lipmaa, Helger . . . . . . . . . . . . . . . . 317 Lucks, Stefan . . . . . . . . . . . . . . . . . . 359
Bellare, Mihir . . . . . . . . . . . . . . . . . .389 B´evan, R´egis . . . . . . . . . . . . . . . . . . .332 Blackburn, Simon R. . . . . . . . . . . .446
Maitra, Subhamoy . . . . . . . . 143, 161 Maity, Soumen . . . . . . . . . . . . . . . . 143 Maximov, Alexander . . . . . . . . . . . 483 Molland, H˚ avard . . . . . . . . . . . . . . . 109 Moon, Dukjae . . . . . . . . . . . . . . . . . . 34 Muller, Fr´ed´eric . . . . . . . . . . . . . . . . . 94
Chee, Seongtaek . . . . . . . . . . . . . . . 193 Cheon, Jung Hee . . . . . . . . . . . . . . . 83 Cho, Joo Yeon . . . . . . . . . . . . . . . . . . 49 Clark, John A. . . . . . . . . . . . . . . . . .161 Contini, Scott . . . . . . . . . . . . . . . . . 454
Englund, H˚ akan . . . . . . . . . . . . . . . 127
Paterson, Kenneth G. . . . . . . . . . . 446 Paul, Souradyuti . . . . . . . . . . . . . . .245 Pieprzyk, Josef . . . . . . . . . . . . . . . . . 49 Piret, Gilles . . . . . . . . . . . . . . . . . . . 279 Preneel, Bart . . . . . . . . . . . . . . . . . . 245
Furuya, Soichi . . . . . . . . . . . . . . . . . 472
Quisquater, Jean-Jacques . . . . . . 279
Goli´c, Jovan Dj. . . . . . . . . . . . . . . . 178 Goubin, Louis . . . . . . . . . . . . . . . . . 332
Rogaway, Phillip . . . . . 348, 371, 389 Rouvroy, Gael . . . . . . . . . . . . . . . . . 279
Han, Jae Woo . . . . . . . . . . . . . . . . . . 34 Hell, Martin . . . . . . . . . . . . . . . . . . . 127 Hong, Jin . . . . . . . . . . . . . . . . . . 34, 193 Hong, Seokhie . . . . . . . . . . . . . . . . . 299
Sarkar, Palash . . . . . . . . . . . . . . . . . 193 Shamir, Adi . . . . . . . . . . . . . . . . . . . . . .1 Shibutani, Kyoji . . . . . . . . . . . . . . . 260 Shirai, Taizo . . . . . . . . . . . . . . . . . . . 260 Shrimpton, Thomas . . . . . . . . . . . .371 Standaert, Francois-Xavier . . . . . 279 St˘ anic˘ a Pantelimon . . . . . . . . . . . . 161
Dumas, Philippe . . . . . . . . . . . . . . . 317
Iwata, Tetsu . . . . . . . . . . . . . . . . . . . 427 Johansson, Thomas . . . . . . . . . . . . 127 Kang, Ju-Sung . . . . . . . . . . . . . . . . .299 Kim, Jaeheon . . . . . . . . . . . . . . . . . . . 34 Klimov, Alexander . . . . . . . . . . . . . . . 1 Ko, Youngdai . . . . . . . . . . . . . . . . . . 299 Kohno, Tadayoshi . . . . . . . . . 408, 427 Lee, Dong Hoon . . . . . . . . 34, 83, 193 Lee, Sangjin . . . . . . . . . . . . . . . . . . . 299 Lee, Wonil . . . . . . . . . . . . . . . . . . . . . 299 Legat, Jean-Didier . . . . . . . . . . . . . 279
Viega, John . . . . . . . . . . . . . . . . . . . .408 Wagner, David . . . . . . . . . . . . . 16, 389 Wall´en, Johan . . . . . . . . . . . . . . . . . 317 Watanabe, Dai . . . . . . . . . . . . . . . . 472 Whiting, Doug . . . . . . . . . . . . . . . . .408 Wu, Hongjun . . . . . . . . . . . . . . . . . . 226 Yin, Yiqun Lisa . . . . . . . . . . . . . . . 454 Zoltak, Bartosz . . . . . . . . . . . . . . . . 210