The Theory of Response-Adaptive Randomization in Clinical Trials
Feifang Hu University of Virginia Charlottesville,VA
...
48 downloads
784 Views
8MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
The Theory of Response-Adaptive Randomization in Clinical Trials
Feifang Hu University of Virginia Charlottesville,VA
William F. Rosenberger George Mason University Fairfax, VA
@Z+KicjENCE A JOHN WILEY & SONS, INC., PUBLICATION
This Page Intentionally Left Blank
The Theory of Response-Adaptive Randomization in Clinical Trials
This Page Intentionally Left Blank
The Theory of Response-Adaptive Randomization in Clinical Trials
Feifang Hu University of Virginia Charlottesville,VA
William F. Rosenberger George Mason University Fairfax, VA
@Z+KicjENCE A JOHN WILEY & SONS, INC., PUBLICATION
Copyright 0 2006 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 11 I River Street, Hoboken, NJ 07030, (201) 748-601 I, fax (201) 748-6008, or online at http://www.wiley.condgo/permission. Limit of Liability/Disclaimerof Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical suppolt, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-PublicationData:
Hu, Feifang, 1964The theory of response-adaptive randomization in clinical trials / Feifang Hu, William F. Rosenberger p. cm. Includes bibliographical references and index. ISBN-13: 978-0-471-65396-7(cloth) ISBN-10: 0-471-65396-9 (cloth) 1. Clinical trials. 1. Rosenberger, William F. 11. Title. R853.C55H8 2006 610.72’46~22 Printed in the United States of America. 10 9 8 7 6 5 4 3 2 I
2006044048
To our teachers and our students
This Page Intentionally Left Blank
Contents
Dedication Preface
1 Introduction 1.1 Randomization in clinical trials 1.1.1 Complete randomization 1.1.2 Restricted randomizationprocedures 1.1.3 Response-adaptive randomizationprocedures 1.1.4 Covariate-adaptive randomizationprocedures 1.1.5 Covariafe-adjusted response-adaptive randomizationprocedures 1.2 Response-adaptive randomization in a historical context 1.3 Outline of the book 1.4 References 2 Fundamental Questions of Response-Adaptive Randomization 2.1 Optimal allocation
V
xi
6 7 8 8
11
11 vii
viii
CONTENTS
Therelationship betweenpower andresponse-adaptive randomization 2.3 The relationship for I< > 2 treatments 2.4 Asymptoticallybest procedures 2.5 References 2.2
15 i8 20 21
3 Likelihood-Based Inference 3. I Data structure and likelihood 3.2 Asymptotic properties of maximum likelihood estimators 3.3 Thegeneral result for determining asymptotically best procedures 3.4 Conclusions 3.5 References
23 23
4 Procedures Based on Urn Models 4.I Generalized Friedman 's urn 4. I . I Historical results on asymptotic properties 4.1.2 Assumptions and notation 4.1.3 Main asymptotic theorems 4. I . 4 Some examples 4.1.5 Proving the main theoretical results 4.2 The class of ternary urn models 4.2.I Randomized Pdlya urn 4.2.2 Birth and death urn 4.2.3 Drop-the-loser rule 4.2.4 Generalized drop-the-loser rule 4.2.5 Asymptoticproperties of the GDL rule 4.3 References
31 31 31 34 37 41 52 56 57 58 59 61 62 64
5 Procedures Based on Sequential Estimation 5.1 Examples 5.2 Properties of procedures based on sequential estimationfor K = 2 5.3 Notation and conditionsfor the general framework 5.4 Asymptoticresults and some examples 5.5 Proving the main theorems 5.6 References
67 68
25 27 28 28
70 75 79 85 88
CONTENTS
ix
6 Sample Size Calculation 6.1 Power of a randomization procedure 6.2 Three types of sample size 6.3 Examples 6.3.1 Restricted randomization 6.3.2 Response-adaptive randomization 6.4 References
91 92 96 99 99 101 103
7 Additional Considerations 7.1 The effects of delayed response 7.2 Continuous responses 7.2.1 Asymptoticvariance of thefour procedures 7.3 Multiple (K > 2) treatments 7.4 Accommodatingheterogeneity 7.4.1 Heterogeneity based on time trends 7.4.2 Heterogeneity based on covariates 7.4.3 Statistical inference under heterogeneity 7.5 References
105 105 108 111 113 114 114 115 115 118
8 Implications for the Practice of Clinical Trials 8.1 Standards 8.2 Binary responses 8.3 Continuous responses 8.4 The efects of delayed response 8.5 Conclusions 8.6 References
121 121 129 131 232 132
9 Incorporating Covariates 9.1 Introduction and examples 9.1. 1 Covariate-adaptive randomization procedures 9.1.2 CARARandomization Procedures 9.2 General framework and asymptotic results 9.2.1 Theprocedure for K treatments 9.2.2 Main theoretical results 9.3 Generalized linear models 9.4 Two treatments with binary responses 9.4.I Power 9.5 Conclusions
135 135 135 138 139 140 141 144 149 153 154
123
x
CONTENTS
9.6
References
155
10 Conclusions and Open Problems 10.1 Conclusions 10.2 Open problems 10.3 References
157
Appendix A: Supporting Technical Material A.1 Some matrix theory A.2 Jordan decomposition A.3 Matrix recursions A.4 Mart ingales A.4.I Dejinition and properties of martingales A.4.2 The martingale central limit theorem A.4.3 Gaussian approximations and the law of the iterated logarithm A.5 Cramkr- Wolddevice A.6 Multivariate martingales A.7 Multivariate Taylor S expansion A.8 References
161
Appendix B: Proofs B. 1 Proofs of theorems in Chapter 4 B.l.1 Proof of Theorems 4.1-4.3 B.1.2 Proof of Theorem 4.6 B.2 Proof of theorems in Chapter 5 B.3 Proof of theorems in Chapter 7 B.4 References
173 173 173 189 194 205 214
Author Index
215
Subject Index
217
157 158 159
161 162 163 164 164 165 167 168 168 172 172
Preface
Research in response-adaptive randomization developed as a response to a classical ethical dilemma in clinical trials. While clinical trials may provide information on new treatments that can impact countless lives in the future, the act of randomization means that volunteers in the clinical trial will receive the benefit of the new treatment only by chance. In most clinical trials, an attempt is made to balance the treatment assignments equally, but the probability that a volunteer will receive the potentially better treatment is only 1/2. Response-adaptiverandomization uses accruing data to skew the allocation probabilities to favor the treatment performing better thus far in the trial, thereby mitigating the problem to some degree. Response-adaptive allocation has a long history in the biostatistical literature, and the list of researchers who have worked (at least briefly) in the area reads like a Who’s Who of modern statistics: Anscombe, Chernoff, Colton, Cornfield, Flournoy, Greenhouse, Halperin, Louis, Robbins, Siegmund, Wei, Woodroofe, Zelen, and others. Largely because of the disastrous ECMO trial in the early 1980s, there was a general reluctance to use these procedures that has continued to this day. When the authors met in 1995, it was unclear whether these procedures were effective or could be adapted to modern clinical trials and whether certain fundamental questions could be answered. Our collaboration over the past 10 years has been an attempt to formalize the important questions regarding response-adaptive randomization in a rigorous mathematical framework and to systematically answer them. We generally had no idea that we were opening a can of worms that would require a demanding arsenal of mathematical tools. We set out to interest others in the problems, and this led to fruitful collaborations with many other investigators. This book is a result of xi
xii
PREFACE
these collaborations. It represents what we now know about the subject, and it is our attempt to form a mathematically rigorous subdiscipline of experimental design involving randomization. Two individuals were particularly influential: 2. D. Bai of Singaporc, whose collaborative work resulted in solutions to decades-old problems in urn models, largely forming the basis for Chapter 4; and L.-X. Zhang of China, whose collaborative work largely forms the basis of Chapter 5. This book is aimed at Ph.D. students and researchers in response-adaptive randomization. It provides answers to some of the fundamental questions that have been asked over the years: How does response-adaptive randomization affect power? Can standard inferential tests be applied following response-adaptive randomization? What is the effect of delayed response? Which procedure is most appropriate, and how can "most appropriate" be quantified? How can heterogeneity of the patient population be incorporated? Can response-adaptive randomization be performed with more than two treatments or with continuous responses? While the mathematics generated by these problems can sometimes be daunting, the response-adaptive randomization procedures themselves can be implemented in minutes by adding a loop to a standard randomization routine. Procedures can be simulated under various parameterizations to determine their appropriateness for use in clinical trials. Our hope is that any future objections to the use of response-adaptive randomization will not be based on logistical difficulties or the lack of theoretical justification of these procedures. Most of the book is written at the level of graduate students in a statistics program. The technical portions of the book are mostly relegated to appendices and to brief descriptions in Chapters 4 and 5 . That material requires advanced probability and stochastic processes as well as matrix theory. Prerequisite material can be found in Appendix A for those wishing to pursue the technical details. In addition, it is recommended that readers new to the area of response-adaptive randomization begin by reading Chapters 10-12 of Rosenberger and Lachin (Randomization in Clinical Trials, Wiley, New York, 2002). We would like to thank our colleagues 2. D. Bai, W. S. Chan, Siu Hung Cheung, Steve Durham, Nancy Flournoy, Bob Smythe, L.-X. Zhang, and Jim Zidek. In addition, we thank our current and former doctoral students Liangliang Duan and Thomas Gwise (Hu); Anastasia Ivanova, Yevgen Tymofyeyev, and Lanju Zhang (Rosenberger). Parts of this book were tested in a short course at a summer school in Torgnon, Italy, organized by Pietro Muliere of Bocconi University. We thank him and his students; in particular, the asymptotic variance in Example 5.9 was derived by the students of the course. As we point out in Chapter 10, open problems abound in this area, and it is our sincere hope that more talented researchers will be attracted to the beauty of the complex stochastic structures encountered throughout this book. However, our greatest hope is that, by providing a firm theoretical underpinning to the concept of response-adaptive randomization in this book, clinical trialists will be motivated to apply these techniques in practice. Finally, much of our research career has benefited from generous funding from the United States government. This included grants from the Division of Mathe-
PREFACE
xiii
matical Sciences, National Science Foundation: Hu and Rosenberger 2002-2005, Rosenberger 2005-2008, Hu (Career Award) 2004-2009; and the National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health: Rosenberger (FIRST Award) 1995-2000. These grants provided the opportunity to advance our research program, and we wish to recognize the importance of such funding for young researchers. F. H. Charlorresville Kqinia
W. F. R. FairfM. KTinia
This Page Intentionally Left Blank
1 Introduction
1.1 RANDOMIZATION IN CLINICAL TRIALS We begin with a mathematical formulation of the problem. Consider a clinical trial of n patients, each of whom is to randomly receive one of K treatments. A randomization sequence is a matrix T = (TI,...,T,)’ where T, = e,, j = 1, ...,K , i = 1,...,n and e, is a vector of zeroes with a 1 in the j-th position. We will typically be interested in exploring important properties of the randomization sequence, which usually involves deriving asymptotic properties of the allocation proportions, given by N ( n ) / n where , N ( n )= ( N l ( n ) ..., , N K ( ~ )and ) N3(n) = C,”=,T,,. Necessarily IIN(n)ll= N3(n)= n. Let X = (XI, ...,X,)’, where J!!$== ( X 1 l ,...,X,K),be a matrix of response variables, where X, represents the sequence of responses that would be observed if each treatment were assigned to the i-th patient independently. However, only one element of X , will be observable. Throughout the book, we will consider only probability models for X, conditional on T,. In some applications, there may be a set of covariate vectors 21, ...,2, that are also of interest. In this case, we will consider probability models for X , conditional on T, and 2,. Let 7, = a { T 1 ..., , T,,} be the sigma-algebra generated by the first n treatment assignments, let X, = a { X 1 ,...,X,} be the sigma-algebra generated by the first n responses, and let 2, = a{2 1 , ..., 2,) be the sigma-algebra generated by the first n covariate vectors. Let F, = I, @ X,,@ Zn+l. A randomization procedure is defined by
xK
1
2
INTRODUCTION
where 4n+lis F,-measurable. We can describe dn as the conditional probability of assigning treatments 1, ...,fc to the n-th patient, conditional on the previous n - 1 assignments, responses, and covariate vectors, and the current patient's covariate vector. We can describe five types of randomization procedures. We have complete randomization if +n = E ( T n l F n - 1 ) = E ( T n ) ; restricted randomization if
response-adaptive randomization if
covariate-adaptive randomization if
and covariate-adjusted response-adaptive (CARA) randomization if
This book will primarily focus on response-adaptive randomization. The latter two classes of randomization procedures, covariate-adaptive randomization and covariateadjusted response-adaptive randomization, have been poorly studied. Chapter 9 will give an overview of the current knowledge about CARA randomization procedures. The book by Rosenberger and Lachin (2002) gives a thorough discussion of properties of complete and restricted randomization as well as a less thorough treatment of covariate-adaptive and response-adaptive randomization. We now give examples of each of these types of procedures. 1.1.1 Complete randomization
The simplest form of a randomization procedure is complete randomization. For K = 2, Ti = ( 1 , O ) or ( 0 , l ) according to the toss of a coin. Then 2'11, ...,Tnl are independent and identically distributed Bernoulli random variables with the probability of assignment to treatment 1 given by d i 1 = E(T,l) = 1/2, i = 1, ...,n. This procedure is rarely used in practice because of the nonnegligible probability of treatment imbalances in moderate samples. 1.1.2
Restricted randomization procedures
Restricted randomization procedures are the preferred method of randomization for many clinical trials because it is often desired to have equal numbers of patients assigned to each treatment, or nearly so. This is usually accomplished by changing
RANDOMIZAJION IN CLINICAL TRIALS
3
the probability of randomization to a treatment according to how many patients have already been assigned to that treatment. Rosenberger and Lachin (2002) describe four different restricted randomization procedures. The first is the random allocation rule, where, for K = 2 and n even, the probability that the i-th subject is assigned to treatment 1 is given by
The second is the truncated binomial design which, for n even and K = 2, is given by 4i1
= 1/2, = 0,
= 1,
if if if
max{Nl(i - I ) , N z ( i- I)} < n/2, Nl(i - 1) = n/2, Nz(i- 1) =n/2.
The third is Efon's biased coin design (Efron, 1971). From the vector N ( i ) ,let Di = Nl(i)- Nz(i)be the imbalance between treatments 1 and 2. Define a constant a E (0.5,1]. Then the procedure is given by q5i1
= 1/2,
Di-1 = 0,
if
- n, if Di-1 < 0, = 1 - a, if Di-1 > 0.
For the fourth procedure, Wei S urn design, Wei (1978) proposed that Efron's procedure be modified so that the degree of imbalance influences the randomization procedure. One technique for doing this is to establish an urn model whereby balls in the urn are labeled 1,...,K , and each represents a treatment to be assigned. Let Y , be the urn composition,where Y,j is the number of balls labeled j in the urn after n patients have been assigned, j = 1, ...,K. One begins with an initial urn composition Y O= 1. Each randomization is accomplished as follows: a ball is drawn and replaced, its label noted, the appropriate treatment is assigned, and one ball from each of the other K - 1 labels is added to the urn. Thus the restricted randomization procedure is given by
A generalization of this procedure is found in Wei, Smythe, and Smith (1986). Let r(.)= ( n l ( . ) ..., , T K ( . ) ) be a continuous function such that the following relationship is satisfied: if Nj(i-1) 1 2-1
then aj
'77
(N (-i - 1)) s El 1
j = l,...,K.
4
INTRODUCTION
Then the restricted randomization procedure is given by
In practice, the random allocation rule and truncated binomial design are usually performed within blocks of subjects so that balance can be forced throughout the course of the clinical trial. Forcing balance within blocks of fixed or random size is called a permuted block design. Alternatively, Efron’s biased coin design and Wei’s urn design, while not forcing perfect balance, adaptively balance the treatment assignments. See Rosenberger and Lachin (2002) for details. 1.1.3
Response-adaptive randomization procedures
Response-adaptive randomization procedures change the allocation probabilities for each subject according to previous treatment assignments and responses in order to meet some objective. Two objectives that we will be principally concerned with for direct application in clinical trials are maximizing power and minimizing exposure to inferior treatments. Here we give two examples of response-adaptive randomization procedures: one a generalization of Wei’s urn design and the other a generalization of Efron’s biased coin design. One can generalize Wei’s urn design to establish a broad family of responseadaptive randomization procedures based on a generalized Friedman’s urn model (Athreya and Karlin, 1968). For response-adaptive randomization, balls are added to the urn based not only on the treatment assigned, but also on the patient’s response. Formally, the randomization of patient i is accomplished as follows: a ball is drawn from the urn composition Y,-1 and replaced, its label noted (say j ) , and the j-th treatment is assigned. One then observes the variable Xi and D 3 k balls are added to the urn, for Ic = 1, ...,K , where D j k is a measurable function on the sample space of X i . The response-adaptive randomization procedure is then given by
Note that if D j k = 1 - d j k with probability one, where dik is the Kronecker delta, we have Wei’s urn design. The use of the generalized Friedman’s urn model for response-adaptive randomization procedures derives from Wei and Durham (1978) for binary response clinical trials with K = 2. Let X i j = 1 if patient i had a success on treatment j and X i j = 0 if patient i had a treatment failure on treatment j . With this notation, if treatment j was not assigned, then X i j is not observable. Define Y o= ( a ,a ) for some positive integer a and let Djk
= d j k x i j -f (1 - d j k ) ( l - x i j )
(1.3)
for all i . When the procedure (1.2) is used, this is called the rundotnizedphy-thewinner rule.
RANDOMIZATION IN CLINICAL TRIALS
Wei (1979) extended the randomized play-the-winner rule to K as follows. Let Y O= crl and Djk
= ( K - 1)djkxij 4- (1 - djk)(l
5
> 2 treatments
- xij)
for all i and again use the procedure (1.2). Note that D = { D j k , j , k = 1,...,K},which we call the design matrix, is assumed to be a homogeneous matrix over all patients i = 1,...,n. This may not be a reasonable assumption in practice, and we expend some energy later in the book dealing with heterogeneous matrices. If we assume homogeneity, many of our asymptotic results that we develop will depend on the matrix H = E ( D ) ,which we call the generating matrix. As a second example, for K = 2, Eisele (1 994) and Eisele and Woodroofe (1995) describe a doubly-adaptive biased corn design. Unlike the generalized Friedman’s urn, the doubly-adaptive biased coin design is based on a parametric model for the response variable. Let the probability distributions of X I ,...,X, depend on some parameter vector 8 E 0.Let p ( 8 ) E (0,1>be a target allocation, i.e., the target proportion of subjects desired to be assigned to treatment 1, which depends on the parameter vector 8. Let g be a function from [0, 112 to [0,1] such that the following four regularity conditions hold: (i) g is jointly continuous; (ii) g(r,r) = r; (iii) g(p, T ) is strictly decreasing in p and strictly increasing in r on (0,l)’; and (iv) g has bounded derivatives in both arguments. At the i-th allocation, the function g represents the closeness of Nl(i - l ) / ( i - 1)to the current estimate ofp(8) in some sense. Hu and Zhang (2004) proposed the following function g for 7 2 0:
S(0,Y) = 1; d L Y ) = 0. This function does not satisfy Eisele’s regularity condition (iv), but it satisfies alternative conditions of Hu and Zhang (2004), which we will describe later in the book. Then, for K = 2, Nl(i - 1) , P ( b i - l ) ) , where 8i-lAis some estimator of 8 based on data from the first i - 1 subjects. When y = 0 and 8i-1 is the maximum likelihood estimator of 8, the procedure reduces to cpil
= P@i-1).
This is called the sequential maximum likelihoodprocedure, which has been studied by Melfi and Page (2000) and Melfi, Page, and Geraldes (2001). These two examples, one based on a class of urn models and the other on a class of adaptive biased coin designs, lead to an important distinction motivating the approach to response-adaptive randomization. The first approach is completely nonparametric
6
INTRODUCTION
and is not designed to target some specific allocation based on unknown parameters. The second approach begins with a parametric response model and a target allocation based on unknown parameters of that model and sequentially substitutes updated estimates of those parameters. This book will explore both approaches: the first approach in the context of various urn models and the second approach in the context of the doubly-adaptive biased coin design. In fhture chapters we will refer to the first approach as procedures based on urn models and the second approach as procedures based on sequential estimation. 1.1.4
Covariate-adaptive randomization procedures
Clinical trialists are often concerned that treatment arms will be unbalanced with respect to key covariates of interest. To prevent this, covariate-adaptive randomization is sometimes employed. As one example, the Pocock-Simonprocedure (Pocock and Simon, 1975) is perhaps the most widely used covariate-adaptive randomization procedure. Let 21, ...,2, be the covariate vector of patients 1, ...,n. Further, we assume that there are S covariates of interest (continuous or otherwise) and they are divided into n,, s = 1, ..., S, different levels. Define Nsik(n),s = 1, ...,S, i = 1, ...,n,, k = 1 , 2 to be the number ofpatients in the i-th level ofthe s-th covariate on treatment k. Let patient n + 1 have covariate vector Zn+l = ( T I , ...,T S ) . Define a metric D s ( n ) = N,,, ~ ( n-)Nsr,2(n),which is the difference between the numbers ofpatients on treatments 1 and 2 for members of level T , ofcovariate s. Let w1,...,w s be a set of weights and take the weighted aggregate D ( n ) = w,D,(n). Establish a probability 7r E (1/2,1]. Then the procedure allocates to treatment 1 according to
xfZl
46il
= E(TillZ-1,Zi) = 1/2, -
T,
= 1 -7r, 1.1.5
if if if
D ( i - 1) = 0, D ( i - 1) < 0, D ( i - 1) > 0.
Covariate-adjusted response-adaptive randomization procedures
A relatively new concept in response-adaptive randomization is CARA randomization, in which the previous patient responses and covariate vectors and the current patient’s covariate vector are used to obtain a covariate-adjusted probability of assignment to treatment. One approach to this, for binary response ( X i j = 0 or 1, i = 1, ...,n,j = 1, ...,K ) using a logistic regression model, was given by Rosenberger, Vidyashankar, and Agarwal(200 I). Consider the covariate-adjusted logistic regression model, given by
where a is the intercept, p is the treatment main effect, -y is an S-dimensional vector of covariate main effects, and 6 is an S-dimensional vector of treatment-by-covariate interactioqs. The fitted model acer n patients yields maximum likelihood estimators bn of a, 0, of P, of 7, and 6, of 6. The covariate-adjusted odds ratio based on
+,
RESPONSE-ADAPTIVE RANDOMIZATION IN A HISTORICAL CONTEXT
7
+
the data for n patients and the covariate vector of the (n 1)-th patient is given by
Then we define the randomization procedure by
1.2
RESPONSE-ADAPTIVE RANDOMIZATION IN A HISTORICAL CONTEXT
In this book, the response-adaptive randomization procedures have three defining characteristics: (1) they are myopic; (2) they are fully randomized; (3) they require a fixed sample size. Let us examine these characteristics in a historical context. The response-adaptive randomization procedures we examine are myopic procedures in that they incorporate current data on treatment assignments and responses into decisions about treatment assignments for the next subject. After each subject, the data are updated, and a decision is made only about the next subject. This differs from the approach in multi-armed banditproblems (see Berry and Fristedt, 1986, for example) in which all possible sequences of treatment assignments and responses are enumerated, and the sequence that optimizes some objective is selected. The latter approach is computationally intensive, so myopic procedures are often preferable from a logistical consideration, although there is no guarantee that such procedures are globally optimal. Most of the work on multi-armed bandit problems has been in the context of nonrandomized studies, although recently there has been some work on randomized bandits by Yang and Zhu (2002). In addition to being myopic, the response-adaptive randomization procedures described are fully randomized in that each subject is assigned a treatment by random chance, the cornerstone of the modern clinical trial. Randomization is necessary to prevent selection bias, covariate imbalances, and to provide a basis for inference (Rosenberger and Lachin, 2002). Historically, response-adaptivetreatment allocation designs were developed for the purpose of assigning patients to the better treatment with probability 1 . The preliminary ideas can be traced back to Thompson (1933) and Robbins (1952) and then to a series of papers in the 1960s by Anscombe (1963), Colton (1963), Zelen (l969), and Cornfield, Halperin, and Greenhouse (1969). The idea of incorporating randomization in the context of response-adaptive treatment allocation designs stems from Wei and Durham (1978). Finally, the response-adaptive procedures in this book assume a fixed sample size. Historically, response-adaptive treatment allocation designs were often viewed in the context of sequential analysis, where a random number of patients N would be determined according to an appropriate stopping boundary. Early papers in this context were by Chernoff and Roy (1 965), Flehinger and Louis (1 97 l), Robbins and Siegmund (1 974), and Hayre (1 979). However, most modem clinical trials in the
8
INTRODUCTION
United States are conducted with a fixed, predetermined sample size set according to power and budgetary considerations. Interim monitoring plans are instituted to potentially stop the clinical trial early for safety or efficacy reasons. The imposition of an early stopping boundary is flexible, and in fact, many clinical trials that cross an early stopping boundary are not terminated early, usually for reasons unrelated to the primary outcome of the study. Thus we restrict our attention to myopic, hlly randomized, fixed sample size procedures, which reflect very well the modem conduct of clinical trials. In fact, to implement these procedures in an actual clinical trial requires nothing more than a simple modification of the randomization sequence generator. (Of course, there are many interesting and important statistical questions that arise in the context of response-adaptive randomization, which will be the subject of this book.) The methodology for such response-adaptive randomization procedures thus begins in the late 1970s, and much of our current understanding is very recent. Many open problems still abound, and this is a fruitful area for hture research. 1.3
OUTLINE
OF THE BOOK
In Chapter 2, we identify fundamental questions of response-adaptive randomization, namely, what allocation should we target to achieve requisite power while resulting in fewer treatment failures, and how do we employ randomization to attain that target allocation? In Chapter 3, we describe the impact on likelihood-based inference resulting from response-adaptive randomization procedures and derive a benchmark for measuring the efficiency of a response-adaptive randomization procedure. We then detail the two approaches to response-adaptive randomization: in Chapter 4, we describe procedures based on urn models and derive their asymptotic properties; in Chapter 5 , we describe procedures based on sequential estimation and derive their properties. In Chapter 6, we deal with the subtle concept of sample size computation in the context of random sample fractions. Chapter 7 discusses additional technical and practical considerations for response-adaptive randomization: delayed responses, heterogeneity, and more general response types and multiple treatments. Chapter 8 describes some practical considerations in selecting response-adaptive randomization procedures. Chapter 9 gives an overview of our current knowledge about C A M procedures. Two appendices describe certain mathematical background that forms a prerequisite for the more technical parts of the book and gives exhaustive proofs of some of the most important theorems in Chapters 4 and 5 . 1.4
REFERENCES
ANSCOMBE,F. J. (1963). Sequential medical trials. Journal ofthe American Statistical Association 58 365-384. ATHREYA,K. B. AND KARLIN,S. (1968). Embedding of urn schemes into continuous time Markov branching processes and related limit theorems.
REFERENCES
9
Annals of Mathematical Statistics 39 1801-1 8 17. BERRY,D. A. AND FRISTEDT, B. (1986). Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London. CHERNOFF, H. AND ROY, s. N. (1965). A Bayes sequential sampling inspection plan. Annals of Mathematical Statistics 36 1387-1407. COLTON,T. (1963). A model for selecting one of two medical treatments. Journal of the American Statistical Association 58 388400. CORNFIELD, J., HALPERIN, M., AND GREENHOUSE, S. W. (1969). Anadaptive procedure for sequential clinical trials. Journal of the American Statistical Association 64 759-770. EFRON,B. (1971). Forcing a sequential experiment to be balanced. Biometrika 58 403-4 17. EISELE,J. R. (1994). The doubly adaptive biased coin design for sequential clinical trials. Journal of Statistical Planning and Inference 38 249-261. EISELE,J. R. AND WOODROOFE, M. (1995). Central limit theorems for doubly adaptive biased coin designs. Annals of Statistics 23 234-254. FLEHINGER, B. J. AND LOUIS, T. A. (1971). Sequential treatment allocation in clinical trials. Biometrika 58 4 19-426. HAYRE,L. S. (1979). Two-population sequential tests with three hypotheses. Biometrika 66 465-474. Hu, F. A N D ZHANG,L.-x. (2004). Asymptotic properties of doubly adaptive biased coin designs for multi-treatment clinical trials. Annals of Statistics 32 268-30 1. MELFI,V. F. AND PAGE,C. (2000). Estimation after adaptive allocation. Journal of Statistical Planning and Inference 87 353-363. MELFI,V. F., PAGE, c.,AND GERALDES,M. (2001). An adaptiverandomized design with application to estimation. Canadian Journal of Statistics 29 107-116. POCOCK, S. J . AND SIMON, R. (1975). Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. Biometrics 31 103-115. ROBBINS, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 58 527-535. ROBBINS,H. AND SIEGMUND, D. 0. (1974). Sequential tests involving two populations. Journal of the American Statistical Association 69 132-1 39. ROSENBERGER, W. F. AND LACHIN,J . M. (2002). Randomization in Clinical Trials: Theory and Practice. Wiley, New York. ROSENBERGER, W. F., VIDYASHANKAR, A. N., AND AGARWAL, D. K. (200 1). Covariate-adjusted response-adaptive designs for binary response. Journal of Biopharmaceutical Statistics 11 227-236. THOMPSON, W. R. (1933). On the likelihood that one unknown probability exceeds another in the view of the evidence of the two samples. Biometrika 25 275294. WEI, L. J. (1978). An application of an urn model to the design of sequential controlled clinical trials. Journal of the American Statistical Association
10
INTRODUCTION
73 559-563. WEI, L. J . (1979). The generalized Pblya’s urn design for sequeiitial medical trials. Annals of Statistics 7 29 1-296. W E I , L. J . A N D DURHAM, S. D. (1978). The randomized play-the-winner rule in medical trials. Journal of the American Statistical Association 73 840-843. WEI, L. J., SMYTHE, R. T., A N D SMITH,R. L. (1986). K-treatment comparisons with restricted randomization rules in clinical trials. Annals of Statistics 14 265-274. YANG, Y. AND ZHU, D.(2002). Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates. Annals of Statistics 30 100-121. ZELEN,M. (1969). Play the winner and the controlled clinical trial. Journal of the American Statistical Association 64 131-1 46.
3 Fundamental Questions of Response-Adaptive Randomization
In this chapter we describe some fundamental questions that will be useful in making decisions about which response-adaptive randomization procedure to use. We begin with a discussion of optimal allocation to target certain objectives. Then we formulate the relationship between a response-adaptive randomization procedure and power. We then combine the two concepts to develop a framework with which to evaluate response-adaptive randomization procedures. 2.1
OPTIMAL ALLOCATION
In this section we introduce an optimization problem of the following form. Let n = (n1,...,n ~ )where , nl' = n,be the sample sizes on the K treatments and let the probability distributions of responses depend on some parameter vector 0 E 0. For function 17 and constant C, we can consider the problem
w(0)n' subject to ~ ( n 0),= C, min ...,n K
nl,
(2.1)
where u ( 0 ) = ( w l ( 0 ) ,...,wK(0)) are a set of positive weights (see Hayre, 1979; Jennison and Turnbull, 2000). It will be convenient to illustrate with the very simplest of clinical trials, where two treatments (A and B) will be compared with respect to a binary response. Let p~ be the probability of success on treatment A and p~ be the probability of success on treatment B ( q A = 1 - P A , Qg = 1 - p g ) . Suppose we have a fixed allocation of TZAsubjects on A and ng subjects on B, where n~ ng = n. Consider designing
+
11
12
FUNDAMENTAL QUESTIONS
the trial to test the hypothesis
HO : P A - p~ = 0 versus H A : P A - p~ # 0 using the Wald test, given by
e~,
4~ represent the estimators. (We could alternatively use a score test where $ A , $ B , or likelihood ratio test.) In modern clinical trials, the first activity of the statistician is to determine a requisite sample size to achieve reasonable power, based on relevant assumptions about the treatment effect. Traditionally, many statisticians would fix the sample proportions as n A = n B = n / 2 and find the minimum sample size n that provides the desired power level. An alternate way of approaching the problem is to fix the variance of test under the alternative hypothesis and find the allocation n = ( n A ,n B ) to minimize the total sample size. Using (2.1) with 'WI ( 8 ) = w2(8)= 1, 8 = (PA,p ~ )and , (2.3)
we can answer a classical question from clinical trials: forfixed variance of the test statistic under an alternative hypothesis, what allocation minimizes the total sample size? This question is equivalent to asking, forfixed sample size, what allocation maximizespower? Let p(6) be a given proportion of patients assigned to treatment A. From the equation v(n,@) =c and dropping the dependence of p on 8, we have
PAQA +-=PBQB pn
(1 - P b
c,
letting n A = p n , n B = (1 - p ) n . Solving for n yields (2.4)
Then (2.1) is equivalent to minimizing (2.4) with respect to p. Upon taking a derivative, we need to find the solution to the equation
The solution is
OPTIMAL ALLOCATION
13
and is called Neyman allocation. Unfortunately, when P A f p g > 1, Neyman allocation allocates more subjects to the inferior treatment. This leads to a second fundamental question: f o r f i e d variance of the test, what allocation minimizes the expected number of treatment failures? In this case, 77 remains the same as in (2.3), and W l ( 0 ) = q A , W 2 ( 8 ) = q B . The solution to this problem is then
(Rosenberger et al.,, 2001). (For lack of a better name, we refer to this as RSIHR allocation, as an acronym of the authors of the original paper.) For the case where the responses are normally distributed, we can use the same approach with 8 = ( p A , ~ A , p ~ , u gwhere ),
Neyman allocation results when w l ( 8 ) = w2(8) = 1 and we obtain p(B) = U A / ( U A + ~ g ) .One possible analog of RSIHR allocation would be to minimize the average response, where wl(0)= and w2(0)= pg for p~ > 0, pg > 0. This results in p = OA f i / ( o ~ f i + o ~ f l ) . Unfortunately,this allocation does not always ensure that more patients are assigned to the better treatment, as in the binary case. Extending Neyman and RSIHR allocation to K treatments is conceptually clear using (2.1) but has a number of subtleties that make it a much more difficult problem. We denote the optimal allocation p(B), noting its dependence on unknown parameters. We have, in essence, described a classical nonlinear optimal design problem, in which the optimal solution depends on unknown parameters (see, for example, Atkinson and Donev, 1992). Traditionally such problems would be solved by substituting a local “best guess” of the parameters (locally optimal design), or by averaging over a prior distribution on the unknown parameters (Bayesian optimal design), or sequentially, by substituting data into parameter estimates as they accrue. The latter idea, of sequential design, is the basic premise of response-adaptive randomization. We wish to target an unknown optimal allocation by substituting accruing data into our design, where in our case the “design” is the randomization process. We now return to the general formulation for K treatments, given in (2.1). We modify it slightly and formulate the optimization as two equivalent problems (Tymofyeyev, Rosenberger, and Hu, 2006). Let 4 ( n l ,...,n K ) be the noncentrality parameter of a suitable multivariate test statistic of interest under the alterative hypothesis. We assume that the noncentrality parameter is a concave function with nonnegative gradient. The first problem can be stated as follows:
14
FUNDAMENTAL QUESTIONS
where C is some positive constant, and w = (wl, ...,w ~ ) is' a vector with positive components. Here we minimize the weighted sum of sample sizes while fixing the value of the noncentrality parameter to be at least at the level C. The constant B E [O,l/K],K B 5 1,is a lower bound for the proportion nk/ C,"=,nj that allows us to control explicitly the feasible region of the problem. By selecting L3 > 0, one eliminates the possibility of having no patients assigned to a single treatment. The case when B = 0 is the least restrictive natural constraint, and B = 1/K immediately generates the solution nk/ CE1nj = l/K, for all k = 1,...,K , which is equal allocation. Note that problem (2.7) is a convex optimization problem. The second problem is
where M is some positive constant. In this problem, we maximize the noncentrality parameter of the test for the fixed value of the weighted sum of sample sizes. Again, this is a convex optimization problem and we are interested in the region that is enforced by the same €3. It can be shown that formulations (2.7) and (2.8) are equivalent with regard to specifying the same allocation proportions of patients to the treatments (Tymofyeyev, Rosenberger, and Hu, 2006); i.e., K
K
If, for example, we let w = 1,problem (2.7) minimizes the total sample size subject to a constraint that the noncentrality parameter be at least C. Problem (2.8) maximizes the noncentrality parameter subject to a constraint that the total sample size does not exceed M . This solution (for B = 0) is the analog of Neyman allocation (2.5) for K treatments. Our principal interest is in the analog of RSIHR allocation in ( 7 3 , where w = q , for q = (91, ...,g K ) , so that wn' is the expected number of treatment failures. Unfortunately, a global solution for all w to (2.7) and (2.8) has yet to be found. For Neyman allocation, there is a closed-form solution, and it is given as Theorem 2 in Tymofyeyev, Rosenberger, and Hu (2006). We state it here without proof. For some positive integers s and g such that s g < K , let
+
Then the vector of optimal proportions when wk = 1, k = 1,...,K , given by
THE RELATIONSHIP BETWEEN POWER AND RESPONSE-ADAPTIVE RANDOMIZATION
15
with components
Pi p:+1
P k -g
solves both optimization problems (2.7), (2.Q for Ii E [O,B],where 13 = min(B1, B K , I / K } ,
(2.1 1)
When B > B and B = B 1 , the solution is p* = ( B ,..., B , P ; ( - ~ +..., ~ p;O , with P;C-,+~
= ... = p k = (1 - ( K - g)B)/g.
When Ii > B and B = B K ,the solution is p* = ( p ; , ...,p:, B, ...,B) with p; = ... = p: = (1- (K s)B)/s.
-
The assumption on the ordering of the Pk's in (2.9) is given to specify the ordering and multiplicities of the largest and smallest values of the underlying probabilities by s and g , respectively, for the nondegenerate case when all Pk's are not the same. Note that if B = 0, we obtain a solution on the boundary which involves only the best and the worst treatments:
and 1 &+, = ... = pfK = -(1 - sp;). 9
2.2
T H E RELATIONSHIP BETWEEN POWER AND RESPONSE-ADAPTIVE RANDOMIZATION
Although a particular allocation may be optimal in terms of power and other criteria, we cannot ensure that a fixed optimal allocation will result because the parameters
16
FUNDAMENTAL QUESTIONS
are unknown. Using a response-adaptive randomization procedure to target a specific allocation induces correlation among treatment assignments that can lead to extrabinomial variability that can adversely affect power. This leads to a fundamental question: can we develop response-adaptive randomization procedures that result in fewer treatment failures without a loss ofpower? To answer this, we need to do a careful analysis of the relationship between power and variability of the procedure. We now consider the relationship between power and some target allocation p ( 6 ) (which may or may not be optimal in some sense) following the approach of Hu and Rosenberger (2003). For the test in (2.2), when 2’ is asymptotically (for n A ca and ng -+ 00) chi-square with 1 degree of freedom, then under the alternative hypothesis, power can be expressed as an increasing function of the noncentrality parameter of the chi-square distribution for a fixed target allocation proportion p(B). Using the simple difference measure, the noncentrality parameter can be expressed as follows:
-
Now we define a function
We have the following expansion:
).(f
= f(0)
+ f’(0). + f”(0)Z2/2 +
0(X2).
After some calculation, we obtain
and This yields
+O ( ( n A / n - PI2)
E?
(I)
+ (11)+ (111)+ O ( ( n A / n -
p)2).
(2.12)
THE RELATIONSHIP BETWEEN POWER AND RESPONSE-ADAPTIVE RANDOMIZATION
17
The first term (I)is determined by p and represents the noncentrality parameter for a fixed design with target allocation p. Note that Neyman allocation maximizes this term. We can use this term to compare the power of different target allocations. The second term (11)represents the bias of the actual allocation from the optimal allocation. With the design shifting to a different side from the target proportion p, the noncentrality parameter will increase or decrease according to the coefficient
It is interesting to see that this coefficient equals 0 if and only if p A q A ( 1 = 0, that is
-P
) -~
pBqBp2
i.e.,Neyman allocation.
Under certain response-adaptive randomization procedures, the test statistic Z2, with N A ( ~and ) N B ( ~substituted ) for n~ and 128,still has the asymptoticchi-square distribution with 1 degree of freedom and the same noncentrality parameter under alternatives (with N A ( ~and ) N B ( ~replacing ) n~ and TZB),despite the complex dependence structure of the treatment assignments and response. The specific procedures for which the asymptotic properties hold and the conditions required will be the subject of Chapter 3. For response-adaptive randomization procedures for which the asymptotic chisquare distribution holds, we can substitute the random variable N A ( ~for) n A in equation (2.12). The noncentrality parameter then becomes a random variable which is a function 4 ( N ~ ( n ) We ) . then consider the expectation of the noncentrality parameter, given by E ( ~ ( N A ( ~Most ) ) ) procedures . will be asymptotically unbiased, so we can assume that E ( N ~ ( n ) l-np ) = 0 for large n, although the rate of convergence will be of interest. For small to moderate samples, this may not be an appropriate approximation. Assuming that the second term vanishes, the average power lost of the procedure is then a function of
) . we which is a direct function of the variability in the design V u r ( N ~ ( n ) / n So now have the precise link between power and the variability of the design. Thus we can use the variance of NA( n ) / nto compare response-adaptive randomization procedures with the same allocation limit. The above derivation determines a template for theoretically evaluating responseadaptive randomization procedures in terms of power. When the usual test statistic has an asymptotic chi-square distribution, power can be evaluated by examining three things: (1) the limiting allocation of the procedure; (2) the rate of convergence to the limiting allocation; and (3) the variability of the procedure.
18
FUNDAMENTAL QUESTIONS
2.3 T H E RELATIONSHIP FOR K
> 2 TREATMENTS
In this section, we explore the relationship between power and response-adaptive randomization for multivariate hypotheses of multiple ( K > 2) treatments in a clinical trial. The idea of optimizing the noncentrality parameter of a multivariate test to achieve a prespecified power derives from an early paper by Lachin (1977). Lachin (2000) presents the important alternative hypotheses for multivariate tests. The contrast test of homogeneity compares K - 1 treatments to a control. This is only one possible set of contrasts, which could also involve successive differences or each treatment versus the average over all treatments, among others. The most likely format for a clinical trial is the comparison of K - 1 treatments to a control, and this is the alternative on which we focus. We also explore an omnibus alternative that specifies that at least one treatment differs from a null value. Define p1, . . . , p ~to be the success probabilities for the K treatments and let p1 (0), ...,p ~ ( 0be) the optimal allocation proportions for each treatment. Provided that the noncentrality parameter i s convex with nonnegative gradient, we can use the formulation in (2.7) or (2.8) to derive the appropriate allocations. For the contrast test of homogeneity, define
P, = ( P I - P K , .-,Prc-1 - P K ) and@,, = ($1 - f i ~ ..., , #K-I
- fir<).
Let C, = Var($,) and kn be the estimator of C,. Then
A natural combination test statistic is
1-1
Under the null hypothesis that pi = p~ for i = 1,..., K - 1,we know that f i c X p: follows a chi-square distribution with K - 1 degrees of freedom. The noncentrality parameter is now
4 = P,X,'P:. By some matrix computations, we see that
THE RELATIONSHIP F O R K
>2
TREATMENTS
19
+
Now we can replace ni by n [ p , (ni/n- p i ) ] . By a Taylor expansion, the noncentrality parameter (divided by n ) is given by
K-1
where
and
Similar to the case with K = 2 , the noncentrality parameter is a decreasing or increasing function of the bias (ni/n- p i ) , depending on the coefficient. As in the case when K = 2, with a response-adaptive randomization procedure, we consider the expected noncentrality parameter. Again, this is only appropriate for those procedures in which the unconditional asymptotic distribution is chi-square with K - 1 degrees of freedom. If E ( N i ( n ) / n- p i ) = 0, then the expected noncentrality parameter is a function of
which is determined by the variance-covariancematrix of the treatment assignments. Now consider the omnibus test, given by
Ho : p l = p2 = . * . = p~ = po versus H A : pi # po for some i for somenull valuepo (40 = 1-PO). Nowwe letp = ( P I , ...,p K ) , P = ($1, The test statistic is given by
(P - 1Po)%:(P - IPO)’,
...,$ K ) .
20
FUNDAMENTAL QUESTIONS
The noncentrality parameter is given by
K
=
>:qpi -po)2
i=l
Piqi
Note that, in contrast to the homogeneity test, this does not depend on the variance of (Nl(n), ...,NK(n)), since we are not estimatingpo. 2.4
ASYMPTOTICALLY BEST PROCEDURES
We return to the simple binary response case with K = 2. Section 2.2 suggests a template for comparing response-adaptive randomization procedures with respect to power. For two procedures targeting the same allocation, one can simply compare the variability of the procedures. Following Hu, Rosenberger, and Zhang (2006), we now describe an asymptotically best response-adaptive randomization procedure for targeting a certain allocation p ( 8 ) as one that attains a lower bound on the asymptotic variance of N A(n)/n.
THEOREM 2.1. Suppose the following regularity conditions hold: 1. PA,PS E (0~1).
2. NA(n)/n converges to p ( p ~ , p ~E ) ( 0 , l ) almost surely for the particular response-adaptive randomization procedure. 3. fi(NA(n)/n - p ( p ~ , p ~converges )) in distribution to N ( 0 ,v) for the particular procedure.
Then a lower bound on the asymptotic variance of N A(n)/n is given by
We refer to a response-adaptive allocation procedure where v equals this lower bound as asymptotically best for that particular target allocation p ( p ~p, ~ ) .
+
EXAMPLE 2.1. For many urn models, NA(n)/n -+ qB/(qA qB) almost surely. When this is the case, the asymptotically best procedure for targeting this allocation
REFERENCES
21
will have asymptotic variance
EXAMPLE 2.2. Suppose we wish a procedure to converge to Neyman allocation, as defined in (2.5). Then an asymptotically best procedure for targeting Neyman allocation will have asymptotic variance
Note that any procedure targeting Neyman allocation with this asymptotic variance will be the most powerful procedure. EXAMPLE 2.3. Now suppose we wish to minimize expected treatment failures for fixed power. Any procedure where the allocation proportion converges almost surely to RSIHR allocation (7.5) will be asymptotically best if it has asymptotic variance 1
.In(&+
4
3
3
(% $8 +
*
Thus we have a benchmark for determining the best procedure for targeting a certain allocation. In Chapter 3 we will prove a very general version of this result. 2.5
REFERENCES
A N D DONEV,A. N.(1992). Optimum Experimental Design. Clarendon Press, Oxford. HAYRE, L. S. (1979). Two-population sequential tests with three hypotheses. Biometrika 66 465-474. Hu, F. AND ROSENBERGER, W. F. (2003). Optimality, variability, power: evaluating response-adaptive randomization procedures for treatment comparisons. Journal of the American Statistical Association 98 67 1-678. Hu, F., ROSENBERGER, W. F., AND ZHANG,L.-X. (2006). Asymptotically best response-adaptive randomization procedures. Journal of Statistical Planningand Inference 136 191 1-1922. JENNISON, C. AND TURNBULL, B. W. (2000). Group SequentialMethods with Applications to Clinical Trials. Chapman and HalVCRC, Boca Raton. LACHIN,J. M. (1977). Sample size determinations for T x c comparative trials. Biornetrics 33 3 15-324. LACHIN,J. M. (2000). Biostatistical Methods: The Assessment of Relative Risks. Wiley, New York. ROSENBERGER, W. F., STALLARD, N., IVANOVA, A., HARPER,C. N.,A N D RICKS,M. L. (2001). Optimal adaptive designs for binary response trials. Biornetrics 57 909-9 13. TYMOFYEYEV, Y.,ROSENBERCER, W. F., AND HU, F. (2006). Implementing optimal allocation in sequential binary response experiments, submitted.
ATKINSON,A. C.
This Page Intentionally Left Blank
3 Likelihood-Based Inference
In this chapter, we examine properties of maximum likelihood estimators following a response-adaptive randomization procedure. In general, if the allocation proportions converge almost surely to a constant limit as n + 00, the maximum likelihood estimators will have asymptotic properties similar to those arising from independent and identically distributed sequences. However, proving this for various procedures has been a major topic of the response-adaptive randomization literature. Here we give a very general result for exponential families. While we could establish the results for more general families of distributions, the result for exponential families will be sufficient for the response-adaptive randomization procedures described in this book. A consequence of the asymptotic distribution of the maximum likelihood estimators is that we can find a lower bound on the variability of the randomization procedure, as described in Section 2.4. We prove this in a very general theorem. We begin with a useful data structure for general I( treatments that will facilitate the theoretical developments (Hu, Rosenberger, and Zhang, 2006).
3.1
DATA STRUCTURE AND LIKELIHOOD
Using the data structure from Section 1.1, we assume that X and identically distributed, with
xlj
1,
...,X , are independent
f j ( . , e j ) ,=j 1,...,K ,
whereej E Oj. WethusassumethatX,isindependentofX1, ...,X m - 1 , but that T, depends on X I ,...,X,-1, 2'1, ...,Tm-l,rn = 1, ...,n.
2'1,
...,T,, 23
24
LIKELIHOOD-BASED INFERENCE
EXAMPLE 3.1. If K = 2 and we use the randomized play-the-winner rule from Section 1.1.3, then f l ( . , O l ) is Bernoulli(pA) and f2(*,82) is Bernoulli(pr,). Also T , = (1,O) if treatment A was assigned to the m-th patient, and T , = (0, l ) if treatment B was assigned. Then X , = ( X m 1 ,X m z ) , where X,1 N f l and Xm2 f2. Clearly X , is independent of X I ,...)X,-1 and also of T I..., , T,, but we only observe the element of X , corresponding to the element of T , that is 1. However, T, depends on all previous treatment assignments and responses according to the urn model.
-
EXAMPLE 3.2. Suppose K = 2 and we use the doubly-adaptive biased coin design from Section 1.1.3 with normal responses. Then 81 = ( P A , U A ) and 8 2 = ( p ~UB), ) andfl(., 81) isNormal(pA, p i ) andf2(., 82) isNorrnal(pB, u;). IfweuseNeyman allocation, then p ( 8 ) = U A / ( U A + U E ) . Again T , = (1,O) if treatment A was assigned to the m-th patient, and T , = (011) if treatment B was assigned. Then Xm = ( X m l ,Xm2),where Xml N fl and X,z fi. Clearly X, is independent of X I ,...,X,-1 and also of T I ,...,T,, but we only observe the element of X , corresponding to the element of T , that is 1. However, T , depends on all previous treatment assignments and responses according to the doubly-adaptive biased coin design.
-
Since X I )...X n are independent and identically distributed, we can therefore write the likelihood of the sample as n
K
K
EXAMPLE 3.1 (CONTINUED).The likelihood is proportional to
The sufficient statistics are
Note that ( S A ( ~S,(n)) ), would be the resulting sufficient statistics from the usual independent Bernoulli sampling. In that case, N A ( ~is) independent of ( p p ~~ ~) (although it may still be a random variable, such as following complete randomization or some restricted randomization procedures), and hence is ancillary. By Basu's theorem, ( S A ( ~S) ,B ( ~ )are ) independent of N A ( ~ )and , it is appropriate to use conditional inference procedures. This is not the case following response-adaptive randomization, and conditional inference is never appropriate. In fact, we have seen in Chapter 2 that the power function depends on the statistic N A ( 7 L ) . This has an implication in calculating appropriate sample sizes for clinical trials, as we will see in Chapter 6.
ASYMPTOTIC PROPERTIES OF MAXIMUM LIKELIHOOD ESTIMATORS
25
3.2 ASYMPTOTIC PROPERTIES OF M A X I M U M LIKELIHOOD ESTIMATORS
We now present a very general theorem on the asymptotic properties of maximum likelihood estimators following a response-adaptive randomization procedure. This result was first proven for K treatments by Rosenberger, Flournoy, and Durham (1997), but their conditions were more restrictive and do not apply to several types of response-adaptive randomization procedures. Essentially all that is required in the theorem presented here is that the allocation proportions converge to a constant. While the proof is given for exponential families for convenience, it is possible to extend this result to more general models.
THEOREM 3.1. Assume the following regularity conditions: 1. The parameter space Oj is an open subset of Rd, d 2 1, for j = 1, ...,K.
..., f~(.,O K ) follow an exponential family. 3. For limiting allocation p ( 8 ) = ( p l ( 8 ) , ..., p K ( 8 ) ) E (0, I)K,
2. The distributions fi(.,
el),
almost surely f o r j = 1, ...,K. Then b is strongly consistent for 8 and
J;(B - e ) - + N ( o ,P ( e ) ) , in distribution, where I ( 0 ) = diag(pl(8)11(4), ...,P K ( ~ ) I K ( ~and K))
is the Fisher’s information for a single observation on treatment j = 1, ...,K .
EXAMPLE 3.1 known that
(CONTINUED).
For the randomized play-the-winner rule, it is well
26
LIKELIHOOD-BASED INFERENCE
almost surely (this is shown in Chapter 4). By Theorem 3.1, we have
in distribution. This result was first noted by Wei er a/. (1990).
EXAMPLE 3.2 (CONTINUED). For the doubly-adaptive biased coin design targeting Neyman allocation, we have
in distribution. Similarly we can give the asymptotic distribution of &A and &g. As a consequence of Theorem 3.1, since many test statistics are just functions of the maximum likelihood estimators, we can easily determine the asymptotic null distribution of the test. For example, in Chapter 2, we stated that under certain conditions the Wald test of the difference of two proportions will be standard normal following response-adaptive randomization. From Theorem 3.1, any response-adaptive randomization procedure for which the allocation proportions converge almost surely to a constant will admit an asymptotically standard normal test. The standard way of proving asymptotic normality of maximum likelihood estimators is to expand the likelihood in a multivariate Taylor series. One then applies the multivariate martingale central limit theorem. Both of these techniques are described in Appendix A. PROOF OF
THEOREM 3.1. It is sufficient to show that (3.1)
j = 1, ..., I<, as
Let a n d & = ~(X1,...,Xn,T1,...,T,+l). Then { Y , = (K,1, ... , K , K ) , A ~i ;= 1,2, . . .} is a martingale sequence with
THE GENERAL RESULT FOR DETERMINING ASYMPTOTICALLY BEST PROCEDURES
27
Conditions of the martingale central limit theorem (see Appendix A) hold by regularity conditions 2 and 3, and it follows that
It remains to show (3.1). For fixed j = 1 , 2 , . . , K, define q(j) = min{k : N j ( k )= z } = min{k > q - l ( j ) : T k j = l},where min{8} = +m. Let {qi,j}be an independent copy of { X i , j } ,which is also independent of {Ti}. Define & j = &,(j),,Z{q(j) < oo} qi,jZ{~i(j)= oo},a = 1 , 2 , . .. Then { X i , j ;i = 1 , 2 , . . .} is a sequence of i.i.d. random variables, with the same distribution as X l j . Let
.
.
+
-
and let 8nr,(n),jmaximize z(Sj). Then on the event {Nj(n)+ oo}, I
-
A
-
Lj(Bj) = L,(Bj; N j ( n ) ) and B j = ONj(+.
(3.2)
Notice that {zi,j: i = 1 , 2 , . . . ,n } are independent and identically distributed. Under regularity condition 2, we have
By (3.2) and regularity condition 3, (3.1) follows. Strong consistency is an immediate consequence of regularity conditions 2 and 3. 0
3.3 THE GENERAL RESULT FOR DETERMINING ASYMPTOTICALLY BEST PROCEDURES
Once we have the asymptotic normality of the maximum likelihood estimators, it is then straightforward to determine a lower bound on the asymptotic variance of the allocation proportions. Any procedure attaining this lower bound will be asymptotically best, as described in Section 2.4.
THEOREM 3.2. Assume that the regularity conditions of Theorem 3.1 hold. In addition, assume that regularity condition 4 holds: 4. For positive definite matrix V(O),
in distribution.
28
LIKELIHOOD-BASED INFERENCE
Then there exists a 00C 0 = 018 . . 8 O K with Lebesgue measure 0 such that for every 8 E 0 - O0,
We can now rigorously define an asymptotical[y best response-adaptiveprocedureas one in which V ( 0 )attains the lower bound
for a particular target allocation p ( 8 ) .
PROOFOF THEOREM 3.2. From the proof of Theorem 3.1, we know that { xi,j: i = 1 , 2 , ...,m} are independent and identically distributed. Therefore the RaoCramer lower bound of p ( 8 ) is
Because N ( n ) / nis an asymptotically unbiased estimator of p ( 8 ) and satisfies regularity conditions 2 and 4, by Theorem 4.16 of Shao (1999, p. 249) the theorem follows directly. 0 3.4
CONCLUSIONS
In Chapters 2 and 3 we have presented critical results that establish some of the properties necessary for a response-adaptive randomization procedure to be useful in practice. These properties emanate from a thorough understanding of the asymptotic properties of the allocation proportions. Provided they converge almost surely to constants between 0 and 1, we can then use standard inferential techniques based on an asymptotic standard normal null distribution following the clinical trial. Furthermore, we can determine an appropriate response-adaptive randomization procedure by choosing a procedure that converges quickly to the desired allocation with small variability. For the researcher in response-adaptive randomization, it is necessary to prove the asymptotic normality of the allocation proportions and find the asymptotic variance. This is usually a challenging task. One can then compare this asymptotic variance to that of other procedures, and to the lower bound provided in this chapter, to determine the value of that procedure. In the next two chapters we will evaluate two large families of response-adaptive randomization procedures in this context. 3.5
REFERENCES
HU,F., ROSENBERGER, w.F.,
AND ZHANG,L.-x.(2006). Asymptotically best response-adaptive randomization procedures. Journal of Statistical Planning and Inference 136 191 1-1922.
REFERENCES
29
ROSENBERGER, W. F., FLOURNOY, N.,AND DURHAM, S. D. (1997). Asymptotic normality of maximum likelihood estimators from multiparameter response-drivendesigns. Journal of Statistical Planning and Inference 60 69-76. SHAO,J. (1 999). Mathematical Statistics. Springer, New York. WEI, L. J., SMYTHE, R. T., LIN, D.Y . ,AND PARK,T. S. (1990). Statistical inference with data dependent allocation rules. Journal of the American Statistical Association 85 156-1 62.
This Page Intentionally Left Blank
4 Procedures Based on Urn Models
We now explore a large class of response-adaptive randomization procedures that are based on urn models. Urn models have a long history in the probability literature and induce many interesting stochastic processes. Our particular interest is in the subclass of randomized urn models, consistent with the terminology used in the classic book by Johnson and Kotz (1 977). In randomized urn models, balls are replaced according to some probability distribution. A thorough review of randomized urn models for response-adaptive randomization can be found in Rosenberger (2002), but the paper only summarizes the main theoretical results. In addition, many important theoretical results have been published since that paper appeared. We explore two classes of urn models useful for response-adaptive randomization. The first is the generalized Friedman’s urn, introduced in Section 1.1.3; the second is the class of ternary urns, 4.1 4.1.1
GENERALIZED FRIEDMAN’S URN Historical results on asymptotic properties
We continue the discussion of the generalized Friedman’s urn from Section I . 1.3. Athreya and Karlin (1968) describe an urn with balls labeled 1, ...,K , with initial composition YO. A ball is drawn at random, its label noted, and the ball is replaced. If its label was j,an additional random number Djk balls with label k = 1, K are added to the urn. Thus D = ( ( D j k ) ) defines a random matrix, and we assume that it has expectation H = E ( D ) .Under the following regularity conditions:
I. Pr(Djk = 0 for all k = 1,. . . l K ) = 0; 31
32
PROCEDURES BASED ON URN MODELS
2. E(Dj,+log D j k ) < 00 for all j , k ;
3. H is strictly positive. Athreya and Karlin (1967, 1968) prove many important asymptotic properties of the generalized Friedman’s urn. First-order asymptotic properties of the urn depend on the eigenstructure of H . Let A1 be the largest eigenvalue of H and let v = (211, ...,V K )be the leA eigenvector corresponding to XI, normalized so that v .l = 1. Regularity condition 3 ensures that A1 is real and simple. Under regularity conditions 1-3, Athreya and Karlin (1968) prove that
almost surely as n -+
00.
In Athreya and Karlin (1967), they prove that
N,o + w i n
almost surely as n -+
00.
EXAMPLE 4.1. (Wei and Durham, 1978.) For the randomized play-the-winner rule, with the same response distribution as in Example 3.1, we have
H = [
PA (?B
(?A
p B ] *
The left eigenvector corresponding to A1 = 1 is
’
(?A -k (?B QA $- (?I3
The proportion of balls labeled A tends as
c:==l&ynl
.--)
QB
QA +qB
almost surely, and the proportion of patients assigned to A , N1 ( n ) -+-
n
QB (?A
+ qB
almost surely, the limiting allocation given in Example 2. I . Athreya and Karlin (1967) also investigate the asymptotic distribution of the urn composition Y,.Here they need to strengthen regularity condition 2:
2’. E(D$)< 00 for all j , k. Let X # A1 be another eigenvalue with associated right eigenvector 6. Then we have the following trichotomous limiting result on a linear eigenvector functional of the
GENERALlZED FRIEDMAN'S URN
urn composition, 6 . Y,. Under regularity conditions I , 2' and 3: if 2 f i e ( A ) then n-l/'E . Y , -+ N ( O c, )
33
< XI,
in distribution, where c is some constant; if 2Re(X) = XI, then
( n logn)-'/'< Y ,
+
N ( O c1) ,
in distribution, where c1 is some constant; if 2Re(X) > XI, then
<.Y,+
n-X/X'
w
almost surely, where W is an unknown random variable. Athreya and Karlin's methods offer no clues as to how to find c and c1. The form of the random variable W remains an open problem.
EXAMPLE 4.1 (CONTINUED). For the randomized play-the-winner rule, we have the following trichotomous limiting result. If p A p~ < 3/2, then
+
in distribution. If p A
+ p~
= 3/2, then
in distribution. If p A
+ p~
> 3/2, then
almostsurely. Rosenberger(l992)showedthatc = q A q B / ( q A + q B ) ( 2 q A + 2 q ~ - l ) . The random variable W depends on the initial urn composition, and hence the limiting behavior makes the randomized play-the-winner rule unattractive in practice when P A + p B > 3/2. Of more interest in determining important properties of the procedures based on the generalized Friedman's urn model, from the developments in Chapter 2, is the limiting distribution of the allocation proportions, N ( n ) .Athreya and Karlin (1968, p. 275) stated, "...It is suggestive to conjecture that { N l ( n ) ,,...)N K ( n ) }properly normalized is asymptotically normal. This problem is open." Athreya and Karlin's conjecture was eventually proved by Smythe (1996). Smythe (1996) used a slightly more general setup, calling the model the extended Pdlya urn (EPU).It assumes the following regularity conditions on H: 4. h j k >_ 0 for all j # k and
zf==,= p > 0.
5 . H has simple eigenvalues {XI,
N
= P'diag(X1,
hjk
...,A K }
and admits the Jordan decomposition
...)XK}Pfor some orthogonal matrix P .
34
PROCEDURES BASED ON URN MODELS
Condition 4 allows for the ball drawn not to be replaced. In fact, one could remove additional balls labeled i after drawing a ball labeled i, providcd there are always balls to remove. We refer to urns where balls can be removed as the class of birth anddeath urns, and these are considered in Section 4.2. Condition 5 is restrictive for some practical problems and is difficult to verify. Under regularity conditions 2', 3,4, and 5 , if 2 R e ( X 2 ) < XI,
in distribution. Furthermore, if 2h!e(X2) = XI,
in distribution. Smythe was not able to determine the general form of V and V1, although it has been found in certain special cases.
EXAMPLE 4.1 (CONTINUED).For the randomized play-the-winner rule, if P A < 312, then
+
PB
in distribution. If p A
+ p~
= 3/2, then
in distribution. Rosenberger (1992) showed that
Matthews and Rosenberger (1 997) showed that
and gave the form of the asymptotic variance in the (presumably) nonnormal case when P A p~ > 3 / 2 . The latter is a messy function of the initial urn composition
+
Yo. 4.1.2
Assumptions and notation
Bai and Hu (2005) were able to relax Smythe's regularity condition 5 so that the matrix H admits a standard Jordan decomposition. Using different techniques, they were able to find the form of V. Having the explicit form of the asymptotic variancecovariance structure of the allocation proportions thus allows a direct comparison
GENERALIZED FRIEDMAN’S URN
35
with other response-adaptive randomization procedures having limiting allocation v , as delineated in Chapter 2. The basic recursive formula for the generalized Friedman’s urn is given by Y n =Yn-1+
TnDn,
(4.1)
which implies
where H = ED,.
EXAMPLE 4.1
(CONTINUED).
T,Dn
Let X, = (X,l,X,z).
Thenusing(1.3), we have
= ( l , O ) i f T , = ( l , O ) a n d X , ~ = 1; = (1,O) if T, = ( 0 , l ) and Xn2 = 0; = ( 0 , l ) if T, = ( 0 , l ) and Xn2 = 1 ; = ( 0 , l ) if T , = ( 1 , O ) and X n 1 = 0.
Plugging into (4.1) completely defines the response-adaptive randomization procedure induced by the randomized play-the-winner rule. The homogeneity of the addition rule and the generating matrix is often not appropriate to assume in clinical trials. Examples are given in Section 4. I .4. Here we consider a more general urn model, which allows for a heterogeneous addition rule D , as well as random generating matrix Hi = E(Dil.Fi.-1).We will show that this general model includes the urn models in Section 4.1.1 as special cases. Here we state the three main theorems for the asymptotic properties of both Y , and N n . Using the notation defined in the introduction, Y , is a sequence of random K-vectors of nonnegative elements which are adapted with respect to {Fn}, satisfying
E(Yt1FL-i)= Y , - i M t ,
(4.3)
c,”=, x3.
+
where M , = I H a and a, = For generalized Friedman’s urn models, the addition rule D , is independent and identically distributed and therefore does not depend on . F , - 1 . Therefore, H a = H = ED,. In this case, it is homogeneous. Without loss of the generality, we assume a0 = 1 in the following. In general, the addition rule D , may depend on F,-1. In this case H a = E(Dzl.Fz-l).In the sequel, we consider this general model and we require the following assumptions.
ASSUMPTION 4.1. The generating matrix H , satisfies Hqk(i) 2 0 for all k , q and
K k= 1
Hqk(i) = c1, for all q = 1,...,K,
(4.4)
PROCEDURES BASED ON URN MODELS
36
almost surely, where H,k is the (q, k)-entry of the matrix Hi and c1 is a positive constant. Without loss of generality, we assume that c1 = 1 throughout.
ASSUMPTION 4.2. The addition rule Di is conditionally independent ofthe drawing procedure Ti,given Fi-1, and satisfies E(D~k+b(i)l.Fi-l) 5 C < 00 for all q , k = 1, ..., K and some 6 > 0.
(4.5)
Also we assume that cov[(D,k(i),Dql(i))lFi-l]+ dqkl almost surely for all q, k , 1 = 1, ...,I<, (4.6) where d , = ( d , k ~ ) & ~q, = 1,...,K , are some K x K positive definite matrices.
REMARK4.1. Assumption 4.1 defines the EPU model (Smythe, 1996), and it ensures that the number of expected balls added at each stage is a positive constant (here c1 = 1). So after n stages, the total number of balls in the urn, a,, should be very close to n (a,/n converges to 1). The elements of the addition rule are allowed to take negative values in the literature, which corresponds to the situation of withdrawing balls from the urn. But to avoid the dilemma that there are no balls to withdraw, only diagonal elements of Di are allowed to take negative values, which corresponds to the case of drawing without replacement.
ASSUMPTION4.3. Assume there exists a generating matrix H (fixed) such that the following condition holds almost surely: (4.7) Further, the limit generating matrix H is a K x K nonnegative and irreducible matrix. Assumption 4.3 guarantees that N has the Jordan form decomposition
with - A t
1
0
At
.
:.
Jt =
-
0
0
0
0
0 1 *,
.
1 . 1
0
1 . .
. **
.. .
At * * *
0 0 .
-
..
I
1 At
-
GENERALIZED FRIEDMAN'S URN
37
Moreover, the irreducibility of H also guarantees that the elements of the left eigenvector v = (211 I ...,up) associated with the positive maximal eigenvalue 1 are positive. Thus, we may normalize this vector to satisfy vi = 1. The matrix t = (1',th,...,t t ) , where 1 = (1, 1) and t h is the corresponding matrix of the Jordan form for J h . Further, we define t - = ( t i , t i ) and
cy==,
K
R = C V j d j + H*(diag(v)- v ' v ) H . j=1
REMARK 4.2. Condition (4.6) in Assumption 4.3 is very mild, just slightly stronger than llHi -+ 0; for example, it holds if the nonhomogeneous generating matrices Hi converge to a generating matrix H with a rate of (log z)-l-c for some c > 0, which is usually satisfied in most applications.
For some asymptotic results we need to strengthen Assumption 4.3 to 4 convergence.
ASSUMPTION4.4. Assume the following condition holds almost surely:
REMARK 4.3. Assumption 4.4 holds if the generating matrices Hi converge to a generating matrix H with a rate of i-1/2-c for some c > 0. However, in some applications, this condition is not satisfied. For example, for the urn model proposed in Bai, Hu, and Shen (2002), this condition is not satisfied. Generally, if the urn model depends on the estimation of parameters (see Zhang, Hu, and Cheung, 2006), then Assumption 4.4 is not satisfied. 4.1.3
Main asymptotic theorems
In this subsection, we use a very general Jordan form of the generating matrix H to prove the consistency and asymptotic normality of both the urn composition, Y,,and the allocation, N,. The techniques used in the proofs allow us to relax the constraint of a diagonal Jordan form, assumed by Smythe (1 996), and allow for the direct computation of the asymptotic variances. Sketches of the proofs are given in Section 4.1.5. Furthermore, our techniques are extremely general and allow for heterogeneous random generating matrices, which arise from certain responseadaptive randomization procedures. Some examples will be presented in the next subsection. We first treat the problem of consistency. Define
PROCEDURES BASED ON URN MODELS
38
THEOREM 4.1. Under Assumptions 4.1-4.3, for any K > 7 V 1/2 (here a V b = max(a, b ) ) , we have
and
n - n ( Y n- EY,)
-+
0 almost surely
n-&(N,- EN,)
-+
0 almost surely.
Further, if Assumption 4.4 holds, then Y -n -
n
and
v --+ 0 almost surely
N,n - v + 0 almost surely. n
We now give the main results on asymptotic normality.
THEOREM 4.2. Under Assumptions 4 . 1 4 . 3 with 7 5 1/2,
V,-'(Y, - EY,)
-+
N ( 0 ,(t-l)*xt-l)
in distribution, where C is specified below. Furthermore, if Assumption 4.4 holds, then E Y , can be replaced by nu. We now describe the form of C. To do this, we need to define hrther notation regarding the Jordan form. Let
When r < 1/2, Vn = fi,ui1 =
cq=l c,"==, EL,,vqdqki and K
K
where 5: is a K x (K - 1) matrix whose first row is 0 and the rest is a block diagonal matrix, the t-block is vt x vt and its ( h ,h e) element is [1/(1- Ah)]'+'. If I=22 is split into blocks, then the (w, t ) element of the (9,h) block (vg x vh) of C 2 2 is given by
+
When T = 1/2, ( ~ 1 1= 0 and CIZ= 0. The (w,t) element of the ( g , h ) block (vg x vh) of C22 is given by
[(v - 1 ) ! ] 7 2 v - l)-l[(t;)*Rt'h]@J)
GENERALIZED FRIEDMAN'S URN
39
if w = t = vrr= uh = v and A, = = 1/2. AH other elements are zero. In most applications, the generating matrix H has a diagonal Jordan form and T < 1/2. In this case, we have following simplified form. COROLLARY 4.1 Suppose Assumptions 4 . 1 4 4 hold and H has a diagonal Jordan form and 7 < 1/2; thus
t-'Ht=J=
[;
0 01,
0
A2
.
AK
0
where t = ( I f lt i , ...,t k ) . The variance-covariance matrix X = (oij)&] has the following simple form: K
K
K
and oij
= (1 - A2 - Xj)-'(t;)*Rt;,
where i, j = 2, ..., K. We now give the asymptotic normality of the allocation vector N,.
THEOREM 4.3. Under Assumptions 4.1-4.3 with 7 5 1/2, Vcl(N, - EN,)
+
N
(0,
(t-l)*zt-l)
in distribution, where 2 is specified below. Furthermore, if Assumption 4.4 holds, then EN, can be replaced by nu. We now describe how to compute 3 based on the Jordan form. Let s%E1+52+s;+2:3.
Whenr
< 1/2,
51 = t'_diag(v)t-, and
x2
= ttdiag(u)HtS,
PROCEDURES BASED ON URN MODELS
40
where 3 is defined after Theorem 4.2. If 2 3 is split into blocks, then the (w,t) element of the (9,h) block, C,,h (v, x vh), of 53 is given by
-
-
I
When T = 112, c1 = 0 and c2 = 0. For c3, we have EgJ, = 0 if A, # Ah or if Re(&) < 1/2. Otherwise, if A, = Ah and Re(&) = 1/2, Z,,h has only one nonzero element, which is the one on the right-lower comer of C , , h and given by IAJ2[(V
- 1)!]-2(2v - l)-'[(t;)"Rt~](l,l,.
When the matrix H has a diagonal Jordan form and following simplified form.
T
<
1/2, we have the
1,
COROLLARY 4.2. Suppose Assumptions 4.1-4.4 hold with 7 < 1/2 and 0
...
0
A2
0
0
AK
where t = (l',th, ...,t k ) .Now let azj
= (t,')'(tiiag(v) - v*vV)t;,b i j = A j ( 1 - Aj)-'(t:)*(diag(v) - v*v)tl,
and
+ (1 - A j ) - l ] ( l
czj = [(l- & ) - I
- s;i - A j ) - l ( t ; ) * R t l ,
for i, j = 2, ...,K . Then n-'12(Nn - nv) is- asymptotically normal with mean vector 0 and variance-covariance matrix ( t - ' ) * E t - l , where % = (L?ij)&=l has the following simple form:
~
1
-
-
=1 a i j = O i l = 0 and i j , j = aij
-
+ bij + b j i + c i j
f o r i , j = 2 ,..., K . Based on Corollary 4.2, we can then calculate the asymptotic distribution for most examples of practical interest, and we do this in the next subsection.
REMARK 4.4. Under Assumptions 4.1-4.3, we can obtain Theorems 4.1-4.3 for Y n - E Y n and N n - EN,. However, in applications, both E Y , and E N ,
GENERALIZED FRIEDMAN'S URN
41
are unknown and are usually very difficult to calculate. It is important to replace both of them by nv to use Theorems 4.1-4.3. Therefore, Assumption 4.4 plays a very important role in these theorems. As pointed in Remark 4.3, we can see that Assumption 4.4 is satisfied in most urn models in the literature.
4.1.4
Some examples
EXAMPLE 4.1 (CONTINUED).For the randomized play-the-winner rule, the generating matrix is
Further, we obtain
R= (qA
For the case qA have
-k
+ QB > 1/2 (T < l/2), we have v, = n. From Corollary 4.1, we
in distribution. For the asymptotic properties of N , , the values corresponding to Corollary 4.2 are
and
so
Thus. we have
42
PROCEDURES BASED ON URN MODELS
in distribution, where
These results were originally shown by Rosenberger (1992) using different techniques. For the case qA q B = 1/2, = n log n. We have
v,
+
in distribution. Also,
in distribution. When QA q B < 112, the asymptotic distribution is still unknown and is an open research problem. Matthews and Rosenberger (1997) found the exact variance, which depends on the initial urn composition.
+
EXAMPLE 4.2. (Wei, 1979.) For multi-arm clinical trials, Wei (1979) proposed the following urn model: starting from Y O= (YO', ...,YOIC), when a type k ball is drawn randomly from the urn, we assign the patient to the treatment k and observe the patient's response. A success on treatment k adds a ball of type k to the urn and a failure on treatment k adds 1 / ( K - 1 ) ball for each of the other Ii - 1 types. Let p k be the probability of success of treatment k , k = 1 , 2 , . . . ,K , and q k = 1 - p k . The generating matrix for this urn model is
After some calculation, we obtain X = 1, and its corresponding eigenvector is
From Theorem 4.1, we can show that
Y, n
almost surely.
---t
N, -+ v and n
GENERALIZED FRIEDMAN‘S URN
43
To obtain the asymptotic distribution of Y, and N,, we apply Theorem 4.2 and Theorem 4.3. First, we calculate the eigenvalues and eigenvectors of H. For simplicity, we consider K = 3 here. The generating matrix is then
H=
[
pi O.%i 0.591 p2 0.592 0.592 0.593 0.5q3 P3
1
.
After some tedious calculations, the eigenvalues of H are
+ +
where C = ,/q: 9; qg - qlq2 - 9193 - @q3. Here A2 = A 3 if and only if C = 0, which is the case q1 = q2 = 43. We consider the case that C # 0 and A2 < 112, so the generating matrix H has a simple Jordan decomposition, given by
t-’Ht=
[2 0
A2
0
0 0
0
A3
1,
with matrix
The left eigenvector (associated with the maximal eigenvalue 1) is
The corresponding di (i = 1,2,3)are d1=
d2
=
and d3 =
[ [ [
PlQl -Plq1/2 -P1@
-P191/2 Pl91/4 Pl91/4
-Pr41/2 Plq1/4 Pld4
1
>
-P2Q1/2 -P292/4
-~2q2/2 P Z Q ~ / ~ P292 -p2q2/2 -p2q2/2 P292/4
P343/4 P343/4 -P393/2
P343/4 P393/4 -P393/2
~2q2/4
-P3q3/2 -P393/2 P3Q3
1
44
PROCEDURES BASED ON URN MODELS
Now we calculate the matrix R:
R =~
+
+
+ N ' ( d i a g ( v ) - w'v)N.
i d l ~ 2 d 2 ~ 3 d 3
After some calculation, we can obtain
Here c 2
41 q2q3
= 4[3(qlq2
-k qlq3 -k q2q3)
+ 1 - 2(ql + Q2 + q3)](Qlqz+ q1(13 -k 4293)''
From Corollary 4.2, we can obtain the asymptotic normality of N , , which is
f i p n- v )
[
011
012
013
031
032
033
- + N (0,c3
where c3
=
Q14243
3[3(q1qz
+ q1Q3+ q2q3) + 1 - 2(ql -t92 -k q3)](qiq2 + Qiq3-k (12Q)3'
These results were first derived in Tymofyeyev, Rosenberger, and Hu (2004).
GENERALIZED FRIEDMAN’S URN
I
45
When XZ < 1 / 2 and C = 0, q1 = q2 = q3. The generating matrix is then pi 0.5qi 0 . 5 ~ p1 0-5qi 0.5%
0.591 0.5qi pi
In this case, the eigenvalues are 1
.
The generating matrix H still has a simple Jordan decomposition as 0 1 - 1.5ql 0
with matrix
1 -1
0 1 - 1.5ql 1
Thelefteigenvector(associated with themaximal eigenvalue 1)isv = ( 1 / 3 , 1 / 3 , 1 / 3 ) . Also, we have PlQl -Plq1/2 -Piq1/2 di= -Plq1/2 Plqi/4 Plq1/4 , -Piq1/2 Pld4 Pl%/4 d2 =
and d3 =
[ [ [
Plq1/4 -plq1/2 piq1/4
-P191/2 plqi -piq1/2
Plq1/4 -plq1/2
PIQI/~
~ 1 q 1 / 4 P I ~ I / ~-piq1/2 piqi/4 Piq1/4 -piqi/2 -Piq1/2 -P191/2 Pig1
Then we can obtain
From Corollary 4.1, we obtain
1
1
.
46
PROCEDURES BASED ON URN MODELS
The inverse matrix o f t is t-1
1/3 1/3 112 0 1/6 -113
=
113 -1/2] 1/6
.
Finally, we obtain
Similarly from Corollary 4.2, we obtain az2 = 2/3, u~~ = 2, bZ2 = 2(2 %1)/(9@),b33 = 2(2 - 3qi)/(3qi), czz = 2(3pi 1)/(9q1(3ql - l)), and c33 = 2(3pi 1)/(3qi(3ql - 1)). All other a i j , b i j , and cij equal 0. Therefore,
+
+
After some calculation, we have
Thus, we finish this example.
EXAMPLE 4.3. (Bai, Hu, and Shen, 2002.) We now explore an urn model that adds balls based on the relative success probabilities of each treatment. Under the same setting as in Example 4.2, a success on treatment k adds a ball of type k to the urn and a failure on treatment k adds pi/(A4 - p k ) balls of type i for all i # k, where A4 = p1 + . . . 4-p ~ In. this case, the generating matrix is
It is easy to see that A = 1 and the corresponding eigenvector is
Based on Theorem 4.I , we have
Y n n
4
v and Nn ---t v n
47
GENERALIZED FRIEDMAN'S URN
almost surely. To obtain the asymptotic distribution of Y, and N,, we need to calculate the eigenvalues and eigenvectors of H. For simplicity, we consider K = 3 here. The generating matrix is then
H =
[
91p?-
Pl
1.
PlfP3
JXL
Pz
-.B!ZL PI +PZ
After some calculations, the eigenvalues of H are
Where C4 = J1- ~ P ~ P Z P ~ / f( PPZ)(PZ I + ~ 3 ) ( p +i ~ 3 ) . First, we consider the case that Cd # 0 and X < 1/2, so the generating matrix H has a simple Jordan decomposition as
0
0
0
0
A3
The left eigenvector (associated with the maximal eigenvalue 1) is PI(PZ+PS)/Ql
PI (p2+P3)/41 + P a p +P2]/92
pa PL+PJ / q 2
+P2)/93
Pl(PZ+P3)/9l+Pz Pl+PZ /QZ+P3(Pl tPZ)/93
PI (pa fP3)lQl +PZ (PI +p2)/qZ
The corresponding d, (i = 1,2,3) are
dl=
dz=
[ [
PlQl
+ P3) -PlqlP3/(PZ + P3) PzqzP:/(Pl + P 3 Y - P Z Q 2 P l / ( P l + P3) P242PIP3/(Pl + P3)'
-P141P2/(Pz
+
P3)
Pl4lP;/(Pz + P3)Z
--Pl4lPZ/(PZ
PlqlP'ZP3/(PZ f p3)' -P242P1l(P1
P242 -mqZp3/(Pl
+P3) +p 3 )
+P3 (PI+PZ)/'?3
P l ~ l P ~ / ( P+p3)' Z
+ + +
PZQZPlP3/(Pl -P242P3/(P1
pZqZP:/@1
P3I2 P3)
p3)'
and P343P?/(Pl P3Q3PlPZ/(Pl -P343Pl/(pI
+ P2l2 +P d 2 + PZ)
P3Q3Pl?yPl P3Q3P2/(Pl -P343PZ/(Pl
+ P2I2
+ P$
f P2)
After some calculation, we can obtain
R=-
rii
r1z
r13
--P343Pl/(Pl --P343PZ/(Pl P3 43
1 1
+ P3) + P3)Z ,
-pl~lP3/(z)z PlqlPZP3/(Pz
+ PZ)
+Pz)
7
48
PROCEDURES BASED ON URN MODELS
- PZPJ(P1 + P3)(Pl 4243
+ PZ) +-.P?P2P3 41
We can then calculate the corresponding C and 5from Corollaries 4.1 and 4.2. The formulas are rather complex; however, we can obtain the results numerically. Here we outline two cases. For ( p l , p 2 , p 3 ) = (0.7,0.5,0.4), we have the corresponding values as 0.7000 0.1667 0.1333 0.3182 0.5000 0.1818 0.3500 0.2500 0.4000 d1=
[
0.2100 -0.1167 -0.0933
] [ ,t=
-0.1167 0.0648 0.0519
1.0000 0.9402 1.0000 -1.2284 1.0000 -0.7791 -0.0933 0.0519 0.0415
1
,
-0.1181 -0.8632 1.4970
1
,
GENERALIZED FRIEDMAN'S URN
d2 =
d3 =
R=
[ [ [
0.1012 -0.1591 0.0579
-0.1591 0.2500 -0.0909
0.0579 -0.0909 0.0331
0.0817 0.0583 -0.1400
0.0583 0.0417 -0.1000
-0.1400 -0.1000 0.2400
0.1884 -0.1152 -0.0732
-0.1152 0.1313 -0.0161
-0.0732 -0.0161 0.0893
1 1
49
,
,
v' = (0.5250,0.2750,0.2000),and ( X i , X2, X3) = (1,0.3718,0.2282). The corresponding matrices are X=
[1
0
0 2.969 0 0.!34]
andX=
-
[
"1
0 0 0 11.637 0 0 0 2.694
From Corollaries 4.1 and 4.2, we obtain
fi (n5- v)
---f
N
(0. [
0.7283 -0.4732 -0.2251
-0.4732 0.4204 0.0528
-0.2251 0.0528 0.2023
2.8565 -1.8476 -1.0089
- 1.8476 - 1.0089
and
&
($- v)
+N
(0. [
1.6746 0.1730
0.1730 0.8359
I)
1)
in distribution. When C4 = 0, that is, pl = p2 = p3, the generating matrix reduces to
Pi 0.5qi 0.5qi 0.5ql pi 0.5ql 0.5qi 0.5qi Pi In this case we have (from Example 4.2)
in distribution.
1
.
.
50
PROCEDURES BASED ON URN MODELS
As pointed in Bai, Hu, and Shen (2002), it is not realistic to assume that ( p 1 )...,p ~ ) are known. They proposed the following design: write N , = ( N , l , .,.,N n ~and ) S n = (s,l,...,S n ~ where ~ ) , N n k denotes the number oftimes that the k-th treatment is selected in the first n stages, and s n k denotes the number of successes of the kth treatment in the N , k trials, k = 1, ..., K . Define M , = ( M n l ,...,M n ~and ) K Mn = x k = l Mnk, where M n , k = ( s n k l)/(Nnk + 2 ) , k = 1,..., I { . The generating matrices are then given by
+
In this case, Hi are random matrices and converge to
+ -+
almost surely, where M = p1 . PK. The asymptotic distributions of Y, and N , can be obtained from Theorems 4.2 and 4.3 in Section 4.1.3. From Lemma 3 of Bai, Hu, and Shen (2002), we have llHi - Hllm = o ( i d 1 i 4almost ) surely, so Assumption 4.3 is satisfied. However, Assumption4.4 isnot true,because ~ ~ H ~ - -=H O p~( ~ z - 1w/ 2 Therefore, ). we cannot obtain the asymptotic distributions ofthe T L - ' / ~ (Y v), and n-'I2(N, - v ) based on Theorem 4.2 and 4.3. This problem has been studied in Zhang, Hu, and Cheung (2006) and there they obtain these asymptotic distributions using other techniques.
EXAMPLE 4.4. (Bai and Hu, 2005.) Quite often in clinical trials the probability of success may depend upon some observable covariates on the patients, that is, p i k = P k ( % i ) , where zi are covariates observed on the patient i and the result of the treatment at the i-th stage. Here p i k = P r ( X i = llTi = k, zt), for z = 1,...,n and k = 1, ...,I(, where Ti = k indicates that a type k ball is drawn at the i-th stage and X.' = 1 if the response of the subject i is a success and 0 otherwise. Thus, for a given z i , the addition rule could be LJ(zi)and the generating matrices H , = H ( t i )= E D ( z i ) . Assume that z l , ..., zn are independent and identically distributed random vectors and let H = E H ( z 1 ) . Based on the results in Theorems 4.2 and 4.3, we can compute the corresponding asymptotic results of the urn composition Y , as well as the allocation number of patients N , . Here we illustrate the results by considering the case K = 2. The following response-adaptive randomization procedure was described by Bai and Hu (1999). Let E ( p k ( % i ) ) = pk, k = 1 , 2 and the addition rule and generating
GENERALIZED FRIEDMAN'S URN
51
matrix are denoted by
where 0 5 dk(zi) 5 1 and q/c = 1 - p k for k = 1 , 2 . As before, X = 1, Xi = pi + p 2 - 1 , =~max(0, XI), and 2) = (qz/(q1 q2),q1/(q1 4 2 ) ) . Further we have
+
+
where U k = Var(dk(E1)).For the case corresponding to Corollary 4.1 are
T
< 1/2, we have V,
= n and the values
From Theorems 4.2 and 4.3, we have
n*
(% - v)
.+ 0 almost surely for any 6
< 1/42
and
in distribution, where
For the randomized play-the-winner rule (Wei and Durham, 1978), we have a k = k = 1,2. Then we have
pkqk,
522
= ( 5 - 2(q1 + q2))41q2* 2(q1
This result agrees with Example 4.1.
+ q2) - 1
52
PROCEDURES BASED ON URN MODELS For the case T = 1/2, V, = n logn and the value corresponding to (B.42) is 522
= 4[(a142
We have
+ a241)(41 + 42) + 4142(~1- q2?].
(nlogn)-"2(Nn - nw) --t N ( 0 , C2)
in distribution, where
For the case of the randomized play-the-winner rule, we have
EXAMPLE 4.5. (Anderson, Faries, andTamura, 1994.) In Example 4.2 and Example 4.3, instead of adding balls according to the success probabilities of each treatment, we may add balls according to the proportions of different types of balls at that time. Thus, at stage j + 1,a success on treatment k generates a ball of type k and a failure on treatment k generates q i / ( Y j l - q k ) balls of type i for all i # k. The generating matrices are then given by
i = 1 , 2 , .... However, we do not know the matrix H , and it is an open problem to find the asymptotic distributions. 4.1.5
Proving the main theoretical results
The main theoretical results in Theorems 4.1-4.3 were proved in a recent paper by Bai and Hu (2005). In this subsection, we sketch the basic principles of the proof techniques. The detailed proof can be found in Appendix B. The principles of the proof can be divided into four techniques: (a) Developing a matrix recursion (refer to Appendix A) that leads to a sum of several terms. (b) Deriving the Jordan decomposition of the generating matrix (refer to Appendix A) and using Jordan decomposition to get a higher-order expression. (c) Applying a multivariate martingale central limit theorem (refer to the martingale term of the terms in (a) and Appendix A). (d) Using the terms in (a) to derive the covariance structure.
GENERALIZED FRIEDMAN'S URN
53
We now outline the basic techniques in ( a H d ) above without giving the detailed calculations here. The interested reader can go through the calculations in Appendix B. To investigate the limiting properties of Y n ,we first derive a decomposition based on matrix recursion. Let Qi = Yi - E(YilFi-l), Gi = I Z-IH, and Bn,i = Gi+l ...Gn with the convention that Bn+ = I and FOdenotes the trivial a-algebra. From (4.3), we have
+
(4.10)
We further decompose S3 as follows:
s3
From the above decomposition, we can calculate the terms as follows. Term S1 is the product of matrices Gi, which depend on the same matrix H . Therefore, we
54
PROCEDURES BASED ON URN MODELS
can use the Jordan form to simplify S1 as follows: S1 = Yott-'Gitt-'G2
*
.*tt-'G,tt-' 0
j=1
..
0
J
0
= YrJt
t-l.
j=1
0 Then we can use some mathematical techniques to obtain its limit. Because E(Qil.Fi-l) = 0 and Bn,iis a constant matrix for i = 1,...,n, S2 is a martingale sum. To calculate Bn,i,we need to use the Jordan form again: 0
...
0
j=l+i
0
t-'B,,,t =
0 j=l+i
0
0
...
n
Based on this expression, we can see that all the elements of Bn,i are bounded by n/i (i = 1, ...,n). Then we can evaluate S2 by using properties of martingales. The term S31 depends on aL-llYi-1, Hi - H , and Bn,i. All elements of ai-llYi-1 are boundedby 1, becauseai-1 = Y + l , l + . . - + Y i - 1 , ~Assumptions . 4.3 and 4.4 are used to evaluate Hi - H . The term S32 depends on a;?lYi-l, ai and Bn,i.To evaluate the convergence rate of a,, the total number of balls after n stages, we have the following lemma.
-
LEMMA4.1. Under Assumptions 4.1 and 4.2, a,/n 4 1almost surely as n 4 oc, and n-&(an - n) 0 almost surely for any K. > 1/2. Then we show a general result for Y , by evaluating the asymptotic order of S2, S31, and S32.
LEMMA4.2. Under Assumptions 4.1-4.3, for some constant M , llYn - EYn112 5 M V 2 . Here IJY1I2= E ( Y Y ' ) for any random vector (row).
(4.12)
GENERALIZED FRIEDMAN'S URN
55
Based on Lemma 4.2, we can show the consistency of Y,.To show the consistency of N n ,we need the following decomposition:
Then, by using the consistency of the martingale sum n
and the consistency of Y,,we obtain the consistency of N n . Thus, we can show Theorem 4. I. To show the asymptotic normality of Y n - E Y , , we only need to show the asymptotic normality of ( Y n - E Y n ) t = y n - Byn, where y n = ( Y n i , ...,Y n K ) . To evaluate thitfurther, we need the following corollary from Theorem 4.1. Let Q, = Q,t and B,,,, = t-'B,,,t.
COROLLARY 4.3. Under the conditions of Theorem 4.1, we have n
Yn,-
- Eyn,- = CQagn,a,-+ op(Vn), a=1
-
-
(4.13)
-
where yn,- = ( Y n , 2 , * ,Y n , K ) and Bn,a,- = (&,t,2, * * . Bn,t,K). Furthermore, ifAssumption 4.4 is true, El',,- in (4.13) can be replaced by 0. It can be seen from Lemma 4.1 that n
Yn1
- EYnl = an
-n
- 1=
C(e,- E(e,lFt--1)). a=1
By using the martingale central limit theorem, we can show following result.
LEMMA4.3. Under Assumptions 4.1-4.3, n - 1 / 2 ( a n- n) is asymptotically normal K with mean 0 and variance u11, where 6 1 1 = vqdqki.
c,=l c,"==, c,"=,
Based on Corollary 4.3 and Lemma 4.3, we have
Then we can obtain Theorem 4.2 by using the martingale central limit theorem (Theorem A.6). For the asymptotic normality ofthe allocation N , , we can consider the asymptotic distributionof N , t . Since the first component of Nnt is anonrandom constant n,we
56
PROCEDURES BASED ON URN MODELS
only need to consider the other K - 1components. Then we can further approximate these K - 1 components by a martingale sum. Theorem 4.3 can then be shown by using the martingale central limit theorem. However, the variance-covariance matrix is very difficult to obtain. 0
REMARK4.5. For the randomized play-the-winner rule (Wei and Durham, 1978), both S31 = 0 and S32 = 0 from the decomposition (4.10) and (4.1 1). For the homogeneous generalized Friedman’s urn model (Athreya and Karlin, 1968) with homogeneous generating matrix H , (Hi= H , i = 1,2, ...,), then S31 = 0. For the nonhomogeneous urn models in Bai and Hu (1999), we have S32 = 0. Theorems 4.1-4.3 can be applied to most ofthe urn models proposed in the literature. REMARK 4.6. Assumption 4.4 plays a very important role in the proof of Theorems 4.1-4.3, when we substistute nu for E Y , and E N , . Without Assumption 4.4, we cannot obtain that s 3 1 = o(V,) and S32 = o(V,). As pointed out in Remark 4.4, Assumption 4.4 is not satisfied for one important case where the generating matrix Hi depends on sequential estimators of unknown parameters. In this case, ( [ H i- HI1 = O P ( i - ’ l 2 )therefore, ; Assumption 4.4 is not true. Recently, Zhang, Hu, and Cheung (2006) proposed a family of sequential estimation-adjusted urn (SEU) models. In this case Theorems 4.1-4.3 do not apply. Instead, they show the law of the iterated logarithm of the estimators and then prove the strong consistency and asymptotic normality of Y , and N , of the SEU models by using the results from the law of the iterated logarithm. The asymptotic variance-covariance matrices of the SEU model are different from those of Theorems 4.2 and 4.3. This is because the estimators of parameters also contribute further variability to the design.
4.2 THE CLASS OF TERNARY URN MODELS We now discuss a different class of urn models with important application to clinical trials. Ivanova and Flournoy (2001) refer to this class as the ternafy urn, and the class induces diagonal generating matrices. Suppose there are K treatments ( K types of balls in the urn) and there are three possible outcomes, A, B, and C. A ball is drawn and replaced, and that treatment is assigned to the corresponding patient. If the patient’s outcome is A at treatment i, i = 1,...,K , a type i ball is added to the urn; if the outcome is B, nothing is done; a type i ball is removed if the outcome is C. When the outcomes are only A and B, we obtain Durham, Flournoy, and Li’s (1998) urn, where the same type of ball is added if there is a success and the urn remains unchanged if there is a failure. When the outcomes are only A and C, we obtain the birth and death urn in Ivanova et af. (2000), where a ball is added for a success and a ball is removed for a failure. We obtain the drop-the-loser rule by Ivanova (2003) when the outcomes are B and C. Although we can use this class of urns to include the three different models, a general technique to prove asymptotic results for the entire class is unknown. So we define the class for convenience only, which allows us to distinguish it from the generalized Friedman’s urn. We now treat each of these models separately.
THE CLASS OF TERNARY URN MODELS
4.2.1
57
Randomized Pdlya urn
The randomized Pdya urn (RPU)procedure can be describe as follows: An urn contains at least one ball of each treatment type (totally K treatments) initially. A ball is drawn from the urn with replacement. If a type i ball is drawn, i = 1,...,K, then treatment i is assigned to the next patient. If the response is a success, a ball of type i is added to the urn. Otherwise, the urn remains unchanged. Asymptotically, this model provides an interesting result from an ethical point of view: we identify the best treatment with probability one and eventually assign patients to that treatment with probability one. The interesting theoretical problem remains the finite sample properties of this procedure. Durham, Flournoy, and Li (1998) use Atheya and Ney's (1972) technique of embedding the urn process in a continuous-time branching process. To do this, we simply note that at times 0 = to < t l < < tk < we can match up the urn composition and the proportion of splits of the branching process. So Yo,Y 1 , ...and Z(to),Z ( t 1 ) ,... are analogs, where Z(t,),i = 0,1,2, ..., is a discrete-time Markov chain with the same initial values Y O= Z(t0). Hence . a + ,
for i = 1, ...,K. Balls are then returned to the urn with a probability distribution which is the same for both Y Oand Z ( t 0 ) . Hence Y1 and Z(t1) have the same distribution, and this can be extended to any n = 1,2, .... Using this embedding, we can prove that, as t -+ 03,
C,"=lzj(t)
--t
1or o almost surely,
depending on whether pi is the maximum or not, respectively. We can use this to prove the following theorem for the urn proportions.
THEOREM 4.4. Suppose the maximum success probability is unique, say, p ( , ) > p ( i ) (where p ( ~ ) , p (. ~. .) ,,p ( ~are ) the ordered probabilities of p l , . . . , p ~ ) .Without loss of generality, assume that p(1) = p l . For all i = 2, ...,K, then as n + 00,
and where el = ( l , O ,
...,0).
n - l ~ ,-+el almost surely,
PROOF.Based on the embedding process, the probability of assignment at time t is
c;,
--$
1or 0 almost surely,
58
PROCEDURES BASED ON URN MODELS
depending on whether pi is the maximum or not, respectively. Therefore,
Now we consider the allocation proportion n-lN,. Let Fnbe the u-algebra generated by {YO, ...,Y n }We . know that
Therefore,
i= 1
Now because
Ti
a= 1
i= 1
is bounded, we have n
n-l x ( T , - E(TilFi-I))-+0 almost surely. i= 1
Also, we have
n-'
n
E(Ti1Fi-1)
--f
el
almost surely,
i= 1
because
E(TnlFn-l)-+
el
almost surely.
Thus, we complete the proof. 0 4.2.2
Birth and death urn
The birth and death urn removes balls from the urn; therefore, there is a positive probability that certain types of balls will become extinct. In order to continue the procedure, we need to introduce the concept of immigration balls, as in Ivanova et al. (2000). Initially an urn contains balls of K types and a immigration balls. A ball is drawn randomly with replacement. If it is an immigration ball, one ball of each type is added to the urn, no patient is treated, and the next ball is drawn. The procedure is repeated until a type i ball (i = 1, ...,IC) is drawn. Then the subject is assigned to treatment i . If a success occurs, a type i ball is added to the urn; if a failure occurs, a type i ball is removed. The birth and death urn has a complicated limiting theory that changes according to the magnitude of the maximal success probability ~ ( 1 ) . If the maximum is not
THE CLASS OF TERNARY URN MODELS
59
unique, limiting results are even more complicated. We state the main results. Proofs use the embedding techniques of the previous section, except that the urn process is embedded in a branching process with immigration. We have the following result from Ivanova et al. (2000).
THEOREM 4.5. Without loss of generality, we assume that P1 2
P2
2 * ' * 2 PK.
We have the following two limiting cases.
Case 1. Ifpl 2 0.5 and p = pl = p2 =
n-'N, where D = ( 0 1 , ...,&,o,
-
+D
= P h > Ph+l 2
-
* *
2 p ~then ,
in probability,
...,0) and
Here Wi(p)is a random variable which follows a Gamma(a/p, p / u ) distribution if p > 0.5. If p = 0.5, then W,(p)is a random variable which has characteristic function [cosh(T&i/u)1/21-2a.
Case 2. If pl < 0.5, then n-lN,
--*
v
in probability, where v = (211, ...,U K ) with
Asymptotic normality should follow for the p(l) < 0.5 case, but obviously it is not relevant ifp(,) > 0.5. This leads to a concern about this type ofurn model that, in practice, we do not know the magnitude of p(l), and hence the planning of a clinical trial based on this type of design is hampered. In particular, it is clear that having a random limit in the p(l) > 0.5 case should be of great concern due to the uncertainty of the final allocation. 4.2.3
Drop-the-loser rule
The drop-the-loser (DL) rule was proposed by Ivanova (2003) and is defined as follows. Consider an urn containing balls of three types. Balls of types A and 4 represent treatments A and B; balls of 0 type are immigration balls, as in the previous section. Let Y O= YO,^, YO+, Yo,B) be the initial urn composition. After m (m 2 0) draws, the urn composition is Y, = (Y,,o, Y m ,Y,,B). ~ , When a subject arrives for randomization, one ball is drawn randomly. If a treatment ball A (or B ) is selected, then treatment A (or B) is assigned to the subject and the response is observed. If it
60
PROCEDURES BASED ON URN MODELS
is a failure, the ball is not replaced. That is, Ym+l,A= Y,,A - 1, Ym+l,B= Y,,B, and Y,+l,o = Ym,0. If it is a success, then the urn remains unchanged. If an immigration ball (type 0) is drawn, no treatment is assigned, and the ball is returned to the urn together with one A and one B ball. That is, Ym+l,A= Y m , ~1, Y,+~,B= Y m , ~1, and Y,+l,o = Y,,o.Then this procedure is repeated until a treatment is drawn and the subject will be treated accordingly. Let P A and p~ be the probability of success on treatments A and B, respectively, and 4.4 = 1- P A , qs = 1 - p ~ Ivanova . (2003) studied properties of the DL rule by embedding the urn composition process 2, in a branching process (an immigrationdeath process). She defines a "virtual" time and a two-dimensional process Z * ( t ) = ( Z i ( t ) ,Z i ( t ) ) , which is a collection of continuous-time linear immigration-death processes having common immigration processes with immigration rate ZO,Oand independent death processes with death rates q A , q B , such that z w , k = z k ( p w ) , k = A , 13. Here pw is the "time" of the w-th draw and is the partial sum of a sequence of independent exponentially distributed random variables with rate parameter 1. Let N k ( t ) be the number of trials on treatment k up to time t , k = A , B. And let N ( t ) = N A ( ~+) N B ( ~be) the total number of subjects assigned up to time t . Ivanova (2003) showed that
+
+
(4.15)
(Note that this is the same limiting urn allocation as for the randomized play-thewinner rule.) Ivanova (2003) also gave the asymptotic distribution of N A ( ~ ) : (4.16)
in distribution as t
4
03,
k = 1 , 2 , where (4.17)
is the asymptotic variance. The DL rule has two basic advantages: (i) it is fully randomized; (ii) compared to many other response-adaptive randomization procedures such as sequential estimation procedures or the randomized play-the-winner rule, it generates the minimum asymptotic variance, which means that it offers high power of the test of the difference ofproportions in the class of designs targeting Q B / ( q A q B ) (Hu, Rosenberger, and Zhang, 2006). To show this, we apply the result in Theorem 3.2.
+
EXAMPLE 4.6. From Theorem 3.2,
THE CLASS OF TERNARY URN MODELS
61
Consequently, the DL rule is an asymptotically best procedure for targeting the allocation p ( 8 ) = q B / ( q A Q B ) .
+
4.2.4
Generalized drop-the-loser rule
Zhang el a/. (2006) described the following generalization of Ivanova's DL rule, and derive several asymptotic results for which Ivanova's results are a special case. The generalized drop-the-loser (GDL) rule is described as follows. Consider an urn containing balls of K 1 types. Balls of types 1, . . . K represent treatments; balls of type 0 will be called immigration balls. We start with YO,,balls of type i, i = 0,. . , K . Let Yo = (Yg,o,.. . , Yo,K)be the initial urn composition, and let Y w= (YW,o,. .. ,Y w , ~be )the urn composition after w draws. A ball is drawn at random. If the ball is of type 0 (i.e, an immigration ball), no subject is treated, and the ball is returned to the urn together with A = a1 + . U K additional balls, a,, of treatment type k , k = 1 , . ..,K . If a treatment ball is drawn (say, of type k , for some k = 1,.. .,K ) , the next subject is given treatment k and the ball is not replaced. We denote the outcome of this subject on treatment k by <w,k. When the outcome is available, we add D w , k = D ( t U w , k )balls of type k to the urn. After n subjects are assigned, we compute N n , k r the number of subjects assigned to treatment k , k = 1,. . . K . We are interested in the behavior ofproportions Nn,k/n, k = 1 , ..., K . Let p k = E ( D w , k ] , k = 1 ,...,K . We assume 0 < p k < 1, k = 1 ,...,K , which means that, for each randomized subject, the expectation of the number of balls added according to its outcome is less than the number (one) of dropped balls. In the case where the outcomes are dichotomous, D w , k = 1 if the outcome is a success and 0 if it is a failure, and p k is the success probability of a trial on treatment k, k = 1,. .. K . When the outcomes are not dichotomous, one may choose suitable adding rules { & , k } to define a design. For example, if the outcomes have three stages, "not effective," "under control," and "cure," one may add a ball when the outcome is a "cure," add a ball with a given probability p when the outcome is"under control," and add no balls when the outcome is "not effective." Under some suitable conditions, we have
+
.
+
Nn,k * V k := n
ak/qk
al/ql -b
* ' *
+aK/qK
+
almost surely k = 1 , . .. , K.
(4.18)
..
If a k , k = 1 , . . ., K are equal, then the limiting proportions V k , k = 1 , . ,K , are the same as those in (4.15). One can choose a,, to adjust the allocation proportions. Furthermore, a k , k = 1 , ...)K , can differ and be random in different draws. It is known that one of the main disadvantages of the DL rule as well as the randomized play-the-winner rule is that they can only target the limiting allocation in (4.15). By choosing the ak, k = 1,...,K , suitably, the GDL rule can target any desired allocation, theoretically. For example, if one wants to target RSIHR allocation in (7.5), one can choose a k = C O T k f i , where & and are current estimates of I)k and q k , respectively, k = 1,2. (Here we assume that the urn allows a fractional
62
PROCEDURES BASED ON URN MODELS
number ofballs.) To avoid fractional balls, one can simply choose the nearest integers or add balls of the same type k with a probability proportional to ak, k = 1, . . . , K , when an immigration ball is drawn. Another way to target the allocation rule in (7.5) is to define a design by choosing a k = co and D w , k E 0. In such a design, the balls are added only through immigration. In general, one may adjust both the a k and Dw,kto target the desired allocation proportions. Note that when a k = 1 for all k, we have Ivanova’s DL rule.
fi
4.2.5
Asymptotic properties of the GDL rule
In this section we give some asymptotic properties of the GDL rule for K = 2. We only consider the case where the immigration balls a k , k = 1,2, are fixed. For the outcomes and addition rules we make the following assumption.
ASSUMPTION4.6. Let { D w , kw; 2 l}, k = 1,2, be two sequences of independent and identically distributed random variables with 0 < Pk = E ( D w , k ] < 1 and EIDw,klP < m f o r s o m e p > 2, k = 1,2. Denoteaz = Vur(D,,k)andqk = 1-pk, k = 1,2. THEOREM 4.6. Suppose Assumption 4.6 is satisfied. Denote N n = (Nn,l,Nn,z) and u = ( w ~ , w z ) ,where vk, k = 1,2, are defined in (4.18). Then there exists a standard Brownian motion { W ( t )t; 2 0 ) such that for any 6 > 0,
+ - nwz = - a W ( n ) + 0 ( n 2 / 5 + 6 US., )
ivn,l - nu1 = a W ( n ) ~ ( n ~ / ~ +a‘s). , ~ n , 2
where = alaz(.Zq*.L.T + a 1 4 1 4 (a241 + a1qd3 ’
(4.19)
By the properties of Brownian motion, the following is an immediate corollary of the theorem.
COROLLARY 4.4. Under the assumptions of Theorem 4.6, (4.20) and (4.21) in distribution, where
c = P’diag(a:wl,.. . ,U & W K ) P .
(4.22)
Equation (4.20) gives the strong consistency and its rate for the proportions N,,k/n, k = 1,.. . ,K . Comparing with (4.16), where the result is given through a “virtual” time, (4.21) gives the direct asymptotic distributions of the proportions.
THE CLASS OF TERNARY URN MODELS
63
We now sketch the proof of Theorem 4.6. Recall that Y n = (Yn,o,Yn,1,Yn,2) represents the number ofballs after n draws; Y = (Y20,Y;l, Y;J are nonnegative numbers, and IY: I = Y 2 0 + Y ~ l Because every immigration ball is replaced, Yzo= Yn,0= YO,O for all n. Let T nbe the result of the n-th draw, where Tn,k = 1 if the selected ball is of type k and Tn,k = 0 otherwise, k = 0,1,2. Further, denote N i = (N,+,o,N,+,l,N;,2) = EL=,Tm.So is the number of selected type k balls in the first n draws. Let un = max{rn : NA,l N;,2 5 n}. Then un is the total number of draws of treatment type balls in the first n assignments and Nn,k = N:n,k, k = 1,2. Hence, after the n-th draw and before the ( n 1)-th draw, the number of balls of each type added according to the outcomes is
+Y22.
+
+
The change in the number of type k balls after n draws from the time of the ( n- 1)-th draw is Y n , k - Y n - l , k = akTn,o - Tn,k W n , k l k = 1,2.
+
Recall that ak here is the number of added type k balls when an immigration ball i s drawn. So, for k = 1,2, the number of type k balls added after n draws is n
n
n
n
n
(4.23) That is, Ayn,k = akTn,O 4-
Tn,k(Dn,k- l ) ,
k = 1,2,
(4.24)
where A denotes the differencing operand of a sequence { zn}, i.e., Azn = zn - znFrom (4.23), it follows that
1.
n
yn,k
- YO,k = akN;,o - qkN:,k
m=l
Tm,k(Dm,k- E[Drn,k])
= akN;,o - qkN;,k -t Mn,k, k = 192,
(4.25)
where
We will prove Theorem 4.6 in Appendix B by showing that the major term Mn,k can be approximated by a Brownian motion (Theorem A.9) and the rest of the terms can be neglected.
64
PROCEDURES BASED ON URN MODELS
4.3
REFERENCES
ANDERSEN, J. S., FARIES, D. E., A N D TAMURA, R. N.(1994). Randomized play-the-winner design for multi-arm clinical trials. Communications in Statistics-Theory and Methods 23 309-323. ATHREYA, I<. B. AND KARLIN, S . (1967). Limit theorems for the split times of branching processes. Journal of Mathematics and Mechanics 17 257-277. ATHREYA,K. B. AND KARLIN,S. (1968). Embedding of urn schemes into continuous time Markov branching processes and related limit theorems. Annals of Mathematical Statistics 39 1801-1 8 17. ATHREYA, K . B. AND NEY,P. E. (1972). Branchingfrocesses. Physica-Verlag, Heidelberg. BAI,Z. D. AND Hu, F. (1999). Asymptotic theorem for urn models with nonhomogeneous generating matrices. Stochastic Processes and Their Applications 80 87-101. BAI, Z. D. A N D Hu, I?. (2005). Asymptotics of randomized urn models. Annals ofAppliedProbability 15 914-940. BAI, Z. D., Ilu, F., AND SHEN, L. (2002). An adaptive design for multi-arm clinical trials. Journal of Multivariate Analysis 81, 1-18. DURHAM, S. D., FLOURNOY, N.,A N D L1, w. (1998), Sequential designs for maximizing the probability of a favorable response. Canadian Journal of Statistics 3 479-495. Hu, F., ROSENBERGER, W. F., AND ZHANG, L.-x. (2006). Asymptotically best response-adaptive randomization procedures. Journal of Statistical Planning and Inference 136 1911-1922. IVANOVA,A. (2003). A play-the-winner type urn design with reduced variability. Metrika 58 1-13. IVANOVA, A. A N D FLOURNOY, N. (2001). A birth and death urn for ternary outcomes: stochastic processes applied to urn models. In Advances in Statistical Theory: A Volume in Honor of Theophilos Cacoulous (Charalambides, C., Koutras, M. V., and Balakrishnan, N., eds.). Chapman and HalKRC, Boca Raton, 583-600. IVANOVA, A., ROSENBERGER, F., DURHAM, s. D.,AND FLOURNOY, N. (2000). A birth and death urn for randomized clinical trials. Sankhya B 62 104-1 18. JOHNSON, N. L. AND KOTZ, S. (1977). Urn Models and Their Applications. Wiley, New York. MATTHEWS,P. c. AND ROSENBERGER, w. F. (1997). Variance in randomized play-the-winner clinical trials. Statistics and Probability Letters 35 233240. ROSENBERGER, W. F. (1992). AsymptoticInference Problems Arisingfrom Clinical Trials Using Response-AdaptiveTreatment Allocation. George Washington University, Washington (doctoral dissertation). ROSENBERGER, W. F. (2002). Randomized urn models and sequential design. Sequential Analysis 21 1-4 1 (with discussion).
w.
REFERENCES
65
SMYTHE, R. T. (1996). Central limit theorems for urn models. Stochastic Processes and Their Applications 65 1 15-1 37. TYMOFYEYEV, Y.,ROSENBERGER, W. F., AND Hu, F. (2004). Asymptotic properties of urn designs for three-arm clinical trials. mODa7-Advances in Model Oriented Design and Data Analysis (Di Bucchianico, A,, Lauter H., and Wynn, H. P., eds.). Physica-Verlag, Heidelberg, 159-166. WEI, L. J. (1979). The generalized Polya’s urn design for sequential medical trials. Annals of Statistics 7 291-296. WEI, L. J. A N D DURHAM, S. (1978). The randomized play-the-winner rule in medical trials. Journal of the American Statistics Association 73 840-843. ZHANG,L.-X., CHAN,W. S., CHEUNG,S. H., AND Hu, F. (2006). Generalized drop-the-loser rule with delayed response. Statistics Sinica, in press. ZHANG,L.-X., Hu, F., AND CHEUNG, S. H. (2006). Asymptotic theorems of sequential estimation-adjusted urn models. Annals of Applied Probability 16 340-369.
This Page Intentionally Left Blank
Procedures Based on Sequential Estimation
In Chapter 1 we introduced the doubly-adaptive biased coin design of Eisele (1994). With the modifications of Hu and Zhang (2004), the doubly-adaptive biased coin design yields a broad family of response-adaptive randomization procedures. This family encompasses any procedure with a target allocation based on unknown parameters of the response model and sequentially substitutes updated estimates of those parameters as the data accrue. We refer to these procedures as response-adaptive randomization procedures based on sequentiaf estimation. We begin by focusing on several examples of the sequential estimation procedures for K = 2. Then we describe conditions for the asymptotic normality of NA (n)/nfor general procedures with K = 2 and give the form of the asymptotic variance of N ~ ( n ) / nFollowing . these results for I< = 2, we generalize all the results to the multiple-treatment case, following the development in Hu and Zhang (2004). The generalization relies on results in Gaussian processes, and the prerequisite material can be found in Appendix A. For any procedure based on sequential estimation, some data must be available to compute the estimates. In practice, this necessitates beginning the trial with a certain number of patients assigned to each treatment before the response-adaptive randomization procedure can begin. One could, for example, begin the trial with a single permuted block of no patients, 4 2 assigned to A and n0/2 assigned to B, or use some other restricted randomization procedure (or complete randomization) for no patients.
67
68
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
5.1
EXAMPLES
In this section, we give ten examples of response-adaptive randomization procedures based on sequential estimation for K = 2 which have been proposed in the literature.
EXAMPLE 5.1. (Eisele and Woodroofe, 1995.) Let X I ,...,X,,be patient responses and Ti,...,T, be treatment assignments, Tj = 1 or 0 , j = 1, ...)n. We assume the following response model: given Tj = 1, Xj is normally distributed with mean / I , A and variance and given Tj = 0, X j is normally distributed with mean /I,B and variance a%,j = 1, ...)n. Suppose the target allocation is Neyman allocation, given by
05,
&A,
+
OA,PB,O B ) = ~ A / ( U A U B ) .
At the j-th stage, j > no, when we have data on j - 1 patients, we compute the updated estimators f i ~ , j - l ,6 ~ , j - 1f,i s , j - ~d, ~ j - 1and calculate p ( f i ~ , j - id, ~ , j 1-, F ~ , j - i ~,
1.
B -1 J
Define a function
Then the response-adaptive randomization procedure is defined by
EXAMPLE 5.2. (Efron, 1971.) Let p ( 6 ) = 1/2. Let (1 - p ) z Y + p ( l - z ) Y g(z,y)=
When y
+ co,we
z-Y+(l-z)r
have Efron’s biased coin design.
EXAMPLE 5.3. (Melfi and Page, 2000.) Assume the same response model and target allocation as in Example 5.1. The response-adaptive randomization procedure is defined by 4j = P ( f i A , j - l , ~ A , j - l , D , B , j - l , 6 B , j - l ) . EXAMPLE 5.4. (Bandyopadhyay and Biswas, 2001.) Assume the same response model as in Example 5.1. The target allocation is now defined as
where T is some positive constant and 9 is the probit function. The response-adaptive randomization procedure is defined by 4j
=~(P~,j-i,d~,j-i,/j~,j-i,~~,j-i).
EXAMPLES
69
EXAMPLE 5.5. (Rosenberger et al., 2001.) Assume the following response model: given Tj = 1, X j is Bernoulli with parameter P A ; given Tj = 0, X j is Bernoulli with parameter p ~with , qA = 1 - PA,QB = 1 - p ~ Let . the target allocation be either Neyman allocation, given by P(PA,PE) =
rn v9.Yia-m'
or RSIHR allocation, given by
~ update the target At the j-th stage, j > no, we compute f i ~ , j - land f i ~ , j -and allocation by calculating p ( f i ~ , j -f~i ~, , j - l ) . The response-adaptive randomization procedure is defined by dj = p(Tj~,j-i,f i ~ , j -1).
EXAMPLE 5.6. (Hu and Rosenberger, 2003.) Assume the same response model and same target allocations as in Example 5.5. A fbnction g(x, u) is defined as in Chapter 1 for y 1 0:
(5.2) Then the response-adaptive randomization procedure is defined by
EXAMPLE 5.7. (Hu and Zhang, 2004.) Assume the same response model and target allocation as in Example 5.1. Let g(x, 3) be as defined in (5.2). Then the response-adaptive randomization procedure is defined by
Note that the function g in (5.2)reduces to g = y if y = 0. Consequently, the response-adaptive randomization procedures in Examples 5.3 and 5.4 can be thought
where g is defined in (5.2) with y = 0 and the procedure in Example 5.5 can be thought of as
70
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
where g is defined in (5.2) with y = 0.
EXAMPLE 5.8. (Smith, 1984.) Let p(8) = 1/2 and let g be defined as in Example 5.7. We then obtain the allocation function
which is Smith’s general class of restricted randomization rules. When y = 0, we have complete randomization; when y = 1, we have Wei’s urn design; when y = 2, we have Atkinson’s (1982) DA-optimal design. Smith suggested y = 5.
EXAMPLE 5.9. (Baldi Antognini and Giovagnoli, 2005.) Assume a response model belonging to a regular exponential family, and let the allocation hnction dj be as in Example 5.5 with y = 0. Various target allocation functions can be determined by using the theory of optimal design of experiments (e.g.,Silvey, 1980). Consider the conditional Fisher’s information matrix M ( 8 I N A( n ) )and the unconditional Fisher’s information matrix M ( 8 ) = E ( M ( @ I N A ( n ) )The . optimal design is one that minimizes a convex function of the inverse of the Fisher’s information matrix. For example, A-optimality, which minimizes the trace of M - ’ , leads to Neyman allocation; D-optimality, which minimizes the log determinant of M-’, leads to p ( 8 ) = 1 / 2 ; and E-optimality, which minimizes the maximum eigenvalue of &I-’, leads to the allocation function (for normal random variables)
It is shown that this randomization procedure converges to the optimal allocation regardless of whether conditional or unconditional information is used, but that the asymptotic unconditional variance is greater than the asymptotic conditional variance for the sample means.
EXAMPLE 5.10. (Hu and Rosenberger, 2003.) Assume the same response model as in Example 5.5. The function g ( x , u ) is defined as in Example 5.6. The target allocation is urn allocation:
5.2
PROPERTIES OF PROCEDURES BASED ON SEQUENTIAL ESTIMATION FOR K = 2
All of the examples in the previous section appeared in different papers by different authors and used different techniques to explore properties of the specific procedures. However, all are special cases of the general procedure of Hu and Zhang (2004). In this section, we give the general procedure of Hu and Zhang for K = 2 and its asymptotic properties, and then apply them to our examples.
PROPERTIES OF PROCEDURES BASED ON SEQUENTIAL ESTIMATION F O R K
=2
71
We consider a vector-valued parameter 8 = ( 8 A , 8,) characterized by the response distribution of X I ,...,X,. Note that we assume here that the response distribution, conditional on treatment assignment, is homogeneous across patients. Consider an estimation procedure that computes at the j-th stage S j , where
and E A ( X 1 )and t B ( X 1 )are measurable functions of X I . For example, if X I given Ti = 1 is Bernoulli with parameter P A and X1 given T I = 0 is Bernoulli with parameter p ~ letting , ( A ( X 1 )= t B ( X 1 )= X I yields 6 A , j = p ~ , and j O B , j = ~ I $3 B' the maximum likelihood estimators of PA and p~ based on the first j observations, respectively. If X1 given TI = 1is N ( ~ Aui) , and X I given TI = 0 is N ( ~ L a;), B, then letting t A ( X = ~) E B ( X l )= ( X I ,X t ) yields 8 ~ , =j PA,^, ?~c: ii%,j) and 8 ~ ,=j ( P B , ~c?;,~ jik,j),where $ ~ , j C,? A , ~ ,F B , ~C?BJ , are the maximum likelihood estimators of p A , U A , p ~U B, based on the first j observations, respectively. Let p ( 8 ) E (0,l) be a target allocation. Define a function g(z, y) to be a function from [0,1] x [0,1] to (0,1],where g(z,y) and p ( 6 ) satis9 the following four conditions:
+
+
(i) g(z, y) is jointly continuous and g(z,z) = z for all z E (0,l);
(ii) g(z,y) is strictly decreasing in z and strictly increasing in y on ( 0 , l ) x (0,l); (iii) p ( 6 ) is a continuous function and it is twice continuously differentiable on a small neighborhood of 8; (iv) There exists a 6 > 0 such that g(z, 9)satisfies (P(e).P(e))
Under conditions (iHiv), we can establish the asymptotic normality of the allocation proportions arising from these response-adaptive randomization procedures, which, by the results in Chapter 2, is critical to understanding the power properties of the procedure and for comparison to other procedures. Assume that
for some E > 0. Let
72
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
in distribution, where
A@))
2 - P(W1u 1 - 2x,
+
2x;v (1 - XX)(1
- 2XY)'
(5.4)
Later in the chapter we prove this result in its full generality for K treatments.
EXAMPLE 5.1 (CONTINUED).As described in Section 1.1.3, Eisele and Woodroofe's g function, given in (5.l), does not satisfy their fourth regularity condition that g(x,y) have bounded partial derivatives in both x and y. This is because
as1 ax
y=x
=I--1 X
is not bounded. However, it does satisfy conditions (i)-(iv) given above. Conditions (i) and (ii) are trivially satisfied, and (iii) is satisfied for Neyman allocation. Condition (iv) holds because y(x,y) has continuous second derivatives at the point (2,y) = ( p ( 0 ) ,p ( 0 ) ) . Therefore,
in distribution. We can derive the asymptotic variance as follows. First,
We also compute
PROPERTIES OF PROCEDURES BASED ON SEQUENTIAL ESTIMATION FOR K = 2
73
Combining, we obtain
The asymptotic variance in (5.4) is computed as
Substituting A, and A, in (5.5) yields
EXAMPLE 5.3 (CONTINUED). Since g(s,y) = g, conditions (i), (ii), and (iv) hold, and we have A, = 0 and A, = 1 . Therefore
in distribution, where u2 = % A U B / ( o A
+
0 ~ ) ~ .
EXAMPLE 5.4 (CONTINUED). Now we consider p ( 0 ) = @ ( ( P A - ,uB)/T)for constant T , g(z,g) = y, and t A ( X 1 ) = t ~ ( X 1 = ) X I . Clearly, p ( 0 ) satisfies condition (iii). We compute
where 4 is the standard normal density. Then
Therefore
in distribution, where
(Zhang and Rosenberger, 2006).
74
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
EXAMPLE 5.5 (CONTINUED).As discussed above, for the Bernoulli case, we have E,l(X1) = tB(x1)= XI.The fhction g(x,y) = y and both Neyman and RSIHR allocation satisfy condition (iii). Therefore
in distribution. Thus it remains only to compute a’, which is Straightforward but tedious. For Neyman allocation, we have
For RSIHR allocation, we have
EXAMPLE 5.6 (CONTINUED).We use the same allocations as in Example 5.5, but here we use a more general g function, given in (5.2). (Recall that when y = 0, g(x,9) reduces to the function used in Example 5.4.) It is easy to see that g(x,y) satisfies conditions (i) and (ii). For condition (iv), we note that g(x,y) has continuous second derivatives at the point ( q y ) = ( p ( B ) , p ( O ) ) . We compute A, = -y and Av = 1 y.Therefore
+
in distribution. It remains only to compute u2,which again is straightforward but tedious. For Neyman allocation,
) [ ( p A q A > 3 / 2 ( 1- 2pB)’ + ( P B q B ) 3 / 2 ( 1 - %A)’] + (l + 72(1 +2 y ) ( m + m ) 3 @ A q A p B q B
For RSIHR allocation, u 2 --
312
P A (PB
+ ( l + Y ) q B / 2 ) + q;”(PA + (1 + y ) q A / 2 ) (1 + 27NdE‘i + f i ) 3 l / i Z E
Note that when y ---t 00, this procedure is the asymptotically best procedure in the sense of Theorem 3.2. However, it reduces to a deterministic procedure, originally proposed by Thompson (1 933).
NOTAJ/ON AND COND/T/ONS FOR THE GENERAL FRAMEWORK
75
EXAMPLE 5.7 (CONTINUED). The target allocation is given in Example 5.1 and the g function is given in Example 5.5. Therefore
in distribution, where 02
2+y UAUB =1+2y(UA+aB)2'
EXAMPLE 5.8 (CONTINUED). It can be shown that
in distribution.
EXAMPLE 5.9 (CONTINUED).The E-optimal design targets p(0) = ui),and this admits the following limiting result. We have
in distribution.
EXAMPLE 5.10 (CONTINUED). We can derive qAqB(2 - (qA
+ QB))
+
(1
(qA
+ 2y)(qA + 9 B ) 3
in distribution. Note that this limit differs substantially from the limits of the randomized play-the-winner rule and the drop-the-loser rule, both of which target the same allocation.
5.3
NOTATION AND CONDITIONS FOR T H E GENERAL FRAMEWORK
We now consider general K-treatment clinical trials. Suppose the patients are randomized sequentially and respond immediately. In general, after i patients are assigned and the responses observed, the (i 1)-th patient is assigned to treatment k with probability { & + l , k } , k = 1,. .. , K. The probabilities { & + l , k } may depend on both the treatments assigned to and the responses observed from the previous m patients. From the notation of Chapter 1, let Ti = (z,~, .. .,T,,K) be the result of the i-th assignment, Xi be the responses of i-th patient, and N , be the patient's allocation after the i-th patient. Suppose that the desired allocation proportion of patients assigned to each treatment is a function of some unknown parameters of the response X.
+
76
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
One main goal of the allocation scheme is to have N,/n v = p ( 0 ) as n + 00, where p ( r ) = ( P I ( % ) ,. . . , P K ( z ) ) : R d l x K -+ (0,l)" is a vector-valued function satisfying p(.z)1' = 1, O = ( e l , . . ,e K )is a vector in P l x K , and e k = (&I,. . . ,6 k d l ) is an unknown parameter of the distribution of X l , k , k = 1 , .. . , K . For simplicity of notation, we assume that d = d l and e k = E X l , k , k = 1,.. . , K . Otherwise, for example, if there exist functions f k such that 8 k = E f k ( X l , k ) , k = 1,.. . , K , we can use the transforms f k ( X l , k ) , k = 1,.. . , K , of the responses as the responses themselves. (This has been demonstrated in Section 5.2.) Choose a O0as the first estimate of 0.I f m patients are assigned and the responses are observed, we use the sample means to estimate the parameters e k , k = 1, . . . , K , -+
i.e.,
-
and we write Oi,k = (8i,l,.. . ,Bi,,c). Here, we add 1 in the denominator to avoid the case of O/O, and add @O,k in the numerator to use 6 0 , k to estimate O k when no patie_nt is assigned to the treatment k , k = 1 , . . . ,K . Usually 00is chosen to avoid & ( O m ) = 0, k = 1,.. . , K . In practice, 00is a guessed value of 0 or an estimator of 0 from other early trials. We now define the general doubly-adaptive biased coin design for I( treatments (Hu and Zhang, 2004). Let g(z,v) = ( g l ( z , y ) , . . . , s K ( z , y ):) [O,lIK x [o,lIK -, 10, lIK be the allocation rule with g(z, y ) l ' = 1. The first patient is allocated to each treatment with the same probability l / K. Let 6% be estimated as in (5.6)from the first i observations, i = 1 , 2 , .... Then the (rn + 1)-th patient is assigned to treatment k with probability = g k ( N , / i &), , Ic = 1,. . ,K , where A
.
is the sample estimate o f v = (211 , .. . ,z ) ~ = ) p ( 0 ) based on the responses observed from the first i patients. To study the properties of the doubly-adaptive biased coin design, we need following three groups of assumptions. We assume that 0 < 21k < 1, k = 1,.. . ,I< in this book. The following first group of conditions are on the response { X , = ( X n , l , - .,.X , , K ) }and'relatedparameters 0 = (01,..., O K ) = (@11,
*
,old,.
*
3
rokl,.
* *
18kd).
(Al) Assume that the response sequence { X , = ( X n , l , . . ,.X n , ~ n ) }= 1,2, ... are i i d . random vectors, and 0 = E X , as before. Also, Ellxl,kll < 03, k = 1 , ..., K . (A2) Further, for some E > 0, EIIXl,kll'+' < 00, k = 1,. , . , K . Conditions (Al) and (A2) are usually satisfied in most applications. Condition (A2) is stronger than (Al). Condition (Al) is used to ensure the consistency of the procedure. On the other hand, we need condition (A2) to ensure the asymptotic normality of the allocation proportions.
NOTATION AND CONDITIONS FOR THE GENERAL FRAMEWORK
77
The second group of conditions are about the allocation function g(z,y ) . The function of allocation rule g(m, y) satisfies following conditions: ( B I ) g ( v , v ) = v a n d g ( z , y ) - g ( z , v ) + O a s y + v a l o n g y l ' = luniformlyin z with zl' = 1. The latter can be implied by the condition that the function g(z,y ) is continuous on { ( e , v ) : 21' = I } . (B2) There exists a constant 0 5
A0
< 1such that for each k = 1,.. . ,K ,
(B3) For any 0 < 6 < 1 / K and each k = 1,.. . ,K, there exists a constant cg such that gk(1)y)l
Xk=O
2 C6 forail a,y with
21'
lim inf
>0
= 1 , y 1 ' = 1,y E [ ~ , I ) K ,
gk(2,y) +*--to+ min(x1,. . . ,X K }
c6
uniformly in z , y with 21' = 1, y l ' = 1,y E 16, (B4) There exists 6 > 0 for which the function g(2,y ) satisfies
REMARK5.1. Condition (B2) is satisfied with A0 = 0 if we assume that the ( m + 1)-th patient is assigned to the treatment k with a probability less than 'Uk
whenever Nm,k/m > V k , In such case, the biased coin design analyzed by Smith (1984) and Wei, Smythe, and Smith (1986) is a special case of this generalization. Their g ( z , p) does not depend on p.
By symmetry, condition (B2) can be replaced by one of the following conditions: (B2') There exists a constant 0 5
A0
< 1 such that for each k
.. ,K ,
= 1,.
(B2") There exist two constants 0 5 A0 < 1, Q # 0 and an invertible real matrix S = ( ~ 1 ,... , s/K) such that S1' = ~ l and ' for each k = 1,.. ., K ,
78
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
+
REMARK5.2. Condition (B3) is easily understood. At the stage m 1, if all estimated proportions & j , j = 1, . .. ,K , are not very small, but the sample proportion Nm,k/rn is very small, then the probability of the assignment of the ( m 1)-th patient to the treatment k should not be too small to avoid experimental bias. This condition can be simply replaced by the condition that
+
Nn,k
-t
cm almost surely,
k = 1 , . . . , K.
(5.8)
Ifg(z, y) = g ( z ) is only a function of 2,then condition (83) is not needed, since this condition or (B.82) is only used for ensuring the consistency of 6, (cf: Lemmas 5.4 and 5.5). Also, if condition (B2') is satisfied at any point (w,v ) and gk(v, w ) = u k , then (B3) is obviously satisfied. So conditions (ii) and (iii) of Eisele (1994, 1995) or Eisele and Woodroofe (1995) imply this condition. REMARK5.3. Conditions (Bl) and (B4) are usually satisfied. In practice, we can check these two conditions very easily.
The third group of conditions are on the proportion function p ( r ) . The proportion functionx = ( t l , . . . , % d ) = ( ~ 1 1 , . , z l d , . . . , Z K ~. , . . ,Z K ~ 3 ) p ( r ) : E d x K+ (0,l)" satisfies the following conditions:
..
(Cl) p ( 0 ) = w and p ( r ) is a continuous function. (C2) There exists S > 0 for which
For a function h(u,w) : RL x R M + R K , we denote V,(h) and V,(h) to be the gradient matrices related to the vectors u and w, respectively, i.e.,
are two K x I( matrices, and
ASYMPTOTIC RESULTS AND SOME EXAMPLES
79
is a ( d K ) x K matrix. Obviously, H1' = El' = 0' since g(z,y)l' = 1. So, H has an eigenvalue X1 = 0 and has the following Jordan decomposition:
t - ' H t = diag(0, J z , . . . , Js), where J , is a vt x vt matrix, given by -At
Jt=
0 0
.. .
0
1 At
0 1
...
00
...
At
...
..
.
... 0 . . .. ..
0
0
0
*.
We may select the matrix t so that its first column is 1'. Let X = max{Re(Xz), . . . , Re(&)}and v = maxj{vj : Re(Aj) = A}. Further, if condition (B2) is also satisfied, then due to an argument similar to that made in Section 3 of Smith (1984) or in the proof of Lemma 3.2 of Wei, Smythe, and Smith (1986),
t - ' H t = dzag(0, A,.
.. ,A)
and H = AH0 = X(I - l'u),
(5.10)
where Ho = tdiag(O,l,. ., l ) t - ' = I - l'u, and u is the first row oft-'. We will use conditions (Al), (B l)-(B3), and (Cl) to establish strong consistency, and conditions (A2), (B4), and (C2) to establish asymptotic normality. We now make some further remarks on the assumptions.
.
REMARK 5.4. If g(z,y) and p ( z ) are twice differentiable at points (v, v ) and 0, respectively, or the second partial derivatives of them are bounded in a neighborhood of the points (v, v ) and 0, respectively, then conditions (B4) and (C2) are satisfied with 6 = 314. REMARK5.5. Our conditions on the allocation rule are weaker than those used by Smith (1984). Conditions (BIHB3) are weaker than the conditions (i)-(iii) in Eisele (1994, 1995) or Eisele and Woodroofe (1995). Their (iv) is a global condition; our (B4) is a local one instead. Also, condition (C2) is weaker than their (vi). Any condition of the form of their condition (v) is not assumed in this paper. It is usually difficult to verify their (v) in applications.
5.4
ASYMPTOTIC RESULTS AND SOME EXAMPLES
Now we state the three main asymptotic theorems.
THEOREM 5.1. (Strong Consistency) Ifconditions (Al), (BI), (B2), (B3), and (Cl) are satisfied, then N,/n -+ v and &, + w almost surely.
80
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
THEOREM 5.2. (Rates ofConsistency) If conditions (A2), (Bl)-(B4), (Cl), and (C2) are satisfied and X < 1, then for any K. > (1/2) V A, n-'(N,
- nv) -+ 0 ass. and j3, - v = 0
({F) a.s.
Furthermore, if X < 1/2, then
THEOREM 5 . 3 . (Asymptotic Normality) Suppose conditions (A2), (B 1)-(B4), (C l), and (C2) are satisfied. If X < 1/2, then we have n'/'(N,/n
- v, Gn - v ) + N ( 0 ,A). W
In order to define A, we need additional notation. We can write A as a partitioned matrix (5.11) where All, A12, and X3 can be calculated by using martingales and Gaussian processes (Appendix A). We can obtain them as follows. Let
v,,= V m - ( X l , k )= [COV [Xl,kl,Xl,kj];i , j = 1,. . . ,d] ,
(5.12)
fork = 1,.. . , K . Define
v = X3
1
1
Vl
VK
diag(-Vl,...,- VK),
(5.13)
= = diag(v) - v'v,
X2 = E'X3E.
(5.15)
Also, we let Wt and Bt be two independent standard K-dimensional Brownian motions. Define the Gaussian process
to be the solution of the equation dGt = (dWt)C:'2
+ B t Ct : / 2 d t + -Hdt, Gt t
Go = 0,
ASYMPTOTIC RESULTS AND SOME EXAMPLES
81
and define uH to be
Further, let
and
The result in Theorem 5.3 can be strengthened, and we now present Theorem 5.3'. Theorem 5.3 is an immediate consequence of Theorem 5.3'.
THEOREM 5.3'. Suppose conditions (A2), (Bl)-(B4), (Cl), and (C2) are satisfied. If A < 1/2, then n-'12
pint] -
IntIu, IntIF,,t1 - [ntlv)
5 (Gt, &xy2)
in the space D[o,l~ with the Skorohod topology.
REMARK5.6. Ifcondition (B2) or (B2") is satisfied, then by (B.81), aH =
c-
j=O
(Xl0gu)j
j!
Ho = UXHo = UX(I - l'u).
Also, C11' = 0, El' = 0. So, H ~ C I H=OC1, HhC2Ho = C2 and EN0 = E . It follows that Aii =
and
N&Ho 1- 2 x
h x 2 H o - El 2x2 -+ ( 12-NX)(1 - 2 4 1- 2x + (1 - X)(1 - 2 4 '
82
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
The first part of Gt is a Gaussian process with covariance function tXs’-XX1/( 1 2X), which agrees with (3.1) of Smith (1984). If the desired allocation proportions are known, then the second part of Gt does not appear since E = 0 and C2 = 0.
REMARK 5.7. For K = 2, we have
REMARK 5.8. Theorem 5.1 shows that the allocation tends toward Efron’s biased coin design with the desired probabilities as the size of the experiment increases. Theorem 5.2 provides a law of the iterated logarithm for the procedure. This theorem also applies to the adaptive bias coin designs (Wei, 1978) and the designs in Smith (1984) and Wei, Smythe, and Smith (1986). The general variance-covariance formula in Theorem 5.3 is very important because it can be used when comparing the design with other sequential designs. Now we give two examples for the multi-treatment clinical trial.
EXAMPLE 5 . 1 1 . For K > 2, suppose the responses of patients on each treatment are also dichotomous (i.e. success or failure). Let pk = Pr(success1 treatment k ) and qk = 1 - p k , k = 1 , . ..,I<. Suppose 0 < p k < 1, k = 1 , . . . , K . Wei’s extension of the randomized play-the-winner rule in Example 4.2 also assigns more patients to better treatments and allows delayed response by the patient. By using his design, the limiting proportions of patients assigned are
However, the limiting distribution of the sample proportions N , strongly depends on the eigenvalues of the generating matrix of the model:
Let r = m a x { R e ( ~ 2 )., .. , Re(TK)}, where 71 = 1 , 7 2 , . . .,TI( are all the eigenvalues of M . Then the asymptotic normality of N , holds only when T 5 1/2 and the variances are very large when T equals or is close to 1/2. (cf: Smythe, 1996; Bai and Hu, 1999). However, when I( 2 3, the expression of r becomes very complex, and it is difficult or impossible to verify the condition T 5 1/2. Now we use the doubly-adaptive biased coin to assign the patients. We can choose the allocation function g(z,y) to be
83
ASYMPTOTIC RESULTS AND SOME EXAMPLES
where y 2 0 and L > 1 are constants. Here the function g depends on the constant L for technical reasons, but we can choose large L to reduce its influence. For this function,
..* l-vz... -v2
1-Vl -v1
-v2
-211
=: - T H O and E = (1 +-y)Ho,
* * .
where Ho = I - l‘v. Obviously, g(z,y ) satisfies conditions (BI) and (B4). Also,
so (B3) is satisfied. For verifying condition (B2), we let f k ( 5 k ) = T ( Z )=
k x k = l fk(2k).
{
2)4%)7)
A L,
Obviously, f k ( x k ) < v k if Z k > v k , k = 1,. .. ,K , and
It follows that T ( z )1 1. So,
Therefore condition (B2) is satisfied. Furthermore, since Hol‘ = 0, V(p)l’ = 0 and vl’ = 1, we have C2 = (l+y)2HbC3Ho= ( l + ~ ) ~ Z ’ and 3 C3E = ( l + y ) & H o = ( l + y ) & .
So if p ( z ) is chosen to satisfy conditions (Cl) and (C2), then by Theorems 5.1-5.3 and Remark 5.1,
and n’/2(N,/n - V , 6,
- V)
+
N(O,A)
in distribution, where (5.18)
84
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
Also, A l l + C3 as y -+ 00. If the desired allocation proportions are the same as that in Wei's design, then the proportion function p( a) is
and the estimate of p = ( p l , ,.. ,pk) is
where Sm,k is the number of successes of all the Nn,k patients on treatment k in the first m stages. In this case
where ej is the vector of which the j-th component is 1 and others are 0. Also, ifthe desired allocation proportions are ...! / CEl then we can choose
(m, m)
p(u) =
As y
4
m,
(d"GiT!...,d%iTGa)/g~-. j=l
co,we have an asymptotically best procedure for the specified target.
EXAMPLE 5.12. Consider a K-arm clinical trial with continuous responses. Suppose that the responses from k ( k = 1, ..., K )treatments follow a normal distribution with mean p k and variance u:, As an extension of Example 5.9 (Baldi Antognini and Giovagnoli, 2005), we can use the following limiting proportions:
We also use the allocation function (5.17). Based on Example 5.1 1, we know that this allocation function satisfies the conditions of Theorem 5.3. It is easy to see that the conditions of Theorems 5.1-5.3 are all satisfied. Then we can obtain the asymptotic distribution of N , as well as its asymptotic variance-covariance matrix. Here we derive the case K = 3 with details. Similar to Example 5.1, we can calculate the matrices V1, V z , and Vs as in (5.12). They are
PROVING THE MAIN THEOREMS
k = 1 , 2 , 3 . From (5.14), (5.15), and (5.16),
+ +u
= E3 = (61
02”
y
u:(u; + U S ) -u:u; -u:u3”
-0:u; -u:u; u f ( u : + u ~ ) -0;u; -I+; us(.: u;)
By Theorem 5.3,
in distribution. When y -+ this particular target.
5.5
00
+
85
I
.
we again have an asymptotically best procedure for
PROVING THE MAIN THEOREMS
In order to prove Theorems 5.1-5.3, we need to use several complicated formulations. The proofs are found in Hu and Zhang (2004). Here we sketch the basic ideas without giving the details. The interested reader can see the complete details in Appendix B. The basic techniques used in the proof are as follows: (a) Developing a matrix recursion (see Appendix A). (b) Using the Jordan decomposition of the matrix recursion (see Appendix A). (c) Determining the order of some terms in the matrix recursion and applying the martingale central limit theorem (see Appendix A) to the leading term. (d) Applying Gaussian approximations (see Appendix A) to calculate the covariance structure. Let Fi = u(T1,.. . , T , , X 1 , . . , X i ) be the sigma-algebra generated by the previous i stages. Under Fm-l, T mand X, are independent, and
.
Let M , = C:=, AM,, where A M , = T , - E[TmIFm-1].Then
Therefore
86
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
For both Theorems 5.1 and 5.2, we have to evaluate the asymptotic order of the Mm, three terms. We obtain strong consistency and asymptotic normality of because it is a martingale sum. However,
xi=,
depends on both
N,-1
and bn-l and
depends on N,-1. To evaluate these two terms, we need several lemmas. Lemma 5.1 and Lemma 5.2 deal with the matrix recursion. Lemma 5.3 is about a sequence recursion of numbers. Lemma 5.4and Lemma 5.5 are about the consistency and rate of convergence of the parameter estimators when N n , k + 00 almost surely for each k = 1, ...)K . LEMMA5.1. Let Bn,n= I and Bn,i = nyz:(I matrices Q, and P, satisfy Q, = P,
+ C YQ H n-l
k= 1
,
Q, = A P n + Qn-1
i.e.,
where A P 1 = P I ,AP, = P,
+ j - I H ) . If two sequences of
- Pn-l,n 2 2, are the differences of P,, then
(5.19)
Also,
llBn,mll 5 C(n/m)xlog"-'(n/m) for all m = 1 , . . . , n , n 2 1, where log%= ln(x V e ) . LEMMA5.2. If two sequences of matrices Q, and P, satisfy AQ, = A P n then for any 6 > 0,
+n-1
n-1
(5.20)
87
PROVING THE MAIN THEOREMS
LEMMA5.3. Let A0 2 0 and KO > 0 be two real numbers, and let {q,} be a sequence of nonnegative numbers and { p , } a sequence of real numbers for which qn
5
A0 + =)(qn-l
(1
V
K O )+ A p n ,
n 2 2,
where A p l = p l and A p , = p , - p , - ~ , n 2 2. Then there exists a constant which depends only on A0 such that
LEMMA5.4, For each k = l,.. . ,K, we have that
{&,k
+ m}
C >0
almost surely
implies
(47)
if (Al),
en& - e,,
=0
(5.21)
if(A2),
i = 1, ...,d.
LEMMA5.5. If conditions (Al), (B3), and (C1) are satisfied, then Nn,k almost surely, k = 1,.. . , I(, and
6 , + 0 and
p^,
-+
-,
00
v a.s.
Further, ifcondition (A2) is alsosatisfied, then &,ki--8ki = o(J(log log Nn,k)/Nn,k) almost surely, k = 1,. . . ,K , i = 1,. . . ,d. Under conditions (Al), (BI), (B2), (B3), and (CI),we have Nn,k
k = 1,...,K , a n d
6, * 0 and
j3,
-+
+ 00
as.,
v
almost surely from Lemma 5.5. Then by using Lemma 5.3, we can show the strong consistency of N , (Theorem 5.1). Theorem 5.2 can be shown similarly. Now we consider Theorem 5.3. First, we can represent both the parameter estimators and the target as martingale sums approximately. Thus, h
n and
a.s.
88
PROCEDURES BASED ON SEQUENTIAL ESTIMATION
where Q, is defined as Q, =
xi=, AQ,,
AQ, = (AQ,,I,. ..,AQ,,K)
with
= (AQ,,ki;i
= 1,.. . , d , k = l , . .. ! I ( )
Then Q, is a martingale sequence in R K x d , and Q, = O(d-) a.s. by condition (A2) and the law of the iterated logarithm from Theorem A.lO. Finally, the asymptotic normality of N , follows from the martingale central limit theorem from Theorem A.14. 0 5.6
REFERENCES
ATKINSON,A . C. (1982). Optimum biased coin designs for sequential clinical trials with prognostic factors. Biometrika 69 6 1-67. BAI,2. D. A N D Hu, F. (1999). Asymptotic theorem forurn modelswithnonhomogeneous generating matrices. Stochastic Processes and Their Applications 80 87-101. BALDIANTOGNINI, A. A N D GIOVAGNOLI, A. (2005). On the large sample optimality of sequential designs for comparing two or more treatments. Sequential Analysis 24 205-2 17. BANDYOPADHYAY, u. AND BISWAS,A. (2001). Adaptive designs for normal responses with prognostic factors. Biomefrika 88 409-4 19. EFRON,B. (1971). Forcing a sequential experiment to be balanced. Biometrika 62 347-3 52. EISELE,J. R. (1 994). The doubly adaptive biased coin design for sequential clinical trials. Journal of Statistical Planning and Inference 38 249-262. EISELE,J. R. (1995). Biased coin designs: some properties and applications. In Adaptive Designs (Flournoy, N . and Rosenberger, W. F., eds.). Institute of Mathematical Statistics, Hayward, 48-64. EISELE,J. R. AND WOODROOFE, M. (1995). Central limit theorems for doubly adaptive biased coin designs. Annals of Statistics 23 234-254. Hu, F. A N D ROSENBERGER, W. I?. (2003). Optimality, variability, power: evaluating response-adaptive randomization procedures for treatment comparisons. Journal of the American Statistical Associaiion 98 67 1-678. Hu, F. A N D ZHANG,L.-X. (2004). Asymptotic properties of doubly adaptive biased coin designs for multi-treatment clinical trials. Annals of Statistics 32 268-30 1. MELFI,v. AND PAGE, c.(2000). Estimation after adaptive allocation. Journal of Statistical Planning and Inference 29 107-1 16. ROSENBERGER, W. F., STALLARD, N., IVANOVA, A . , HARPER,C. N., A N D RICKS,M. L. (2001). Optimal adaptive designs for binary response trials. Biometrics 57 909-9 13. SILVEY,S. D. (1980). OpfimumDesign. Chapman and Hall, London.
REFERENCES
89
SMITH,R. L. (1984). Properties of biased coin designs in sequential clinical trials. Annals of Statistics 12 1018-1034. SMYTHE, R. T. (1996). Central limit theorems for urn models. Stochastic Processes and Their Applications 65 1 15-1 37. THOMPSON, W.R. (1933). On the likelihood that one unknown probability exceeds another in the review of the evidence of the two samples. Biometrika 25 275-294.
WEI, L. J . (1978). The adaptive biased coin design for sequential experiments. Annals of Statistics 6 92-100. WEI, L. J. (1979). The generalized P6lya’s urn design for sequential medical trials. Annals of Statistics 7 291-296. WEI, L. J., SMYTHE,R.T., AND SMITH,R. L. (1986). K-treatment comparisons with restricted randomization rules in clinical trials. Annals of Statistics 14 265-274. ZHANG,L. A N D ROSENBERGER, W. F. (2006). Response-adaptive randomization procedures in clinical trials with continuous outcomes. Biomerrics 62 562-569.
This Page Intentionally Left Blank
6 Sample Size Calculation
In the stages of planning a clinical trial, it is important to determine the number of subjects to be used in the trial. As pointed in Friedman, Furberg, and DeMets (1998):
Clinical trials should have sufficient statisticalpower to dctcct diffcrencesbctween groups considercd to be of clinical interest. Thercforc, calculation of sample size with provision for adequatc levels of significanceand power is an essential part of planning.
In the clinical trials literature, sample size is computed for fixed sample sizes and n = n1 nz. For example, both Eisele and Woodroofe (1995) and Rosenberger and Lachin (2002, page 26) assume that the allocations are fixed (not random) and predetermined. However, given a fixed sample size n of a randomization procedure, the number of subjects assigned to each treatment, n1 and n2, are random variables, unless the randomization procedure employed is a forced-balance design, such as the random allocation rule, truncated binomial design, or permuted block design (Rosenberger and Lachin, 2002). Therefore, as we have seen in Chapter 2, the power of a randomization procedure with a fixed sample size is also a random variable. If one uses a sample size based on the formula for a fixed design for a randomized clinical trial, then the clinical trial may not have sufficient statistical power with very high probability. Examples can be found in Section 6.3. In this chapter, we will study the random power function and then propose requisite sample size formulas for randomization procedures. n1, n2,
+
91
92
SAMPLE SIZE CALCULATION
6.1
POWER OF A RANDOMIZATION PROCEDURE
We now focus on the power function for comparing two treatments (1 and 2 ) in clinical trials. Suppose there are Nnl and Nn2 (Nnl Nn2 = n) patients on treatments 1 and 2 , respectively. Let X I , ...,X N , ~be the responses of patients on treatment 1 and Y1,...,Y N be ~ the ~ responses on treatment 2. For simplicity of illustration, we assume that x1 N ( P l , 4 andY1 " / 4 2 , &
+
'v
where both u? and uz are known and 1-11 and p2 are two unknown parameters. We only consider a one-sided hypothesis test, given by
Ho : p1 = p2 versus H 1 : p l > 1.12.
x
Without much difficulty, we can generalize the problem to a two-sided test. Let and y be the estimators of 1-11 and p2, respectively. For a given significance level a, we reject HOif
is defined as Pr(2 > qU))= where qOl) variable. The power function is now
cy
and 2 is a standard normal random
where Q is the cumulative distribution function ofthe standard normal distribution. It is clear that ,!?(PI,p2, Nnl, Nn2) is a random variable. Before we study the properties of this power function, we state the following condition.
ASSUMPTION6.1. For a given randomization procedure, we assume the following asymptotic results hold: N n l / n -+ v almost surely with 0 < v < 1
(6.3)
and
f i ( N , : / n - Y ) --$ N ( O , T ~ ) in distribution for some T~ > 0.
(6.4)
POWER O f A RANDOMIZATION PROCEDURE
93
REMARK6.1. Assumption 6.1 is usually true for most restricted randomization procedures. Theorem 4.3 and Theorem 5.3 ensure that Assumption 6.1 holds for most of the response-adaptive randomization procedures discussed in Chapters 4 and 5. To study properties of the power function, we define a function
After some simple calculations, we can obtain
where q!~is the density function of the standard normal distribution, and
We now state the following approximate result for the power function of a randomization procedure (Hu, 2006).
THEOREM 6.1. Under Assumption 6.1, we have the following approximation for large n:
94
SAMPLE SIZE CALCULATION
PROOF.From (6.6), we have
P( P I
1
C L 1~Nn 1 , Nn2
Now f,(lc) is adifferentiable function for small 1x1. When n is large, both @(a,) and an$(an) are bounded, because @(a,) converges to 0 much faster than a, converges to 03. Therefore fz(lc)is bounded for small 1x1. Based on Assumption 6.1 and a Taylor expansion,
+0.5f;(O)
(2
- u)
+ oP
((% ') - u)
When fL(0)# 0, we have
=
fi
(2
- u)
2
+ 0.5f~(O)(f~(O))-'fi (% n - U ) +~ ~ ( n - ' / ~ ) .
Because fL(0) # 0 and f i ( c l . 1 - p.2) is bounded by some constant, therefore, 0.5f;(O)(fA(O))-' is bounded. By Assumption 6.1,
n
REMARK 6.2. The function f,(O) is determined by u and represents the power for a fixed design (with u as the allocation proportion, that is, N,l/n = u). The value fn(0)is fixed for a given n. Also, fn(0) is maximized by taking v=-
01 01
+
u2
or Neyman allocation.
REMARK 6.3. When fA(0)# 0, the main random term is fA(0)(Nnl/n- u ) . The power function is mainly influenced by N,l/n - v, a random variable. The power will increase or decrease according the value of N,l/n. From Theorem 6.1, the power of a randomization procedure is approximately normally distributed for large n. To control the influence of this term, it is important to make fA(0) = 0, which is againNeymanallocation. When fA(0) = 0, the main random term is f:(0)(Nnl/n-
POWER OF A RANDOMIZATION PROCEDURE
95
v ) ~ .This is a second-order term and usually quite small. However, for most randomization procedures, f; (0) # 0.
For a randomization procedure, the average power is PO(P1, P2 1 n) =
EP(P1, P2, Nnl , K 2 ) .
Based on Theorem 6.1, we can obtain a result for the average power lost due to using a randomization procedure (Hu, 2006).
THEOREM 6.2. Under the assumptions of Theorem 6.1, if E(N,l/n) -v = o(n-I), then
P o ( P ~~,
2n ) , = f,(O)
+ 0.5ft(O)E (+- v)' + o (E (+- v)
')
When a, > 0, we have fl(0)< 0. Therefore, the average power lost from using a randomization procedure is -0.5f:(O)E(N,l/n - v )2 , which is a function of the variabililty of the randomization procedure.
PROOF. From Theorem 6.1, we have r3o(P1,
P2, n ) = E
m ,P2, Nn1, Nn2)
= fn(0) + fA(O)E( N n ~ / n- v)
(% - v) f,(O) + 0.5ft(O)E(*n +0.5fl(O)E
=
from the condition E(N,l/n) - u = .(TI-'). Because a, > 0, we need only show that
+o ( E (
-
- v)
')
.)'+ (* ') o(E
n - V)
Now we show that f " ( 0 ) < 0.
This is obtained as
-
(.a;
+ (1 - v)u:)(v3u; + (1 - v ) 3 4 ) - (v'u; - (1 - v) u2l )2 2
v4(1- 4 4
REMARK6.4. We are usually interested in the value n such that f,(O) is around power 0.8 or higher. In this case, the condition a, > 0 is satisfied. The condition E(N,l/n) - v = is weaker than the usual unbiasedness condition E ( N n l / n )- v = 0.
.(.-')
96
SAMPLE SIZE CALCULATION
REMARK6.5. By using a randomization procedure, the average loss of power is given by - 0 . 5 f ” ( 0 ) E ( N n l / n - v ) ~which , depends on the variability of the design. This agrees with results of Chapter 2. It is important to note that the average power lost has order n - l , and this order is not very significant in simulations. This may be the reason why the randomness is ignored in sample size calculations, because one usually only checks the average power of a fixed sample size by simulation. However, in practice, one only runs a clinical trial once; therefore, it is critical to consider the random power instead of the average power. Now we consider the following general case. Suppose f i 1 and li.2 are the corresponding estimators of p 1 and p2 based on the data. Let B: and 62”denote the corresponding variance estimates of f i f i . 1 and f i f i z , respectively. ASSUMPTION 6.2. Assume that as Nnl -+ 00 and Nn2 -+ co almost surely, (i) By -+ 0: and d; in probability and (ii) m(fi1 - p l ) -+ N ( O , o ; ) and m ( f i 2 - pz) -+ N ( O , o ; ) in distribution, where 0: and 0; are some positive constants. --$
02”
When the estimators f i 1 and f i z are maximum likelihood estimators, moment estimators, or estimators from some estimating functions, Assumption 6.2 is usually satisfied. Based on Assumption 6.2, for a given significance level a, we reject Ho if
We can then calculate the approximated power similar to Theorems 6.1 and 6.2. Details are provided in Hu (2006). 6.2
THREE TYPES OF SAMPLE SIZE
In this section, we consider the requisite sample size for a randomization procedure. Here we assume that a:, 0;and p1 - 112 are given. To achieve power p, it is required that
which, under Assumption 6.2, is approximately equivalent to
After some simple calculations, N,1 and Nn2 must satisfy
THREE TYPES
OF SAMPLE SIZE
97
For a fixed procedure, one has Nnl = nv and Nn2 = n(1 - v) (predetermined). The sample size no can then be calculated as
This is the requisite sample size for fixed design; here we call it the type I sample size.
REMARK6.6. The sample size formula in (6.8) is used in literature for both fixed procedures and randomization procedures. Eisele and Woodroofe (1 995) used this formula for doubly-adaptive biased coin designs. Rosenberger and Lachin (2002) also used it for randomization procedures. In the literature, the randomness of Nnl (of a randomization procedure) is ignored. As pointed out in Rosenberger and Lachin (2002, page 26), the randomness should not be ignored under response-adaptive randomization. But they did not study this problem further. For randomization procedures, both Nnl and Nn2 are random variables for a fixed n. Based on Assumption 6.1, we have Nnl + 00 and Nn2 00 almost surely as n -+ 00. Now from Assumption 6.2, we have -+
in distribution as both Nnl -+ 00 and Nnz under H1 can be approximated by
--f
00.
Therefore the power of the test
where 21 is a standard normal random variable and is the cumulative function of the standard normal distribution. From Assumption 6.1, we can replace Nnl and Nn2 by vn ~ f i Zand ( 1 v)n - r f i Z , respectively, where 2 is a standard normal random variable. The approximate power is then
+
in distribution. Thus, the mean power @&I, fixed n is approximately
p 2 , n) = E(P(p1,p2, Nn1 ,N n 2 ) ) for
POO(P~ ,P Z ,n) = ~ 8 ( ~P L1ZNn1, ,> ~ n 2 )
98
SAMPLE SIZE CALCULATION
To achieve a fixed power P on average, we just find the smallest n such that n ) 2 P. We refer to this as the fype 11sample size. From the above derivation, we have the following theorem (Hu, 2006).
&(pl,p2,
THEOREM 6.3. Under Assumptions 6.1 and 6.2, the sample size n1 (type 11) to achieve a fixed power P on average is the smallest n such that & ( p l , p 2 , n ) _> P, where the power function D o ( p l , p 2 ,n) is defined in (6.10).
REMARK6.7. In Theorems 6.1 and 6.2, we approximate the power function P ( p 1 , p2, N,1, Nn2) and the average power P o ( p I , p 2 ,n ) by Taylor expansion. In Theorem 6.3, we use the approximate normal distribution to calculate the corresponding average power and then calculate the requisite sample size. These two approximations are equivalent for large n. However, the ,&(PI, p2, n ) in (6.10) is much easier to implement. Also, the condition E ( N , l / n ) - v = o(n-') is not required in the calculation of sample size. From (6.9), we found that power depends on the proportion v and the variability 7 of a randomization procedure. When v is fixed, the power is a decreasing function of 7 . This has been demonstrated by simulation studies in Melfi and Page (1998) and Rosenberger et al. (2001). This agrees with Theorems 6.1 and 6.2. In practice, the clinical trial is only done once. Therefore, to achieve a certain power P on average is not enough. It is desirable to find a sample size n such that Pr(P(pI,p2, & I ,
N,z) 2
0)2 1 - P.
We refer to this as the type 111sample size. To achieve this, we have the following theorem (Hu, 2006).
THEOREM 6.4. Under Assumptions 6.1 and 6.2, the sample size 71.2 (type 111) to achieve a fixed power P with at least (1 - p)lOO% confidence is obtained (approximately) to be the smallest integer n which satisfies
and
where 7 is given in Assumption 6.1.
PROOF. We wish to find a sample size 122 such that for all n 2 np. By Assumption 6.2, this is approximately equivalent to
EXAMPLES
99
Thus, Nnl and Nnz must satisfy
If (6.11) and (6.12) are satisfied, then
is true for all
n v - z ( ~ / ~ ) T &< Nn1 < nu
+ ~ ( ~ 1 2 ) rand f i Nn2 = n - N n l .
Based on Assumption 6.1, we have approximately P r (nu - z ( ~ / ~ ) T &< Nnl < nu
+ Z(,,/~)T&)
= 1- p.
To ensure (6.13) approximately, we just need to find a sample size n such that both (6.1 1) and (6.12) are satisfied. The proof is now complete. 0
REMARK6.8. In Theorem 6.1, we obtained the approximate power function of a randomization procedure. When we calculated the requisite sample size (type 111) in Theorem 6.4, we did not use the result of Theorem 6.1 directly. This is because the approximate power in Theorem 6.1 depends on several terms; this makes it difficult to define a formula for the sample size. In Theorem 6.4, we derived the sample size formula directly from the definition of power, making it easier to understand and easier to calculate by using numerical methods. The results ofTheorems 6.3 and 6.4 are based on asymptotic properties of randomization procedures. In the following two sections, we use these results to calculate the requisite sample sizes and study their finite sample properties. For simplicity of notation, we will use no to represent the type I sample size from (6.8), n1 to represent the type I1 sample size from Theorem 6.3, and 722 to represent the type I11 sample size from Theorem 6.4. 6.3
EXAMPLES
In this section we calculate the sample sizes for restricted randomization procedures as well as response-adaptive randomization procedures. We will use the notation from Section 6.2. Also, we will use 1 - p = 0.9 in this section. 6.3.1
Restricted randomization
EXAMPLE6.1. COMPLETE RANDOMIZATION. Assign each patient to each treatment group with 1/2 probability. It is easy to see that N n l / n + 0.5 almost surely
100
SAMPLE SIZE CALCULATION
and in distribution. The sample size no is estimated as the smallest integer, which is (6.14)
The sample sizes n1 and n2 are defined in Theorems 6.3 and 6.4, respectively, with v = 1/2 and r = 1/2.
EXAMPLE 6.2. WEI’SURNDESIGN. Wei (1978) shows that 4(3b - a) in distribution. When a = 0 and b = 1, u = 1/2 and r2 = 1/12. We can then calculate sample sizes n1 and 122.
EXAMPLE 6.3. GENERALIZED BIASEDCOINDESIGN.Smith (1984) described a generalized biased coin design, which include Wei’s urn design as a special case. To describe the design, we first let N j l be the number of patients in the experimental treatment of the first j patients and Nj2 be the number of patients in the control treatment of the first j patients. Therefore N j l Njz = j . Assign the ( j 1)-th patient to treatment 1 with probability
+
+
N;2 N,71+ Nj’2* From Hu and Zhang (2004)’ we can show that
in distribution. Smith recommended the design with y = 5. In this case, v = 1/2 and the asymptotic variance is 1/44. The properties of the above three designs have been extensively studied in Chapter 3 of Rosenberger and Lachin (2002). Here we calculate the requisite sample sizes and then compare the results. The results in Table 6.1 show that n1 is greater than no by only 1 for both complete randomization and Wei’s urn design. For the generalized biased coin design, no and n1 are the same. This indicates that all three randomization procedures do not lose too much power on average compared to a fixed design. This confirms our finding that the average power lost has order n-l for a randomization procedure. We find that n 2 could be much larger than the sample size no. For example, for complete randomization with “1 = 1 and u2 = 2, no = 62, but n2 = 72 is required to achieve the fixed power (0.80) with probability 0.9. For this case, if a sample of
EXAMPLES
101
Table 6.1 Sample sizesfor complete randomization(CR), Wei 's urn design (UD). and Smith 's generalized biased coin design (GBC) (a = 0.05, p = 0.8, and 1.11 - 1.12 = 1).
n0
(CR) n2 (CR) nl (UD) nz (UD) 121 (GBC) 122 (GBC)
711
25 26 28 26 26 25 25
62 63 72 63 68 62 65
211 212 234 212 223 211 217
804 805 853 805 831 804 818
62 is used for a complete randomization, with 20% chance, the power is less than 0.76 when the target power is 0.80. This is due to the variation (1/4) of complete randomization. This disadvantage was pointed out by Efron (1971). For Wei's urn design and the generalized biased coin design (y = 5), the sample sizes 122 are 68 and 65, respectively. This is because both designs have much smaller variability. We did not calculate the sample sizes for Efron's biased coin design, because Assumption 6.1 is not satisfied. However, based on the simulation results on page 49 of Rosenberger and Lachin (2002), its sample sizes should be similar to that of the generalized biased coin design with y = 5. For the case that 6 1 = 02 = 1, fA(0) = 0 for Y = 1/2. In this case, the main random term in the power function is of order n-l. This explains the sample sizes n2 for both Wei's urn design and the generalized biased coin design.
6.3.2
Response-adaptive randomization
Following the notation in Section 6.2, when the variances 6: and 02" are known, then the optimal allocation for minimizing the total sample size and retaining preassigned power (Jennison and Turnbull, 2000) is Neyman allocation. In this case, u = al/(al 0 2 ) and fA(0) = 0. Therefore, the main random term of the power function has order n-l. However, 0: and 02" are usually unknown in practice. In these cases we can target the allocation using the methods in Chapter 5 .
+
EXAMPLE 6.4. DOUBLYADAPTIVE BIASED COIN DESIGN. HU and Zhang's (2004) randomization procedure is introduced in Example 5.7. Based on the asymptotic results of Theorem 5.3, we have
102
SAMPLE SIZE CALCULATION
Table 6.2 Samplesizesfor complete randomization(CR). doubly-adaptivebiasedcoin design (DBCD), and the sequential maximum likelihoodprocedure (SMLE) (a = 0.05, = 0.8, and pi - pa = 1).
no (CR) nl (CR) n2 (CR) no (DBCD) ~ . (SMLE) 1 n2 (SMLE) ~1 (DBCD,y = 1) 7 ~ (DBCD,y 2 = 1) 721 ( D B C D , y = 4 ) 122 (DBCD,y = 4)
25 26 28 25 28 31 26 28 26 27
62 63 72 56 58 63 57 59 57 58
211 212 234
155
157 163 156 159
156 158
804 805 853 501 504 510 502 506 502 505
almost surely and
in distribution. For fixed a and power p, no is estimated as the smallest integer larger than (01
+ O2Y(+) + q l - p ) ) 2 (111
- 112p
(6.15)
Then n1 and n2 are defined in Theorems 6.3 and 6.4, respectively, with
Note that when y = 0, we have the sequential maximum likelihood procedure of Melfi and Page (1998). For a given a = 0.05 and p = 0.8, Table 6.2 reports the required sample sizes (no,n1, and nz)for 11.1 - p2 = 1 and some different 0 1 and 0 2 values. Three designs (y = 0,1,4) are reported. From the results in Table 6.2, the sample size n1 is greater than no by 2 or 3 for each randomization procedure. This agrees with the theoretical results in Section 6.2. When 0 1 = 0 2 , the sample sizes no of the three designs are the same. The doublyadaptive biased coin design has substantial advantages over complete randomization when 01 is different from 6 2 . For example, when 01 = 1 and 0 2 = 8, the sample size
REFERENCES
103
no of the doubly-adaptive biased coin design is 501, which is significantly smaller than 804. So a well-chosen response-adaptive randomization procedure can reduce sample size significantly in clinical trials. From Table 6.2, we can find that 712 is slightly larger than no for the doublyadaptive biased coin design with y = 4. Therefore, without substantial change in sample size, a well-planned randomization procedure can still achieve a fixed power with high probability. Sample sizes in Table 6.2 are based on the power function (6.10), which depends on the asymptotic distribution of nl and 122. It is important to know their finite properties. Hu (2006) has compared the approximate average power function (6.10) with its simulated average power function. We consider the case with p1 = 1, p2 = 0, (TI = 1, I Y ~= 2, a = 0.95, and p = 0.80. The simulation shows that the average power function (6.10) provides a good approximation of the simulated average power. For randomization procedures, we can simulate the average power for each fixed sample size. Usually we can use Monte Carlo simulation to find the sample size nl to achieve a target power on average (as in Rosenberger and Hu, 2004). However, we cannot simulate the random power function for a fixedsample size. This is because we observe 1 (rejection) or 0 (acceptancc) from each simulation. This does not give us the distribution of the random power. Therefore, it is difficult to use Monte Carlo simulation to find the sample size 122.
6.4
REFERENCES
EFRON,B. (1971). Forcing a sequential experiment to be balanced. Biometrika 62 347-3 52. EISELE,J. R. AND WOODROOFE, M. (1995). Central limit theorems for doubly adaptive biased coin designs. Annals of Statistics 23 234-254. FRIEDMAN, L. M., FURBERG, C . D., AND DEMETS, D. L. (1998). Fundamentals of Clinical Trials. Springer, New York. Hu, F. (2006). Sample size and power of randomized designs. Unpublished manuscript. Hu, F. AND ZHANG, L.-X. (2004). Asymptotic properties of doubly adaptive biased coin designs for multi-treatment clinical trials. Annals of Statistics 32 268-301. JENNISON,c. AND TURNBULL, B. w. (2000), Group Sequential Methods with Application to Clinical Trials. Chapman and HalVCRC, Boca Raton. MELFI, v. AND PAGE, c. (1998). Variability in adaptive designs for estimation of success probabilities. In New Developments and Applications in Experimental Design (Floumoy, N., Rosenberger, W. F., and Wong, W. K., eds.). Institute of Mathematical Statistics, Hayward, 106-1 14. ROSENBERGER, W. F. AND Hu, F. (2004). Maximizing power and minimizing treatment failures in clinical trials. Clinical Trials 1 141-147. ROSENBERGER, w. F. AND LACHIN,J. M. (2002). Randomization in Clinical
104
SAMPLE SIZE CALCULATION Trials: Theory and Practice. Wiley, New York.
ROSENBERGER, W. F., STALLARD, N.,IVANOVA, A., HARPER,C. N.,AND RICKS,M. L. (2001). Optimal adaptivedesigns for binary response trials. Biometrics 57 909-9 13.
SMITH,R. L. (1984). Properties of biased coin designs in sequential clinical trials. Annals ofStatistics 12 1018-1034.
WEI, L. J. (1978). The adaptive biased coin design for sequential experiments. Annals of Statistics 6 92-100.
7 Additional Considerations
7.1
T H E EFFECTS OF DELAYED RESPONSE
From a practical perspective, there is no logistical difficulty in incorporating delayed responses into the response-adaptive randomization procedure, provided some responses become available during the recruitment and randomization period. For urn models, the urn is simply updated when responses become available (Wei, 1988). For procedures based on sequential estimation, estimates can be updated when data become available. Obviously, updates can be incorporated when groups of patients respond also, not just individuals. But how does a delay in response affect the procedures? Early papers evaluate the effects of delayed response by simulation using a priority queue data structure (e.g., Rosenberger and Seshaiyer, 1997). Bai, Hu, and Rosenberger (2002) were the first to explore the effects theoretically. In this section we present the condition required for our asymptotic results to be unaffected by staggered entry and delayed response. We find this condition to be satisfied for reasonable probability models. Bai, Hu, and Rosenberger (2002) assume a very general framework for delayed response under the generalized Friedman's urn with multinomial outcomes, and the delay mechanism can depend on the patient's entry time, treatment assignment, and response. We may not need that full generality in practice, but we state the general model for completeness. Asume a multinomial response model with responses <, = 1 if patient n had response I , 1 = 1,..., L. Let J,, be the treatment indicator for the nth patient, i.e., J,, = j if patient n was randomized to treatment j = 1,..., K, and let T, = (?",I, ...,T,K) satisfying T,J, = 1 and all other elements 0. Assume also that the entry time of the nth patient is t,, where { t , - t,-1} are independent 105
106
ADDITIONAL CONSIDERATIONS
and identically distributed for all n. The response time of the nth patient is denoted by ~ ~ 11, ( which j , has distribution gjl,j = 1,...,K ,1 = 1,...,L for all n, so that the distribution of the response times can depend on both the treatment assigned and the response observed. For the nth patient randomized to treatment j , only one possible response, say I , can be observed; thus it is convenient to define an indicator function to keep track of when patients respond, Mjl ( n ,m),which takes the value 1 if t, ~ ~ 1 () Ej (tn+,, , tn+m+l), rn > 0, and 0 otherwise. We assume, for n = 1 , 2 , ... that, givenj and I , { M j l ( n m , ) }are i.i.d. By definition, for every pair of n and j , there is only one pair ( l , r n )such that Mjl(n, m) = 1and Mjp(n,m')= 0 for all ( I , m) # (l', m'). We can define p j l m = E {M j l ( n ,m ) }as the probability that a patient on treatment j with response 1 will respond after m more patients are enrolled and before m 1 more patients are enrolled. Assume that
+
+
C pjlm= 1 for j = 1,...,K. l,m
Bai, Hu, and Rosenberger (2002) introduce the following condition on the delayed responses.
ASSUMPTION7.1. For some c E (0, CQ),
Assumption 7.1 implies that the probability that at least m additional patients will arrive prior to a patient's response is of order o(mdc).Hence, in practical examples, the delay cannot be very large relative to the entry stream. In practice, it is convenient to verify this assumption by examining the time to response variable -rn(j,I ) and the entry times t,.
THEOREM 7.1. When (i) El~~(j,l)I~l < 00 for each j , l and c1 > c, and (ii) E(t, - t i - 1 ) > 0, then Assumption 7.1 is satisfied. PROOF.
We have
THE EFFECTS OF DELAYED RESPONSE
107
almost surely. By the Chebyshev inequality, we have
As a simple example where Assumption 7.1 is easily satisfied, consider the following.
EXAMPLE 7.1. It is common to assume that patient entries are a random sample from auniform distribution on [0,It].From the well-known relationship between a random sample from a uniform distribution and the Poisson process, patient entry intervals can be considered to follow independent exponential distributions with parameter d. If response time is also exponentially distributed with parameter 8 (does not depend on the types of responses), then
Assumption 7.1 is clearly satisfied. For the generalized Friedman's urn models, Bai, Hu, and Rosenberger (2002) and Hu and Zhang (2004b) have shown the following result.
THEOREM 7.2. Theorem 4.1 holds under Assumption 7.1. In addition, if c 2 2 in Assumption 7.1, then Theorems 4.2 and 4.3 hold. For responses which are not multinomial outcomes, Hu et al. (2006) and Zhang et al. (2006) considered the following simple delay mechanisms. Let t , be the entry time of the n-th patient, where t , is an increasing sequence of random variables. The response time of the n-th patient on treatment k is denoted by ~ , ( k ) , k = 1,. . ., K. tn+m+l]} be an indicator function that Let M k ( n , m ) = I { t , 7 , ( k ) E takes the value 1 if the response is observed for the n-th patient on treatment k during the period between the time when the ( n m)-th patient is assigned and the time when the (n m 1)-th patient arrives, k = 1,. . ,K. Obviously, for every pair of n and k , there is only one m such that Mk(n,m) = 1and Mk(n,m') = 0 for all m' # m. We assume that for each fixed k and m, { M k ( n , m ) n , = 1 , 2 , . . .} is a sequence of independent and identical distributed random variables. This condition is quite natural, as { t n + l - t n } and { ~ , ( k ) } are usual sequences of independent and identically distributed random variables for fixed k. Of course, in a clinical trial, only responses that correspond to nonzero Tn,khfk(n,m), n = 1 , 2 , . ., m = 1 , 2 , .. ., k = 1,. . . ,K, will be observed. Define pm,k = E ( M k ( n , m ) ) . Notice that EmMk(n,m) = 1, which means that the delayed responses will also be observed, even though it may take slightly longer to obtain some of the responses. Thus, pi& = I and C z mpi$ ---* 0 as m + 00. Similar to Assumption 7.1, the following assumption is made about the convergence rate of pn,k.
+
+ +
+
.
.
Ci
108
ADDITIONAL CONSIDERATIONS
ASSUMPTION 7.2. For some c E (0, oo),
C 00
/&,k = Pr(T,(k)
> tn+m - t,)
= ~ ( r n - ~IC )=, I , . .
. , K.
i=m
For the DL rule and the GDL rule, Zhang et al. (2006) show the following result.
THEOREM 7.3. In Assumption 7.2, if c 2 2, then Theorem 4.6 holds. Hu etal. (2006) have shown that Assumption 7.2 applies also to Theorems 5.1-5.3 for the doubly-adaptive biased coin design. Here we state the main result.
THEOREM 7.4. Theorem 5.1 holds under Assumption 7.2. In addition, if c 2 2 in Assumption 7.2, then Theorems 5.2 and 5.3 hold. To prove Theorems 7.2, 7.3, and 7.4, one has to calculate the effect of the delay mechanisms. Basically one has to show that the extra terms caused by delayed responses can be bounded by o(n-'/'). However, the techniques to prove the three theorems are very different. For example, one has to consider the effect of the delay mechanisms on the estimators at each stage for the doubly-adaptive biased coin design (Theorem 7.4). In Appendix B, we only prove Theorem 7.2 and Theorem 7.4. For the proof of Theorem 7.3, refer to Bai, Hu, and Rosenberger (2002), Hu and Zhang (2004b) and Zhang et al. (2006) In conclusion, for most of the response-adaptive procedures we have investigated,a moderate delay in response will not affect the asymptotic properties of the procedure. For moderate sample sizes, simulating a priority queue under probabilistic delay and entry mechanism is a useful tool to determine the effect of delayed response on expected treatment failures, power, and other important factors. 7.2
CONTINUOUS RESPONSES
Thus far, we have focused (except for a few examples) on clinical trials with binary responses. While this provides a convenient framework for developing the theory because of its simplicity, we can extend all of our results to the analogous case where there are continuous responses. One trivial approach would be to dichotomize the continuous outcomes and use the methodology for binary responses, but this is clearly unsatisfactory. The literature on response-adaptive randomization dealing with continuous responses is much smaller and less developed. Rosenberger (1 993) probably first explored a general response-adaptive randomization procedure for general continuous responses. This was in the context of a mapping of the linear rank statistic which served as the bias of a coin for randomization. This approach has no particular optimal properties but has the advantage of being completely nonparametric. Rosenberger and Seshaiyer (1997) use a similar approach for survival time responses. For normal responses, Eisele and Woodroofe (1995) and Melfi and Page (2000) focused on sequential estimation designs that target Neyman allocation. There has
CONTlNUOUS RESPONSES
109
been more focus on normally distributed responses recently (e.g., Bandyopadhyay and Biswas, 2001; Atkinson and Biswas, 2004; Biswas and Mandal, 2004). Several of the examples in Chapter 5 dealt with normally distributed responses, and, unlike with urn models, there is no inherent difficulty applying the doubly-adaptive biased coin design with continuous responses, provided we are willing to make parametric assumptions. Zhang and Rosenberger (2006) take this approach in a recent paper, and we will summarize their results here. Zhang (2006) investigates standard probability models for survival time responses, but this is work in progress, complicated by the additional component in survival trials of censoring. Assume that we wish to compare two treatments with responses X A ,. N ( ~,ui) A and X g N ( ~ Bu;), , respectively. Consider the hypothesis N
Ho : A = 0
H A : A # 0,
versus
where A = p~ - p g . If we use the usual z-test statistic
then Z2 will be asymptotically x2 with 1 degree of freedom under Ho. The noncentrality parameter of the chi-square distribution under H A is given by
nn
'
no
and we write this in the same way as in Section 2.2 as
where x = nA/n - p and p is the targeted allocation proportion to treatment A. Expanding this function in a Taylor series about p yields
+
o((?-P)2)
(7.1)
Equation (7.1) is then the analog of (2.6) in the binary case. We again consider the expectation of this expression, leading to the first term which depends on the
110
ADDITIONAL CONSIDERATIONS
particular allocation target, the second term that is asymptotically unbiased, and the third term is a hnction of Var(NA(n)/n). Analysis then proceeds as in Chapter 2. As mentioned in Section 2.1, we can derive allocation proportions as the counterpart of the optimal allocation ofbinary responses (Rosenberger etal., 2001). Neyman allocation is given by p = U A / ( U A U B ) and, unlike in the binary case, is independent of the mean response. In the binary case, we minimized the expected number of treatment failures to obtain RSlHR allocation. This measure is no longer relevant for normal responses. For convenience, we will treat a smaller mean response as desirable, and we instead minimize the total expected response from all patients. We therefore have the following optimization problem:
+
(7.2) where K is some constant. Solving this problem yields P=
fiUA
m U A
(7.3)
+f i u B '
As we noted in Section 2.1, when p~ < p ~ there , are possible values of U A and UE which makes n ~ / n g less than 1/2. While this may maximize power for fixed expected treatment failures, it is not appropriate to allocate more patients to the inferior treatment. Define r = UA&/UB& and s=
{0
1 if ( / L A < /I,B and r
> 1) or
(PA
> p~ and r < I),
otherwise.
(7.4)
We modify the allocation as follows.
We will refer to this as Zhang and Rosenberger allocation. Recently Biswas and Mandal(2004) generalized the binary optimal allocation for normal responses in terms of failures. Specifically, they minimized
+
instead of n ~ j i n~ ~ in p (7.2),~where c is a constant and @(.)is the cumulative density function of the standard normal distribution. This amounts to minimizing the total number of patients with response greater than c. The corresponding allocation rule is J P3
=
w
u
+
A
J W 0 J. 0.
'
CONTINUOUS RESPONSES
111
which assigns the next patient to treatment A with probability b. We will refer to this as Biswas and Mandal allocation. In Example 5.4,we noted an allocation by Bandyopadhyay and Biswas (2001). This derives as the limiting allocation from a response-adaptive randomization procedure which assigns the ( j 1)th patient to treatment A with probability $4, with
+
where T is a scaling factor and is the esfimate of p based on the responses of the first j patients. Note that this procedure is defined without regard to any formal optimal property. We will refer to this as Banyopadhyay and Biswas allocation.
7.2.1
Asymptotic variance of the four procedures
Each of the four allocations in the previous section for normal responses, Neyman, Zhang and Rosenberger, Biswas and Mandal, and Bandyopadhyay and Biswas, can be implemented using sequential estimation procedures. In particular, Zhang and Rosenberger (2006) recommended that Zhang and Rosenberger allocation be used in conjunction with Hu and Zhang’s (2004a) procedure and y = 2. Biswas and Mandal (2004) and Bandyopadhyay and Biswas (200 1) allocations are both implemented using Melfi and Page’s (2000) approach, which are both a special case of Hu and Zhang’s procedure with 7 = 0. (There is no reason why they cannot also be used with 7 = 2.) Using Hu and Zhang (2004a), we can now derive the asymptotic variance of the four procedures. As in Chapter 5, define = (X,’, Xi)’, i = A, B. Let Bi = E
d e .
+
Let V ( p )= (pLAl,p;,, 5.3,we know that
,pLB,, phoa)’ and 03”
:= V(p)’VV(p). Then from Theorem
in law, with
So the asymptotic variance Of N ~ ( n ) /isnthe left upper element of C . The asymptotic variance for Neyman allocation under Page and Melfi’s procedure (Hu and Zhang’s procedure with y = 0) is given in Example 5.2.
112
PA
ADDITIONAL CONSIDERATIONS
For Zhang and Rosenberger allocation, we assume > 1.1~leads to the same result. We can write
PA
<
PB.
The case of
+
Then X = -a and y = 1 a. It is easy to see that
where i = A , B. For convenience, define D = UB& case, differentiation gives UB
=
( 2aAD2
-U Bm ( 2 &
’
+
g ~ f i .
+ 0;) - .A d m ’
2U~fiD’
For the first
U A &(2&
2 . ~ 0 ’~
+ 0;)
2 . ~ 6 0 ~
It follows that 03”
= V(P)’VV(P) - b i c A f i ( b i + 2 P i ) + O i O B fi(d +2P%) =
Hence n x Var
[’ + f G(1- P ) +I)&
(T) =
4~ D 3
4 ~ D3 0
(PA
P(1
(2 + 1
- P).
+ 2a
For the second case, it is much easier since V ( p )= 0; hence u; = 0. Then n x Var(NA/n) z p(1 - p ) / ( l + 2 a ) . To obtain the asymptotic variance of Biswas and Mandal’s procedure with p = J w u A / E , where
let
MULTIPLE (K > 2 ) TREATMENTS
113
Then and then
= (p:,
P(1
+ 2 p E i ) u : / 2 , where i = A, B. We therefore obtain - P) + 2(1+
a)l(P:,
+ 2 P ; , ) a 2 P + (dB+ 2 P ; B ) 4 / 2 ( 1 1
+2 a
- P ) l . (7,7)
The asymptotic variance of Bandyopadhyay and Biswas's procedure is found in Example 5.3.
7.3
MULTIPLE
(K > 2) TREATMENTS
We now return to the formulation of optimization problems (2.7) and (2.8) when K > 2. Recall that we found a closed-form solution only for the case where w = 1, and this is the generalization of Neyman allocation. The solution is quite complex and is given in (2.10). If we wish to implement the optimal allocation using the doubly-adaptive biased coin design, we require the assumption from Section 5.2 that p E (0, l ) K and that p is continuous and twice continuously differentiable. We will therefore need to set B > 0. Unfortunately, this alone is not sufficient, because p in (2.10) is not continuous. Tymofyeyev, Rosenberger, and Hu (2006) propose to apply a smoothing transformation to p ( p 1 , . . . , p ~that ) would produce a new allocation function fi(p1, . . . , pK). The idea is that @ ( P I ,...,p ~ resembles ) p ( p 1 , ...,p ~ and, ) at the same time, is a continuous function of the pi's. We define
...,
i q P l , . . . , P K ) = (Pl(PI,...,PK),,..,PK(Pl, PK))
to be the convolution of p ( p l , . . . , p ~ )with some kernel function h(p1, ...,p K ) . The ith component of fi(p1, . . . , p ~is) given by
114
ADDITIONAL CONSIDERATIONS
The choice ofthe smoothing kernel h(p1, ...,p ~allows ) some flexibility for practical implementation. It can be a common filtering function, for example, a multivariate Gaussian kernel, given by
The parameter a: tunes how aggressively the smoothing is done. The resulting smoothed allocation can then be applied in practice using Hu and Zhang’s function for the doubly-adaptive biased coin design. Unfortuanely, closedform solutions are not available for the most important case, namely, w = q. We address this problem by simulation in the next chapter. 7.4
ACCOMMODATING HETEROGENEITY
Throughout this book we have assumed that the probability distribution of the response is homogeneous among observations with the same covariate values. This assumption can be violated when observations are selected sequentially and the distribution of response is time heterogeneous. Several examples of time heterogeneity have been described in Altman and Royston (1 988). For example, they described one clinical trial where characteristics of patients changed over the course of recruitment, and therefore the probability of response to treatments differed among patients recruited at different times. Examples of heterogeneous responses are also discussed in Coad (1991). To accommodate heterogeneity of responses in responseadaptive randomization, one has to solve the following problems. What are the properties of response-adaptive randomization under heterogeneity of responses? How is statistical inference performed after using response-adaptive randomization with heterogeneous responses? In this section, we will try to answer these questions. First, we consider the properties of randomization procedures based on urn models and sequential estimation, respectively. Then we propose to analyze the data using weighted likelihoods. 7.4.1
Heterogeneity based on time trends
We assume the same data structure as in Section 1.1, except that we assume that the responses are not identically distributed. Thus
xij fj(.,ej(i)),i = I , ...,n,j
= 1, ...,K ,
where O j ( i ) E O j . We thus assume that X i is independent of X I ,...,X , - I . For the generalized Friedman’s urn model from Chapter 4, ej(i)= pj(i), where pj(i) is the probability of success on treatment j of patient i. The distribution of the responses arises in the model only through the generating matrix, given by H i , which is a function of the pj(i)’s, j = 1,...,K . This heterogeneous generating matrix was introduced in Chapter 4. The key conditions on the generating matrix are
ACCOMMODATINGHETEROGENEITY
115
given in Assumptions 4.3 and 4.4. Under these two conditions, the usual asymptotic properties in Theorems 4.1-4.3 hold. Assumptions 4.3 and 4.4 hold when the pj(i)’s converge to a constant p j at a certain rate. In particular, Assumption 4.3 holds if
c 00
- P,l < oo
(7.9)
i=1
and Assumption 4.4 holds if (7.10) The theoretical proof can be found in Bai and Hu (1999,2005). REMARK7.1. In practice, conditions (7.9) and (7.10) mean that the success probabilities across treatments have to converge to a constant probability over the patient sequence. Thus these conditions may be satisfied by only a subset of clinical trials that may exhibit a time trend. REMARK7.2. We conjecture that conditions similar to (7.9) and (7.10) apply also to the DL rule and to procedures based on sequential estimation (Chapter 5), but these are open problems.
7.4.2
Heterogeneity based on covariates
Now presume that there is a covariate vector of interest Zi for patient i = 1,..., n and 21, ...,2, are independent and identically distributed. We now assume that
xij
-
f,(., ej(zi)),i = 1, ...,,,j = 1, ...,K ,
where Oj(Zi) E O j . Thus the patient’s response probabilities change according to the patient’s covariate structure. Let O j = E ( O j ( Z , ) ) ,where the expectation is taken with respect to Zi. For the generalized Friedman’s urn model, Hi is now a random variable that depends on O j ( Z , ) , j = 1,...,K. We take N = E ( H i ) ,where the expectation is taken with respect to Zi. Then Assumptions 4.3 and 4.4 are satisfied. Remark 7.2 applies here as well. 7.4.3
Statistical inference under heterogeneity
We now describe weighted likelihood techniques to establish estimators under heterogeneity. The basis for weighted likelihood techniques is found in Hu (1997), and the paper also proves the asymptotic normality of maximum weighted likelihood estimators for independent data. Hu, Rosenberger, and Zidek (2000) derive similar results for dependent data. The application of weighted likelihood techniques to response-adaptive randomization with heterogeneity derives from Hu and Rosenberger (2000).
116
ADDITIONAL CONSIDERATIONS
We consider only the generalized Friedman's urn here, but conditions should also be satisfied by other response-adaptive randomization procedures, provided allocation proportions converge to a constant limit. However, we have not investigated this yet. For the general theory of maximum weighted likelihood estimators, refer to Hu, Rosenberger, and Zidek (2000). Let T I ,...,T , be the treatment assignments as defined in Chapter 1 . Let pj(i) be the probability of success of patient i on treatment j , i = 1, ...,n, j = 1, ..., K . Let X I ,...,X , be the responses, where each element X i j takes the value 1 or 0 with probability p j ( i ) or q i ( i ) = 1 - pj(i),respectively. The full likelihood is proportional to n
K
However, this likelihood is not useful, because it is overparameterized. Because of condition (7.9) or (7.10), we have P , = (Pl(n),*..,Pdn)) + P = (Plr...,PK)
(7.11)
for some vector p . The general framework of weighted likelihoods then replaces p , by p incorporating relevance weights. Relevance weights are selected by downweighting the influence of a subject's parameter according to its distance from p . We denote them as wni, i = 1, ...,n, and we assume that each w,, > 0 and CZ,w,, = 1. The weighted likelihood is given by
This leads to the maximum weighted likelihood estimator, given by
Now we can apply the theorems of Hu, Rosenberger, and Zidek (2000) to give asymptotic properties of the maximum weighted likelihood estimators. If condition (7.9) or (7.10) holds, then JqTijIFi-1) + vj
(7.12)
in probability, where Fi and vj are defined in Section 4.1.2. Then by (7.12), we have n
(7.13) and n
n
(7.14)
ACCOMMODAJING HEJEROGENEITY
117
in probability. We can choose wn, to satisfy the following condition: max
WE,
k l ,...,n
CLl w:i
(7.15) *
Note that (7.15) is a Lindeberg-type condition. We can now state the following theorem.
THEOREM 7.5. Assume that either condition (7.9) or (7.10) holds and that the relevance weights are chosen such that condition (7.15) holds. Then
--$
N(O9Pjelvj)
in distribution. In addition, if the relevance weights are chosen in such a way that n
~ ~C7w'n i ( p j ( i )-pj)E(T:jIFi-l)
+
0
i=l
in probability, then
(gw:,)
-1/2
(#jb) - P j )
+
N(O,Pjqj/vj)
(7.16)
a= 1
in distribution. There are two approaches to selecting the relevance weights. The first approach is to minimize the mean squared error. While this solution is easy to obtain, it may not be particularly useful, as it depends on p j ( i ) - p j , which is generally unknown. An alternate method is to choose the weight function wni = fn,(A), where f is an increasing function of i and A is some parameter. Then the mean squared error is minimized with respect to A. Hu and Rosenberger (2000) suggest several forms of fitnctions f . For clinical trials with possible time trends, Altman and Royston (1988) recommend the use of CUSUM plots. The weighted likelihood methodology is particularly useful when the contribution of early patients needs to be downweighted. Time trends may be due to a drift in patient characteristics or a learning curve among study personnel. For a drift in patient characteristics, condition (7.1 1) assumes that the underlying success probabilities converge, and that we are interested in consistently estimating the underlying success probabilities at the point of convergence. These methods are not applicable if there is interest in estimating success probabilities for early patients and downweighting later patients. In this case, we could obtain an estimator, but Theorem 7.5 would not hold. In the case of a learning curve among study personnel in implementing the protocol or making diagnostic assessments,
118
ADDITIONAL CONSIDERATIONS
heterogeneity can be introduced in patient responses. Downweighting early patients will allow estimation for patients treated according to protocol. If heterogeneity is introduced via differing covariate structures among patients, one can use covariate information to obtain a more specific solution to the inference problem. In particular, relevance weights can be chosen according to some smoothing techniques across the covariates.
7.5
REFERENCES
ALTMAN,D. G . AND ROYSTON,J . P . (1 988). The hidden effect of time. Statistics in Medicine 7 629-637. ATKINSON,A . C. AND BISWAS,A. (2004). Adaptive biased-coin designs for skewing the allocation proportion in clinical trials with normal responses. Statistics in Medicine 24 2477-2492. BAI,Z. D. AND Hu, F. (1999). Asymptotic theoremforurnmodelswithnonhomogeneous generating matrices. Stochastic Processes and Their Applications 80 87-101. BAI, Z. D. A N D Hu, F. (2005). Asymptotics of randomized urn models. Annals ofApplied Probability 15 914-940. BAI,Z. D., Hu, F . , AND ROSENBERGER, W. F. (2002). Asymptotic properties of adaptive designs with delayed response. Annals ofstatistics 30 122-139. BANDYOPADHYAY, U . AND BISWAS,A . (2001). Adaptive designs for normal responses with prognostic factors. Biometrika 88 409-4 19. BISWAS,A. AND MANDAL,S. (2004). Optimal adaptive designs in phase 111 clinical trials for continuous responses with covariates. In mODa7-Advances in Model-Oriented Design and Analysis (Di Bucchianico, A., Lauter, H., and Wynn, H. P,, eds.). Physica-Verlag, Heidelberg, 5 1-58. COAD,D. S. (1991). Sequential tests for an unstable response variable. Biometrika 78 113-121. EISELE,J. R. A N D WOODROOFE, M. (1995). Central limit theorems for doubly adaptive biased coin designs. Annals of Statistics 23 234-254. Hu, F. (1997). The asymptotic properties of the maximum relevance weighted likelihood estimators. Canadian Journal of Statistics 25 45-59. Hu, F. AND ROSENBERGER, W. F. (2000). Analysis of time trends in adaptive designs with application to a neurophysiology experiment. Statistics in Medicine 19 2067-2075. HU, F., ROSENBERGER, w. F., AND ZIDEK,J. V. (2000). Relevance weighted likelihood for dependent data. Metrika 51 223-243. HU, F. AND ZHANG, L.-X. (2004a). Asymptotic properties of doubly adaptive biased coin designs for multitreatment clinical trials. Annals of Statistics 32 268-301. HU, F. AND Z H A N G , L.-x. (2004b). Asymptotic normality of adaptive designs with delayed response. Bernoulli 10 447-463. HU, F., ZHANG,L.-X., CHEUNG,S. H., AND CHAN, W. S. (2006). Doubly
REFERENCES
119
adaptive biased coin designs with delayed responses. Submitted. A N D PAGE, c. (2000). Estimation after adaptive allocation.Journal of Statistical Planning and Inference 81 353-363. ROSENBERGER, W. F. (1 993). Asymptotic inference with response-adaptive treatment allocation designs. Annals of Statistics 21 2098-2 107. ROSENBERGER, W. F. AND SESHAIYER, P. (1997). Adaptive survival trials. Journal of Biopharmaceutical Statistics 7 6 17-624. ROSENBERGER, W. F., STALLARD, N., IVANOVA, A . , HARPER,c. N., A N D RICKS,M. L. (2001). Optimal adaptive designs for binary response trials. Biometrics 57 909-9 13. TYMOFYEYEV, Y., ROSENBERGER, W. F . , AND Hu, F. (2006). Implementing optimal allocation in binary response experiments. Submitted. WEI, L. J. (1988). Exact two-sample permutation tests based on the randomized play-the-winner rule. Biometrika 75 603-606. ZHANG,L. (2006). Response Adaptive Randomization in Clinical Trials with Continuous and Survival Time Outcomes. University of Maryland Graduate School, Baltimore (doctoral thesis). ZHANG,L. A N D ROSENBERGER, W. F. (2006). Response-adaptive randomization for clinical trials with continuous outcomes. Biometrics 62 562-569. ZHANG,L.-X., CHAN,W. S., CHEUNG,S. H., AND Hu, F. (2006). Ageneralized drop-the-loser urn for clinical with delayed responses. Statistic Sinica, in press.
MELFI,v. F.
This Page Intentionally Left Blank
8 Implications for the Practice of Clinical Trials
8.1
STANDARDS
In this book, we have presented a complete theory of response-adaptiverandomization and have answered numerous questions that may have hindered the use of these procedures in the past. We now address the potential implications of these theoretical developments for the practice of clinical trials. Chapter 12 of Rosenberger and Lachin (2002) outlines ethical and logistical considerations in the implementation of response-adaptive randomization, and we feel that the chapter gives a thorough assessment to which this book has little to add. In particular, information technology for clinical trials has progressed to the point that there is no need to consider that aspect further in this context. The chapter in Rosenberger and Lachin also gives some historical context that is useful in understanding why there has been a general reluctance to use response-adaptive randomization in practice. We now contend that sufficient information has been given in this book to merit taking a fresh look at response-adaptive randomization. This book is full of compromises and trade-offs. What are our goals? We wish to assign more patients to the better treatment in a clinical trial. However, if we assign too many patients to the better treatment, we may sacrifice inferential power. To make up for a loss of power, we may have to increase the sample size in the clinical trial, which has implications in terms of ethical and monetary costs. We want a trial that is fully randomized to reduce bias, but the very act of randomization can induce additional variability that may lead to a loss of power. And randomization means necessarily that some patients will be assigned to the currently inferior treatment by chance alone. A quotation from Tamura el al. (1994, p. 775) summarizes the dilemma: 121
122
IMPLICATIONS FOR THE PRACTICE OF CLINICAL TRIALS
We belicve that because [rcsponse-adaptive randomization] represents a middle ground between the community benefit and the individual paticnt benefit, it is subject to attack
from eithcr sidc. Response-adaptive randomization is not relevant for many clinical trials, in particular, long-term clinical trials with limited recruitment periods. There obviously must be a substantial number of responses available before all patients are randomized. Let us assume that a clinical trial’s time scale does permit response-adaptive randomization. The guiding principle in this book is that the only statistical difference between a standard randomized clinical trial and a response-adaptive clinical trial should be a simple change in the allocation probabilities. By this we mean that important statistical conditions of the clinical trial should be preserved: I . Standard inferential tests can be used at the conclusion of the trial. We have shown that very simple conditions ensure asymptotic normality of maximum likelihood estimators, and that the asymptotic null distribution of test statistics is not affected by the response-adaptive randomization.
2. Power should be preserved. We have given a theoretical template for the comparison of response-adaptive randomization procedures and target allocations in terms of power.
3. The clinical trials should be fully randomized to protect from biases and provide a basis for inference. Assuming 1-3 hold, the response-adaptive randomization procedure should ensure that the expected number of treatment failures (or an analogous measure for other types of responses) be reduced over standard restricted and complete randomization. Note that restrictions 2 and 3 are quite stringent and will make the resulting skewing of the randomization conservative. One major consideration in determining whether response-adaptive randomization will be beneficial is the assessment of the expected number of treatment failures under restrictions 2 and 3. A modest reduction in failures may be critically important when the outcome of the trial is grave, and the decision should be calibrated according to the particular trial. This book provides a theoretical template for asymptotic analysis of responseadaptive randomization procedures. This template will be appropriate for many clinical trials with moderate to large numbers of patients. Theorem 7.1 gives us conditions on delayed response and patient entry that will allow the results to hold with delayed response asymptotically. For small to moderate-sized trials, simulation will be useful to determine properties of the response-adaptive randomization procedure under a wide variety of models to ensure robustness in case design assumptions are wrong. In this chapter, we will perform a thorough asymptotic comparison of procedures using this theoretical template and show how these results compare to simulated results with moderate sample sizes.
BINARY RESPONSES
8.2
123
BINARY RESPONSES
The basic template for theoretical comparison of response-adaptive randomization procedures, based on our theoretical development, is based on three terms of the Taylor series expansion, given in (2.12). Term (I) depicts the effect on the noncentrality parameter of the target allocation p . Term (ZZ)depends largely on the rate of convergence to the target allocation, and term (111) represents the effect of the variability of the randomization procedure. For binary responses, Hu and Rosenberger (2003) perform a graphical comparison of measures (I) and (111) and also of the expected number of treatment failures; these graphs essentially summarize the basic properties of the procedures we have discussed. They first compare the three target allocations: Neyman allocation, RSIHR allocation, and urn allocation (the limiting proportion in Example 2.1, given by p = q g / ( q A qg)). Figure 8.1 shows the approximate value of the noncentrality parameter (term (I)) for p~ = 0.1,0.4,0.7, P A 2 p g . It is clear that Neyman allocation maximizes (I), and the maximum is given by
+
RSIHR allocation, which for fixed power minimizes expected treatment failures, is also quite powerful, particularly for smaller values of p~ and p ~but, urn allocation has severe decreases in power as P A and p g increase. Figure 8.2 gives the expected proportion of treatment failures for the three values of p (for the same values of P A , p g in Figure 8. I), given by PqA -k (1 - p)qB*
+
One can see that Neyman allocation is clearly undesirable when P A p g > 1, and urn allocation is quite favorable, especially at levels where power is quite low. The most desirable allocation, in terms of preserving power and protecting patients, is RSIHR allocation, since it is based on a multiple objective optimality criterion. For fixed allocation proportions, assuming no bias, the power is completely determined (up to an order) by the asymptotic variance of N A ( n ) / n . To directly compare the urn procedures with procedures based on sequential estimation, we must examine only p = qBI(q.4 q B ) , since the urn models cannot target other allocations. We examine four procedures for binary responses: the randomized play-the-winner (RPW) rule, the drop-the-loser (DL) rule, and the doubly-adaptive biased coin design (DBCD) (see Example 5.4) for y = 0 and y = 2. For the four examples, the variances are plotted in Figure 8.3 for the same values O f p A ,p~ as in Figure 8.1 (the RPW is not plotted for P A p g 2 3 / 2 since the Taylor series expansion does not apply). It is clear that the DL rule is the most powerful procedure. In many cases, the variance is smaller than that produced by complete randomization, which has a constant variance of 0.25. Also quite powerful is the DBCD with y = 2. The RPW rule has little to recommend it in terms of increased variability. Clearly the DL rule is superior if one is interested only in the urn limiting allocation. But the DBCD has the clear advantage of targeting any desired allocation.
+
+
124
IMPLICATIONS FOR THE PRACTICE OF CLINICAL TRIALS PB’Ol
3
- Urn - - Ne-m
RSIHR __
,, -
25-
2-
t
I
-
,
r. 05-
0 0 7 0 2
0 35-
013
01.
015
d8
017
01.
i
00
- Urn
.. Norna”
03-
0 2 5 ~
RSlHR
,
!
01
1015-
01-
005-
Fig. 8.1 Noncentralityparameter forjixed allocation targets p: Neyman allocation. RSIHR allocation, and urn allocation,for p g = 0.1,0.5,0.7,~,4 2 p ~ From . Hii and Rosenbetger (2003). reprinted with permissionfrom The Journal of the American Statistical Association, copyright 2003 by the American Statistical Association. all rights reserved.
BINARY RESPONSES
-
p 4 pss - 00.4
07
125
- - Neymn
1
- - - _ _ _ _ _- --
02
\
0lr
84
05
0.6
07
09
08
Q*
1 0065
07
075
09
PA
085
09
005
Fig. 8.2 Expected failure rate for jixed allocation targets p : Neyman allocation, RSIHR allocation, and urn allocation,for PB = 0.1,0.5,0.7,p~ 1 p ~ From . Hu and Rosenberger (2003). reprinted with permission from The Journal ofthe American Statistical Association. copyright2003 by the American Statistical Association. all rights reserved
126
IMPLICATIONS FOR THE PRACTICE OF CLINICAL TRIALS
0' 01
03
02
04
05
06
07
OQ
08
PA 09-04
14
DBCD
1.2
04
0s
06
07 PA
08
08
_ -_ _ - _- - - - - _ _
Fig. 8.3 Asymptotic variance of N A ( n ) / n (multiplied by n) for four response-adaptive randomization procedures in Table 8.1, PO = 0 . 1 , 0 . 5 , 0 . 7 , p ~2 p~ (RPW not plotted for P A -I-PO 2 3/2). From Hu and Rosenberger (2003), reprinted with permission from The Journal of the American Statistical Association. copyright 2003 by the American Stahtical Association, all rights reserved.
BINARY RESPONSES
127
Table 8.1 Asymptotic and simitlaied mean of the allocation proportions N ~ ( n ) /for n the randomizedplay-the-winner(RPW) rule, drop-the-loser(DL) rule, anddoubly-adaptivebiased coin design (DBCD). each targeting p = QB/(qA 4 0 ) . Sirnulalions based on n = 100 and 1000 replications. From Hu and Rosenberger (2003). reprinted wilh permissionfmm The Journal of ihe American Baiistical Association, copyright 2003 by ihe American Statistical Association, all rights reserved.
+
(PA,~E)
p
0.50 0.60 0.63 0.70 0.50 0.62 0.50
(0.8,0.8) (0.8,0.7)
(0.7,0.5) (0.7,0.3) (0.5,0.5) (0.5,0.2) (0.2,0.2)
RPW
DL
DBCD(y = 0 )
DBCD(y = 2)
0.50 0.59 0.6 1 0.68 0.50 0.60 0.50
0.50 0.60 0.62 0.69 0.50 0.61 0.50
0.50 0.50 0.57 0.57 0.61 0.60 0.68 0.68 0.50 0.50 0.61 0.61 0.50 0.50
Rosenberger and Hu (2004) also investigated the accuracy of the asymptotic approximations for moderate sample sizes using simulation. The results appear in Tables 8.1 and 8.2, and are based on n = 100 and 1000 replications. For the urn designs (DL and RPW) they started with five balls of each type in the urn. For the DBCD, they began with five patients assigned to each treatment and then implemented the rule for 90 more patients. Again, only the urn allocation was explored to allow comparison of the urn designs to the DBCD. The DBCD with y = 2 attained the target allocation quite accurately, and the simulated and asymptotic variances agreed quite well. While the simulated variance of the DL rule was substantially smaller than that of the DBCD in some cases, the simulated allocation was far from p, except when p = 0.5. The DL rule seemed quite sensitive to the initial urn composition. Convergence properties were quite good when the sample size was increased to n = 500. The simulated results for the RPW and the DBCD procedure with y = 0 were quite far from the asymptotic values. Table 8.3 demonstrates the simulated power of the usual test statistic -$I3 z=-@ A=’
JG
+
where p = ( @ A @ B ) / 2 , = 1 - p , with a = 0.05 (two-sided), for complete randomization, the RPW rule, and the DL rule. Also given are the simulated expected treatment failures and the standard deviation. The RPWrule was started with five balls of each type in the urn. The DL rule had five type A balls, five type B balls, and one type 0 ball. The DL rule is better than the RPW rule in every case, having slightly larger power and fewer expected treatment failures. We see that the DL rule preserves power quite adequately over complete randomization, and in every case
128
IMPLICATIONS FOR THE PRACTICE OF CLINICAL TRIALS
Table 8.2 Asymptotic (A) andsimulated (S) variance of the allocationproportions N A( n ) / n (multiplied by n) for the procedures in Table 8.1. Simulations based on n = 100 and 1000 replications. From Hu and Rosenberger (2003), reprinfed withpermissionfrom The Journal of the American Statistical Association, copyright 2003 by the American Statistical Association. all rights reserved,
(0.8,0.8) (0.8,0.7) (0.7,0.5) (0.7,0.3) (0.5,0.5) (0.5,0.2) (0.2,0.2)
A
S
A
S
A
S
A
S
N/A N/A 1.33 0.63 0.75 0.35 0.20
2.29 1.90 0.90 0.51 0.65 0.34 0.19
1.00 0.72 0.35 0.21 0.25 0.13 0.06
0.50 0.45 0.32 0.18 0.23 0.12 0.06
1.63 1.20 0.64 0.42 0.50 0.31 0.22
1.27 1.14 0.66 0.45 0.52 0.32 0.26
1.13 0.82 0.41 0.25 0.30 0.16 0.09
1.15 0.79 0.43 0.27 0.31 0.17 0.11
results in fewer expected failures, ranging from approximately 1 to 6 fewer expected failures. While these reductions may not be dramatic, they are desirable in clinical trials where treatment failures are particularly undesirable. It is clear from these results that there is little reason to use the RPW when the DL rule is available. The DBCD targeting RSIHR allocation is simulated in Table 8.4. Simulation results show that y = 2 is a good trade-off that yields almost the same results as y = 00 but is slightly better than the sequential maximum likelihood procedure (y = 0). In every case, the DBCD with y = 2 has the same or slightly better power than complete randomization, with a smaller expected number of treatment failures. The DBCD targeting RSIHR allocation is slightly more powerful than the DL rule, but the DL rule reduces the expected treatment failures slightly more when the success probabilities are relatively high. The DBCD has fewer expected treatment failures when success probabilities are low. In conclusion, it appears that when ethical considerations dictate that a modest 2-5% reduction in expected treatment failures is desirable, the DL rule and the DBCD (7 = 2) targeting RSIHR allocation are viable options. Both preserve power and reduce expected treatment failures accordingly. We now return to the formulation in Section 7.3 for K > 2 treatments. We are most interested in a response-adaptive randomization procedure that will target the optimal allocation when w = q. While a closed-form solution is an open problem, Tymofyeyev, Rosenberger, and Hu (2006) simulate the smoothed solution and implement it with the doubly-adaptive biased coin design and y = 2. These simulations (not shown here) on power and expected treatment failures show that the benefit is considerably more compelling for K = 3 than for K = 2.
CONTINUOUS RESPONSES
129
Table 8.3 Simulatedpower and expected treatment failures (S.D.)for complete randomization and two response-adaptiverandomizationpmcedures(RP W randomizedplay-the-winner; DL: drop-the-loser). 10,000 replications (a = 0.05 two-sided). The sample size was selected that yielded simulated power of approximately 0.90 under complete randomization. From Rosenberger and Hu (2004), reprinted by permissionfrom Socieg for Clinical Trials.
Complete PA
PB
n
0.9 0.9 0.9 0.9 0.7 0.7 0.5 0.3 0.2
0.3 0.5 0.7 0.8 0.3 0.5 0.4 0.1 0.1
24 50 162 532 62 248 1036 158 532
RPW Rule
DL Rule
Power
Failures
Power
Failures
Power
Failures
90 90 90 90 90 90 90 90 90
lO(2.4) 15(3.2) 32(5.1) 80(8) 31 (4.0) 99(7.8) 570(16) 126(5.1) 452 (8)
87 87 88 89 88 89 89 89
7 (2.4) 12 (3.2) 28 (5.4) 75 (9) 28 (4.3) 94 (8.2) 565 (16) I25 (5.4) 451 (8)
90 89 89 89 89 89 89 90 90
7 (1.8) 12 (2.6) 27(4.6) 73 (8) 27 (4.1) 93 (8.0) 565(16) 124(5.3) 451 (8)
90
8.3 CONTINUOUS RESPONSES Zhang and Rosenberger (2006) compare the four procedures described in Section 7.2 both theoretically (comparing the asymptotic variances) and by simulation. Three of the procedures, the sequential maximum likelihood procedure ( i x . , the DBCD with y = 0), Zhang and Rosenberger’s (2006) procedure, and Biswas and Mandal’s (2004) procedure have similar asymptotic variances, and are all directly comparable. However, Bandyopadhyay and Biswas’s procedure has variability several magnitudes larger, and consequently has very low power. The lower power was noted, in fact, in their original paper (Bandyopahdyay and Biswas, 2001). They also compute the total expected response, n ( p p ~ (1 - p ) p ~ )and , compare the remaining three procedures to determine which is the most ethically appealing. In that regard, both Rosenberger and Zhang’s procedure and Biswas and Mandal’s procedure appear to be the best. Based on these theoretical considerations, they then simulate the four procedures with moderate sample sizes to determine if the asymptotic relationships hold in practice. They chose different combinations of parameters P A , p ~ U A, , UB. For Zhang and Rosenberger’s procedure, y = 2 was used; in Biswas and Mandal’s procedure, c = 0 was chosen, and in Bandyopadhyay and Biswas’s procedure, T = 2 was chosen, following the original papers. Table 8.5 shows the simulated total expected responses and corresponding power. Sample sizes are selected such that complete randomization gives 80% power. For total expected reponse, we can see that they agree with our theoretical discussion.
+
130
IMPLICATIONS FOR THE PRACTICE OF CLINICAL TRIALS
Table 8.4 Simulated power and expecied treatment failures (S.D.) for complete randomization and the DBCD with y = 0,2,03, 10000 replications (a = 0.05, two-sided). The sample size was selected that yielded simulated power of approximately 0.90 under complete randomization. From Rosenbetger and Hu (2004). reprinied by permissionfrom Sociev for Clinical Trials.
7=0
Complete PA PB
0.9 0.3 0.9 0.5 0.9 0.7 0.9 0.8 0.7 0.3 0.7 0.5 0.5 0.4 0.3 0.1 0.2 0.1
7=2
7=m
n Power Failures Power Failures Power Failures Power Failures 24 50 162 532 62 248 1036 158 532
90 90 90 90 90 90 90 90 90
10 (2.4) 15 (3.2) 32 (5.1) 80 (8) 31 (4.0) 99 (7.8) 570 (1 6) 126 (5.1) 452 (8)
89 90 90 90 90 90 90 89 90
8 (2.1) 91 14 (2.9) 91 31 (4.9) 90 79 (8) 91 29 (3.9) 90 97 (7.6) 90 567 (16) 90 122 (5.5) 90 448 (9) 90
8 (1.7) 13 (2.6) 31 (4.8) 79 (8) 28 (3.5) 97 (7.5) 567 (16) 122 (5.4) 448 (9)
92 91 91 91 90 90 90 90 90
8 (1.5) 14 (2.5) 31 (4.8) 79 (8) 28 (3.4) 97 (7.4) 567 (16) 122 (5.3) 448 (8)
In all cases Procedure I1 has a smaller total expected response than Procedure V and in most cases, has a smaller total expected response than Procedures I and 111. Procedures I and Ill have a larger total expected response than Procedure V. Procedure IV reduces the total expected response most. On the other hand, the large loss of power of Procedure IV is obvious. For example, in the second row, while all other procedures have power at least SO%, the power of Procedure IV is just 53%. The other three procedures have similar power and are usually more powerful than Procedure V. We remark that in Procedures 111 and IV, there is no clear guideline on how to choose constants T and c. But these constants can have a dramatic effect on the performance of the procedure. For example, in Procedure 111, if the standard deviations are small, then a small change in c will lead to a large change in the allocation probability. In conclusion, the Zhang and Rosenberger and the Biswas and Mandal procedures provide the best balance between power and ethical concern. For the latter, careful selection of c is an important consideration. Both procedures increase power over complete randomization, which should provide for a smalier requisite sample size. Bandyopadhay and Biswas’s procedure should not be used because of the large loss of power.
THE EFFECTS OF DELAYED RESPONSE
131
Table 8.5 Power and total expected response (parentheses)forfive procedures with 10000 replications. I: Sequential maximum likelihood; II: Zhang and Rosenberger; Ill SBiswas and Mandal; IV: Bandyopadhyay and Biswas; V: complete randomization. Reprintedfvom Zhang and Rosenberger (2006).
(88, 13, 15,4.0,2.5) (88, 13, 15,2.5,4.0) (350, 14, 15,4.0,2.5) (350, 14, 15,2.5,4.0) (350, 16, 15,4.0,2.5) (350, 16, 15,2.5,4.0) (88, 17, 15,4.0,2.5) (88, 17, 15,2.5,4.0)
8.4
0.82 (121 1) 0.83 (1254) 0.82 (5034) 0.82 (51 16) 0.83 (5466) 0.83 (5383) 0.82 (1429) 0.82 (1387)
0.82 (1207) 0.80 (1231) 0.83 (5031) 0.79 (5076) 0.80 (5425) 0.83 (5382) 0.81 (1407) 0.82 (1383)
0.82 (1210) 0.82 (1253) 0.82 (5034) 0.81 (51 16) 0.82 (5466) 0.82 (5384) 0.82 (1429) 0.82 (1387)
0.77 (1 177) 0.53 (1 169) 0.82 (5010) 0.72 (4993) 0.73 (5342) 0.82 (5362) 0.51 (1346) 0.77 (1352)
0.80 (1232) 0.79 (1232) 0.79 (5075) 0.80 (5075) 0.80 (5425) 0.80 (5424) 0.80(1408) 0.79 (1408)
T H E EFFECTS OF DELAYED RESPONSE
Zhang and Rosenberger (2006) also compare the effects f delayed respon e on the power and total expected responses for their procedure. They assume that the time patients are randomized into the trials follows a uniform distribution on [0,1] with arbitrary unit, and their delays in responses are distributed as exponential with parameters @A and BE. Response-adaptive randomization starts when each treatment has at least two responses for appropriate estimation of parameters. Several different combinations of 8’s are selected. Table 8.6 gives the simulation results for p.4 = 1 3 , =~ 1 ~ 5 , =~ 4~ . 0 , = ~ ~2.5,n = 88. There is no delay when 6, = 0 , i = A, B. We can see that the delay in responses has little effect on the power. Allocation proportions are slightly larger when there is no delay than when there is delay. On the other hand, the variance of allocation proportion is also larger, because the allocation probability is changed whenever a new patient enters. The delay effect on total expected responses is smaller with a larger variance when there is no delay than when there is delay. As the average delay increases, the variance of the allocation proportion and the total expected responses also increase. When the delays on two treatment are different, the effect depends on the treatments. In the simulation, since PA < p ~if ,a patient on treatment A has a larger delay, then the allocation proportion is smaller with smaller variance, and the total expected response is larger with smaller variance. The simulation indicates that when the delay is moderate (when 60% or more of already randomized patients’ responses are available), the power of the trial is negligibly affected. The allocation proportion skewing is reduced with reduced variability. The total expected response is accordingly increased with reduced variability. But all these changes are marginal.
132
IMPLICATIONS FOR THE PRACTICE OF CLINICAL TRIALS
Table 8.6 Eflect of delayed response on the allocationproportions and the total expectedsesponse (denoted TER).using the Zhang and Rosenbergerprocedure with 10000 seplications. Reprintedfsom Zhang and Rosenberger (2006). 6 A 6B
8.5
Power E ( N ~ ( r t ) / nV) U T ( N A ( ~ ) /T~E)R V u r ( T E R )
0.0 0.0 0.82
0.65
0.0063
1205.43
194
0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.4
0.82 0.82 0.82 0.82
0.64 0.64 0.63 0.63
0.0033 0.0034 0.0037 0.0040
1207.31 1208.03 1208.27 1208.88
101 106 113 123
0.1 0.2 0.82 0.2 0.3 0.82
0.64 0.64
0.0034 0.0035
1207.18 1207.74
104 109
0.2 0.1 0.82 0.3 0.2 0.82
0.64 0.63
0.0032 0.0035
1208.04 1208.46
100 108
0.2 0.4 0.82 0.2 0.6 0.82
0.64 0.64
0.0038 0.0043
1207.69 1207.51
119 133
0.4 0.2 0.82 0.6 0.2 0.83
0.63 0.63
0.0036 0.0038
1209.33 1210.36
112 119
CONCLUSIONS
We conclude, based on these limited simulation results, that response-adaptive randomization preserves power and often results in lower treatment failure rates. The resulting effect is conservative,and the relative severity ofthe disease and the outcome will certainly be factors in deciding whether to use response-adaptive randomization. When more than two treatments are compared, the benefits seem to be magnified, and more patients will benefit from the use of response-adaptive randomization. Finally, based on limited simulations, the effect of delayed response seems minimal and validates our theoretical results in Section 7.1. 8.6
REFERENCES
BANDYOPADHYAY, U. AND BISWAS,A.. (2001). Adaptive designs for normal responses with prognostic factors. Biometrika 88 409-41 9. AND MANDAL, S. (2004). Optimal adaptive designs in phase 111 clinical trials for continuous responses with covariates. In mODa7-Advances in Model-Oriented Design and Analysis (Di Bucchianico, A., Lauter, H., and Wynn, H. P., eds.). Physica-Verlag, Heidelberg, 51-58. F. AND ROSENBERGER, w. F. (2003). Optimality, variability, power: eval-
BISWAS,A.
HU,
REFERENCES
133
uating response-adaptive randomization procedures for treatment comparisons. Journal of the American Statistical Association 98 67 1-678. ROSENBERGER, W. F. AND Hu, F. (2004). Maximizing power and minimizing treatment failures in clinical trials. Clinical Trials 1 141-147. ROSENBERGER, W. F. AND LACHIN,J. M.(2002). Randomization in Clinical Trials: Theory and Practice. Wiley, New York. TAMURA,R. N., FARIES, D. E., ANDERSEN, J. S., AND HEILIGENSTEIN, J. H. (1994). A case study of an adaptive clinical trial in the treatment of out-patients with depressive disorder. Journal of the American Statistical Association 89 768-776. TYMOFYEYEV, Y., ROSENBERGER, W. I?., AND Hu, F. (2006). Implementing optimal allocation in sequential binary response experiments. Submitted. ZHANG,L. A N D ROSENBERGER, W. F. (2006). Response-adaptive randomization for clinical trials with continuous outcomes. Biornetrics 62 562-569.
This Page Intentionally Left Blank
9 Incorporating Covariates
9.1
INTRODUCTION AND EXAMPLES
With the complexity of modem clinical trials, information on sometimes thousands of covariates is collected in addition to information on the primary outcome. The problem of heterogeneity in response-adaptive randomization has been pointed out (e.g.,Rosenberger, 1996). It may not be acceptable to base the allocation probabilities only on responses of previous patients if those patients have different characteristics, particularly when those characteristics may be correlated with the primary outcome. For this reason, we describe the concept of covariate-adjusted response-adaptive (CARA) randomization in this chapter. These procedures calculate the allocation probabilities on the basis of previous responses and the current and past values of certain known covariates of the patients. CARA randomization differs from covariate-adaptive randomization in its goals. The latter is interested in equalizing the distribution of certain known covariates across treatments in order to improve the comparability of treatment groups. In this section, we will describe several examples of each type of randomization procedure. 9.1.1
Covariate-adaptive randomization procedures
The prime concern of the following examples is to balance the distribution of certain known covariates across treatment groups. These allocation procedures are not response-adaptive. The first three examples consider covariates that are divided into different strata. In general, let 2 1 , ...)2, be the covariate vectors ofpatients 1, ...)n. Further, we assume that there are I covariates of interest (continuous or otherwise) 135
136
INCORPORATING COVARIATES
'
and the are divided into ni levels, i = 1, ...,I . The total number of strata are = ni, which becomes quite large even for small I. Most of the procedures described focus on marginal balance, where balance is maintained at the C:=ln, separate levels rather than at the stratum-specific level.
s
n,=,
EXAMPLE 9.1. (Zelen, 1974.) Zelen's rule uses a preassigned randomization sequence (by either complete randomization or some restricted randomization design), ignoring covariates. Let N j k ( n )be the number of patients in stratum j = 1, ...,S on treatment k = 1 , 2 . When patient n in stratum j is ready to be randomized, one computes D j ( n - 1) = N j l (n- 1)- N j 2 (n- 1).For an integer c, if 1 0 3 (n- 1)1 < c, then the n-th patient is randomized according the preassigned randomization sequence, otherwise, the patient receives treatment 1 according to = E(T,1lTn-1,2,)= 1, if = 0, if
D j ( n - 1) 5 -c, D j ( n - 1) rc.
Zelen proposes that c = 2 , 3 , or 4. He also proposes randomizing the value of c for each new patient to increase the randomness of the allocation. For example, one can use the distribution Pr(c = i) = 1/4, i = 2 , 3 , 4 , 5 .
EXAMPLE 9.2. (Pocock and Simon, 1975.) See Section 1.1.4. EXAMPLE 9.3. (Wei, 1978.) Wei roposed a different procedure using urns. At the beginning of the trial, each of ki urns contain a1 balls of type I and a2 balls of type 2. Each urn represents a particular level of a particular covariate. Let Y , j k ( n ) be the number of balls of type k in the urn representing level j of covariate zi after n patients have been randomized. Let Dij(n) = Y,jl(n)- Kj2(n). Patient n 1 is to be randomized with observed covariate vector ( z 1 , ...,z k ) . Select the urn such that Dizi( n )is maximized. Draw a ball and replace. If it is a type 1 ball, assign the patient to treatment 1, otherwise to treatment 2. The procedures in Examples 9.1-9.3 are arbitrary in the sense that they are developed intuitively rather than based on some optimal criterion. Atkinson (1982) proposed an allocation rule based on optimizing the efficiency of estimating parameters from a covariate-adjusted linear model. The idea of using a linear regression model to relate the covariate and treatment effects and create a randomization procedure is the basis of C A M randomization procedures, which will be discussed shortly. EXAMPLE 9.4. (Atkinson, 1982.) For illustration, we only consider the linear model Xi= Zip' e i , where ei are independent with u2 = var(ei) and p is ap-vector. The variance of the least square estimator of p is
2
+
+
where 2 is the design matrix. Optimum design of experiments is concerned with choosing the design matrix Z which minimizes different functions of the variancecovariance matrix of 0 (e.g.,Atkinson and Donev, 1992).
INTRODUCTION AND EXAMPLES
137
One well-studied criterion is that of D-optimality in which the determinant 2‘2 is maximized. This minimizes the generalized variance of b. Now let represent the general design measure and define
<
d(z,<) = a2z(Z’Z)-’x’and J(<) = s u p r E z d ( x , < ) ,
where 2 is the design region of x . The G-optimal design is the one which minimizes
JW.
The General Equivalence Theorem of Kiefer and Wolfowitz (1960) shows the equivalence of D-optimality and G-optimality. This equivalence is used to generate sequential D-optimal designs. Given a design with m design points (with design matrix &), the (m 1)-th point is added at the point x with maximum d ( x , &). The resulting sequential design is nonrandomized. Atkinson (1982) suggests a randomized version of the procedure, although it is not optimal in a formal sense. Consider a K treatment clinical trial. The design region 2 consists of K points, the k-th of which corresponds to the k-th treatment. Let the corresponding value of d(z,<) be d(k, t).The D-optimum biased coin design is then to choose treatment k for the n-th patient with probability
+
In some applications, only some linear combinations of the parameters A‘P are of interest (see examples from Atkinson, 1982, and Atkinson, 1999), where A is an s x p matrix of rank s < p. The variance-covariance matrix of A’$ is then a2A’(Z’Z)-’A.The analogue of D-optimality is to maximize the determinant o - ~ ( A ’ ( Z ’ Z ) - ’ A ) -this ’ ; is the DA-optimality criterion. For DA-optimality, the analogue of d(z,<) is the quantity dA(x, <) = z( Z ’ Z )
-’A(A’(z’z)-’ A)-’ A’(2’2)-‘z’.
Sequential DA-optimum designs are generated by choosing the next design point with the maximum of dA(%,<). Let the corresponding value of dA(z,<) be d ~ ( k , [ ) The . DA-optimum biased coin design is then to assign treatment k to the n-th patient with probability
This example leads naturally to the concept of C A M randomization. With a linear model, the D-optimal design can be computed and implemented under the assumption of homoscedastic variances. The resulting design will force balance among the treatments with respect to the covariates, and hence the resulting designs will be analogous to that of Examples 9.1-9.3. Only the imposition of the different randomization procedures will result in potentially different allocations. Atkinson (1982) considered only homoscedastic linear models. But if one were to introduce
138
INCORPORATING COVARIATES
heteroscedasticity through a model that is nonlinear in the parameters, such as logistic regression, the resulting D-optimal design would no longer balance covariates across treatments. The resulting design would also be a function of the unknown regression parameters and could not be implemented in practice. Hence, to accommodate these designs, the unknown parameters must be sequentially estimated, and thus the procedure becomes response-adaptive because the parameter estimates will be functions of previously observed responses and covariate values. In fact, they are therefore covariate-adjustedresponse-adaptiverandomization procedures. Imposing a CAR4 procedure to optimize the efficiency of parameter estimates will not provide an optimal procedure because of the introduction of randomization. Also, the resulting allocation may result in assigning more patients to the inferior treatment, much like the analogous situation in the case without covariates, Neyman allocation. Consequently, we must again be concerned about the trade-offs among balance, efficiency, randomization, and ethics, which has essentially been the subject of this book. 9.1.2
CARA Randomization Procedures
We now give several examples of CARA randomization procedures. The key component of these procedures is that a patient’s covariate vector is observed, and then randomization is made according to an allocation function that depends on all previous responses, all previous patients’ covariates, and the current patient’s covariate vector. There have been only two or three papers dealing with this CARA randomization, and the theoretical results have been largely elusive.
EXAMPLE 9.5. (Rosenberger, Vidyashankar, and Aganval, 2001 .) See Section 1.1.5. EXAMPLE 9.6. (Bandyopadhyay and Biswas, 2001.) This example extends Example 5.4 to incorporate covariates. For comparing two treatments in a clinical trial, suppose the responses follow a linear model
X , = p1T2,1+ pz(1- T,J)
+ Z,P + el,
where e, are independent with cr’ = var(e2), is a p-vector, p1 and 112 are treament effects, and 2, are the corresponding covariates. Estimates fijl and ji,z are computed based on the first j observations. Then the n-th patient is assigned to treatment 1 with probability
4nl = 9
pn-1,1
- Pn-I,’
where T is some constant and fi,-l,j is the estimate of pg based on n - 1patients. It is easy to see that the allocation does not depend on Z,, although it does depend on &-I, Therefore it is not a CARA randomization procedure in the sense that we have defined it.
EXAMPLE 9.7. (Gwise, 2005.) In his thesis, Gwise has considered D-optimal C A M randomization procedures for comparing two treatments with a binary co-
GENERAL FRAMEWORK AND ASYMPTOTIC RESULTS
139
variate. Assume that
where zi is a binary variable taking the values 0 or 1. We assume that the errors ei(k, zi)are independent and normally distributed with mean 0 and variance u i l ,k = 1,2; 1 = 0 , l . When uil is known, then the D-optimal design can be computed exactly, but it is nonrandomized. Using Atkinson’s (1982) approach, we can define the following randomization procedure:
where N k ~ ( nis)the number ofpatients assigned to treatment k = 1 , 2 with covariate value 1 = 0 , l after n patients in the trial. When the ail’s are unknown, we can estimate them based on the n observations and substitute the appropriate estimates into (9.1). 9.2
GENERAL FRAMEWORK AND ASYMPTOTIC RESULTS
Zhang et al., (2006) propose a general framework for C A M randomization procedures. Consider a clinical trial with K treatments, and assume that a patient with a covariate vector 2 is assigned to treatment k, k = 1,. ,K, and the observed response is xk. Assume that the responses and the covariate vector satisfy the following model:
..
.
where q,+(.,.), k = 1,... ,K, are some given functions, O k , k = 1,. . ,K, are unknown parameters, and ~k c I R is ~ the parameter space of O k . Write 8 = (61,. . . , O K ) and 0 = O1 x . . x O K .This model is very general, and includes the generalized linear models of McCullagh and Nelder (1989) and certain nonlinear models as special cases. We will discuss the special case of generalized linear models, which will be the most useful in applications, in Section 9.3. As in Chapter 1, T, = (Tml,...,T m ~represents ) the assignment of treatment to the m-th subject. If the rn-th subject is allocated to treatment k,then all elements in T, are 0 except for the k-th component, T m k = 1. Let { Xmk, k = 1, , K , rn = 1 , 2 . . .} be the responses and {Z,, m = 1 , 2 . . .} be the corresponding covariates. Thus X,k is the response of the rn-th subject to treatment k, k = 1,. ,K and 2 , is the covariate of the rn-th subject. In practice, only Xmk with T m k = 1is observed. Assume that { (Xml,.. .,X,K, Z m ) ,m = 1 , 2 , . . .} is a sequence of i.i.d. random vectors, the distributions of which are the same as that of (XI,.. . ,X K ,2).Denote X m = (Xml, * ,XrnK).
.
... ..
--
140
INCORPORATING COVARIATES
9.2.1 The procedure for K treatments Now we rigorously define the C A M randomization procedure. To start, assign mo subjects to each treatment by using restricted randomization. Assume that m (m 2 Kmo) subjects have been assigned to treatments. Their responses { X j , j = 1,. . . ,m }and the corresponding covariates { Z j , j = 1,. . . ,m}are observed. Let 8, = (8,1, ...)6,~) be the estimator of 8 = (81,...)OK). Here, for each k = 1,.. . ,I(, Bmk = emk(xjk, zj: Tjk= 1 , j= 1,... ,m)is the estimator of 8 k based on the observed sample { ( X j k ,Z j ) : for which Tjk = 1 , j = 1 . .. , m } of size Nmk. When the (m 1)-th subject (with corresponding covariate Z,+1) is ready for randomization, we assign the (m 1)-th patient to treatment k with a probability of & + l , k = .rrk(8,, Zm+l),k = 1 , .. . ,K . Thus,
+
dh+i,k
=
+
(Tm+i,kIFm, z,+i)
= E (Tm+i,klXm,?;n,&+i)= ~ l i ( e mZ,m + i ) ,
(9.2)
k = 1,..., K , where X, = o(X1,...,X , ) , 7, = u (Tl,...,Tm), 2, = o(Z1,. , Z,), 3, = o ( X m , l m 2,), , and nk(-,.), k = 1,...,K , are some given
..
functions. Note that unlike the formulation in Chapter 1, for mathematical convenience later, we take the current patient’s covariate vector, Zm+l out of 3,. Given Fm,the response X m f l of the ( m 1)-th subject is assumed to be independent of its assignment T,+1. Define rr(-,-) = ( T I ( - ,.), . . . ,n~(.,.)) to be theallocationfunctionthatsatisfiesnl+. ..+KK = 1. Letgk(8’) = E[nk(B*,Z ) ] . From (9.2), it follows that
+
Pr(Tm+l,k =lIXm,?-,,&)
k = L...,K.
=gk(e,),
(9.3)
Z ) = &(elz’,. .. , 6 K Z ‘ ) , REMARK9.1. In general, we can choose .rrk(e, k = 1,.. . ,K , which includes a large class of functions. Here, 0 < & ( z ) < 1, k = 1,.. . ,K , are real functions that are defined in RK with K
& ( z ) = 1 and & ( z ) = R j ( z ) whenever
zi
= zj.
(9.4)
k=l
For simplicity, it is assumed that Z and O k , k = 1,. .. K , have the same dimensions. Otherwise, slight modifications are necessary (see Section 9.3 for an illustration). In practice, the functions Rk can be defined as )
where G is a smooth positive real function that is defined in R. For example, we can define Ilk(%) = e T z k / ( e T z l ... eTzK), k = 1,. . . , K .
+ +
REMARK 9.2. When K = 2, we can let I 2 1 (21) 2 2 ) = G(z1- 2 2 ) and n2 (21, z2) = G(22- zl), where G is a real function defined on R satisfying G(0) = 1/2, G(- 2 ) = 1 - G(z) and 0 < G ( z ) < 1 for all z.
GENERAL FRAMEWORK AND ASYMPTOTIC RESULTS
141
EXAMPLE 9.5 (CONTINUED).For the logistic regression model, Rosenberger, Vidyashankar, and Agarwal(200 I ) suggested using the estimated covariate-adjusted odds ratio to allocate subjects, which is equivalent to defining &( 21, 2 2 ) = e z k /( e z l e z z ) ,k = 1 , 2 (see allocation rule (9.21)).
+
EXAMPLE 9.6 (CONTINUED).For the normal linear regression model, Bandyopadhyay and Biswas (2001) suggested choosing G(z) = @ ( z / T )where , @(.)is the cumulative normal distribution and T is a tuning parameter. 9.2.2
Main theoretical results
In this section, we present the main theoretical results for Zhang et aL's (2006) treatment of CARA randomization. This representsjust a sketch of very recent work; details, including proofs of the main theorems, have been submitted for publication. We define the following notation. Let 8' be the parameter vector of interest and 8 be the true value. For fixed 2 = x , let r(@, z) = (.lrl(8*,z),.. . ,r K ( 8 * ,x ) ) be the allocation function. Defineg(8') = (gl(8*),.. .,g K ( 8 ' ) ) and let w k = g k ( 8 ) = E(nk(8,Z ) ]be the expectation ofthe allocation function over 2,k = 1,. . . ,K, and let v = (vl,.. . , v ~ ) .We assume that 0 < v k < 1, k = 1,.. . ,K . For further discussion, we need the following conditions about the allocation function r ( 8 * z), , g(8*) and the estimator 6. ASSUMPTION9.1. We assume that the parameter space @ k is a bounded domain in Rd and that the true value 8 k is an interior point Of @ k , k = 1,.. ,I ( .
.
.
1. For each fixed x , r k ( 8 * , x ) > 0 is a continuous function of 8*, k = 1,. .,K. 2. For each k = 1,. . . , K , .lrk(O*, 2)is differentiable with respect to 8' under the expectation, and there is a S > 0 such that
where dgk/d8* = (dg/d6il,. ..,dg/i%kd). ASSUMPTION 9.2. Suppose that
2
+ o(n-1'2) as., (9.5) where hk are K functions with E [ h k ( X k , Z ) l z ] = 0,k = 1,... , K . THEOREM 9.1. If E l l h k ( X k , Z)112+E< for some E > 0, k = 1,.. .,K, then 8,k
- ek = 1
+
T m k h k ( X m k , z,) (1 o(1))
m=l
00
under Assumptions 9.1 and 9.2, we have for k = 1,.. . , K , P r ( T n , k = 1)
and
-$
2)k;
P r ( T n , k = 1lFFm-1, 2, = %)
--$
? f k ( 8 ,%) a.s.
(9.6)
142
INCORPORATING COVARIATES
and C = El
+ 2x2. Then,
f i ( N , / n - v ) + N ( 0 ,C ) and
fi(k - 8 )
+
N(O,V )
(9.8)
in distribution.
BASICOUTLINE
OF THE PROOF.
From the definition of the randomization proce-
dure, we have Tm+l,k
= Tm+l,k - ~
+
[ ~ m + l , k ~ gmk (I8 r n )
for k = 1,.. . ,K. Therefore, Nn,k
- lZVk = E[Tl,kl3b]- wk +
n
(Tmk
- E[Tmk1Fm-1])
m=l II-
1
m=l
The term x:=l(Tmk - EITmkl.Frn-l])is a martingale sum, so we can apply asymptotic properties to this term. Now
m= 1
m=l
which can be approximated by another martingale sum. Then we can use the martingale central limiting theorem to obtain (9.8). 0
REMARK9.3. Assumption 9.1 is widely satisfied in application. If both r ( 8 * z, ) and g(O*) have second derivatives with respect to 8*, then Assumption 9.1 is true. Assumption 9.2 depends on different estimation methods. When 8, is a maximum likelihood estimator from a generalized linear mode!, this condition is usually satisfied. Examples can be found in Section 9.3. If 8, is an estimator from some estimating functions (including moments estimators), Assumption 9.2 is true under some standard conditions. Theorem 9.1 provides the asymptotic properties of the allocation proportions N n / n ,the overall proportion of patients to each treatment, regardless of covariate values. In many applications, one may want to know the allocation proportions for
GENERAL FRAMEWORK AND ASYMPTOTIC RESULTS
143
a given set of covariates. Given a covariate vector z, the proportion of subjects assigned to treatment k is given by
where Nn,klzis the number of subjects with covariate z randomized to treatment k, k = 1, . ..,K , in the n trials, and N n ( z )is the total number of subjects with covariate z. Write Nnlz = ( N n , l ~ z,, .,N n , ~ l Z )The . following theorem establishes the asymptotic properties of these proportions.
.
THEOREM 9.2. Given a covariate z, suppose that P r ( 2 = z ) > 0. Under Assumptions 9.1 and 9.2, we have
k = 1,. . . , K , Nn,klz/Nn(%) + n k ( 8 , r ) a.s. and
in distribution, where
BASICOUTLINE
OF THE PROOF.
Because P r ( Z = z ) > 0,
almost surely by the law of large numbers. Further,
c n
TmkI{Zm = z } =
m=l
The first term is a martingale sum. Now
C nk(bm-l,z ) Pr(Z, n
m= 1
= z ) - nnk(e), z )Pr(Zm = z )
can also be approximated by a martingale sum. 0
(9.10)
144
INCORPORATING COVARIATES
REMARK9.4. When P r ( 2 = z ) > 0, Theorem 9.2 ensures that the conditional allocation proportion has a predetermined limit r ( 0 ,z ) . In practice, one can choose this r ( 0 ,z ) based on the desired goal. For example, one can use different r ( 0 ,z ) for male and female patients. Some choices of r ( 0 ,z ) can be found in Examples 9.5-9.7. REMARK 9.5. From the results of Theorems 9.1 and 9.2, we can see that g ( 0 ) = E n ( @2). , The overall allocation proportion g ( 0 ) is difficult to control, because it depends on the function r and the distribution of the covariate 2. When the response-adaptive randomization does not involve covariates, we can control the target allocation. For the randomization in Bandyopadhyay and Biswas (200 I), since the allocation probability of the rn-th patient &j does not depend on Z m + l , it is not a C A M randomization procedure. Therefore, the asymptotic result does not depend on the covariates.
9.3
GENERALIZED LINEAR MODELS
In this section, we consider a special case when the responses are related to the treatment effect and covariates via a generalized linear model. This leads to particular studies of the standard logistic regression model and the linear regression model. Suppose, given 2, that the response Xk of treatment k = 1,...K has a distribution in the exponential family under a generalized linear model (e.g., McCullagh and Nelder, 1989) and takes the form
with link function Pk = hlc(ze;),where ek = (ekl,..., ekd),k = 1,...,K , are parameters. Assume that the scale parameter $Jk is fixed. With this model, E [ X k l Z ]= u k ( p k ) , Var(Xkl2) = ug(p,k)$~k.We compute the derivatives as
and
Thus, given 2, the conditional Fisher's information matrix is
GENERALIZED LINEAR MODELS
145
For the observed data up to stage rn, the likelihood function is
nn
K
k=l j = 1
k= 1
m
L(O) =
K
Ifk(XjklZj, ek>l*jk
j=1 k = l K
m
with
j=1
The maximum likelihood estimator
8,
= ( 8 m 1 , .. . , bmK)
of 8 = (el,. . . ,OK)is that for which 8, maximizes L ( 8 ) over 8 E O1x Equivalently, brnk maximizes L k over Ok E @ k t k = 1 , 2 , , . , K.
.
.. x O K .
.
COROLLARY 9.1. Let vk = E[nk(O,Z)],k = 1,.. ,K. Define I k
Z)Ik(8klZ)},k = 1,.. ., K .
= Ik(8) = E{A&
(9.13)
Suppose that a:, hg are continuous and EZ'Z < 00. Further, if the matrices Ik, k = 1,2 . . ,I(, are nonsingular and the maximum likelihood estimator 8, is unique, then under Assumption 9.1, we have (i) (9.6), (9.7), and (9.8) hold with vk = I;', k = 1,.. . , K; (ii) if P r ( Z = z) > 0 for a given covariate z,then (9.10) and (9.11) hold,
.
Corollary 9.1 is a special case of Theorems 9.1 and 9.2. The proof is given in Appendix B through the verification of Assumption 9.2. For both logistic regression and linear regression, the conditions in Corollary 9.1 are satisfied.
REMARK9.6. From Corollary 9.1, one can obtain
6@ n , k - O k )
+
N
( O , ~ k { E [ ~ k ( ~ , Z ) ~ k ( ~ k l Z ) l } (9.14) -l)
...
in distribution, k = 1, ,K. It is important to note that the asymptotic variances are different from those of generalized linear models with a fixed allocation procedure. For the fixed allocation case, we have
- ek) N (0, { -+
~ i ~ ~ ( ~ ~ i ~ ) i } - l )
(9.15)
in distribution, k = 1,. . . ,K . However, when the allocation functions n k ( 8 , Z ) do not depend on 2, then T k ( 8 , Z )= gk(8) = Vk, and so (9.14) and (9.15) are the same.
146
INCORPORATING COVARIATES
Our asymptotic variance-covariance matrix of 8, is also different from that in Theorem 2 of Baldi Antognini and Giovagnoli (2005), because the allocation probabilities in their study do not depend on the covariates. In Example 9.6, the design is not covariate-adjusted, so the asymptotic properties of 8 are the same. In Examples 9.5 and 9.7, the estimators have different asymptotic properties.
REMARK9.7. When the distribution of 2 and the true value of 8 are known, the values of v = +(e, Z ) ] ,d g / d e k = E[dsr(B,z)/a8k) and I k in (9.13) can be obtained by computing the expectations, and then the values of the asymptotic variance-covariance matrices V, C , and C , can be obtained. In most situations, the distribution of Z and the true value of 6 are unknown. However, we can obtain the estimates as follows.
(c"
(a)EstimateIk byi,,k = T then the estimator of V is V , = diag
1-1
ii,:,. . . ,i,$) .
m=l
(b) Estimate
EL=, T m k I k ( d n , k l Z m ) r
1~ = 1 , 2 , . . . , K ;
and a g / d 8 k , respectively, by
(d) For a given covariate z, we can estimate C , by
c,
= diag(n(B,, %)) - 7 r ( B , , 2#{m 5 n : Zm= z }
+
% ) ' X ( B , , %)
n
When $k is unknown, we can estimate I k in the same way after replacing ?,bk with its estimator &.
REMARK9.8. In some applications, one may want to test the homogeneity among treatmcnts, that is, HO : 81 = 0 2 = ... = OK versus H
1
: not all e
k
are equal.
(9.16)
To do this, we define
eC = (01 and
-0 K,...
..
- e K ) , e = ( e n , l - @ n , K , . .. , B , , K - 1 - 8 , , K ) ,
v c= d i a g ( I ; ' ,
*C
4 . .
,IKL1) + 1'1 8 I&
GENERALIZED LINEAR MODELS
vc= d i a g ( j ; l , . .. ,iiY1)+ 1'1
147
€3
where 1 = (1,. . . ,1). By (9.Q we have
fi(8" - ec)-+ N ( O ,vc) in distribution. Thus, a natural statistic for the test of homogeneity is
nBC{V C-}1
(8")'.
According to the above central limit theorem, we know that the asymptotic distribution of n8{Vc}-'(t?C)'is x ; ~ - under ~ ) ~HO and ~ ? ~ ~ under ~ )H ~ I . The ( q ) noncentrality parameter is = nec{vc}-l(ec)'.
By some computations, we can see that K-1
n-lp =
C(ek - eK)ik(ek - eK)'- C e ( I l + . . . + I ~ ) - ~ c ; ,(9.17)
k=l
where Ce = Cf=;'(Ok - 6 K ) I k . The likelihood ratio test is asymptotically equivalent to this test. We now consider two special cases, the logistic regression model and the linear model.
EXAMPLE 9.8. LOGISTICREGRESSION
MODEL (Zhang et a/.,2006.) Consider the case of dichotomous responses. Let Xk = 1 if a subject on treatment k is a success and 0 otherwise, k = 1,.. . ,K. Let pk = q k ( e k , 2) = Pr(Xk = 112) be the probability of success on treatment k for a given covariate 2, qk = 1 - p k , k = 1,. . . , K. Assume that
lOgit(pk) = f f k -I-&z',
k = 1 , . . . ,K.
(9.18)
Without loss of generality, we assume that f f k = 0, k = 1 , 2 , . ..,K , or alternatively, we can redefine the covariate vector to be (1,Z). For each k = 1,., . ,K , let Pjk = qk(Ok, 2,). With the observed data up to stage m,the maximum likelihood estimator hmk of 8 k (k = 1 , . . ,K) is that for which 6,k maximizes
.
(9.19) j=1
The logistic regression model is a special case of (9.12) with & = 1, pk = log(pk/qk), hk(X) = 2,b k ( X k , d k ) = 0, and a k ( P k ) = - logU - P k ) = M l + e p k ) . Thus, given 2, the conditional information matrix is I k ( e k l Z ) = a:(pk)Z'Z = pkqkZ'Z. From Corollary 9.1, we have the following result. Because this result is useful in applications, we state it as a corollary.
148
INCORPORATING COVARIATES
COROLLARY 9.2. Suppose that Condition 9.2.2 is satisfied, EIJ2112+E< 00 for some E > 0, and the matrix E[Z’Z]is nonsingular. Then (9.6), (9.7), and (9.8) and I k = &(e)= E { r k ( e ,z ) p k q k z ’ z } , k = 1,.. . , K . hold with v k = Iil, Moreover, if P r ( 2 = z)> 0 for a given covariate z, then (9.10) and (9.1 1) hold. EXAMPLE9.9. LINEARREGRESSION MODEL (Zhang et af., 2006.) Suppose that the response xk of a subject to treatment k , k = 1,. ,K , and its covariate 2 satisfies the linear regression model
..
E [ X ~ ~=Zqk(ek, J z ) = ekz’, k = 1,.. , , K . For the observed data up to stage m, let Pml; minimize the error sum of squares
sk(ek)= C T ~ ~ - ekz;l2 ( xover~t j k E~~ n
k
,
m= 1
k = 1,... ,K . Here, Pmk is the least-squares estimator (LSE) of &. If the responses are normally distributed, that is, X k I c N ( p k , u i ) with link function pk = B k Z ’ , then the linear model is a special case of (9.12) with & = u;, a k ( p k ) = pE/2 and h k ( 2 ) = 2. In such a case, the LSE and the maximum likelihood estimator are identical. Here, we consider general responses. We have the following asymptotic properties from Corollary 9.1.
-
COROLLARY 9.3. Suppose that the conditions in Corollary 9.2 are satisfied, and that EI(Xk2112+t < 03 for some E > 0, k = 1,.. . K . Then (9.6), (9.7), and (9.8) hold with vk = I F ~ I ~ ~ I~ ;, t; k= , qrk(e,Z)Z’Z] and I Y ~= E { r k ( e z , ) ( x-~ B k Z ’ ) * Z ’ Z } , k = 1,.. . ,K . Moreover, if P r ( 2 = z ) > 0 for given z , then (9.10) and (9.1 1) hold. This corollary follows from Theorems 9.1 and 9.2 directly, as Assumption 9.2 (9.5) is satisfied with hk = ( X k - OkZ’)ZI,k. In fact, notice that e m k is the solution to the normal equation as
Also, {TjkZiZj- E[T’:jkZ;Zj1Fj-1]}is a sequence of martingale differences. It follows from the law of large numbers for martingales that
which, together with (9.20), implies that
TWO TREATMENTS WITH BINARY RESPONSES
149
Notice that E [ ( X j k - O k Z i ) Z j l Z j ] = 0. Then (9.5) is satisfied by hk = (xk-
elcz;)ZI,;.
REMARK 9.9. Theorem 1 of Bandyopadhyay and Biswas (2001) is a special case of the results of Corollary 9.3, in which ell = pl, OZ1 = p2,e1j = Bzj, j = 2 , . .. ,d, and the first component of Z is 1. Bandyopadhyay and Biswas (2001) only studied the consistency of N , , l / n and Pr(Tn,l = 1).
9.4
TWO TREATMENTS WITH BINARY RESPONSES
In Sections 9.2 and 9.3, we described a general framework for CAR4 randomization and studied the asymptotic properties under some widely satisfied conditions. In this section, we will apply the idea of C A M randomization to simple clinical trials comparing two treatments with binary responses. Consider two treatments, 1and 2, with binary responses (success and failure). Suppose that for a given covariate 2, the conditional success probabilities satisfy the logistic model (9.18) (a1 = a2 = 0). After observing the responses of m subjects and when the (m 1)-th subject is ready for treatment allocation, let the covariate of the (rn 1)-th subject be Z m + l . As described in Section 1 . 1 S, Rosenberger, Vidyashankar, and Agarwal (200 1) suggested that this subject be allocated to treatment 1 or 2 with a probability that is proportional to the estimated covariate-adjusted odds ratio, comparing treatments 1 and 2, that is, evaluated at Z m + l :
+
Pl(hm1, zm+1)/q1(eml, z m + 1 )
~ z ( d m , zz ,m + l ) / q 2 ( a m , 2 , z m + 1 )
-
+
ex~{hmlzk+l) exp{k,zZL+l)
’
where 8,1 and hm,2 are the maximum likelihood estimators of 81 and 02, respectively, which are evaluated by fitting the logistic model with data from all m previous subjects. Therefore the allocation probability is
This allocation rule is a special case of (9.2) with (9.2I ) and T Z = 1 - ?TI.It can easily be seen that d n l / d O l = -dm/dOZ = ~ 1 ~ 2 2 . Let Nn,k be the number of subjects assigned to treatment k in n trials, Nn,klr be the number of subjects assigned to treatment k in n trials for a given covariate vector z , k = 1,2, and Nn(t) be the total number ofsubjects with covariate t.According to Corollary 9.2 and by some calculations, we have the following result.
COROLLARY 9.4 Let Ik = E[nkpkqkZ’Z], k = 1,2. Under the conditions in Corollary 9.2, the following are true.
INCORPORATING COVARIATES
150
(a) For the allocation proportion N n , ~ / n , J;I(Nn,l/n - 211)
3
“0, o f )
in distribution, where w1 = E[n1(81,82,Z ) ]and 0;
= Wl(1 - Wl)
+ 2 [E[n1n2Z](I;l + lY1) (E[n1n22])’] ;
To evaluate the advantage of our proposed procedure, we compare the covariateadjusted design with the corresponding design that ignores the covariate information. Before launching the comparative study, we need to provide the corresponding asymptotic properties for the latter design. Let p ; = EIpl(&, Z ) ]and p; = E[p2(82, Z)] be the average success probabilities of a subject who is being given treatments 1 and 2, respectively. Write q; = 1 - p ; and q; = 1- p;. Here, we describe the allocation procedure. For the (m+ 1)-th subject, the assignment of treatment is carried out according to the maximum likelihood estimators fiLk of p i , k = 1,2, based on the responses of the previous rn subjects. Let 4 i l = 1- Ftl and 4&:2 = 1 - Fm,2. The (rn l)-th subject is then assigned to treatment A with a probability of
+
where
is a function of p ; and p i .
THEOREM 9.3. Let n; = 1 - n;,V i = n;(p;,pz)and
Under the assumption of Corollary 9.4, we have the following results. , have Jii(Nn,~/n- w;) --t N ( 0 ,.y2) (a) For the allocation proportion N n , ~ / nwe in distribution, where aT2 = n;(l - n;) 20;.
+
(b) Given a covariate z with P r ( Z = z ) > 0, we have
TWO TREATMENTS WITH BINARY RESPONSES
151
in distribution, where w ; , ~= v;, (0;1~)~
+
= A;(I - 7 ~ ; ) 26; P r ( Z = z)
and p,qy = Pr(success 12 = z and treatment k), k = 1,2.
To compare the covariate-adjusted allocation rule (9.2) with the non-covariateadjusted rule (9.22), we consider the following simple case. Suppose that the covariate vector 2 has two possible values, (1,O)and ( 1 , l ) . Let p = P r ( 2 = (1,O)). In this case, the statistical model (3.1) is logit(pl(O1, 2))= 812’ and logit(p2(02,2))= 022’, whereel = (e11,e12) and02 = (e21,e22). Letplo = p l p l , z = (i,o)),pl1 = P I ( & , Z = ( 1 , 1 ) ) , ~ 2 0= 2482, Z = (LO)), andp21 = p 2 ( O 2 , Z = (1,l)). Write A1i
=
Pli(1 - P2i) Pl41 - P Z i ) PZi(1 - Pli) ’
+
i = 0,1,
where Pi = p l o p pii(1 - PI. P; = PZOP ~ 2 1 ( 1- P). When the covariate information is used in the response-adaptive design, we have the following results based on Corollary 9.1:
+
V I = x10p
+
+ rill( 1 - p ) and 0: = vl(1 - v1) + 2[~7;2~p + aT1(l- p ) ] .
Forthecovariate-adjusted case(fromCorollary9. I ) , w l l ~ = ( l , o ) = 7 ~ 1 0and o~Ic=(l,o) = nlo(1 - a l 0 ) + 2 a ~ ~Inaddition,vl(c,(l,l) . = 7 ~ 1 1anda$E=(lll)= 7r11(1 -rill)+ 2 4 , . Further, for the case without adjustment for covariates (from Theorem 9.3), vi = 7 ~ ;and oi2 = ni(1 - 7 ~ ; )+ 2 a i . In this case, v;lc=(l,o) = = 7ri, ,o) = 7 ~ (1 ; -7 ~ );+ 2 p 4 +2P, and .;(=( 1,1) = Ti (1 -Ti) +2( 1-P).; -2P1 where
a;i=(,
We use two randomized procedures to randomize subjects. The first is (9.2) with is given in (9.21) (referred to as C), in which the covariate information is
?rk, which
152
INCORPORATING COVARIATES
Table 9.I Asymptotic means and variances (muhiplied by N , (2)) of allocalionproportions Nn,IIz/Nn(2) of a given Z for the two designs (Pr( Z = (1,O)) = 1/2). = (1,0),NC Z = (1, I), C Z = (1, I), NC (pio,pzo,pii,pzi) Z = (1,0), C (0.9,0.6,0.8,0.6) (0.9,0.7,0.8,0.6) (0.9,0.7,0.6,0.8) (0.6,0.4,0.6, 0.4) (0.6,0.4,0.4, 0.6) (0.5,0.5,0.5,0.5) (0.4,0.2,0.1,0.3)
(0.86, 1.39) (0.79,2.15) (0.79,2.15) (0.69, 1.99) (0.69, 1.99) (0.50,2.25) (0.73,2.45)
(0.79, (0.75, (0.50, (0.69, (0.50, (0.50, (0.50,
1.00) 1.21) 1.65) 1.10) 1.25) 1.25) 1.65)
(0.73,2.08) (0.73,2.08) (0.27,2.08) (0.69, 1.99) (0.31, 1.99) (0.50,2.25) (0.21,3.37)
(0.79,0.97) (0.75, 1.1 I ) (0.50, 1.52) (0.69, 1.10) (0.50, 1.25) (0.50, 1.25) (0.50, 1.52)
used. The second is (9.22) (referred to as NC), in which the covariate information is not used. For the two allocation rules, Table 9.1 provides the asymptotic means and associated variance of the allocation proportions N n , l l z / N n (2 ) for a given covariate 2 = (1,O) or 2 = (1,l). CAR4 randomization assigns more subjects to the better treatments for each given covariate 2,whereas the other design does not display this characteristic. By using CARA randomization, we can allocate more subjects to the better treatment and therefore have a higher probability of success in the trial. For illustrative purposes, let us consider the case with p10 = 0.4, p20 = 0.2, p11 = 0.1, and p z l = 0.3. When 2 = (l,O), CARA randomization allocates about 73% of subjects to treatment 1, because treatment 1 (plo = 0.4, pzo = 0.2) is much better than treatment 2 for 2 = (1,O). At the same time, CARA randomization allocates only about 21% of the subjects with 2 = ( 1 , l ) to treatment 1. This reflects the fact that p11 = 0.1 and p21 = 0.3. The NC design allocates about 50% of the subjects to each treatment, regardless of the values of the covariates. We can calculate the average success rate as follows. For the CAR4 design, it is 0.5 * (0.73 * 0.4 0.27* 0.2) 0.5 * (0.21 * 0.1 0.79 * 0.3) = 0.302, and for the NC design, it is only 0.5 * (0.5 * 0.4 0.5 * 0.2 0.5 * 0.1 0.5 * 0.3) = 0.25. When Table 9.1 is repeated for different success probabilities and Pr( 2 = (1,0)), similar conclusions are obtained and are hence not reported here. The allocation rule of Rosenberger, Vidyashankar, and Aganval focuses mainly on the goal of assigning more subjects to the better treatment. However, in practice, there are many other goals in defining an allocation rule. The following two remarks provide some illustrations. For adaptive urn models, Coad and Ivanova (2001) discussed some other allocation functions.
+
+
+
+ +
+
REMARK9.10. For the example that is considered in Table 9.1 with a binary covariate (2= ( 1 , O ) or 2 = (1, l)), we can consider different allocation schemes. For example, we can use RSIHR allocation. Suppose that lj~o(m), ljll(m),ljzo(m), and p ~ (1m )are the estimators ofp10, p 1 1 , p20 and pzl based on the responses of the
TWO TREATMENTS WITH BINARY RESPONSES
153
first m subjects. We then assign the (m+ 1)-th subject to treatment 1with probability
By using this CARA randomization procedure, we can achieve the optimal allocation proportions for a given 2 = (1,O) or 2 = (1, l),respectively. Further, for a given 2 = (1,O) or 2 = (1, l), we can study the power of this design using the ideas in Hu and Rosenberger (2003).
REMARK 9.11. Baldi Antognini and Giovagnoli (2005) proposed a two-treatment @-optimaldesign for (9.12) when there are no covariates. They suggested allocating a subject to treatment 1 with a probability
whereM(/i1,p2,41,&) = d i a g ( r l l ( ~ i , ~ i ) , ~ l 2 ( ~ 1 2I,k~( ~2k),)d,k ) ,k = 1,2, are the Fisher’s information matrices and f i k and are maximum likelihood estimators of p,k and &, k = 1,2. Here, @ is a continuous bounded and strictly convex function defined on the set of variance-covariancematrices. A parallel idea can be applied to define a covariate-adjusted design by using the conditional Fisher’s information matrices instead. For binary responses following the logistic regression model, the conditional Fisher’s information matrices are I k ( & ( Z ) = pkqkZ‘Z 0: P k q k , k = 1,2. A covariate-adjusted @-optimaldesign can now be defiFed as follows. Allocate the (m 1)-th subject to treatment l with probability rl(Oml, Om,2, &+I), where
+
m/(m + m). Asymptotic
If the criterion is A-optimality, then 711 = properties can also be derived from Corollary 9.2.
9.4.1
Power
In this section, we compare the power of the allocation rules. Consider the null hypothesis Ho : p i 0 = p20, pi1 = p21, which can easily be derived from the general form (9.16). The likelihood ratio test is used with level of significance 0.05. The values of sample size, n (tabulated in the last column of Table 9.2), are chosen such that the simulated power of complete randomization is approximately 0.9. The simulation study was conducted with 10000 replications. In addition to simulated power, the average success proportions p are also tabulated in Table 9.2. Four different allocation procedures are used as follows: 1. The allocation rule of Rosenberger, Vidyashankar, and Agarwal(2001) (9.21).
2. The allocation rule defined in Remark 9.10.
154
INCORPORATING COVARIATES
Table 9.2 Simulated average success proportion (power)for the four procedures defined in Section 9.4.1. (PlO,PZO, Pll, PZl)
1
2
3
(0.9,0.6,0.8,0.6) (0.9,0.7,0.8,0.6) (0.9,0.3,0.8,0.3) (0.9,0.3,0.3,0.8) (0.6,0.4,0.6,0.4) (0.6,0.4,0.4,0.6) (0.6,0.3,0.4,0.2) (0.5,0.2,0.2,0.5) (0.4,0.2,0.1,0.3)
0.788(0.825) 0.795(0.844) 0.715(0.834) 0.7 16(0.832) 0.535(0.873) 0.535(0.879) 0.428(0.864) 0.416(0.858) 0.290(0.867)
0.736(0.900) 0.756(0.893) 0.616(0.902) 0.618(0.901) 0.5 lO(0.896) 0.5 lO(0.900) 0.393(0.900) 0.376(0.900) 0.268(0.897)
n
4 ~~
0.74 l(0.889) 0.762(0.885) 0.585(0.888) 0.586(0.885) OSOO(0.896) OSOO(0.899) 0.369(0.896) 0.339(0.894) 0.238(0.895)
~
~~~
0.726(0.896) 0.750(0.893) 0.574(0.900) 0.575(0.897) 0.500(0.898) 0.500(0.901) 0.375(0.903) 0.350(0.902) 0.250(0.899)
142 218 36 36 313 313 176 124 222
3. The A-optimal rule defined in Remark 9.1 1. 4. Complete randomization. From the simulation results in Table 9.2, the powers of Procedures 2,3, and 4 are quite similar. Among Procedures 2, 3, and 4, the success proportions are higher for Procedure 2. For instance, for (p10, p20,pll,p21) = (0.9,0.3,0.8,0.3), the success proportion of Procedure 2 is 0.616, which is 5.3% higher than that ofprocedure 3 and 7.3% higher than Procedure 4. Whenpli +p2i < 1, i = 0, 1, the success proportions of Procedure 3 are lower than those of Procedure 4. This agrees with the results in Rosenberger and Hu (2004) for response-adaptive randomization procedures without covariates. Procedure 1 has the highest success proportion for all cases, while its power is lower than that of the other three allocation procedures. However, its power is above 0.82 for all cases. As stated in Chapter 8, we believe that an important objective of a responseadaptive design is to skew the assignment probabilities to favor treatment performance and simultaneously to maintain an adequate level of power. Under this rationale, Procedure 2 is a better choice than Procedures 3 and 4 as its power is similar to that of Procedures 3 and 4, and thus the use of Procedure 2 means that it is possible to assign more subjects to the better treatment. Nevertheless, Procedure 1 is a good option if one desires to allocate more subjects to the better treatment. In exchange, the power is slightly lower.
9.5
CONCLUSIONS
In this chapter, we have described both covariate-adaptive randomization procedures and covariate-adjusted response-adaptive ( C A M ) randomization procedures. While Atkinson’s (1982) covariate-adaptive randomization procedure fits into the C A M procedure framework outlined here, the ad hoc procedures in Examples 9.1-9.3 do not. Those three examples attempt to balance marginally across levels of covariates,
REFERENCES
155
and this goal directly conflicts with the goal of targeting an optimal allocation for efficient estimation of regression parameters for a fixed covariate vector. In the latter formulation, the covariates are related in a very specific model form to the responses and a formal optimization problem is conducted. In this chapter, we provide a comprehensive framework for CARA randomization procedures as a method for treatment allocation procedures in clinical studies when covariates are available. It is a very general framework that allows a wide spectrum of applications to very general statistical models, including generalized linear models as special cases. Second, asymptotic properties are given to provide a statistical basis for inferences that are related to treatment efficacy. We then apply the CARA randomization to two treatments with binary responses (targeting the odds ratio). For this example, we calculate the asymptotic means and variances of both the overall allocation proportion and the allocation proportion with a given covariate. We find that CARA randomization allocates more subjects to the better treatment for a given covariate. Therefore, it has a higher expected number of successes than a responseadaptive randomization procedure that ignores the information of the covariates in this example. From the simulations in Section 9.2, we can also see the advantages of using CARA randomization procedures (Procedures 1 and 2). When responsc-adaptive randomization is used in a clinical trial, the usual asymptotic properties of the estimator are the same as those for fixed designs. For parameter estimation from a trial using CARA randomization, asymptotic normality is still true under some conditions. However, the asymptotic variance is different from that of fixed allocation. When covariate information is not being used in the treatment allocation scheme, an optimal allocation proportion is usually determined with the assistance of some optimality criteria, as noted in Chapter 2. For CARA randomization, the means to define and obtain an optimal allocation scheme are still unclear. The allocation function T I ( @ ,2) in Section 9.4 usually depends on the target allocation. For binary responses with two treatments, we use the odds ratio allocation, which is also used by Rosenberger, Vidyashankar, and Agarwal(2001). We can also use other allocation proportions (see the remarks in Section 9.4), such as the three proportions that were considered by Hu and Rosenberger (2003). It is important to study the behavior of the power function when a CAR4 procedure is used in clinical trials. For a simple case (discussed in Remark 9. I), it is not difficult to derive the power function, because we only have two covariates 2 = (1,O) and 2 = (1,l). For the general covariate 2,the formulation becomes very different, and it is an interesting topic for future research. 9.6
REFERENCES
ATKINSON, A. C. (1982). Optimal biased coin designs for sequential clinical trials with prognostic factors. Biometriku 69 61-67. ATKINSON,A. C. (1999). Optimal biased-coin designs for sequential treatment allocation with covariate information. Statistics in Medicine 18 1741-1752.
156
INCORPORATING COVARIATES
ATKINSON, A.
c. AND DONEV,A.
Clarendon Press, Oxford.
BALDIANTOGNINI, A.
N.(1992). Optimum Experimental Design.
A N D GIOVAGNOLI, A. (2005). On the large sample optimality of sequential designs for comparing two or more treatments. Sequential Analysis 24 205-2 17. BANDYOPADHYAY, U. AND BISWAS,A. (2001). Adaptive designs for normal responses with prognostic factors. Biometrika 88 4 0 9 4 19. COAD,D. S. A N D IVANOVA, A. (2001). Biascalculations foradaptiveurndesigns. Sequential Analysis 20 91-1 16. EFRON,B. (1971). Forcing a sequential experiment to be balanced. Biometrika 62 347-352. GWISE,T. (2005). Optimal Adaptive Biased Coin Designs in Clinical Trials. University of Virginia, Charlottesville (doctoral thesis). Hu, F. AND ROSENBERGER, W. F. (2003). Optimality, variability, power: evaluating response-adaptive randomization procedures for treatment comparisons. Journal of the American Statistical Association 98 67 1-678. KIEFER, J. A N D WOLFOWITZ,J. (1960). The equivalence of two extremum problems. Canadian Journal of Mathematics 12 363-366. MCCULLAGH, P. AND NELDER,J . A . (1989). GeneralizedLinearModels. Chapman and Hall, London. POCOCK,S . J. AND SIMON, R. (1975). Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. Biometrics 31 103-115. ROSENBERGER, W. F. (1996). New directions in adaptive designs. Statistical Science 11 137-149. ROSENBERGER, W. F. AND Hu, F. (2004). Maximizing power and minimizing treatment failures. Clinical Trials 1 141-147. ROSENBERGER, W. F., VIDYASHANKAR, A. N., A N D AGARWAL, D. K. (200 1). Covariate-adjusted response-adaptive designs for binary response. Journal of Biopharmaceutical Statistics 11 227-236. WEI, L. J. (1978). An application of an urn model to the design of sequential controlled clinical trials. Journal of the American Statistical Association 73 559-563. ZELEN,M. (1974). The randomization and stratification ofpatients to clinical trials. Journal of Chronic Diseases 28 365-375. ZHANG,L.-X., HU, F., C H E U N G , s. H. A N D CHANW.S. (2006). Asymptotic properties of covariate-adjusted response-adaptive designs. Submitted.
10 Conclusions and Open Problems
10.1 CONCLUSIONS This book has attempted to put on a firm theoretical footing the concept of responseadaptive randomization. While we have spent extensive time describing procedures based on urn models, it should be clear that procedures based on sequential estimation are more flexible in that they can target any desired allocation. This flexibility leads to a large class of procedures that have yet to be explored fully. We feel that this is the approach researchers in this area should pursue. Pieces of the framework we have presented have been present in the literature but have never been put together. Hayre (1 979) was probably the first to propose looking at the allocation problem in terms of a formal optimization criterion. Eisele (1994) was the first to propose an allocation function that would minimize the variability of response-adaptive randomization procedures when substituting sequentially computed estimators. These foundational papers led to the ideas in three papers that have formed the basis for much of this book: Rosenberger et al. (2001); Hu and Rosenberger (2003); and Hu and Zhang (2004). What these papers were unable to answer is perhaps the most interesting open question of all: Is there a fully randomized procedure that targets any optimal allocation that is asympotically best? We have an asymptotically best procedure targeting urn allocation (Jvanova, 2003) and a deterministic procedure that can target any allocation (Hu, Rosenberger, and Zhang, 2006), but finding a fully randomized procedure that targets any allocation is elusive.
157
158
CONCLUSIONS AND OPEN PROBLEMS
10.2 OPEN PROBLEMS There are still many open problems in response-adaptive randomization, and this is a fertile ground for researchers. Chapter 9 presents an overview of covariate-adaptive and C A M randomization procedures. Little is known about these procedures, and there are few papers regarding their theoretical properties in the literature. Urn models, while perhaps not the best approach to the problem of clinical trials, have numerous interesting probabilistic properties that make them interesting in their own right. For the generalized Friedman’s urn, we have assumed that the expected number of balls added at each stage is constant. We do not know anything about limiting properties when it is not. We do not know anything about the limiting properties when the real part of the second largest eigenvalue is greater than half the size of the largest eigenvalue. The distribution of the random variable W in Example 4.1, for instance, has been an open problem for four decades. Ternary urns are more recent in the literature and have been less studied than the generalized Friedman’s Urn. We have discussed heterogeneity very briefly in this book, and that is principally because there has been very little work in this area. Yet it is critical if these designs are to be used in clinical trials. We have described some results that apply to the generalized Friedman’s urn; extending these results to sequential estimation procedures is a completely open problem. We have also described only one limited approach, the weighted likelihood, and this requires that any time trends in patient characteristics converge, which may be unrealistic in many clinical trials. A more general approach to dealing with heterogeneity is desirable. For K > 2 treatments, we have discussed only binary responses. The optimization framework should apply to continuous responses, and this is a completely open problem. We have not discussed other types of outcomes, such as survival outcomes or rates of change from longitudinal models. There has been limited work on response-adaptive survival trials, but not to our knowledge on longitudinal models. Chapter 6 describes the methodology to determine requisite sample sizes for randomized clinical trials for comparing two treatments. This formulation can be extended to more than two treatments using a multivariate test statistic. Most larger clinical trials impose a sequential monitoring procedure to allow for early stopping. The basic statistical formulation requires determining the distribution of sequentially computed test statistics. Under response-adaptive randomization, this is a difficult task. Numerical studies have been performed (e.g., Coad and Rosenberger, 1999, and Stallard and Rosenberger, 2002, for the randomized play-thewinner rule, and Coad and Ivanova, 2005, for urn models with K = 3 treatments.) There has been little theoretical work done to this point, nor has there been any evaluation of sequential monitoring in the context of sequential estimation procedures such as the doubly-adaptive biased coin design. Regarding the intersection of sequential analysis and response-adaptive randomization, Rosenberger (2002) states: Surprisingly, the link betwecn [rcsponse-adaptiverandomization] and sequential analysis has becn tenuous at best, and this is perhaps the logical place to search for open research topics.
REFERENCES
159
It is the sincere hope of the authors that this book will provide the impetus for future researchers and also the impetus for the use of these procedures in future clinical trials.
10.3
REFERENCES
A N D IVANOVA, A. (2005). Sequential urn designs with elimination for comparing K = 3 treatments. Stalistics in Medicine 24 1995-2009. COAD,D. S. A N D ROSENBERGER, W. F (1999). A comparison ofthe randomised play-the-winner rule and the triangular test for clinical trials with binary responses. Statistics in Medicine 18 761-769. EISELE,J. R. (1994). The doubly adaptive biased coin design for sequential clinical trials. Journal of Statistical Planning and Inference 38 249-261. HAYRE,L. S. (1979). Two-population sequential tests with three hypotheses. Biometrika 66 465-414. Hu, F. AND ROSENBERGER, W. F. (2003). Optimality, variability, power: evaluating response-adaptive randomization procedures for treatment comparisons. Journal of the American Statistical Association 98 671-678. Hu, F., ROSENBERGER, W. F.,AND Z H A N G , L.-X. (2006). Asymptotically best response-adaptive randomization procedures. Journal of Statistical Planning and Inference 136 191 1-1922. Hu, F. AND ZHANG, L.-X. (2004). Asymptotic properties of doubly-adaptive biased coin designs for multi-treatment clinical trials. Annals of Statistics 32 268-30 1. IVANOVA, A. (2003). A play-the-winner type urn design with reduced variability. Metrika 58 1-13. ROSENBERGER, W. F. (2002). Randomized urn models and sequential design. Sequential Analysis 21 1-4 1 (with discussion). ROSENBERGER, w. F., STALLARD, N.,IVANOVA, A., HARPER, N.,AND RICKS,M. L. (2001). Optimal adaptive designs for binary response trials. Biometrics 57 909-91 3. STALLARD, N. AND ROSENBERGER, w. F. (2002). Exact group-sequential designs for clinical trials with randomized play-the-winner allocation. Statistics in Medicine 21 467-480.
COAD,D. S.
c.
This Page Intentionally Left Blank
AppendixA Supporting Technical Material
This book assumes a certain mathematical level, including knowledge of advanced probability and matrix theory. In this appendix, we review some of the most important concepts that are used in the book, particularly in the proofs (Appendix B) and the discussion of the proofs throughout the book. We begin with some requisite matrix theory. A.1
SOME MATRIX THEORY
Throughout the book we refer to TOW vectors, denoted by an n-tuple ofreal or complex numbers, for example, z = (21,...,zn). An m x n matrix is an array of row vectors
A = [a']
z
am
with m rows and n columns, where the components aij i = 1, ...,m,j = 1, . . . t n may be real or complex numbers. 161
162
SUPPORTING TECHNICAL MATERlAl
If A has complex elements, the transpose of A is well defined. However, an often more useful matrix is the conjugate transpose, given by
where G i j is the complex conjugate of aij. The eigenstructure of a square matrix ( n x n) plays an important role throughout this book. The lefl eigenvector v corresponding to the eigenvalue X of A is the solution to the equation v ( A - X I ) = 0. Similarly, the right eigenvector u is the solution to the equation
( A - X ~ ) U= 0. In describing the asymptotic properties of stochastic processes, first-order properties are often determined by the left eigenvector corresponding to the maximal eigenvalue of the transition matrix. For urn models, we use the left eigenvector of the generating matrix. A stochastic matrix is a matrix where all row sums take the same value. The maximal eigenvalue of a stochastic matrix is the common row sum. For our purposes, the maximal eigenvalue of the generating matrix for urn models is 1, and we are interested in the left eigenvector corresponding to the maximal eigenvalue 1. For the stochastic processes we consider in this book, the rate of convergence depends on the size of the real part of the second largest eigenvalue relative to the maximal eigenvalue. Order the real part of the n eigenvalues, leading to a distinct ordered set of eigenvalues (XI, ...,A,) (some possibly with multiplicity). For the generalized Friedman's urn model, the asymptotic limiting distribution of both the urn composition and the proportion of patients assigned to each treatment differ according to a phase change occurring at Re(&) = 1/2. The generating matrix also has the property of strict positivity. Let a:;") be the (i,j)-th element of A". A matrix A is strictly positive if there exists m such that us'' > 0 for all i,j = 1,...,n (Athreya and Ney, 1972). A.2
JORDAN DECOMPOSITION
For both the generalized Friedman's urn model in Chapter 4 and the procedures based on sequential estimation in Chapter 5, the Jordan form of a matrix is used to prove asymptotic properties of the procedures. Let A and B be two n x n matrices. Matrices A and B are called similar if
B = P-'AP for some nonsingular n x n matrix P . An n x n matrix A is similar to a matrix of the form J = diag(J1, ...,Js),
MATRIX RECURSIONS
163
... 0 ... 0 0 At ... 0 .. .. .. . . . . . .
-At 0 Jt = 0
At
0 1
-0
0
0
1
9
...
.
At -
for t = 1,...,s, where Cf, is the combinatoric notation and, by convention, the elements are zero if m < 1. A.3
MATRIX RECURSIONS
One of the important techniques used in the asymptotic proofs of Chapters 4 and 5 is matrix recursion. Provided matrix dimensions allow appropriate multiplication, matrix recursions are solved exactly like scalar recursions. For example, consider a recursion of the form C n = An Cn-IBn, where A, and Cn are K-vectors and B , are K x K matrices, n = 1 , 2 , .... As in the scalar case, this has solution
+
n
C, = C A i i=l
n n
j=i+l
Bj.
164
SUPPORTING TECHNICAL MATERIAL
In our application, the Jordan form of B , is of the form I product in the solution to the recursion leads to the matrix
0 j=i+l
J
+ n-l J . Computing the
...
0
n
0
0 j=i+l
0
0
...
fi
(I+j-'Js)
j=i+l
For each of the diagonal elements, the Jordan form allows us to simplify as follows:
where $(n,i, A) is uniformly bounded (say, I $) and tends to 1 as i nonzero off-diagonal elements can be computed similarly. A.4 A.4.1
--$
00.
The
MARTINGALES
Definition and properties of martingales
In this section, we will list some basic properties of martingales without proofs. One can find these results and their proofs in Hall and Heyde (1980). Let ( R , 3 , P ) be a probability space, where R is a set, 3 is a a-algebra of subsets of R, and P is a probability measure defined on F.Let 3,, n = 0,1,2, ... be an increasing sequence of o-algebras of F sets. Suppose that S,, n = 0, 1 , 2 , ... is a sequence of random variables on R satisfying
(i) S, is measurable with respect to F,, (ii) ElS,l < 00, and (iii) E(SnIF,]= S , for all rn 5 n.
The sequence S, is called a martingale with respect to F,, n = 0,1,2, .... X, = Sn- Sn-l is called the martingale difference. We now state several properties
of martingales that will be used in the proofs of asymptotic theorems in Appendix B.
THEOREM A . 1 . (Burkholder'sInequaliry)If {Si,3,},i = 0,1,2, ... isamartingale and 1 < 7 < 00, then there exist constants C1 and C2 depending only on y such that
MARTINGALES
where Xi = Si - S,-
165
1.
THEOREM A.2. If { S i , F i } ,i = 0,1,2, ... is a martingale and y exists a constant C depending only on y such that
> 0, then there
THEOREM A.3. (Rosenthal's Inequality) If {&,Ti},i = 0, 1,2, ... is a martingale and 2 < y < 00, then there exist constants C1 and CZdepending only on y such that
THEOREM A.4. (Weak Law of Large Numbers) Let { S i , F i } ,i = 0,1,2, ... be a martingale and {c,} be a increasing sequence of positive constants with cn -+ 00 as n -+ 00. Then c;'S, -+ 0 in probability as n -+ 00 if (i) CZl Pr(lXil (ii) c';
> cn)
-+
0,
C:=l E ( X i i l F + l )
(iii) c i 2 ,&[E(X;i)2
-+
0 in probability, and
- , ~ ( ( X i ~ ) ~ l F i - l0,) ] -+
where X i i = XiI(lXi( _< c,) and I ( . ) is an indicator function.
THEOREM A.5. (Strong Law of Large Numbers) Let { S i , F i } ,i = O , l , 2, ... be a zero-mean, square-integrable martingale. Then S, converges almost surely if
C& E(X;lFi-l) < 00.
A.4.2
The martingale central limit theorem
Let {S,j, F,j, 1 5 j 5 kn} be a zero-mean, square-integrable martingale for each n 2 1, and let X,j = Sj, - Sn,j-l, 1 5 j 5 k, (S,O = 0) be the martingale differences. Here k, -+ 00 as n -+ 00. The double sequence {S,j, Fnj, 1 5 j 5 kn} is called a martingale array. Define V2j = Ci=, E(X;ilFn,i-l), the conditional variance of Snj and U2j = c { = , X i i , the squared variation.
THEOREM A.6. (Martingale Central Limit Theorem)Suppose that max IXnj I -+ 0 in probability, 3
(A. 1)
166
SUPPORTING TECHNICAL MATERIAL
X:j j=1
-
u2 in probability,
where u2 is a positive constant,
and Fnj
E Fn+l,j for 1 5 j 5 k n , n 2 1.
Then
kn
=
&kn
1
Xnj
-+
N ( 0 ,02)in distribution.
j=1
Based on this theorem, we have the following corollary.
A . l . If COROLLARY kn
E[X:jT(IXnjI > €)[Fn,j-1]-+ 0 in probability for any c
> 0,
(A.5)
j=1
j=l
and condition (A.4) is true, then kn
snk,
=
j=1
X,j
-+
N ( 0 ,u 2 )in distribution.
This corollary is very useful in applications. In Theorem A.5 and Corollary A. 1, we assume that U i k nand V2kn converge to a constant u2. But we can generalize the result as follows.
THEOREM A.7. Suppose conditions (A.l), (A.3), and (A.4) are satisfied. Then U,-,',Snk,
-t
N(O,1) in distribution
if unknconverges to a positive random variable almost surely. A.2. Suppose conditions (A.4) and (AS) are satisfied. Then COROLLARY v;',snk,
+
N(O,1) in distribution
if Vnk, converges to a positive random variable almost surely. In most applications, one deals with a single martingale S, instead of martingale arrays S,j. However, one can apply the above results to martingale Sn in the following way: define k , = n, Fnj = Fj,and Snj = sK'Sj, 1 5 j 5 n, where sn is the standard deviation of S,. Based on Theorems A.5 and A.6, we can obtain the asymptotic normality of S,.
MARTINGALES
A.4.3
167
Gaussian approximations and the law of the iterated logarithm
Asymptotic proofs in Chapter 5 require the use of Gaussianapproximations (cJ Hall and Heyde, 1980, Appendix I). We first introduce the Skorohod representation of a martingale.
THEOREM A . 8 . (Skorohod Representation) Let {Sn,Fn},n = 0,1,2, ... be a
zero-mean, square-integrable martingale with martingale differences Xi.Then there exists a probability space supporting a standard Brownian motion W and a sequence of nonnegative variables ~ 1 , 7 2 , with the following properties. if Tn = Ti, S; = W(T,), X; = S;, Xy = S; - Sy-, for i 2 2, and 3; is the u-algebra generated by S;, ...,S; and by W ( t )for 0 5 t 5 Tn,then
xy=,
...
(i) { S n , n2 1) = {S,t,n 2 1) indistribution, (ii) Tn is 3;-measurable. We now state the main theorem that allows the embedding of a martingale process into a Brownian motion.
xy='=,
THEOREM 12.9. Let {Sn,Fn}, n = 0,1,2, ... be a zero-mean, square-integrable martingale with martingale differences Xi. Let U: = X,?and sf = E ( S z )= E(U;). Define to be a random element of C(0,I] obtained by interpolating between the points (O,O), (U;2Uf, U;lSl),...,( 1,U;lSn),namely,
en
m ( t )= u;'[s~+ x,<,",(tU;t - u~)x,+~] ifU? I tU: 5 u:+~.
If for any E > 0
as n then
-
n
S,'CE[X?I(IXiI > E S n ) ]
0
-+
i=l 00,
en -+
and
si2u,2+ v2 > o almost surely,
W in distribution, where W ( t )is a standard Brownian motion.
Theorems A.8 and A.9 are used in the proof of Theorem 5.3 to embed the allocation proportions into a continuous-time process N n ( t ) / nthat approximates a Brownian motion using linear interpolation, thus providing limiting results on the continuous process. When t = 1 the process behaves as N,/n. We now state the law of the iterated logarithm for a martingale process with the Skorohod representation. Let S,, Un and tnbe defined as in Theorem A.9. Further, define qn = (2 log log Un)-1/2<,and d(t) = (2t log log t)1/2. Let Wnbe a positive nondecreasing sequence of random variables and Z, a nonnegative sequence of random variables such that Wnand Zn are Fn-l- measurable for n = 1,2, ....
THEOREM A.lO. If -1
[d(wn)] 2
C{xi~(lxil>Zi)- E(Xil(lXil>zi)l3i-11) 0 n
+
i=l
168
SUPPORTING TECHNICAL MATERIAL
and n
WG2 x { E [ X , Z l ( l X iLl zi)IFi-i] - [E(xiI(lxilL Z i ) l F ~ - i ) ] ~ + }1 i=l
almost surely, 00
~W;4E[x:I(lxi( 5 Zi)1Fi-l]< 0;) a s . , i= 1
where Wi:l W, --$ 1 and W,
+ 00
almost surely, then
I ~ ~ s u P [ ~ ( w , ~ ) ]= - ' 1s ,U.S. and l i m i n f [ 4 ( ~ ; ) ] - ' ~ , = -1 a s . asn
-+
00.
For our applications, we take W, = CJr&-l. In Theorem 5.2, we give a law of the iterated logarithm for the doubly-adaptive biased coin design and use the result to show that higher-order terms tend to zero almost surely in Theorem 5.3. A.5
CRAMER-WOLD DEVICE
A well-known and important property of characteristic functions is that they uniquely determine the probability measure. By using this property, we can obtain the CrumtrWold device, (cJ Billingsley, 1968), which reduces the convergence in distribution of random vectors to the convergence in distribution of one-dimensional random varibles. We will use the Cramtr-Wold device to develop a multivariate version of the martingale central limit theorem.
THEOREM A.11. (Crumkr-WoldDevice)For a sequence of random vectors X , = ( X n l,...,X,K) and a random vector Y = (Y1, ...,YK), a necessary and sufficient
+ ... + CKXn,(
condition for X , --$ Y in distribution is that cXL = CIX,~ c1 ... CKYK= cY' in distribution for each c E RK.
+ +
+
From the Cramtr-Wold device, we can obtain following useful result.
-
COROLLARY A.3. For a sequence of random vectors X, = (Xnl,...,X,lc), a necessary and sufficient condition for X , N ( 0 ,C ) in distribution is that cXk + N ( 0 ,cCc') in distribution for each c E RK. A.6
MULT WAR IA T E M A R T IN GALES
The general technique common to the asymptotic proofs in Chapters 4 and 5 is the development of a matrix recursion and then careful analysis of the components of
MULTIVARIATE MARTINGALES
169
the recursion. This is facilitated by creating martingales for some terms and showing that other terms converge to a constant. These martingales are created from matrix expressions, and hence the usual martingale theory for scalar martingales does not apply directly. We will therefore state some theorems for multivariate martingales and prove them using the Cramtr-Wold device and applying asymptotic properties of scalar martingales (Section A.4) to the arbitrary linear combinations used in the device. Let (51,3, P ) be a probability space, where 0 is a set (multidimensional), F is a a-algebra of subsets of 51, and P is a probability measure defined on 3. Let F,, n = 0,1,2,...be an increasing sequence of a-algebras of 7 sets. For a random vector X,we define IJXll2= Suppose that S, = ( & I , ...,S,K),n = 0 , 1 , 2 ,... is a sequence of random vectors (K-dimensional) on 51 and satisfying:
m.
(i) S, is measurable with respect to F,, (ii) EI(Sn112 < co,and (iii) EISnlFm]= S , for all m 5 n. The sequence S, is called a multivariate martingale with respect to F,, n = 0, 1 , 2 , ..., and X , = ( X , l , ...,Xnk) = S, - S,-l is the martingale difference. We can obtain the following corresponding results for multivariate martingales.
THEOREM A.12. (Weak Law of Large Numbers) Let { S i , F i } ,i = 0,1,2, ... be a multivariate martingale and { c,} be an increasing sequence of positive constants with cn + 00 as n -+ 00. Then c;'S, + 0 in probability as n + 00 if Pr(llXill2 > cn) (i) Cy==,
(ii) c;'
0,
-+
C:='=, E(X iilFi- l)+ 0 in probability, and
i ( X i : X i i l Z - l ) + 0, (iii) c z 2 C , " = l [ E ( X i : X-~E where Xki = X J ( ~ ~ X5i Cn). ~~2
PROOF. To show c;'S, + 0 in probability, we just have to show that c;'S,k + 0 in probability for each k = 1,2, ...,K . Based on the definition, we can see that S n k , k = 1, ...,K are martingales. Also, the conditions of Theorem A.4 are satisfied for each S n k . 0 THEOREM A.13. (Strong Law of Large Numbers) Let { S j , F i } ,i = 0,1,2, ... be a zero-mean, square-integrable martingale. Then S, converges almost surely if
CZ1E(X2X:JFi-l)< 00.
PROOF. Based on the condition 00
C E ( X i X : 1 3 i - l )< co, i=l
we can obtain CEl E(X&IFi-l) < 00 for k = 1, ...,K . Based on Theorem AS, S n k converges almost surely for each k = 1,...,K . 0
170
SUPPORTING TECHNICAL MATERIAL
Now we define the multivariate martingale array. Let {SnjyFnj, 1 i j i k n } be a zero-mean, square-integrable martingale for each n 2 1, and let X n j = S n j - Sn,j-,, 1 5 j 5 k , (S,o = 0 ) be the martingale differences. Here k , + 00 as n + 00. The double sequence { S,j, Fnj, 1 5 j 5 k,} is called a multivariate martingale array. Define
i= 1
the conditional variance-covariance matrix of S,j , and j
u;, = C X & X n i , i= 1
the unconditional variance-covariance matrix. We now prove the multivariate martingale central limit theorem.
THEOREM A . 14. (Martingale Central Limit Theorem)Let { S,j, 3 n 9 , l 5 j 5 k,} be a zero-mean, square-integrable martingale array with differences X,,, and let C be a constant nonnegative definite matrix. Suppose that
c kn
j= 1
X k i X n i+ E in probability,
E ( m p IIX,j 3
1);
is bounded in n,
and Fnj
E Fn+l,j for 1 I j 5 k n , n 2 1.
(A. 10)
PROOF.By the Cram&-Wold device, we just have to prove that S n k n C ’ is asymptotically normal with mean 0 and variance cEc’ for any given constant vector c = ( c l , ...,cI<).LetSnj = S,jc’. Nowwefirstshowthat {Snj,Fnj,l ij 5 k,} is a zero-mean, square-integrable martingale array. For each fixed n, we have (i) S n j is measurable with respect to F,j, because S,j is measurable with respect to Fnj.
MULTIVARIATE MARTINGALES
171
(ii) Because { S n j }is square-integrable,
(iii) We have
Therefore
~ [ S nI j~ n , m = ] sn,m for any m < j and also ESnj = 0. Now we check all the conditions of Theorem A.6 for SYj. Condition (A.4) follows directly from (A.lO). Let Anj = S n j - S n , j - l ; then Xnj = X,jc‘. Now (A. l), (A.2), and (A.3) follow from (A.7), (A.8), and (A.9) and the fact that c is a constant vector. In (A.2), the corresponding u2 = czc‘. 0 Based on this theorem, we have the following corollary.
COROLLARY A.4. Let {S,j,Fnj, 1 5 j 5 IC,} be a zero-mean, square-integrable martingale array with martingale differences X n j . If kn ~~[X~jxnj~(II> X~ n)jl l. ~I n , j -4 l]
0
(A. 1 1)
j=1
in probability for any E > 0, and (A. 12)
in probability, and condition (A. 10) holds, then kn
Snk, =
x,j
--$
N ( 0 ,C) in distribution.
j= 1
We will use Corollary A.4 to prove central limit results for both the generalized Friedman’s urn and the doubly-adaptive biased coin design in Appendix B. In Theorem A. 14 and Corollary A.4, we assume that U n k , and V n k , converge to a constant c. Now we state some more general results.
THEOREM A. 15. Let { S,j
,3,,j, 1 5 j 5 k n } be a zero-mean, square-integrable martingale array, Suppose that conditions (A.7), (A.9), and (A. 10) are satisfied. Then
Ui,!/S,kn
+ N ( O I,
) in distribution
172
SUPPORTING TECHNICAL MATERIAL
if U n k n converges to a positive definite random matrix almost surely.
COROLLARY A.5. Suppose that conditions (A. 10) and (A.ll) are satisfied. Then V ~ ~ ~ 2 S+ , kNn( 0 ,I ) in distribution if Vnk, converges to a positive definite random matrix almost surely.
A.7
MULTIVARIATE TAYLOR’S EXPANSION
We now state the multivariate extension of the well-known Taylor’s expansion. Given a function f (x),where x is a K-vector, the expansion of f ( x )about 0 is given by
+ af
af2
f(z)= f (0) -x’ ’ ax ax f x-xax
+ o(llxll;).
(A. 13)
When f(x) = (f l ( x ) ,...,fs(z)), each f i ( x ) is expanded separately according to (A.13), i = 1, ..., S. A.8
REFERENCES
ATHREYA, K. B. AND NEY, P. E. (1972). BranchingPuocesses.Physica-Verlag, Heidelberg. BILLINGSLEY, P. (1968). Convergence of Probability Measures. Wiley, New York. HALL,P. AND HEYDE,C. C. (1980). Martingale Limit TheoryandItsApplication. Academic Press, London.
Appendix B Proofs
B.l
PROOFS
OF THEOREMS IN CHAPTER 4
B.l.1 Proof of Theorems 4.1-4.3 We now prove the main asymptotic theorems for the generalized Freidman’s urn model. These proofs are adaptive from Bai and Hu (2005), and refer extensively to conditions, assumptions, and notation introduced in Chapter 4 and to technical theorems in Appendix A. First, we prove Lemmas 4.1 and 4.2 stated in Section 4.1.5. We then prove the three main theorems stated in Section 4.1.3.
PROOF OF LEMMA 4.1. Let e, = a, - ai-1 for i 2 1. By definition, we have e, = TiDil‘,where Ti is the result of the i-th draw, multinomially distributed according to the urn composition at the previous stages, i.e., the conditional probability that the i-th draw is a ball of type k (the k-th component of Ti is 1and other components are 0) given previous status is K - l , k / a , - l . From Assumptions 4.1 and 4.2, we have
and
173
174
PROOFS
Therefore n
n
i= 1
i= 1
forms a martingale sequence. From Assumption 4.2 and K
By Theorem A S , the series
> 1/2, we have
cq 00
i=l
converges almost surely. Then, by Kronecker's lemma (Hall and Heyde, 1980, p. 31),
almost surely. This completes the proof for conclusion (b) of the lemma. Conclusion (a) is a consequence of conclusion (b). 0
PROOFOF LEMMA4.2. Without loss of generality, we assume a0 = 1 in the following proof. For any random vector, write llYll := Define yn = (ynl, . . . , y n ~=) Y n t.Then(4.12)reduces to
m.
l l ~ -n EynII I MVn.
(B.4)
In Lemma 4.1, we have proved that [lan - n1I2 IC K 2 n (see (B.3) and (B.2)). Noticing that Ea, = n 1from (B. 1), the proof of (4.12) further reduces to showing that
+
for j = 2, .. . ,K . We shall prove (B.5) by induction. Suppose no is an integer and M a constant such that
PROOFS OF THEOREMS IN CHAPTER 4
and
M=
3 75
c1 +cz + c3 + cs + cs + (C3+2C5)MO 1 - 3E
where t < 1/4 is a prechosen small positive number,
and the constants C are absolute constants specified later. Consider m > no and assume that llyn - Eyn(l 5 MV,, for all 5 n < m. By (4.10) and (4.11), we have
I
and B m j j is the j-th column of the matrix Em,*. In the remainder of the proof of the theorem, we shall frequently use the elementary fact that
where $(n,i, A) is uniformly bounded (say, 5 qb) and tends to 1 as i -+ co.In the sequel, we use +(n,i, A) as a generic symbol, that is, it may take different values at different appearances and is uniformly bounded (by $, say) and tends to 1 as
176
PROOFS
i + 00. Based on this, one can find that the (h,ti + e)-th element of the block matrix fly=j+,(I + i-' J t ) is asymptotically equivalent to 1
e!- ( j / W
l o g e ( n / j ) $ ( % ~A,t ) ,
(B. 10)
where At is the eigenvalue of J t . By (B.7) and the triangle inequality, we have
(B. 1 1)
Consider the case where 1 (B. 10) we have
+ v1 + ...+ vt-1
< j 5 1 + v1 f
-
I(YoBrn,o,jII5 CIImTLx~IloguL-l m <_ ClVm.
- .. + vt. Then, by (B.12)
Since the elements of E ( Q ; a i ) are bounded, we have
5 GVm, for all m and some constant C2. Noticing that a;J1 (Iyi-l(1 is bounded, for T #
(B. 13)
f ,we have
(B.14) for all m and some constant C3.
PROOFS OF THEOREMS IN CHAPTER 4
Now we estimate this term for the case T =
177
3.We have
First, we have
Here we point out the fact that for any p > 1, there is a constant C,
> 0 such that
This inequality is an easy consequence of Burkholder's inequality (Theorem A.l). By using
-1- _ -1 +-2-ai-1 ai-1
and the above inequality, we have
i
zai-1
178
PROOFS
Combining the above four inequalities, we have proved (B. 15)
By Assumption 4.3 and the fact that
IIVi-lll is bounded, we have
5 5
L
(B. 16)
Next, we show that
I
(B. 17)
PROOFS OF THEOREMS IN CHAPTER 4
179
By Assumption 4.3 and the induction assumption that (Iyi-l - E y i - l ( l 5 M A ,
By Jensen's inequality, we have
5 (C5Mo + 6M)Vm. The estimate of the third term is given by
cs 1llWi - Ewcll(m/i)Re(X')log--'(m/i) m
5
5 <
:g; i= 1
{
c5vm.
3, 3
i-1/2(m/i)Re(Xt)logvi - 1 (m/i)9 ifr # i - w log-'/2(i + l)(m/z)R"(Xt) log"'-' (m/i), i f 7 = (B. 18)
The above three estimates prove assertion (B. 17). Substituting (B.12HB.17) into (B. 1 l), we obtain IlYn,j-EYn,jll
5 ( ~ E M + C ~ + C ~ + C ~ + C ~ + C ~ + ( C ~ + ~5CMVm ~)MO)V~
for j = 2, ...,I(. We complete the proof of (B.5) and thus of (B.4). From (B.4), we obtain (4.12). 0
THEOREM 4.1. Since IE > T V 1/2, we may choose I C ~such that > K . I > T V 1/2. By (4.12), we have llYn - EYn112 5 M n Z K l .
PROOF OF K.
Now by using subsequence arguments and the Borel-Cantelli lemma, one can show that T L - ~ -( E YY~, ) + 0
180
PROOFS
almost surely. To complcte the proof of the theorem, it remains to show that the replacement of E Y , by nv is valid, i.e., to show that IIYn,jII 5 MVn if (4.8) holds and that IIYn,jII = o(n) under (4.7). Here the latter is for the convergence with K, = 1. Following the lines of the proof for Lemma 4.2, we only need to replace Eyn,j on the left-hand side of (B. 11) and replace EYi-1 Wi on the right-hand side of (B.11) by 0. Checking the proofs of (B. 1 2 HB. I7), we find that they remain true. Therefore, we need only show that
-
{ m Czl
3 ( m / z ) T - 1 ' logUt-'(m/i) 2 5 O(Vm), if(4.8) holds, (B.,9) logUL-'(m/i) 5 em, if (4.7) holds.
+(m/z)T-l
This completes the proof of the strong consistency of Y n . Note that n
N, =p i= 1
C(Ti- JqTiIFi-1))+ c-. n
i
=
i=l
n
i=l
Yi-1
(B.20)
ai-1
Since (Ti- E(TilFi-1)) is a bounded martingale difference sequence, we have (Theorem AS), n
n-n C(Ti- E(Til.Fi-1)) -+ 0, i= 1
almost surely for any K > 1/2. Based on (B.20), we just have to show the strong consistency of Cy=lY i - l l a i - 1 . Based on the strong consistency of Y , , we obtain the strong consistency of CyZlY + l / i by using Kronecker's lemma. Now we consider the difference between Cy=lY i - l / u i - , and CyZlY i - l / i . Based on the fact that
almost surely, we establish the strong consistency of N , . 0
PROOFOF COROLLARY 4.3. From the proof of Theorem 4.1, one finds that the term estimated in (B. 12) is not necessary on the right-hand side of (B.11). Thus, to prove (4.13), it suffices to improve the right-hand sides of (B. 15HB.17) regarding EV,. The modification for (B.15) and (B.16) can be done without any further conditions, provided that the vector Yi-1 in these inequalities can be replaced by (0, Y i - l , - )The . details are omitted. To modify (B.17), we first note that (B. 18) can be trivially modified to cVmif condition (4.7) is strengthened to (4.8). The other two estimates for proving (B. 17) can be modified easily without any further assumptions. 0
PROOFS OF THEOREMS IN CHAPTER 4
181
PROOF OF LEMMA 4.3. From Theorems 4.1, we have Y n / a n --+ v a.s. Similar to (B.2), we have K
n
i=l
K
q=l k = l
K (=I
almost surely. Assumption 2.2 implies that { e , - E(eilFti-1))satisfies the Lyapunov condition. From the martingale central limit theorem (Theorem A.6), Assumptions 4.14.3 and the fact that n
a,
- n = 1 + C(e, - E(eilFi-l)), i= 1
the theorem follows. 0 PROOF OF THEOREM 4.2. To show the asymptotic normality of Y , - E Y , , we only need to show the asymptotic normality of y n - Eyn = ( Y , - E Y , ) t . From the proof of Lemma 4.1, we have n ynl
- Ey,,l = a , - IE - 1 = x ( e , - E(e,lF,-l)). i=l
From Corollary 4.1, we have
Combining the above results, we get
Again, Assumption 4.2 implies the Lyapunov condition. Using the martingale multivariate central limit theorem (Theorem A.6), as in the proof of Theorem 2.3 of Bai and Hu (1999), from (B.21), one can easily show that V;’(Y, - EY,) tends to a K-variate normal distribution with mean zero and variance-covariance matrix
By Theorem 4.1, for the case r = 112, V, = f i l ~ g ” - ” ~ n 6, 1 1 = 0 and El2 = 0. When T < 1/2, V, = &, 6 1 1 = K V q d q k i . Now let us find El*. Write t = ( l ‘ , t i , . . ‘, t : ) = (l’,L),ti = ( t ; , , . . ., t i , ) and
zqZ1 c,“==, z,“=,
182
PROOFS
= (Bn,i,a,**. , B n , i , K ) . (Also, the matrices with a minus sign in the subscript denotc the submatrices of the last K - 1 columns of their corresponding matrices.) Then the vector E l 2 is the limit of I
I
I
Bn,i,- = t - ' B n , i t -
n
i= 1 n
i= 1
K
=
n - 1 2 1
( C v q d q + H * ( d i a g ( w-) w*w)H
i=l
= 1
q=l
(&$, q= 1
+H*(diag(v)- v*w)H tn-'-&,-
)
) c n
=
1 C V q d q tn-' (qI1
&,z,-
+op(l)
i= 1
+ op(l),
(B.22)
i=l
where the matrices dq are defined in (4.6). Here we have used the fact that lH'(diag(v) - w*w) = l(diag(w) - w * w ) = 0. By elementary calculation and the definition of B n , i , - , we get
n
n
...
0
0
.(B.23) ..
0
n
n
In the h-th block of the quasi-diagonal matrix n
n
i=l j=i+l
the (9, g
+ 1)-th element (0 5 l 5 vh - 1) has the approximation (B.24) i=l
Combining (B.22), (B.23), and (B.24), we get an expression for X 1 2 .
PROOFS O f THEOREMS IN CHAPTER 4
183
The variance-covariance matrix EZZof the second to the K-th elements of V;'(yn - Ey,) was first calculated in (2.17) of Bai and Hu (1999). For the case, r < 1/2, Vn = fi.Let /n-1
\
n-1
n
r
0
n
0
...
0
.*.
0
n
J-J
t'Rt
(I+j--'JZ)
j=i+l
n j=i+l
(B.25)
+o( 1).
To obtain the limit of (BZ),we need to consider each block (9,h) of the matrix according to the Jordan decomposition for 9,h = 2, ...,s. For given g and h, the block matrix is
c J-J n
n-'
n
i=l j=i+l
n
( I + j - ' J j ) ( t ; ) * R t ; , J-J ( I + j - ' J h ) , j=i+l
(B.26)
184
PROOFS
To calculate this, we need to use the following two results:
1’
za 10gy1/xpx =
Irn
ybe-(”+’)%y =
+
r(b 1) ( a + l)(b+’)
’
for Re(a) > -1, b > -1 and the limit n
as n --t
00.
The (w, t)-th element of (€3.26) can be approximated by
is the (w,t)-th element of the matrix [(tk)*Rti]. Further, the where [(tL)*Rt;E](W,t) (w,t)-th element of (B.26) converges to
When T = 1/2, we shall use the fact that n
z-l logb(l/z)dz
5 n-‘ X ( i / n ) - ’Iogb(i/n) i= 1
5 logb(n)+
l/n 1
2-1
logb(l/z)dz
to obtain (B.28)
If A, = Ah, vg = Vh = v and Re(&) = 1/2, then the corresponding block (9,h ) is
c l-J n
V,-2
n n
n
(I+j-’Jj)(tj)*Rt’h
i=l j=i+l
(I+PJh),
(B.29)
j=i+l
where V,” = n log2”-1n. The (w,t)-th element of (B.29) can be approximated by
By using (B.28), (B.3 1) converges to
((v - 1)!)-2(2v - l)-l[(t;)*Rt‘h](l,J)
(B.31)
PROOFS OF THEOREMS IN CHAPTER 4
185
if w = t = v; otherwise, (B.3 1) converges to 0. Combining 011, C12, and X:22together and by using the martingale multivariate central limit theorem (Theorem A.14), n-’I2(Y, - E Y , ) has an asymptotically joint normal distribution with mean 0 and variance-covariance matrix C. Thus, we have shown that n-”2(Y, - E Y , )
+ N ( 0 ,( t - y C t - 1 )
in distribution. When (4.8) holds, un - nel has the same approximation as the right-hand side of (B.2 1). Therefore, in the martingale multivariate central limit theorem (Theorem A. 14), E Y , can be replaced by nw. This completes the proof of the theorem. 0
PROOF THEOREM 4.3. We have
We shall consider the limiting properties of N n . First, we have n
n
n
n-1
t=l
i=O
(B.32)
For simplicity, we consider the asymptotic distribution of N,t. Since the first component of N,t is a nonrandom constant n, we only need to consider the other K - 1 components. From (4.13) and (B.32), we obtain
i=l n
i=O n-1
i=l
i=O
i=O
186
PROOFS
n-1
n
(B.33)
-
+
n-1 where B,,J = t-'G,+1 . - G , t ,Bn,j = Cz=J B z , J / ( i 1). Here, in the fourth equality, we have used the fact that Y*,-(i 1 - uz)/[uz(i l)] = o p ( f i ) , which can be proven by the same approach used to show (B. 15) and (B. 19). In (B.33), we only have to consider the asymptotic distribution of the martingale A
3
n
xFz;
+
+
n-1
We now estimate the asymptotic variance-covariance matrix of V;'Un. To this end, we need only consider the limit of
n-1
(B.34)
PROOFS OF THEOREMS IN CHAPTER 4
187
since vT- = 0. This estimate implies that n
v12CE(Q;Q~IJE,-~) -+
{
j=l
-
c1=
;diug(w)t-,
if7 < 1/2, (B.35) if7 = 1/2,
as j
-+ 00.
Because Qj = [TjDj - ( Y j - l / ~ j - l ) H j ] t ,
( c )+
= ttdiag(v)Ht V,-2
n-1
gn,j,-
o(1).
(B.36)
j=1
From (B.8), we have
Based on (B.8), (B.9), and (B.101, the (h,h
+ l)-th element of the block matrix
has a limit obtained as
=
(&)e+l.
Substitutingthis into (B.37) and then (B.36), when V , = n, we obtain
j= 1
(B.38)
188
PROOFS
where is a K x ( K - 1) matrix whose first row is 0 and the rest is a block diagonal matrix, the t-block is vt x vt and its ( h ,h e)-th element is given by the right-hand side of (B.38). The matrix 52 is obviously 0 when V’, = n log2”-1n. Note that the third term in (B.34) is the complex conjugate transpose of the second term; thus, we also have the limit of the third term 5;. Now, we compute the limit 5, of the fourth term in (B.34). By Assumption 4.2, the matrices Rj in (B.34) converge to R. Then the fourth term in (B.34) can be approximated by
+
cin
n-1 X
l
i=j+l
n
(I-kT-lJh)
r=j+ 1
I’
(B.39)
g,h=l
Similar to (B.38), we can show that the (20, t)-th element of the (9,h)-th block of the matrix in (B.39) is approximately
(B.40) Here, strictly speaking, in the numerator of (B.40), there should be factors $ ( i , j , 20’) and +(m,j , t’). Since for any j o the total contributions of terms with j 5 j o is o(1) and the $ 3 tend to 1 as j 00, we may replace the $J’S by 1. For fixed w, w‘, t and t’, if A, # A h or Re(A,) < 1/2, we have
-
ccc
1 n-ln-l
- j=l
n-l
i = j m=j
(Z/j)Jg
(m/j)Xhl0gW’(i/j)logt’(m/j) (i l)(m l ) ( d ) ! ( t / ) !
+
+
PROOFS OF THEOREMS IN CHAPTER 4
189
Thus, when T < 1/2, if we split %3 into blocks, then the (w, t)-th element of the ( 9 ,h) block Eg,h (v, x vh) Of 5 3 is given by
(B.42)
I(t;)*Rt~l(u-w’,t-t’).
When T = 1/2, Z9,h = 0 if A, # Ah or if Re(A,) < 1/2. Now we consider Z,,h with A, = Ah and Re(&) = 1/2. If w’ t’ < 2v - 2, then
+
92-1 n-1 n-1
j=1
i=j
(i/j)Xg(l/j)Ag
logW$/j) 1ogtyq.j)
e=j
When w’= t‘ = v - 1which implies w = t = v = vg = vh, by Abelian summation we have n-ln-ln-1
-+
(XgI-2[(v - 1)!]-2(2v
- 1)-1.
(B.43)
Hence, for this case, Eg,h has only one nonzero element, which is the one on the right-lower corner of E9,hand is given by JA,
1-21
(v - 1)!]-2(2v - 1)-1 [(t;)*Rt)h](l,J).
(B.44)
Combining (B.34), (B.39, (B.38), (B.42), and (B.43), we obtain an expression of 5.By using the martingale multivariate central limit theorem (Theorem A.14), n - ’ / * ( N , - nv)t has an asymptotically joint normal distribution with mean 0 and variance-covariance matrix E.Thus, we have shown that n-”2(Nn - nu) -+ N ( 0 ,( t - 1 ) * 3 t - l )
in distribution. 0
8.1.2
Proof of Theorem 4.6
We now prove the main asymptotic result for the generalized drop-the-loser rule, as proved in Zhang et a/. (2006). Before we prove the main theorem, we prove the following two lemmas.
190
PROOFS
LEMMA4.8. Denote by Fn = a(T1,.. . , T n , Y i ,. . . ,Y n ) .Let
Vn,o =
n
C (Tm,om=l
~[~m,oI~m-l]),
n
Vn,k =
x { T m , k ( D m , k- 1) - E[Tm,k(Dm,k- l ) I r n - l ] } ,k = 1,2* m= 1
Assume EIIDm,klP]< 00 for p 2: 2. Then there exists a constant C, > 0 such that the martingales { vn,k,Fn; n 2: l } ,k = 0 , 1 , 2 , satisfy Ivm+i,k
- vm,ilp] 5 CPnpI2for all m and n, k = 0,1,2.
(B.45)
PROOF. Notice that IAVn,o(5 1 and
E[lAvn,kIPIFn-l]5 2’-’(1
+ E[lDn,klp])5 cp, k = 1 , 2 .
By Theorem A.2, we obtain (B.45) immediately. 0 Let un,k = akV,,o 4- vn,k, k = 1 , 2 . Then Un,k is the sum o f conditionally centered changes in the number o f type k balls in the first n draws, k = 1 , 2 . It can be shown that { u n , k ,Fn; n 2 1 ) is a martingale satisfying an inequality similar to that o f (B.45), k = 1,2. The next lemma gives the convergence rate ,of the urn proportions Y n . LEMMA4.9. Under Assumption 4.6, for each k = 1,2 and any 6 > 0,
(B.46) (B.47) (B.48) (B.49)
PROOF.According to (4.24), it is obvious that
+
Yn,k
=
yn-l,k
+ akYn-l,o
4-
- qkYn-l,k
lY;t-ll
(B.50) Then
PROOFS OF THEOREMS IN CHAPTER 4
Let Sn = max{l 5 j 5 n : according to (B.5 l), Yn,k
5 5 5 5
q,k
yn-l,k YS,,k y0,k &,k
<
ak%,o/qk},
+Aun,k 5 -k A U s , , + l , k
191
where max(0) = 0. Then
* * '
+
*.*
-k A U n , k
v { a k & , O / q k ) -6 u n , k - u S , , k v { a k Y O , O / q k } -k U n , k - u S , , k -
(B.52)
Thus (B.46) is proved. Notice that Sn 5 n is a stopping time. It follows that E u n , k = E U s , , k . By (B.52) we conclude that EYn,k
5 o(n1/8+6).
Equation (B.47) is proved by the fact that Y n , k 2 -1 and I y n , k l = y n , k 4- 2Yn+ Next, we verify (B.49). Fix rn. By replacing q , k with Y m + j , k in the definition of the stopping time Sn, with similar arguments as in showing (B.46), we can show that
Now, for each p 2 2 and 0 < t < 1/2, if n 2 1/(4t), then by (B.45), (B.47), and (B.53), we have
5 c{t-ln'/R+6
+t l / a - l / ~ ~ 1 / 2
1,
where the sums and maxima are taken over {i 2 0 : i[nt]5 n}. Here C > 0 is a constant and does not depend on t and n. Notice that Y m , k 2 - 1 . Letting t = n-1/4, we have
Choose p such that (4p)-' 5 S yields (B.48) immediately. Equation (B.49) can be derived easily from (B.48) and the Borel-Cantelli lemma. The proof of the lemma is now complete. Equation (B.49) indicates that the terms Y n , k in (4.25) can be neglected.
192
PROOFS
m= 1
m=l
E[AMn,k . AMn,jIAn-l] = 0,
j
#k
and
E [IAMn,kIPIA,-1] 5 2’E [IDl,kI”]. According to the law of the iterated logarithm for martingales (Theorem A. lo), we have Mn,k = a.s. (B.55)
o(d s )
Combining (B.49) and (B.55) shows that, for any 6 > 0,
akN;,o - qkN;,k
= -Mn,k
+ o(n3/8+6/2)
= -Mn,k -k = O(d-)
which, together with the fact that N;,o N;,k
= n
-
(B.56)
O(n3/8+6)
as.,
k = 1,2,
+ N;,l + N;,2 = n, yields
aklqk adq1-t a d q z
nSVk
s+l
+ 1 + O(&z&$ + o(d-) a s . , k = 1,2
(B.57)
and
alqZN;,2 - azq1N;J
= a2(alN;,o - QIN;,l) - .l(azN;JJ - 4zN;,2) = alMn,z - azMn,l+ o(n3/8+6) a s . (B.58)
We consider the martingale { M n =: alMn,2 - azMn,l}. From (B.54) and (B.57), it follows that n m=l n
n
m= 1
m=l
- n ~ ( u : v 2 0 +; a;v,a:)
+
O( J-) as. s+l By the Skorohod representation (Theorem A.8), there exists an A,-adapted nondecreasing sequence of random variables { rn}and a standard Brownian motion B such that
EIArnlAn-l]= E[(AMn)’(An-1], ElArnlp/2ICpElAMnI” 5 c p l b > 2
PROOFS OF THEOREMS IN CHAPTER 4
M , = qqI n = 1 , 2 , . . . .
193
(B.59)
Note that {C",l(Arm- EIAr,Idm-l])} is also a martingale. According to the law of the iterated logarithm, we have
On the other hand, by (B.57), we have
+ O(d z )
+ N;,2 = &n
U.S .
It follows that
Substituting (B.59) and (B.60) into (B.58) and applying the properties of a Brownian motion (cJTheorem 1.2.1 of Csorgo and RBvisz, 1981), we have aiq2Nn,2
-~
q l N n= , ~ aiqzN:,,,2
+ +
- azqiN.l,,,l = B(rUn)+ O ( U ; ' ~ + ~ )
+
+
= ~ ( n ( u ~ w 2 u , u2 ~ w l a ~ ) ~((n~oglogn)'/~(logn)'/~) ) 0(n3/8+6 1 = B ( ~ ( u : w ~ u&,a:)) ; ~ ( n ~ / u.s., ~ + ~ )
+
which, together with the fact that Nn,1
+ Nn,2 = n,yields
Notice that 2
u =
u;wlu: (a1q2
+
U?W,U,2
+a2d2
'
thus agreeing with (4.19). Let 1 B(t(u:Wlu;
W ( t )= -U
alq2
+ U;wlg:))
+ a2q1
Then { W ( t ) t; 1 0} is a standard Brownian motion. The proof is now complete. 0
194
8.2
PROOFS
PROOF OF THEOREMS IN CHAPTER 5
We now prove the main asymptotic theorems for the doubly-adaptive biased coin design, as originally proved in Hu and Zhang (2004a). We first prove the five lemmas stated in Section 5 . 5 . Then we prove the three main theorems stated in Section 5.4.
PROOFOF LEMMA5.1. First, we show the result (5.19) by induction. It is easy to see that (5.19) holds for n = 1. Now suppose it holds for n - 1, that is,
Then
-At Jt
=
1 At
0 1
...
0 0 0
... *. .. .. .. . . . . . . 0 0 0 ... At 0 0
0
-
PROOF OF THEOREMS IN CHAPTER 5
195
where llAnll -+ 0. Then
From (5.19), it follows that
Then by (5.20), there exists a constant GO2 1such that
Now we define a sequence of real numbers D , 2 1 such that
It is easy to define D, for m = 1,.. . ,9. Assume that n 2 10 and D, is defined form = 1,. . . , n - 1. Let n1 = and write
[a,
From (B.61) it follows that
196
PROOFS
where c k = k + 1 ( ' ) = 0.
NOW
define
Next, it suffices to show the boundedness of D,. Since llAnll + 0, there exists a n6 such that
Then
+
which, together with the induction, implies that D, 5 1 maxrn<,,-1 D, for all n 2 n6. 0
PROOF OF LEMMA5.3. Without loss of generality, we assume that KO= 1. Let
bn,n = 1,
PROOF OF THEOREMS IN CHAPTER 5
Also,
197
Ibn,,,,l 5 C(n/m)'O, k = 1 , . .., n , n = 1 , 2 , . . . .
It follows that
Now, by (B.62) we conclude that
PROOF OF LEMMA5.4. For k = I , . . . , K , define rt = m h { j : N j , k = z}, where min{@}= +m. Let { r ] i , k } be an independent copy of { X , , k } ,which is also independent of {Ti}. Define =i,k
= x,,",k'{T!
< +m} + ~
i
,
k
=~ +m}, { ~
~
i = 1 , 2 , . . .. By using the same argument as Doob (1936), we can show that {Zm,krm = 1 , 2 , . . .} is a sequence of i.i.d. random vectors, with the common distribution the same as that of X1,k. So, if (Al) is true, n =m,k
7t-l
4
8k
m=l
almost surely. And if (A2) holds, then
m=l
almost surely. Now, by the fact that
on the event
{ N n , k -+ m}, (5.21) is proved.
0
PROOF OF LEMMA5.5. By Lemma 5.4 and the continuity of p ( y ) , it suffices to show that Nn,k -+ 00 almost surely, k = 1 , . .. , K. Note that for each k, k = 1,. .,K, on the event {limn-.+m Nn,k < m}, the sequence { o n , & } takes a finite number of values. Also, on the event {limn+m Nn,k =
-
.
198
PROOFS h
m}, Bn,ki
h
almost surely. This shows that {On} is a relatively compact set almost surely. By the continuity of p ( y ) , p ( y ) E (0, l)Kfor any y on the closure of { 6,) almost surely. Therefore, there exists a 0 < 6 < 1 such that --$
f3ki
cn = p(&) E p, 1)K,
n = 1,2,.. ..
(B.63)
Note that P(Tn,k = 11Fn-1) = gk(Nn-l/n - l,pn-l), k = 1,-* . K . h
3
For each j = 1,.. . ,I<, on the event { Nn,j = 0, n = 1 , 2 , . . .}, we have
c 00
c 00
P(Tnj = llFn-l)2
cg
= +oo
U.S.
n=2
n=2
by (B.63) and condition (B3). This implies {Tn,j= 1,Lo.} = {Nn,j+ m} almost surely by the generalized Borel-Cantelli lemma ( c j Corollary 2.3 of Hall and Heyde, 1980). This is a contradiction. So,
2
2(n - 1)
for n largeenough
almost surely by (B.63) and condition (B4). Thus
c _-
M
P(Tn,k = 1lFn-1) = 3-m
U.S.,
n=2
which implies { X n , k = l,i.o.} = {Nn,k+ w} almost surely by the generalized Borel-Cantelli lemma. This is also a contradiction. Therefore,
almost surely k = 1,.. . ,K . 0
PROOF OF THEOREM 5.1. Recall that conditions (Al), (Bl), (B2”), (B3), and (CI) are assumed. By Lemma 5.5, Fn + v almostly surely. Let
PROOF O f THEOREMS IN CHAPTER 5
199
By condition (BI),
almost surely. On the other hand, M n = o ( n ) almost surely by the law of large numbers from Theorem AS. Now we apply Lemma 5.3. For each k = 1 , . . ,K , let
.
and
Then
200
PROOFS
From Lemma 5.3, it follows that
since 0 5 XO < 1. Thus limsup($-v)s;
k=l,
as.,
n+x
..., K .
(B.64)
Suppose that is is a limit of one subsequence of { N n / n } .Then by (B.64), (V-V)S:
50,
k=l
All the above K inequalities must be the equalities, since otherwise, K
0 = (G - V)l-ICO = (G - vp1-I =
C(is- 21)s;
< 0,
k= 1
which is a contradiction. So, --I
VSk
= us;, k
= 1,.
. .,K ,
which implies isS = vS.It follows that G = v. Then N n / n * v almost surely. 0 PROOF OF
1,.
THEOREM 5.2. Since N n / n + v almost surely and
. . K , by Lemma 5.4 we have
?&
> 0, k =
(B.65) Then by condition (C2),
On the other hand, M n = O(d Theorem A. 10. So
m
)by the law of the iterated logarithm from
201
PROOF OF THEOREMS IN CHAPTER 5
Now, by condition (B4),
By Lemma 5.2,
O ( 4 W ) if X + 6 < 1 / 2 , if X + 6 > 1/2.
(B.66)
The proof of Theorem 5.2 is completed by noting that 6 > 0 can be chosen arbitrarily small. 0 PROOF OF THEOREM 5.3. Let Q, = C:=, AQ,, where AQ,=(AQ,,1
,...,A Q m , K ) = ( A Q m , k i ; i =...l ,, d , k = l , ...,K )
By definition,Q, is a sequence of martingale in R K x dand , Q, = O ( d w ) almost surely by (A2) and Theorem A. 10 (the martingale law of the iterated logarithm). By (B.66),we have
(B.67)
202
PROOFS
By (B.65)and condition (C2), we have
On the other hand, by (5.9) and (B.65)-(B.67), we have
Then, by Lemma 5.1 it follows that
PROOF OF THEOREMS IN CHAP JER 5
203
Note that U, is a sum of martingale differences. We now check the Lindeberg condition on U n .For some 0 < c < 1 / X - 2,
By the martingale multivariate central limit theorem (Theorem A. 14), we obtain the asymptotic normality of N , . Now the main task is to calculate its asymptotic variance-covariance matrix. First, we have
+ diug{g(v,v ) }
- {g(v,w)}'g(v,v) = diug(v) - v'v = c1
almost surely and
G O V ( A M ~ , A Q , ~= F 0~ - U.S. ~} Second
)
204
PROOFS
are martingales. We would like to embed these two martingales in a multivariate Brownian motion by using Theorem A.9 and Theorem A. 1 1 (Cram&-Wold device). Based on the above conditional variance-covariance limits, it follows that for any o < s < t < 1, (8.70)
+ n i a d z [ pY (:)Hdy]'.2 Y
[p
Y ( Yy d Y ] + o ( n ) (8.71)
PROOF OF THEOREMS IN CHAPTER 7
=
nsAl2
+ o(n),
205
(B.73)
and
where
This shows that the limiting variance-covariance function of
n-1/2 ( U [ n t ] r Q [ n t l v ( P ) I ~ ) agrees with the covariance function of (Gt, as defined in Theorem 5.3’. So, by the weak convergence of martingale (Theorem A.9),
n-lI2
(V,,t], q n t ] V ( P ) I , )
-+
(Gt,
my2)
in distribution. Thus, we have proved Theorem 5.3’. From (B.68) and (B.69), we can see that Theorem 5.3 is a special case of Theorem 5.3’ by taking s = t = 1 in (B.70), (B.72), (B.73), and (B.74). 0 B.3
PROOF OF THEOREMS IN CHAPTER 7
In this section, we prove results on delayed responses stated in Chapter 7. For the generalized Friedman’s urn, the proofs are adapted from Bai, Hu, and Rosenberger (2002) and Hu and Zhang (2004b). For the doubly-adaptive biased coin design, proofs are adapted from a currently submitted manuscript of Hu et al. (2006).
206
PROOFS
PROOFOF THEOREM 7.2. If we can express the effect of delayed response mathematically and show that the term from the delay mechanism is negligible, then by Theorems 4.1-4.3, we obtain Theorem 7.2. To do this, we first express the effect of delayed response. For simplicity of presentation. we use &,to represent the response of n-th patient. For patient n, after observing tn = 1, Jn = j (treatment j), we add d & ( I ) balls of type i to the urn, where the total number of balls added at each stage is constant; i.e., d j i ( l ) = 0, where p > 0. Without loss of generality, we can assume 0 = 1. Let
c%,
+ + +
So, for given n and m, if Mjl(n,m ) = 1, then we add balls at the ( n m)-th stage (i.e., after the ( n m)-th patient is assigned and before the ( n m 1)-th patient is assigned) according to T n D ( l ) Since . M j p ( n ,m ) = 0 for all 1’ # 1,
+
Consequently, the number of balls of each type added to the urn a..:r patient is assigned and before the ( n 1)-th patient is assigned is
+
the n-th
Wn m=O 1=1
m=l k 1
We can write (B.75)
By (B.79, we have n
n
n
n
L
m=l k = m 1=1
k
L
PROOF OF THEOREMS IN CHAPTER 7
207
m = l k=O 1=1 n L c o m = l 1=1 k=O
1=1 m = l k=n-m+l
n
L
1 = 1 m = l k=n-rn+l n L
n
That is, (B.76) where L
n
00
(B.77) If there is no delay, i.e., M~,,,,l(rn,k ) = 0 for all k 2 1 and all m and 1, then R, = 0 and (B.76) reduces to (B.78) which is the same as the basic recursive form of the urn model in Chapter 4. The main task now is to show that R, can be neglected. Before we do that, it shall be noted that the distance between Y nin (B.76) and that in (B.78) (without delayed responses) is not just h. This is because the distributions of T , will change due to delayed responses. So the asymptotic properties of the model with delayed responses do not simply follow from those when delayed responses do not appear. For a vector z in R m , we let 1 1 ~ 1 be 1 ~ its Euclidean norm and define the norm of an rn x m matrix M by llMlle = sup{llzMll/llslle : z # 0 ) . For any vector P and matrices M, M 1 , we have
208
PROOFS
Now from (B.77), for some constant C,
n
M
because llTmlle= 1for all rn and the number of balls added is always bounded. So n
n
It also follows that
which is summable. By the Borel-Cantelli lemma, we have
R, = o (nl-c') a.s. ~ c <' c. Now from the proofs of Theorem 4.1, we can show that Y , / n --$ 'u almost surely, because n-lR, * 0 almost surely. For N,, we can prove strong consistency similarly. When c 2 2 , then
almost surely for some c' > 1 / 2 . That indicates that R, does not contribute the asymptotic distribution of Y , . Thus we can prove Theorem 7.2 similarly to the proofs of Theorems 4.2 and 4.3. 0
PROOFOF THEOREM 7.4. Let X i k , k , j = 1,.. . ,w,,k be all of the responses on treatment k that are observed up to the time when the (rn I)-th patient is ready for treatment allocation, where Wm,k 5 Nm,k is the number of responses observed, k = 1,. . . ,K . Let S m , k = Cy'ik Xik+ and let
+
be the sample mean that is based on the responses X z f , k j, = 1 , . . . ,wm,k, k = 1,.. . ,K . Write e m = ( e m , i , ern,^). The (m 1)-th patient is then assigned to treatment k with a probability
-
e .
+
1
PROOF OF THEOREMS IN CHAPTER 7
209
(B.79) Based on Theorems 5.1-5.3, the proof of Theorem 7.4 can be completed by showing that * 0, = o(n-112-6) (B.80)
on
for some 6 > 0 under Assumption 7.2, where & is the estimator defined in Chapter 5 without delayed responses. To do this, we require three additional lemmas. From the time when the n-th patient is assigned to the time when the (n + 1)-th patient is to be allocated, all of the responses of patients on treatment k that we have observed correspond to the nonzero Tn-m,k&lk(n - m,m)-n- m,k , m = 0 , . . . ,n - 1, and the total number of such cases is
Let
Remember that Sn,k is the total sum of the responses on treatment k that are observed up to the time when the (n 1)-th patient is ready for treatment allocation, and W n , k is the number of responses observed. Then, for each k = 1,.. . ,K, we have
+
n
n
n
.i
n
m=l j=m
(B.81)
m=l j = O
and n
n n-m
(B.82) m= 1
m=l j = O
210
PROOFS
LEMMA7 . 1 . Let {?&,k = 1 , 2 , . . .} be a sequence of i.i.d. random variables with zero means and { ~ k k, = 1 , 2 , . ..} be another sequence of random variables that takes only two values, 0 or 1. Denote Cn = C;==,~ j Suppose . that for each n, qn is dependent on {ql ,... ,q n - l r € 1 , . . , E , } . Then, in the event that + m},
.
{c,
PROOF. The proof is similar to the proof of Lemma 5.4 in Section B.2.
0
LEMMA7 . 2 . For the doubly-adaptive biased coin design with delayed responses, if conditions (Al), -(B3), and (Cl) are satisfied, then N,,k .-+ 0;) almost surely, k = 1 , . . . ,I ( , and 0, 0 almost surely. PROOF. Based on (5.6) and Lemma 7.1, it suffices to show that, for each
Nn,k + 00 almost surely, k = I , . . ., K and
-
{Nn,k -+ m} implies @n,k + 6 k almost surely,
k = 1,.. . ,K .
(B.83)
As the former is implied by the latter (c$ the proof of Lemma 5.5 in Section B.2), it remains to prove (B.83). It is obvious that W
x h f k ( m , j ) = l for k = l ,
j=O
..., K.
(B.84)
Fix k = 1 , . .,, K . By (B.84) and (B.82),
For each j and k , { M k ( m ,j ) ,m = 1 , 2 . . .} is a sequence of i.i.d. random variables. From Lemma 7. I , it follows that N,,k -+ 00 almost surely implies
almost surely, j = 0 , 1 , ... . Then, in the event that { N,,k
3
m},
PROOF OF THEOREMS IN CHAPTER 7
almost surely as T
+ 00.
We therefore conclude that in the event that { Nn,k
211 + CQ},
(8.85) almost surely. Note that
in=lj=n-m+l
m=lj=n-m+l
m = l j=n-m+l
m= 1
m=l
By Lemma 7.1 and (B.85), in the event that { Nn,k
almost surely as A
-+ 00.
+ m},
Also, again by Lemma 7.1,
ck=l Tm'kXm'k
+ t9k a.s. in the event that
{ N n , k + 00).
Nn,k
Thus, by noticing (B.8 1) and (B.84), we conclude that in the event that { Nn,k
sn k Nn,k
+ CQ},
C\=iTm,kXm,k Nn,k
-
almost surely. Combining (B.85) and (B.86), we obtain (B.83) from the definition of &,k. 0 LEMMA7.3 For the doubly-adaptive biased coin design with delayed responses, suppose that N,/n -+v almost surely. If condition (A2) and Assumption 7.2 are satisfied, then for c' = (2 €0)/(2 €0 (1 E O ) C ) ,
+
+ + +
we have
en- 0, -
A
+
= o ( n ~ ' - ' ( ~ o g n ) ' ) o(n-(l+c~)/('+E~)) as.
212
PROOFS
In particular, if c 2 2, then for some cb > 0,
0, - 0 , = o(n-ll2-4 ) a s . I
h
PROOF.Notice that
(B.87)
.
almost surely, k = 1,. . ,K . By (B.81) and (B.84), for each k = 1,.. . , K ,
m=l
m=l j=n-m+l
Let 0 < c’ < 1 be a number the value of which will be defined later. Let 1, = [nv’] and I k ( m , p ) = C,”=,+,M d m ,j ) . Then,
n-1,
00
m=n-l,+l
m=l n
For the second term, by Theorems 1.2.1 and 2.6.6 of Csorgo and RCvCsz (198 I),
m=n-l,+l
=
o(nC’)
’>
+ o n2+r0
(
a.s.
PROOF OF THEOREMS IN CHAPTER 7
By choosing
ct =
2
2
+
213
€0
+ €0 + (1+
€O)C'
we have (B.90)
Combining (B.87)-(B.90), we have snk 2
Jn,k - ---
Nn,k
Nn,k
-
Rn,k Nn,k
-J n n k + 0 (nC'-'(logn)2)
+o
(n-e)
as.
Nn,k
Similarly,
Tn k = 1+ 0 (nC'-'(1ogn)2) + 0 ( n - 2 )
a.5.
Nn,k
Consequently, for each k = 1,. .., K ,
A
=
On,k
+ 0 (nC"'(logn)2) + o ( n - s )
as.
Now because c 1 2 in Assumption 7.2, we obtain Lemma 7.3 by d < 1/2. 0
214
8.4
PROOFS
REFERENCES
BAI,Z. D.AND Hu, F.(1999). Asymptotic theorem for urn models withnonhomogeneous generating matrices. Stochastic Processes and Their Applications 80 87-101. BAI, Z. D. A N D Hu, F. (2005). Asymptotics of randomized urn models. Annals of Applied Probability 15 9 14-940. BAI,Z. D., Hu, F., A N D ROSENBERGER, W. F. (2002). Asymptotic properties of adaptive designs with delayed response. Annals of Statistics 30 122-139. C S O R G ~M. , AND R k v ~ k z P. , (1981). Strong Approximations in Probability and Statistics. Academic Press, New York. DOOB,J. L. (1936). Note on probability. Annals of Mathematics 37 363-367. HALL,P. AND HEYDE,c. c.(1980). hfartingaleLimit TheoryandltsApplicarion. Academic Press, London. Hu, F. AND ZHANG,L.-X. (2004a). Asymptotic properties of doubly-adaptive biased coin designs for multi-treatment clinical trials. Annals of Statistics 32 268-301. Hu, F. AND ZHANG, L.-X. (2004b). Asymptotic normality of adaptive designs with delayed response. Bernoulli 10 447463. Hu, F., ZHANG,L.-X., CHEUNG,S. H., AND CHAN, W. S. (2006). Doubly adaptive biased coin designs with delayed responses. Submitted. ZHANG,L.-X., CHAN, W. s., CHEUNG, S. H., AND Hu, F.(2006).Ageneralized drop-the-loser urn for clinical trials with delayed responses. Statistica Sinica, in press. ZHANG,L.-X., HU, F., AND CHEUNG,S.H. (2006). Asymptotic theorems of sequential estimation-adjusted urn models. Annals of Applied Probability 16 340-369.
Author Index
Aganval, D. K..6,9, 138, 141, 149, 152-153, 155-156 Altman, D. G., 114, 117-1 18 Anderscn, J. S.,52.64, 121, 133 Anscombc, F. J., 7-8 Athreya, K. B.,4,8,31-33,56-57,64, 162, 172 Atkinson, A. C., 13,21,88, 109, 118, 136-137, 139, 154-156 Bai, Z. D., 34,37,46,50,52,56,64,82,88, 105-108,115,ll8, 173, 181,183.205,214 Balakrishnan, N., 64 Baldi Antognini, A., 70,84,88, 146, 153, 156 Bandyopadhyay, U., 68,88,109, I I I, I 13, I 18, 129,132,138,141,144.149,156
Bcny, D. A., 7,9 Billingsley, P., 168, 172 Biswas, A.,68,88,109-111,113,118,129,132, 138, 141, 144,149, 156 Chan, W.S.,61,65, 107-108, 118-119, 139, 141, 147-148,156,189,205,2l4 Charalambides, C., 64 Chcrnoff, H., 7 , 9 Cheung, S. H., 37,50,56,61,65, 107-108, 11&119,139,141,147-148, 156, 189, 205,214 Coad, D. S., 114, 118, 152, 156, 158-159 Colton, T., 7,9 Cornfield, J., 7, 9 Csorg6, M., 193,212,214 DeMcts, D. L., 91,103
Di Bucchianico, A., 65, I 18, 132 Donw, A.N., 13,21,136, 156 Doob, J. L., 197,214 Durham, S.D.,4,7, 10,25,29,32,56-59,6445 Efron, B.,3,9,68, 88, 101, 103, 156 Eiscle, J. R., 5,9,67-68, 72,79,88,91,97, 103, 108,118, 157,159 Farics, D. E., 52.64, 121, 133 Flchingcr, B. J., 7.9 Flournoy, N., 25.29.56-59.64.88 Friedman, L. M., 91,103 Fristedt, B., 7, 9 Furbcrg, C. D., 91, 103 Gcraldcs, M., $ 9 Giovagnoli, A., 70,84,88, 146,153,156 Grcenhousc, S. W., 7 , 9 Gwisc, T.,138, 156 Hall, P., 164, 167, 172, 174, 198,214 Halperin, M., 7,9 Harper, C. N., 13,21,69,88,98,104, 110, 119, 157, 159 Hayre, L. S.,7,9, l l , 2 l , 157, I59 Hciligenstein, 1. H., 121, 133 Hcyde, C. C., 164, 167, 172, 174, 198,214 Hu, F., 5.9, 13-14, 16,20-21,23,28,34,37,44, 46,SO. 52.56.60-61,64-65,67,69-70, 76,82,85,88,93,95-96,98, 101, 103, 105-108, I I t , 113, 115-1 19,123, 127-128, 132-133,139,141,147-148, 153-154, 156-157,159, 173,181,183, 215
216
AUTHOR INDEX
189, 194,205,214 Ivanova, A. V., 13,21,56,58-62,64,69,88,98, 104,ll0,119, 152,156-159 Jcnnison, C., I l , Z l , 101, 103 Johnson, N. L., 31,64 Karlin, S., 4, 8,31-33,56,64 Kicfcr, J., 137, 156 Kotz, S., 3 I , 64 Koutras, M. V., 64 Lachin, J. M . , 2 4 , 7 , 9 , 18,21,91,97, 10&101, 103, 121,133 Lauter, tf., 65, 118, 132 Li, W., 56-57,64 Lin, D. Y., 26.29 Louis, T.A., 7, 9 Mandal, S., 109-1 I I, 118,129, 132 Matthcws, P. C., 34,42,64 McCullagh, P., 144, 156 Melfi, V., 5,9,68,88,98, 102-103, 108, 1 11, 119 Ncldcr, J. A., 144, 156 Ncy, P. E., 57,64, 162, 172 Page, C., 5,9,68,88,98, 102-103, 108, I 11,119 Park, T. S., 26.29 Pocock, S. J., 6,9, 136, 156 RCvCsz, P., 193,212,214 Ricks, M.L.,13,21,69,88,98,104,110,119, 157,159 Robbins, H., 7,9 Roscnbcrgcr, W. F.. 246-7,9, 13-14, 16, 20-2 I, 23,25,28-29,3 I, 33-34,42,44, 56,58-60,64-65,69-70,73,88-89,91, 97-98,100-101,103-111,113, 115-119, 121, 123,127-129, 131-133,135, 138, 141,149, 152-159,205,214 Roy, S. N., 7 , 9
Royston, J. P., 114, 117-1 18 Scshaiycr, P., 105, 108, 119 Shao, J., 28-29 Shcn, L., 37,46,50,64 Sicgmund, D. O., 7, 9 Silvcy, S.D., 70.88 Simon, R., 6,9, 136, 156 Smith, R. L., 3, 10, 70, 77,79, 82.89, 100, 104 Smythc, R. T., 3, 10,26,29,33-34,65,77,79,82. 89 Stallard, N.,13,21,69,88,98,104,110,119, 157-159 Tamura, R. N.,52,64, 121, 133 Thompson, W. R., 7,9,74,89 Turnbull, B. W., 1 I , 21,101, 103 Tymofycyev, Y.,13-14,21,44,65, 113,119,128, 133 Vidyashankar, A. N.,6,9, 138,141, 149, 152-1 53,155-1 56 Wci, L. J., 3-5,7,9-10,26,29,32,42,56,65, 77, 79,82,84,89,10&101, 104-105, 119, 136,156 Wolfowitz, J., 137, 156 Woodroofc, M., 5, 9,68,72,79,88,91,97, 103, 108, 118 Wynn, H. P., 65, 118, 132 Yang, Y., 7, 10 Zelen, M.,7, 10, 136, 156 Zhang, L.-X.,5,9,20-21,23,28,37,50,56, 60-61,64-65,67,69-70,76,85,88.101,
103,107-108,111, 118-119,139,141, 147-148,156--157,159,189,194,205,214
Zhang, L., 73,89, 109, 1 1 I , 119, 129, 131, 133 Zhu, D., 7, 10 Zidek, J. V.,115-1 16, 118
Subject Index
Allocation proportions, I Asymptotically best procedure, 20-2 I, 27-28,61, 74,84-85,
157
Bandit problems, 7 Multi-armed, 7 Randomized, 7 Birth and dcath urn modcl, 34,56,58 Borel-Cantelli lemma, 179, 191, 198,208 Branching process, 57.59-60 Brownian motion, 62-63,80,167, 192493,204 Burkholdcr’s Inequality, 164 Burkholder’s inequality, 177 C A M randomization, 2.6.8, 135-141, 144, 149, 152-155, 158
Complete randomization, 2,24,67,70, 122-123.127-130,
100-102,
136,153
Conjugate transpose, I62 Continuous response, 84, 108-109, 129, 158 Convex optimization, 14 Covariate-adaptive randomization, 2,6, 135, 154 Pocoek-Simon procedure, 6, 136 Cramir-Wold device, 168-169,204 Delayed rcsponses, 8,82,105-108, 122, 131-132, 205-207.209-2 1 I Design matrix, 5, 136-137 Doubly-adaptivc biased coin dcsign, 5-6,24,26, 67,102-103, 108-109, 123, 158, 168, 171, 194,205.210-21 I
Drop-the-loser rule, 56,59,61,75,
115, 123,
127-128 Gcneralized, 61, I89
Eigenvector Generalized, I63 Letl, 32,37,43,45,47, 162 Right, 32, 162-163 Expected treatment failures, 13-14,21, 108, 110, 123, 127-128
Extended Pblya urn model, 33 Fisher’s information, 25,70, 144, 153 Gaussian approximations, 85, 167 Gcneralized Friedman’s urn model, 4 4 3 1-33,
35.56, 105, 107,114-116,158, 162, 171, 173,205 Generalized linear model, 139, 142, 144-145, 155 Generating matrix, 5.35-37, 39,4143,4547, 49,51-52,56,82, 114,162 Heterogeneous, I14 Immigration balls, 58-63 Jenscn’s incquality, I79 Jordan block, 163 Jordan canonical form, 54, 163 Jordan decomposition, 33-34,43,45,47,52,79. 85,162, 183, 194 Kronceker’s Icmma, 174, I80 Lindcberg condition, 203 Linear regression, 136, 141. 144-145, 148
217
218
SUBJECT INDEX
Logistic rcgression, 6, 138, 141, 144-145, 147, I53 Lyapunov condition, 181 Markov chain, 57 Martingale array, 165 Central limit theorem, 26-27,52,55-56,85,88, 165, 168, 181 differcncc, 148,164-165, 167,169-170,180. 203 Law of the iterated logarithm, 56,88, 167, 192, 20&20 1 Multivariate, 169 Skorohcd rcprcscntation, 167, I92 Strong law of largc numbcrs, 165 Weak law of large numbcrs, 165 Matrix rccursion, 52-53.85-86, 163, 168 Stochastic, 162 Strict positivity, 162 Variance-covariance, I9,3940,56,84, 136-137,146,170,181,183,185-186,
189,203 Maximum likclihood estimation, 5-6, 23.25-27. 71,96, 122, 142, 145, 147-150, 153 Multivariate martingalc array, 170 Multivariate martingale Central limit theorem, 170, I8 1, 185,I89,203 Strong law of largc numbers, 169 Weak law of large numbcrs, I69 Myopic proccdure, 7 Noncentrality parameter, 13-14, 1620, 109, 123, I47 Optimal allocation, 11, 15 for normal responses, 13 Neyman allocation, 13-14,21,24, 110-11 I , 113, 123 RSIHR allocation, 13-14.21, 110, 123 Optimal dcsign, I3 Bayesian, 13 Locally, 13
Power,4,8, 11-12, 15-18,20-21,24,60,71, 91-103, 108,110, 121-123, 127-132, 153-155 Randomization procedure, I Randomization scquence. 1 Randomizcd play-the-winner rulc, 4-5,24-25, 32-35,41,51-52,56,60-61,75,82, 127-128, 158
123,
Randomized P6lya urn niodcl, 56-57 Response-adaptive randomization, 2 Restricted randomization, 2 Efron’s biased coin design, 34.68.82 Permuted block design, 4 Random allocation rule, 3-4 Smith’s class of dcsigns, 70 Truncated binomial dcsign, 3-4 Wei’s urn design, 3-4,70,100 Roscnthal’s Incquality, I65 Sample size, 7-8, 11-12, 14,24,91,96-103, 108, 121-122,127,129-130, 153,158 Scquential design, 13 Sequential estimation proccdures, 6,60,67, I 11, 158
Sequential maximum likelihood proccdurc, 5, 102, 128-1 29 Scquential monitoring, 158 Skorohod topology, 81 Strong consistency, 56,62,79.86-87, 180,208 Survival time rcsponses, 108-109 Target allocation, 5-6,8, 16-17,20,28,67-71,75, 122-123,127, 144,155 Taylor’s expansion, 172 Tcrnary urn model, 3 I , 56 Time heterogcneity, I14 Triangle incquality, 176 Urn allocation, 60,70, 123, 127, I57 Um composition, 3432-33,37,50,57,59-61, 162, I73 Initial, 3,33-34,42,61, 127 Urn modcl Nonhomogeneous, 56 Sequential estimation-adjusted, 56