Biostatistics (2002), 3, 2, pp. 195–211 Printed in Great Britain
Multipoint linkage detection in the presence of heterogeneity YEN-FENG CHIU∗ Department of Biostatistics, School of Public Health, CB #7420, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7420, USA
[email protected] KUNG-YEE LIANG Department of Biostatistics, School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA TERRI H. BEATY Department of Epidemiology, School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
S UMMARY Linkage heterogeneity is common for complex diseases. It is well known that loss of statistical power for detecting linkage will result if one assumes complete homogeneity in the presence of linkage heterogeneity. To this end, Smith (1963, Annals of Human Genetics 27, 175–182) proposed an admixture model to account for linkage heterogeneity. It is well known that for this model, the conventional chisquared approximation to the likelihood ratio test for no linkage does not apply even when the sample size is large. By dealing with nuclear families and one marker at a time for genetic diseases with simple modes of inheritance, score-based test statistics (Liang and Rathouz, 1999, Biometrics 55, 65–74) and likelihoodratio-based test statistics (Lemdani and Pons, 1995, Biometrics 51, 1033–1041) have been proposed which have a simple large-sample distibution under the null hypothesis of linkage. In this paper, we extend their work to more practical situations that include information from multiple markers and multi-generational pedigrees while allowing for a class of general genetic models. Three different approaches are proposed to eliminate the nuisance parameters in these test statistics. We show that all three approaches lead to the same asymptotic distribution under the null hypothesis of no linkage. Simulation results show that the proposed test statistics have adequate power to detect linkage and that the performances of these two classes of test statistics are quite comparable. We have applied the proposed method to a family study of asthma (Barnes et al., 1996), in which the score-based test shows evidence of linkage with p-value <0.0001 in the region of interest on chromosome 12. Additionally, we have implemented this score-based test within the frequently used computer package GENEHUNTER. Keywords: Admixture model; Asymptotic; Genetic heterogeneity; Genetic linkage; Multipoint analysis. ∗ To whom correspondence should be addressed
c Oxford University Press (2002)
196
Y. F. C HIU ET AL. 1. I NTRODUCTION
Linkage analysis is a method of identifying the chromosomal location of the gene(s) for a trait through statistical testing. The term ‘linkage’ has been used to describe the phenomenon whereby alleles from two loci segregate together in a family: that is, where they are passed as a single unit from the parents to a child. Intuitively, if a marker locus is linked to a trait locus, the inheritance pattern of the marker locus is expected to be consistent with the inheritance pattern of the trait. Therefore, linkage analysis proceeds by looking at the inheritance pattern of a sequence of genetic markers that have known or estimated chromosomal locations, comparing them with the inheritance pattern of the trait within families, to map the trait locus. Nevertheless, linkage heterogeneity hampers the identification of the unobserved disease gene, because a chromosomal region may be linked with a disease locus in some families but not in others. It was postulated by Smith (1963) that the use of a statistical test that assumes homogeneity when heterogeneity is present will result in a considerable reduction in power to detect linkage. In addition, Faraway (1993) showed that when heterogeneity is not present, the power of the test allowing for heterogeneity is only slightly less than the power obtained under an assumption of homogeneity. Thus, it is helpful to allow for heterogeneity when testing for linkage, especially for complex diseases. In this paper the score-based test statistics (Liang and Rathouz, 1999) and the likelihood-ratio-based (LRB) test statistics (Lemdani and Pons, 1995) for detecting genetic linkage under heterogeneity are extended to the multipoint linkage analysis, where the information from multiple markers and arbitrary families is used in broader genetic models. In two-point linkage analysis, the degree of linkage between two points, one trait locus and one marker locus, can be expressed by the recombination fraction θ, which is the probability that a recombination event occurs during meiosis. The further apart two loci are, the higher the possibility that a recombination event occurs during meiosis. When two loci are not linked, θ = 1/2; otherwise 0 θ < 1/2. The smaller the θ , the closer the two loci are linked to one another. In the presence of linkage heterogeneity, Smith (1963) considered the following mixture model for family i, i = 1, . . . , n: gi (yi ; α, θ ) = α f i (yi ; θ) + (1 − α) f i (yi ; θ0 ),
(1.1)
where y1 , y2 , . . . , yn are the number of recombination events out of m i meioses in the ith family; α is the unknown proportion for ‘linked’ families; f i is either binomial (m i , θ ) if the phase is known or 12 binomial (m i , θ) + 12 Binomial (m i , 1 − θ ) if the phase is unknown, and θ0 = 12 . The likelihood function for θ and α is given by L(α, θ ) ∝
n
{α f i (yi , θ) + (1 − α) f i (yi ; θ0 )}.
(1.2)
i=1
Based on this mixture model (1.1), the conventional chi-squared approximation to the likelihood ratio test for testing H0 : α = 0 (or equivalently H0 : θ = 1/2) is not directly applicable (Davies, 1977, 1987). Instead, Lemdani and Pons (1995) proposed the following family of LRB test statistics to test for linkage under heterogeneity: Rε = 2 sup log α∈[ε,1]
L(α, θˆα ) , L(·, θ0 )
(1.3)
where θ ∈ = [0, 1/2], and θˆα is the maximum likelihood estimate for θ for fixed α ∈ [ε, 1], where 0 < ε < 1. Note that the test statistic is the conventional likelihood ratio test when ε = 0. Lemdani and Pons (1995) showed that under the null hypothesis (H0 ), and for any α ∈ [ε, 1], θˆα converges in
Multipoint linkage detection in the presence of heterogeneity
197
probability to θ0 . Consequently, Rε converges in distribution to 12 χ02 + 12 χ12 as θ0 is on the boundary of . Here χ02 denotes a distribution degenerate at zero with probability one. More recently, Liang and Rathouz (1999) considered the score function for α evaluated at α = 0, i.e. S(θ ) = ∂ log L(α, θ )/∂α |α=0 n n f i (yi ; θ) = Si (θ ) = −1 , f i (yi ; θ0 ) i=1 i=1
(1.4)
and proposed the use of the test statistic for testing for linkage under heterogeneity: H0 : α = 0, Tλ = λS(θˆλ ),
0<λ1
(1.5)
where θˆλ is the maximizer of L(λ, θ ) for a given value α equal to λ as defined in (1.2). They showed that under H0 , Tλ asymptotically is distributed as 12 χ02 + 12 χ12 . Through simulations, Liang and Rathouz (1999) showed that both LRB and score-based test statistics performed well for large ε and λ values. Nonetheless, both of these test statistics applied only to the ideal situation where the disease follows simple Mendelian monogenic inheritance for each individual in the nuclear families. Furthermore, such linkage analysis is designed to test one marker at a time. In practice, investigators frequently use multi-generational family data to deal with multiple loci where the phenotype is a complex disease and very likely to be heterogeneous. A typical extended pedigree may comprise a hundred or so individuals covering three or more generations, and from a genetic point of view such pedigrees yield far more information than can be obtained from the same number of individuals divided into small unrelated pedigrees (Elston and Stewart, 1971). Moreover, to some extent the category of complex traits is all-inclusive. Even the simplest genetic disease is complex when examined closely (Lander and Schork, 1994). The complexities arise when the simple correspondence between genotype and phenotype breaks down, either because the same genotype can result in different phenotypes or because different genotypes can result in the same phenotype. In addition, the multipoint approach, using all the relevant marker data simultaneously to estimate the position and effects of a trait susceptibility gene, is known to be more efficient than single-marker searches (Lander and Green, 1987). The purpose of this paper is to extend the work by Lemdani and Pons (1995) and Liang and Rathouz (1999) to a more general situation in three directions: (1) from one marker to multiple markers; (2) from nuclear families to multi-generational pedigrees; and (3) from a classical Mendelian trait to a complex (non-Mendelian) trait. 2. T HE PROPOSED TEST STATISTICS 2.1
The likelihood function
Consider a chromosomal region, R, of length L centi-Morgan (cM) framed by M markers located at t1 , . . . , t M . Suppose that n independent pedigrees are ascertained, with m i pedigree members sampled from the ith pedigree and possibly varying pedigree structures. For each pedigree, let Yi be an m i × 1 random vector denoting the phenotypes and X i be an m i × M random vector characterizing the observable marker information for the M genotyped markers. In general, the probability function for (Yi , X i ) is indexed by three sets of parameters, τ, φ and γ , which are of dimensions 1, p and q respectively: i.e. f (yi , xi ) = f (yi , xi ; τ, φ, γ ). Here τ is the location of the trait locus which is hypothesized to be linked to the chromosomal region, R; φ is a set of trait-related parameters including penetrances and the allele frequency at the trait locus; and γ is a set of marker-related parameters such as allele frequences of the M
198
Y. F. C HIU ET AL.
markers. More specifically f (y, x; τ, φ, γ ) =
f (y, z, x; τ, φ, γ )
z
=
f (y|z; φ) f (z, x; τ, γ ),
(2.1)
z
where Z denotes the set of genotypes for the trait locus. Implicitly, we assume that the marker and trait loci are in equilibrium in the population and there is no epistasis between them, so that Y and X are statistically independent conditional on the trait genotypes, Z . For general discussions on the construction of probability functions for linkage data and related computational issues see, for example, Kruglyak et al. (1996) and Whittemore (1996). Finally, to acknowledge the possibility of heterogeneity in the absence of additional knowledge, we consider an extended version of the mixture model of Smith (1963): gi (yi , xi ; α, τ, φ, γ ) = α f i (yi , xi ; τ, φ, γ ) + (1 − α) f i (yi , xi ; τ0 , φ, γ ) = α f i (yi , xi ; τ, φ, γ ) + (1 − α) f i (yi , φ) f i (xi ; γ ),
(2.2) (2.3)
where τ0 = ∞ means that the postulated trait locus is unlinked to the region R. This could occur when the trait locus is either in another chromosomal region or is in the same chromosome but unlinked to region R. Thus, the likelihood function based on the data for n pedigrees has the form L(α, τ, φ, γ ) ∝
n
gi (yi , xi ; α, τ, φ, γ )
(2.4)
i=1
and the hypothesis of interest, that there is no linkage between the postulated trait locus and this region, can be expressed as H0 : α = 0, or equivalently H0 : τ = τ0 . 2.2
The score-based test statistics
The score function used by Liang and Rathouz (1999) for the simple case can now be extended to more general situations: S(τ, φ, γ ) = ∂ log L(α, τ, φ, γ )/∂α |α=0 n f (yi , xi ; τ, φ, γ ) = −1 . f (yi , xi ; τ0 , φ, γ ) i=1
(2.5)
Note that the ratio term in (2.5), when taking the base-10 logarithm, is known as the LOD score comparing an arbitrary τ , 0 τ < ∞, with τ0 = ∞ through the probability function for (Y, X ). Note also that through a strictly monotonic function θ = h(τ ) = 0.5(1 − e−0.02τ ), known as the Haldane map function (Haldane, 1919), we can re-express the null hypothesis of τ = ∞ as θ = 12 . The test statistic S in (2.5) is not computable since it depends on τ, φ and γ . However, we encounter the same problem as stated in the two-point approach. The conventional approach of replacing (τ, φ, γ ) by its maximum likelihood estimate under H0 , i.e. α = 0, is not applicable since the likelihood function is independent of τ when there is no evidence of linkage: see (2.3). We now discuss some alternative ways to make S computable, and which lead to a simple asymptotic distribution. Assuming for the moment that both φ and γ are known, i.e. (φ, γ ) = (φ0 , γ0 ), the true values, Liang and Rathouz (1999) have shown that for 0 < λ 1, τˆλ,φ0 ,γ0 converges as n → ∞, to τ0 under H0 and is normally distributed with variance inversely proportional to λ2 . Furthermore, the test statistic
Multipoint linkage detection in the presence of heterogeneity
199
Tλ = λS(τˆλ,φ0 ,γ0 , φ0 , γ0 ) converges under H0 to a 12 χ02 + 12 χ12 distribution. In practice, these trait and marker parameters are not known to investigators. However, they could be estimated either internally or ˆ externally through, for example, nsegregation analysis. Possible estimators include the maximizers, φ and n γˆ say, of i=1 f i (yi ; φ) and i=1 f i (xi ; γ ), respectively. As shown below, the asymptotic behavior of √ Tλ (τˆλ,φ0 ,γ0 , φ0 , γ0 ) is preserved if one replaces (φ0 , γ0 ) in Tλ by any n-consistent estimator such as ˆ γˆ ) defined above. Indeed, the same asymptotic behavior is preserved if either φ or γ is consistently (φ, estimated (see Williamson and Amos, 1995, and proofs in the Appendix). This proposition is especially important for linkage designs such as the affected sib pairs design in which the genetic mechanism as characterized by the φ parameters, and hence the φ parameters themselves, may not be adequately estimated from the data at hand. Incidentally, it is typical in this situation that an ‘educated guess’ for φ, which we term φ ∗ , is utilized. This is known as ‘wrong LOD scores’ (WROD), when φ ∗ is an incorrect estimate for φ (see, for example, Hodge and Elston (1994) and Liang et al. (1996)). The above discussion leads to the consideration of the following three possible approaches to manipulate (τ, φ, γ ) in S. In each approach, we assume that an arbitrary value of λ in the range 0 < λ 1 has been chosen in advance. The choice of λ in practice will be discussed in Section 3. ˆ γˆ ) defined earlier, and τ by τˆ ˆ which maximizes A PPROACH 1 Replace (φ, γ ) in S by (φ, ˆ γˆ ) with respect to τ . L(λ, τ, φ,
λ,φ,γˆ
A PPROACH 2 Replace (φ, γ ) in S by (φ ∗ , γˆ ) and τ by τˆλ,φ ∗ ,γˆ which maximizes L(λ, τ, φ ∗ , γˆ ) with ˆ γ ∗ ), where γ ∗ is a ‘guesstimate’ for γ , and τ by respect to τ . Likewise, one can estimate (φ, γ ) by (φ, ∗ ˆ γ ). the maximizer of L(λ, τ, φ, A PPROACH 3 Replace (τ, φ, γ ) by (τˆλ , φˆ λ , γˆλ ) which simultaneously maximizes L(λ, τ, φ, γ ) with respect to (τ, φ, γ ). L EMMA 1 Under some regularity conditions on f (y, x; τ, φ, γ ), (a) if τˆ is an estimator of τ using either approach 1 or 3, then asymptotically under H0 , τˆ is distributed as 12 N (τ0 , 0) + 12 N (τ0 , Vλ,φ0 ,γ0 ) where (φ0 , γ0 ) are the true values of (φ, γ ) and n λ2 ∂ log f (yi , xi ; τ0 , φ0 , γ0 ) 2 −1 Vλ,φ0 ,γ0 = lim E0 , (2.6) n→∞ n ∂τ i=1 where E0 denotes the expectation at H0 , i.e. (τ, φ, γ ) = (τ0 , φ0 , γ0 ). (b) if τˆ = τˆλ,φ ∗ ,γˆ from approach 2, then τˆ is, under H0 , asymptotically distributed as 12 N (τ0 , 0) + 1 ∗ ∗ ∗ 2 N (τ0 , Vλ,φ ,γ0 ) where Vλ,φ ,γ0 is the same as V in (2.6) with φ0 replaced by φ . Likewise, if −1 ∗ τˆ = τˆλ,φ,γ ˆ ∗ , then the only modification is to have Vλ,φ0 ,γ ∗ as the asymptotic variance with γ replacing γ0 in (2.6). Lemma 1 establishes the asymptotic behavior of τˆ when approaches 1, 2 or 3 are used to estimate τ . This result is crucial for the main result stated in Proposition 1 below. Proofs for both Lemma 1 and Proposition 1 are given in the Appendix. P ROPOSITION 1 For 0 < λ 1, let Tλ (τ, φ, γ ) = λS(τ, φ, γ ) where S is defined in (2.5), and let ˆ γˆ ) be an estimator of (τ, φ, γ ) based on either approach 1, 2 or 3. Then under H0 , Tλ (τˆ , φ, ˆ γˆ ) is (τˆ , φ, 1 2 1 2 asymptotically distributed as 2 χ0 + 2 χ1 . This proposition suggests that one can refer to a common and simple distribution to assess the statistical significance for testing H0 as long as the parameters τ, φ and γ in Tλ are computed using either one of the three approaches discussed earlier. This is the same distribution as that derived in the earlier work (Liang and Rathouz, 1999) where a single marker was available for analysis and there is no need to estimate extra parameters such as allele frequencies for both trait (φ) and marker loci (γ ).
200
Y. F. C HIU ET AL. 2.3
The likelihood-ratio-based test statistics
To complete the process, one can extend the work by Lemdani and Pons (1995) to the more general likelihood function as specified in (2.3). Recall that another way of expressing the null hypothesis is H0 : α = 0. Accordingly, for 0 < ε < 1 we define ˆ γˆ ) = 2 sup Rε (τˆ , φ,
α∈[ε,1]
ˆ γˆ ) L(α, τˆ , φ, , ˆ γˆ ) L(α = 0, τ0 , φ,
(2.7)
ˆ γˆ ) correspond to either one of three approaches introduced in Section 2.2 with α replacing λ. where (τˆ , φ, P ROPOSITION 2 The LRB test statistic defined in (2.7) converges under H0 to 12 χ02 + 12 χ12 as n → ∞. A sketch of the proof for Proposition 2 is given in the Appendix. In the next section, we assess the finite sample performance, both in terms of nominal size and power, of the proposed test statistics through simulations. 3. A SIMULATION STUDY 3.1
The simulation design
Pedigree structure We assume that all the pedigrees in the sample are nuclear families with equal numbers of offspring (m = 2 or 4). The number of sampled pedigrees is set as n = 50 and 100. Marker loci We assume that either two fully polymorphic markers (i.e. M = 2) at t1 = 0, t2 = 10 cM, or four fully polymorphic markers at either t1 = 0, t2 = 10, t3 = 20, t4 = 30 cM, or t1 = 0, t2 = 3, t3 = 6, t4 = 10 cM, have been genotyped for each subject. Note that the term fully polymorphic indicates that the marker is sufficiently polymorphic that we are able to distinguish which allele is from the paternal side and which is from the maternal side. In addition, under the full polymorphism situation at the marker loci and by conditioning on founders’ (parents in this case) information, there is no need to specify the γ parameters for the markers. Trait locus We assume a qualitative trait (affected or unaffected) following an autosomal dominant inherent pattern with incomplete penetrance and phenocopies. In addition, we assume that parents’ genotypes at the trait locus are Dd ×dd (Hodge and Elston, 1994; Maclean et al., 1993), where D is the disease allele and d is the normal allele. The penetrance for the genotype ‘Dd’ at the trait locus is φ1 , the phenocopy for the genotype ‘dd’ at the trait locus is φ2 . That is, P(Y = 1|X = Dd) = φ1 and P(Y = 1|X = dd) = φ2 , where X is the genotype for the trait and Y is the phenotype for the trait. The true values of φ1 and φ2 are φ set to be 0.9 and 0.1, respectively. Map function We use Haldane’s map function (Haldane, 1919) assuming no interference between recombination events, so that the inheritance vectors (Kruglyak et al., 1996) across the genome are assumed to follow a first-order Markov chain. Heterogeneity We assume, under the alternative, that α is equal to 0.1, 0.2 and 0.5, respectively. That is, we consider the situations where either 10, 20 or 50% of the sampled pedigrees are linked to the trait locus at τ = 1, 5 or 15 cM.
Multipoint linkage detection in the presence of heterogeneity
201
Table 1. Simulated levels (in %) of two test statistics for detecting linkage based on two markers Nominal level (%) 1 0.5 0.1
φ1∗ = 0.9 T1 R0.8 0.97 1.02 0.57 0.53 0.08 0.09
φ1∗ = 0.7 T1 R0.8 1.09 1.00 0.50 0.50 0.10 0.09
φ1 = φˆ 1 T1 R0.8 1.10 1.07 0.53 0.53 0.09 0.09
φ1 = φˆ 1λ T1 R0.8 1.06 1.04 0.54 0.52 0.13 0.10
n
m
50
2
50
4
1 0.5 0.1
0.98 0.56 0.23
1.04 0.47 0.16
1.15 0.52 0.13
1.04 0.52 0.15
1.06 0.49 0.23
1.03 0.44 0.17
1.06 0.51 0.21
1.02 0.47 0.16
100
2
1 0.5 0.1
0.94 0.67 0.16
1.04 0.59 0.11
1.11 0.57 0.08
1.05 0.54 0.12
1.07 0.60 0.10
1.00 0.57 0.10
1.02 0.57 0.12
1.04 0.59 0.11
100
4
1 0.5 0.1
1.21 0.50 0.09
0.93 0.50 0.12
1.08 0.58 0.09
0.98 0.41 0.13
1.11 0.55 0.09
0.97 0.51 0.12
1.14 0.51 0.08
0.93 0.50 0.12
For each configuration, under the null hypothesis, we simulated 10 000 replicates to derive the empirical nominal levels and the critical values for nominal sizes of 0.01, 0.005 and 0.001. Another 10 000 replicates were generated under the alternative hypothesis to estimate the statistical power of various testing procedures at the nominal level 0.005. In the two-point linkage analysis, under the null hypothesis of no linkage, T1 with λ set at 1 and R0.8 with ε set at 0.8 outperform other counterparts with smaller λ or ε (Liang and Rathouz, 1999). Here, we examine the performance of T1 and R0.8 under the null hypothesis when there are two or four markers. For each of two testing procedures, we consider four approaches of estimating φ1 , with φ2 set at φ2∗ = 0.0. Approach 1 sets φ1 at 0.9, the true φ1 value; approach 2 sets φ1 at 0.7, approach 3 estimates φ1 by maximizing f (yi ; φ1 , φ2 = 0.0) with respect to φ1 and approach 4 estimates φ1 by maximizing L(λ, τ, φ1 , φ2 = 0.0) with respect to τ and φ for a given λ. 3.2
The false-positive rate study
Both T1 and R0.8 show an excellent agreement with the distribution of 12 χ02 + 12 χ12 at all three nominal levels, regardless of how φ1 was estimated; see Tables 1 and 2. In the scenario of four markers, the empirical nominal levels are slightly higher than the nominal levels when one has 50 pedigrees with four offspring. This phenomenon vanishes when the number of pedigrees with the same structure increases to 100. This indicates that the rate of convergence for the asymptotic distribution under the null hypothesis of no linkage could depend on number of markers and pedigree structures. 3.3 The statistical power study Table 3 shows the estimated statistical power for detecting linkage under heterogeneity for both T1 and R0.8 . It does not appear that the method for estimating φ1 has much impact on the statistical power for either T1 or R0.8 . Furthermore, both procedures are comparable regarding the statistical power to detect linkage. As expected, the statistical power for either procedure does depend on the number of pedigrees, pedigree size (m), the true location of trait locus (τ ) and degree of heterogeneity (α). In particular, the statistical power increases rapidly when there are four offspring in each pedigree and/or 50% of the sample
202
Y. F. C HIU ET AL. Table 2. Simulated levels (in %) of two test statistics for detecting linkage based on four markers at 0, 10, 20, and 30 cM Nominal level (%) 1 0.5 0.1
φ1∗ = 0.9 T1 R0.8 1.15 1.04 0.50 0.50 0.15 0.09
φ1∗ = 0.7 T1 R0.8 1.20 1.02 0.55 0.48 0.17 0.07
φ1 = φˆ 1 T1 R0.8 1.12 1.01 0.67 0.57 0.17 0.11
φ1 = φˆ 1λ T1 R0.8 1.21 1.06 0.72 0.51 0.17 0.10
n
m
50
2
50
4
1 0.5 0.1
1.08 0.74 0.15
1.19 0.68 0.14
1.28 0.63 0.12
1.16 0.71 0.14
1.22 0.67 0.15
1.20 0.69 0.15
1.08 0.73 0.15
1.19 0.68 0.14
100
2
1 0.5 0.1
0.85 0.57 0.19
0.90 0.51 0.11
0.96 0.53 0.11
0.92 0.48 0.08
0.99 0.52 0.12
0.92 0.47 0.13
0.85 0.57 0.19
0.90 0.51 0.11
100
4
1 0.5 0.1
1.22 0.43 0.13
1.06 0.51 0.11
0.86 0.58 0.14
1.15 0.54 0.11
1.08 0.46 0.12
1.07 0.50 0.11
1.22 0.43 0.13
1.06 0.51 0.11
pedigrees are linked to the trait locus. It is also interesting to note that the statistical power to detect linkage is higher when one has 50 pedigrees with four offspring per pedigree compared to the design of 100 pedigrees with two offspring per pedigree. This is consistent with the conventional observation that larger family structures contain more information about genetic mechanisms. In the four-marker scenario, we assume that the heterogeneity (α) is 50% and the trait locus was in the middle of the second and the third markers. With 50 two-offspring families, the statistical power decreases as we increase the number of markers from two to four. The reduction is around 8–20% when there are 50 two-offspring families (see Table 4), the loss of statistical power disappears completely when the number of sibs is 200. The loss of statistical power is the evidence that with fully polymorphic markers, adding more markers outside the flanking markers does not provide more information about the trait. Instead, the ‘noise’ caused by the added markers decreases the statistical power. In contrast, having extra makers within the original flanking markers does increase the statistical power (see Table 5); the increment is about 8–15% with 50 two-offspring families when α is 50%. Nevertheless, there is not much impact with the additional markers in terms of statistical power when α is 10 or 20%. In the next section, we use an example with non-fully polymorphic markers to demonstrate the statistical power gained from additional markers. In summary, both T1 and R0.8 perform very well regarding the nominal size and statistical power under the admixture model (2.2). One advantage of using T1 is that it can be easily implemented within some commonly used programs due to its simple expression in terms of the LOD scores for each pedigree (Liang and Rathouz, 1999). Indeed, we have implemented T1 in the GENEHUNTER 1.1 (Kruglyak et al., 1996) program which is utilized in the next section. 4. A N EXAMPLE We now demonstrate the use of the proposed score-based test statistic T1 , by applying it to a family study on asthma, known to be a complex disease with a large but poorly defined genetic component. In the family studies of asthma conducted by Dr Kathleen C. Barnes, Division of Clinical Immunology, Department of Medicine, the Johns Hopkins University, 33 asthmatic probands were selected from six
Multipoint linkage detection in the presence of heterogeneity
203
Table 3. Simulated statistical power (in %) T1 and R0.8 for detecting linkage based on two markers. The nominal level is 0.005. ‘–’ indicates the configurations whose simulations were not performed as they are expected to have 1.00 simulated power. Upper entry: α = 0.1; middle entry: α = 0.2; lower entry: α = 0.5 φ1∗ = 0.9 T1 R0.8 3 2 10 8 82 77
φ1∗ = 0.7 T1 R0.8 3 2 11 9 82 77
φ1 = φˆ 1 T1 R0.8 3 3 11 10 83 81
φ1 = φˆ 1λ T1 R0.8 3 4 11 17 83 96
n
m
τ
50
2
1
50
2
5
3 12 90
2 10 86
3 13 89
3 10 84
3 13 90
3 12 89
3 13 90
5 18 95
50
4
1
6 29 100
3 18 99
5 24 100
4 21 99
6 29 100
5 27 100
6 29 100
11 56 100
50
4
5
7 47 100
4 24 100
6 31 100
5 27 100
7 35 100
7 34 100
7 36 100
11 55 100
100
2
1
5 25 99
3 18 99
5 22 98
4 20 98
5 23 99
5 23 99
5 25 99
8 41 100
100
2
5
6 32 100
3 23 100
5 27 100
4 25 100
5 30 100
5 28 100
5 31 100
8 40 100
100
4
1
26 80 –
9 51 –
14 62 –
11 54 –
25 78 –
11 56 –
25 79 –
34 94 –
100
4
5
31 87 –
11 61 –
16 72 –
13 64 –
29 86 –
13 66 –
29 87 –
32 93 –
Table 4. Simulated statistical power (in %) T1 and R0.8 for detecting linkage when α = 0.50 based on four markers at 0, 10, 20, and 30 cM. The nominal level is 0.005 n 50 50 100 100
m 2 4 2 4
τ 15 15 15 15
φ1∗ = 0.9 T1 R0.8 68 69 99 96 99 95 100 100
φ1∗ = 0.7 T1 R0.8 61 68 97 95 95 93 100 100
φ1 = φˆ 1 T1 R0.8 69 70 99 96 99 95 100 100
φ1 = φˆ 1λ T1 R0.8 68 70 99 96 99 95 100 100
204
Y. F. C HIU ET AL. Table 5. Simulated statistical power (in %) T1 and R0.8 for detecting linkage when τ = 5 based on four markers at 0, 3, 6, and 10 cM. The nominal level is 0.005. ‘–’ indicates the configurations whose simulations were not performed as they are expected to have 1.00 simulated power. Upper entry: α = 0.1; middle entry: α = 0.2; lower entry: α = 0.5 φ1∗ = 0.9 T1 R0.8 3 2 11 10 97 89
φ1∗ = 0.7 T1 R0.8 2 2 10 9 96 86
φ1 = φˆ 1 T1 R0.8 3 2 12 10 97 89
φ1 = φˆ 1λ T1 R0.8 3 2 12 10 97 89
n
m
τ
50
2
5
50
4
5
7 34 100
5 27 100
5 26 100
4 24 100
6 32 100
5 27 100
7 34 100
5 27 100
100
2
5
5 29 100
4 26 100
5 25 100
4 23 100
5 28 100
4 26 100
5 29 100
4 26 100
100
4
5
26 86 –
12 65 –
11 65 –
11 61 –
26 85 –
12 66 –
26 86 –
12 65 –
polyclinics and two private clinics from the island of Barbados and from the Accident and Emergency Department of Queen Elizabeth Hospital. Families were extended by recruiting all available parents, siblings, and relevant family members of the 33 probands into the study. This gave a total of 527 AfroCaribbean subjects and one Caucasian subject (the father of a proband), including 154 asthmatics (50% male) and their family members (Barnes et al., 1999). The phenotype is binary as defined in Barnes et al. (1996). Twenty-two markers on chromosome 12q (D12S390, D12S398, D12S335, IFNgCA, D12S313, D12S1052, D12S326, D12S1598, D12S1667, D12S379, D12S1719, D12S1678, D12S1064, D12S351, D12S311, D12S95, D12S58, D12S346, PAH, D12S78, D12S338, D12S360) were analysed for linkage under heterogeneity in this Barbados population. Various segregation analyses on asthma studies suggested different genetic models of asthma (Duffy, 1997) including both autosomal recessive and dominant models. We considered two different genetic models to compute the LOD scores. The first was an autosomal recessive model based on the segregation analysis performed by Mrazek et al. (1989) with incomplete penetrance (φ1∗ = 0.62) and phenocopies (φ2∗ = 0.0085 for heterozygote and φ3∗ = 0.0015 for homozygote), and with the frequency of the disease allele (q D ) set at 0.22. The other was an autosomal dominant model with φ1∗ = 0.90, φ2∗ = 0.70, φ3∗ = 0.001 and q D = 0.24. The allele frequency q D = 0.24 was reported by Duffy (1997), whilst other parameters φ1∗ , φ2∗ and φ3∗ were arbitrarily chosen to have a prevalence value of 0.30 (Barnes et al., 1996). The order of the markers was supposed to be known and was chosen from other studies (Krauter et al., 1995). The positions of the markers were the maximum likelihood estimates obtained from the marginal distribution of the markers, calculated by the genetic software CRIMAP (Green et al., 1990), while the estimates of the markers’ alleles were the moment estimates obtained from another genetic package, GCONVERT (Duffy, 1995). Under the assumption of heterogeneity and the autosomal dominant model, results from the proposed score-based test show evidence of linkage (T1 = 22.59 with p-value < 0.0001) in this region of interest on chromosome 12. As a side remark, Figure 1 shows the estimates of α, the proportion of linked families,
205
-4 -6
LOD Score
-2
0
Multipoint linkage detection in the presence of heterogeneity
-8
LOD - recessive LOD - dominant
0
20
40
60
80
Chromosome position (cM) Fig. 1. The estimates of the proportion α based on different genetic models in the asthma study.
across this chromosomal region. They range from 0.0001 to 0.34. Under the assumption of the autosomal recessive model, the proposed score-based test statistic, T1 , is computed as 7.18 ( p-value = 0.0037). The range of α, ˆ also shown in Figure 1, is similar (0.006–0.42). It is interesting to note that when heterogeneity is ignored, i.e. assuming α = 0, either genetic model leads to negative LOD scores in the framed region (see Figure 2) suggesting wrongly that this region should be excluded. In summary, fitting the admixture model (2.3) to the data reveals evidence of heterogeneity in these asthmatic pedigrees from Barbados. Although sensitive to the specific genetic models, the use of the proposed test statistic T1 , does suggest evidence of linkage of an asthma susceptibility gene in this targeted region of chromosome 12. In addition, most of the markers are not fully polymorphic in practice. Having more than two markers should help to estimate correct haplotypes, and the statistical power is therefore expected to be higher than that based on two markers. As shown in this example (see Table 6), under the dominant model, when we used only two markers (the first and last one), the p-value from this score-based test was 0.19; when seven markers were used, the p-value dropped to 0.06; when all the 22 markers were used, the p-value further reduced to 0.000 001. This example illustrates that, with non-fully polymorphic markers, more markers do help to improve the statistical power based on the score-based test statistic. 5. D ISCUSSION In this paper, we have extended the work by Lemdani and Pons (1995) and Liang and Rathouz (1999) to more practical situations with the aim of detecting linkage while allowing for linkage heterogeneity. We have suggested several different approaches to estimate parameters relevant to genetic mechanisms and frequencies of trait locus and marker loci. Our results, both theoretical and empirical, suggest that choices of estimators for those parameters have little bearing on the performance of the proposed test statistics. On the other hand, our simulation study suggests that the score-based test statistic T1 performs
206
Y. F. C HIU ET AL.
0.3 0.2 0.1 0.0
Estimate of Proportion
0.4
recessive dominant
0
20
40
60
80
Chromosome position (cM) Fig. 2. LOD score under the assumption of homogeneity.
Table 6. Number of markers and statistical power in the example Number of markers 2 7 22
Marker 1, 22 1, 2, 3, 19, 20, 21, 22 1–22
Test statistic 0.797 2.32 22.59
P-value 0.185 9 0.064 0 0.000 001
very well both in terms of nominal size and power. This, along with its easy implementation to commonly used computer packages such as GENEHUNTER, prompts us to recommend that it be used whenever linkage heterogeneity is suspected. Indeed, our asthma example supports the conventional wisdom that the statistical power for detecting linkage may be impaired if one does not account for heterogeneity in the analysis. ACKNOWLEDGEMENTS The authors would like to thank Dr Barnes for the access to the data used in Section 4. This work is supported in part by the grant GM 49909 of the National Institutes of Health.
APPENDIX
Proof of Lemma 1. Define θ = h(τ ) = 0.5(1 − e−0.02τ ), the Haldane map function (Haldane, 1919). Since h (τ ) = 0.01e−0.02τ > 0, h(·) is a strictly monotonic function of τ , with range 0 θ 12 .
Multipoint linkage detection in the presence of heterogeneity
207
The log likelihood function after plugging in λ, the specified value of α, is λ (θ, γ , φ) = =
n i=1 n
log(gi (yi , xi ; λ, θ, γ , φ)) log(λ f i (yi , xi ; θ, γ , φ) + (1 − λ) f i (yi , xi ; θ0 , γ , φ)).
i=1
√ For approach 1, suppose γˆ1× p and φˆ 1×q are the vectors of n-consistent and asymptotically normal estimators for γ1× p and φ1×q , respectively, from sources other than the likelihood function (2.4). Let δ = (γ , φ) be the 1 × ( p + q) vector of all the marker-related and trait-related parameters. Under H0 , n ∂ log f i (yi , xi ; θ0 , γ0 , φ0 )/∂θ √ λ 0 I11 12 D n n i=1 → M V N , , t 0 12 t (δˆ − δ0 ) where θ0 , γ0 and φ0 are the true values of θ, γ and φ, respectively. Then, according to Gong and Samaniego (1981) and Self and Liang (1987), under some regularity conditions on f i (·, θ, γ , φ), the ‘pseudo’ estimator for θ also follows an asymptotic nomal distribution, i.e. under H0 , √ if θ0 is an interior point of [0, 12 ] N (0, Vλ,φ0 ,γ0 ) D ˆ n(θλ,γˆ ,φˆ − θ0 ) → 1 1 1 2 N (θ0 , 0) + 2 N (θ0 , Vλ,φ0 ,γ0 ) if θ0 is on the boundary of [0, 2 ] where
−1 −1 −1 t t Vλ,φ0 ,γ0 = I11 (I11 − 2I12 12 + I12 I12 )I11 = I11 ,
since
I12 = lim E 0 n→∞
For approach 3, let
∂ 2 λ,δ (θ )/∂θ ∂δ n
= 0.
θˆλ γˆ t = arg max λ (θ, γ , φ) . λ θ,γ ,φ φˆ λt
According to Huber (1967), under some regularity conditions on f i (·; θ, γ , φ), p
(θˆλ , γˆλ , φˆ λ ) → (θ ∗ , γ ∗ , φ ∗ ), n 1 which maximizes lim E{log gi (yi , xi ; λ, θ, γ , φ); gi (yi , xi ; α0 , θ0 , γ0 , φ0 )}. n→∞ n i=1 However, under H0 , gi (yi , xi ; α, θ0 , γ0 , φ0 ) = λ f i (yi , xi ; θ0 , γ0 , φ0 ) + (1 − λ) f i (yi , xi ; θ0 , γ0 , φ0 ) = f i (yi , xi ; θ0 , γ0 , φ0 ) = gi (yi , xi ; λ, θ0 , γ0 , φ0 )
208
Y. F. C HIU ET AL.
i.e. under H0 , the true probability density (mass) function is a member of the family of the probability density (mass) functions of gi (yi , xi ; λ, θ0 , γ0 , φ0 ). Hence, (θ ∗ , γ ∗ , φ ∗ ) = θ0 , γ0 , φ0 ). Results follow by the same arguments as in approach 1. For approach 2, let λ,γˆ ,φ ∗ (θ√ ) be the log likelihood function of θ after plugging in the parameters specified: λ, γˆ , φ ∗ , where γˆ is a n-consistent estimator of γ , λ,γˆ ,φ ∗ (θ ) =
n
log(gi (yi , xi ; λ, θ, γˆ , φ ∗ ))
i=1
=
n
log(λ f i (yi , xi ; θ, γˆ , φ ∗ ) + (1 − λ) f i (yi , xi ; θ0 , γˆ , φ ∗ )).
(A1)
i=1
Let θˆλ,γˆ ,φ ∗ be the maximizer of λ,γˆ ,φ ∗ (θ ), i.e. θˆλ,γˆ ,φ ∗ = arg(max0θ 1 λ,γˆ ,φ ∗ (θ )). According to 2 p Huber (1967), under some regularity conditions on f i (·; θ, γ , φ), θˆλ,γ0 ,φ ∗ → θ ∗ , which maximizes n 1 E{log gi (yi , xi ; λ, θ, γ0 , φ ∗ ); gi (yi , xi ; α0 , θ0 , γ0 , φ0 )}. n→∞ n i=1
lim
Apply the inequality log x < x − 1, ∀x > 0, x = 1. Then, ∀ i = 1, . . . , n, E{log[gi (yi , xi ; λ, θ, γ0 , φ ∗ )]; gi (yi , xi ; α0 , θ0 , γ0 , φ0 )} − E{log[gi (yi , xi ; λ, θ0 , γ0 , φ ∗ )]; gi (yi , xi ; α0 , θ0 , γ0 , φ0 )} gi (yi , xi ; λ, θ, γ0 , φ ∗ ) <E (y , x ; α , θ , γ , φ ) − 1. ; g i 0 0 0 0 i i gi (yi , xi ; λ, θ0 , γ0 , φ ∗ ) However, under H0 : θ = θ0 = 12 and assuming that there is no linkage disequilibrium and no epistasis, gi (xi |yi ; λ, θ, γ0 , φ ∗ ) · gi (yi ; φ ∗ ) RHS = · gi (yi ; φ0 ) · gi (xi ; γ0 ) − 1 gi (xi ; γ0 ) · gi (yi ; φ ∗ ) yi xi = gi (yi ; φ0 ) {gi (xi |yi ; λ, θ, γ0 , φ ∗ )} − 1 yi
xi
= 0. Hence, under H0 , E{log(gi (yi , xi ; λ, θ, γ0 , φ ∗ )); gi (yi , xi ; α0 , θ0 , γ0 , φ0 )} is uniquely maximized at θ = θ0 , ∀ i = 1, . . . , n. The results follow by the same arguments as stated in the other approaches. Note that this approach is applicable only when θ0 is equal to 12 . Proof of Proposition 1. Following similar arguments in Gong and Samaniego (1981) and Liang and Rathouz (1999), applying the Taylor expansion to the score function considered based on approach 1, ∂ S (θ ) γˆ ,φˆ 0 Sγˆ ,φˆ (θˆλ,γˆ ,φˆ ) = Sγˆ ,φˆ (θ0 ) + (θˆλ,γˆ ,φˆ − θ0 ) + o p (1) ∂θ n ˆ √ −1 ∂ log f i (yi , xi ; θ0 , γˆ , φ) 2 =0+ n · n(θˆλ,γˆ ,φˆ − θ0 ) + o p (1) ∂θ i=1 = O p (1)
Multipoint linkage detection in the presence of heterogeneity and Tλ∗
D
= λSγˆ ,φˆ (θˆλ,γˆ ,φˆ ) →
χ12 1 2 1 2 2 χ0 + 2 χ1
209
if θ0 is an interior point of [0, 12 ] if θ0 is on the boundary of [0, 12 ].
The same arguments apply to approaches 2 and 3, except that the score functions must be replaced by Sγˆ ,φ ∗ (θˆλ,γˆ ,φ ∗ ) or Sγ ∗ ,φˆ (θˆλ,γ ∗ ,φˆ ) in approach 2 and by S(θˆλ , γˆλ , φˆ λ ) in approach 3. √ Proof of Proposition 2. Let δ = (γ , φ), and under H0 let δˆ be the n-consistent estimate of δ. Under ˆ ∀ fixed α ∈ [ε, 1], ε > 0, the LRB test statistic is given by H A , let the estimate of δ be δ(α). ˆ L α (θˆ (α), α, δ(α)) ˆ = 2 log ˆ ˆ ˆ R(θˆ , α, δ) α, δ(α)) − 0 (θ0 , δ)}, = 2{α (θ(α), ˆ L 0 (θ0 , δ) where θ is defined in Lemma 1. ˆ α, δ) ˆ at θ0 , Applying a Taylor series expansion of R(θ, 2 ˆ ˆ ∂ α (θ0 , δ(α)) ∂α (θ0 , δ(α)) 1 ˆ = 2 (θˆ (α) − θ0 ) ˆ R(θˆ , α, δ) + o p (1) + (θ(α) − θ0 )2 ∂θ 2 ∂θ 2 −1 ˆ ˆ ˆ ∂α (θ0 , δ(α)) ∂α (θ0 , δ(α)) ∂ 2 α (θ0 , δ(α)) = − + o p (1) ∂θ ∂θ ∂θ 2 2 −1 ∂ (θ , δ(α))/∂θ ˆ ˆ ∂α (θ0 , δ(α))/∂θ ∂ 2 α (θ0 , δ(α))/∂θ α 0 ˆ = − + o p (1). √ √ n n n Note that ˆ ˆ ∀α ∈ [ε, 1]; (1) under H0 , α (θ0 , α, δ(α)) = 0 (θ0 , δ), p ˆ → δ0 as n → ∞, ∀α ∈ [ε, 1], where δ0 is the (2) as shown in the previous section, under H0 , δ(α) true value of δ; (3) under H0 , o p (1) holds uniformly in α ∈ [ε, 1]. According to Liang and Self (1996), under H0 and as n → ∞, ˆ ∂α (θ0 , δ(α))/∂θ D → N (0, Iα (θ0 , δ0 )), √ n and
where
2 ˆ −∂ 2 α (θ0 , δ(α))/∂θ p → Iα (θ0 , δ0 ), n
n α2 ∂ log f i (yi , xi ; θ0 , δ0 ) 2 E0 . n→∞ n ∂θ i=1
Iα = lim
Therefore, from the results above and in Self and Liang (1987), if θ0 is an interior point of [0, 12 ] χ2 D ˆ ˆ Rε = 2 sup R(θ , α, δ) → 1 1 2 1 2 if θ0 is on the boundary of [0, 12 ]. α∈[ε,1] 2 χ0 + 2 χ1 The same arguments apply to approaches 1 and 3, except that δ0 must be replaced by δ ∗ in approach 2. ˆ Note that δ ∗ = (γ0 , φ ∗ ) when δˆ = (γˆ , φ ∗ ) and δ ∗ = (γ ∗ , φ0 ) when δˆ = (γ ∗ , φ).
210
Y. F. C HIU ET AL. R EFERENCES
BARNES , K. C., N EELY , J. D. AND D UFFY , D. L. et al. (1996). Linkage of asthma and total serum IgE concentration to markers on chromosome 12q: evidence from Afro-Caribbean and Caucasian populations. Genomics 37, 41–50. BARNES , K. C., F REIDHOFF , L. R., N ICKEL , R., C HIU , Y.-F., J UO , S.-H., H IZAWA , N., NAIDU , R. P., E HRLICH , E., D UFFY , D. L., S HOU , C., L EVETT , P. N., M ARSH , D. G. AND B EATY , T. H. (1999). Dense mapping of chromosome 12q13.12-q23.3 and linkage to asthma and atopy. Journal of Allergy and Clinical Immunology 104, 485–491. DAVIES , R. B. (1977). Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 64, 247–254. DAVIES , R. B. (1987). Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 74, 33–43. D UFFY , D. L. (1995). GCONVERT. ‘www.qimr.edu.au/davidD/sibpair.html#Gconvert’. Queensland Institute of Medical Research, Australia. D UFFY , D. L. (1997). The genetic epidemiology of asthma. Epidemiologic Reviews 19, 129–143. E LSTON , R. C. AND S TEWART , J. (1971). A general model for the genetic analysis of pedigree data. Human Heredity 21, 523–542. E LSTON , R. C. (1998). Linkage and association. Genetic Epidemiology 15, 565–576. FARAWAY , J. J. (1993). Distribution of the admixture test for the detection of linkage under heterogeneity. Genetic Epidemiology 10, 75–83. G REEN , P., FALLS , K. AND C ROOKS , S. (1990). CRIMAP. St Louis: Washington University. G ONG , G. AND S AMANIEGO , F. J. (1981). Pseudo maximum likelihood estimation: theory and applications. Annals of Statistics 9, 861–869. H ALDANE , J. B. S. (1919). The combination of linkage values and the calculation of distances between the loci of linked factors. Journal of Genetics 8, 229–309. H ODGE , S. E. AND E LSTON , R. C. (1994). LODS, WRODS, and MODS: the interpretation of LOD scores calculated under different models. Genetic Epidemiology 11, 329–342. H UBER , P. (1967). The behaviour of maximum likelihood estimates under nonstandard conditions. In Proceedings of the Fifth Berkeley Symposium in Mathematical Statistics and Probability, Berkeley: University of California Press. K RAUTER , K., M ONTGOMERY, K., YOON , S. J., L E B LANC -S TRACESKI , J., R ENAULT , B., M ARONDEL , I., H ERDMAN , V., C UPELLI , L., BANKS , A. AND L IEMAN , J. (1995). A second gen eration YAC contig map of human chromosome 12. Nature 377, 321–333. K RUGLYAK , L., DALY , M. J., R EEVE -DALY , M. P. AND L ANDER , E. S. (1996). Parametric and nonparametric linkage analysis, a unified multipoint approach. American Journal of Human Genetics 58, 1347–1363. L EMDANI , M. AND P ONS , O. (1995). Tests for genetic linkage and homogeneity. Biometrics 51, 1033–1041. L ANDER , E. S. AND G REEN , P. (1987). Construction of multilocus genetic maps in humans. Proceedings of the National Academy of Sciences USA 84, 2363–2367. L ANDER , E. S. AND S CHORK , N. J. (1994). Genetic dissection of complex traits. Science 265, 2037–2048. L IANG , K.-Y. AND R ATHOUZ , P. J. (1999). Hypothesis testing under mixture models: application to genetic linkage analysis. Biometrics 55, 65–74. L IANG , K.-Y., R ATHOUZ , P. J. AND B EATY , T. H. (1996). Determining linkage and mode of inheritance: mod scores and other methods. Genetic Epidemiology 13, 575–593.
Multipoint linkage detection in the presence of heterogeneity
211
L IANG , K.-Y. AND S ELF , S. G. (1996). On the asymptotic behavior of the pseudo-likelihood ratio test statistic. Journal of Royal Statistical Society Series B 59, 785–796. M ACLEAN , C. J., B ISHIP , D. T., S HERMAN , S. L. AND D IEHL , S. R. (1993). Distribution of lod scores under uncertain mode of inheritance. American Journal of Human Genetics 52, 354–361. M RAZEK , D., PAULS , D., A NDERSON , I., B ROWER , A. AND K LINNERT , M. (1989). Segregation analysis of 145 asthmatic families [abstract]. American Journal of Human Genetics 45 (suppl.), A245. OTT , J. (1991). Analysis of Human Genetic Linkage. Baltimore: The Johns Hopkins University Press. R ISCH , N. (1990). Linkage strategies for genetically complex traits. I. Multilocus models. American Journal of Human Genetics 46, 222–228. S ELF , S. G. AND L IANG , K.-Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of American Statistical Association 82, 605–610. S MITH , C. A. B. (1963). Testing for heterogeneity or recombination fraction values in human genetics. Annals of Human Genetics 27, 175–182. W HITTEMORE , A. S. (1996). Genome scanning for linkage: an overview. American Journal of Human Genetics 59, 704–716. W ILLIAMSON , J. A. AND A MOS , C. (1995). Guess LOD approach: sufficient conditions for robustness. Genetic Epidemiology 12, 163–176. [Received 1 March, 2000; revised 9 February, 2001; accepted for publication 9 April, 2001]