Modelling hierarchical clustered censored data with the hierarchical Kendall copula

This article proposes a new model for right‐censored survival data with multi‐level clustering based on the hierarchical Kendall copula model of Brechmann (2014) with Archimedean clusters. This model accommodates clusters of unequal size and multiple clustering levels, without imposing any structural conditions on the parameters or on the copulas used at various levels of the hierarchy. A step‐wise estimation procedure is proposed and shown to yield consistent and asymptotically Gaussian estimates under mild regularity conditions. The model fitting is based on multiple imputation, given that the censoring rate increases with the level of the hierarchy. To check the model assumption of Archimedean dependence, a goodness‐of test is developed. The finite‐sample performance of the proposed estimators and of the goodness‐of‐fit test is investigated through simulations. The new model is applied to data from the study of chronic granulomatous disease. The Canadian Journal of Statistics 47: 182–203; 2019 © 2019 Statistical Society of Canada


INTRODUCTION
Multi-level hierarchical clustered data arise in practice whenever the sampling scheme involves a hierarchical structure. For example, consider the study of chronic granulomatous disease (CGD) This article was published online on 25 January 2019. An error was subsequently identified and the authors have made minor additional corrections to the text to ensure its validity. This notice is included in the online and print versions to indicate that both have been corrected on 1 February 2019. Additional Supporting Information may be found in the online version of this article at the publisher's website. * Author to whom correspondence may be addressed. E-mail: johanna.neslehova@mcgill.ca from a multi-center placebo-controlled randomized trial which aimed to investigate the effect of gamma interferon on the disease (Fleming & Harrington, 1991, Section 4.4). There are two levels of clustering; the upper-level and lower-level cluster is formed, respectively, by patients from the same hospital and by the same patient's infection records.
Hierarchical clustered data are typically analyzed using frailty or copula models. In the frailty approach, the hierarchical dependence structure is accounted for through nested random effect models. For example, Sastry (1997) considers a proportional hazard model with two nested independent Gamma frailty variables, while a similar model of Yau (2001) relies on independent normal frailties. Rondeau, Filleul & Joly (2006) also use a model similar to that of Sastry (1997) but handle the baseline estimation nonparametrically through spline-smoothing. Shih & Lu (2009) consider a nested random effect proportional hazard model with an unspecified baseline hazard function, whereas Ha & Lee (2005) assume an accelerated failure time model with nested and independent frailty variables.
In the copula-based strategy, the dependence structure is modelled explicitly. At the heart of this approach is the decomposition of Sklar (1959), which states that the joint survival function of any random vector (X 1 , … , X d ) can be expressed in terms of its univariate survival functions F 1 , … ,F d , and a copula C, that is, a d-variate distribution function with standard uniform univariate margins. Specifically, for all x 1 , … , x d ∈ ℝ, Furthermore, for any copula C and marginsF 1 , … ,F d the expression on the right-hand side of (1) is a valid d-variate survival function with marginal survival functionsF 1 , … ,F d . Thus, a copula model consists of assuming that in Sklar's representation the copula C and the marginsF 1 , … ,F d belong to suitably chosen parametric (or semiparametric) families of copulas and univariate survival functions, respectively. The key advantage of this approach is that the dependence structure can be modelled, fitted, and validated independently of the margins (Nelsen, 2006;Joe, 2015;Durante & Sempi, 2016;Mai & Scherer, 2017). Although copula models have gained substantial popularity in various fields, their use in the analysis of hierarchical clustered data is less frequent. This is likely because more sophisticated copula models need to be constructed that reflect the hierarchical dependence specific to clustered data. In addition, inference is more complex when the data are incomplete. Shih & Lu (2007) applied hierarchical Archimedean copulas (HAC) of Joe (1993) with Clayton generators to two-level clustered survival data, and proposed a three-stage estimation procedure that can handle right censoring. The HAC model was also used by Andersen (2004) to model familial data, where the siblings and parents are assumed to have an exchangeable dependence structure. Familial data were also analyzed using Gaussian copula models by Zhao & Joe (2005) and Othus & Li (2010). Very recently, Prenen, Braekers & Duchateau (2017) studied a model for one-level clustered data with Archimedean dependence and variable cluster sizes. The copula approach also underlines the work of Romdhani, Lakhal-Chaieb & Rivest (2014), who propose the exchangeable Kendall's tau as a nonparametric measure of intracluster association.
In this article, we propose a new copula model for hierarchical clustered data using the hierarchical Kendall copula construction of Brechmann (2014) with Archimedean clusters. As in the HAC model, we use exchangeable Archimedean copulas to describe within-cluster dependence at the first level. However, to model dependence between clusters at subsequent levels of the hierarchy, a different strategy is adopted. Particular variables are chosen as cluster representatives and their dependence is captured by an Archimedean copula model. This construction has several advantages over the HAC model. First, clusters of unequal size are easily accommodated. Second, no conditions on the within-cluster dependence are needed, and no restrictions on the parameter space are required. Moreover, the model is easy to simulate from and is fitted sequentially, level by level, when the data are complete. Unlike Brechmann (2014), we allow the data to be right-censored, which makes inference more challenging as the amount of censoring of the cluster representatives increases with the level of the hierarchy. To overcome this problem, we obtain complete observations through an imputation algorithm, and base estimation and model validation on the imputed sample.
The article is organized as follows. In Section 2 we recall the hierarchical Kendall copula construction of Brechmann (2014) with Archimedean clusters. Using this approach, we then propose a new model for hierarchical clustered data in Section 3. Parameter estimation for right-censored observations and the accompanying imputation algorithm are discussed in Section 4, where the asymptotic distribution of the proposed estimators is also derived. Section 5 contains a new goodness-of-fit test that can be used to validate whether the proposed hierarchical Kendall dependence structure with Archimedean clusters is indeed well-suited. The performance of the estimators and the goodness-of-fit procedure is investigated in Section 6 via simulations. In the same section, the new model is applied to the two-level clustered CGD data. Section 7 contains concluding remarks. The Supplementary Material contains details about the marginal estimation, explicit formulas when the Archimedean generators are Clayton, additional simulations, outstanding proofs, R source code and implementation instructions.

HIERARCHICAL KENDALL COPULAS WITH ARCHIMEDEAN CLUSTERS
Let X = (X 1 , … , X d ) be a random vector with univariate marginal survival functionsF 1 , … ,F d . As stated in the Introduction, the joint survival function of X can be decomposed as indicated in (1); the copula C appearing therein is unique if and only ifF 1 , … ,F d are continuous. Continuity of the margins is assumed throughout this article; discontinuity of the marginals causes C to be unidentifiable, thereby invalidating copula inference procedures developed for continuous data; for discussion, see Genest & Nešlehová (2007), for example.
The class of Archimedean copulas plays a special role in the model proposed here. A d-variate copula is called Archimedean if it can be expressed, for all u 1 , … , u d ∈ [0, 1], as The Archimedean generator : [0, ∞) → [0, 1] must be nondecreasing and such that (0) = 1 and (x) → 0 as x → ∞; −1 (0) = inf{x ≥ 0 : (x) = 0} by convention. Furthermore C ,d is a bona fide copula if and only if is d-monotone (Malov, 2001;McNeil & Nešlehová, 2009). This property means that for all k ∈ {1, … , d − 2}, the kth derivative (k) of exists on (0, ∞) and satisfies (−1) k (k) ≥ 0, and further that (−1) (d−2) (d−2) is nonincreasing and convex. According to McNeil & Nešlehová (2009), the density c ,d of C ,d exists if and only if (d−1) exists and is absolutely continuous on (0, ∞). Then for all u 1 , … , u d ∈ (0, 1) In the context of clustered data, Archimedean copulas are particularly well-suited for modelling intracluster dependence. The latter is often assumed to be exchangeable, meaning that for any permutation of {1, … , d}, the survival copulas of X and (X (1) , … , X (d) ) are the same. Because C ,d is invariant with respect to any permutation of its arguments, it indeed induces exchangeable dependence. Another appealing property of Archimedean copulas is that when is a Laplace transform of some positive random variable, C ,d is the dependence structure of a multiplicative frailty model (Marshall & Olkin, 1988).
For the hierarchical Kendall copula construction, it is also important to recall that for any d-monotone Archimedean generator , C ,d is the survival copula of R × (S 1 , … , S d ), where the radial random variable R is strictly positive and independent of the random vector (S 1 , … , S d ), which is uniformly distributed on the unit simplex  d = {x ∈ [0, 1] d ∶ x 1 + · · · + x d = 1} (McNeil & Nešlehová, 2009;Genest, Nešlehová & Ziegel, 2011). The survival function F ,d of R is uniquely determined by , that is, for any r > 0, where (d−1) + denotes the right-hand derivative of (d−2) .
Definition 1. The hierarchical Kendall copula with Archimedean clusters is the distribution function of (U 1 , … , U d ) with the following properties: , … , n} are mutually independent, and for each i ∈ {1, … , n}, the distribution of (U , ∈  i ) given R 1 , … , R n depends only on R i . (iii) The survival copula of R 1 , … , R n is Archimedean with generator 0 .
Notice that the key idea behind the construction in Definition 1 is that the intercluster dependence is accounted for through the radial variables R 1 , … , R n , which can be thought of as cluster representatives.
Remark 1. For each i ∈ {1, … , n}, the survival function of the variable R i defined in part (i) of Definition 1 is as specified in (3) with replaced by i and d by m i . Consequently, the distribution of Thus the construction in Definition 1 is indeed a hierarchical Kendall copula as defined by Brechmann (2014). In his article the variables R 1 , … , R n in part (i) are replaced by W 1 , … , W n , and part (iii) is replaced by (iii*) which states that the copula of W 1 , … , W n is Archimedean with generator 0 . Note also that although Archimedean copulas are considered here, the construction of Brechmann (2014) allows the copulas in parts (i) and (iii*) to be arbitrary.
The density of the hierarchical Kendall copula with Archimedean clusters exists provided that the copulas C 0 ,n and C i ,m i for i ∈ {1, … , n} have densities; these densities are of the form indicated in (2). From Theorem 2.8 in Brechmann (2014), the density c of the hierarchical Kendall copula is as follows. For arbitrary u 1 , … , u d ∈ (0, 1) and each i ∈ {1, … , n}, let Definition 1 can be extended to multiple levels of hierarchy by assuming that the survival copula of R 1 , … , R n in part (iii) is again a hierarchical Kendall copula with Archimedean clusters; see also Remark 2.7 in Brechmann (2014).

Description of the Model
For simplicity, we restrict our attention to two-level clustered data, noting that the model could be extended to the multi-level case. Suppose there are n clusters at the upper level, for example, hospitals, and that for each i ∈ {1, … , n}, cluster i is divided into m i sub-clusters containing m ij subjects each, for example, patients' records from hospital i. The sizes of the clusters at these two levels need not be the same.
The failure time and the covariate vector of subject in sub-cluster j of upper-level cluster i are denoted by T ij and Z ij , respectively. The covariate vector can depend on time but we will write Z ij instead of Z ij (t) if no confusion can arise. We assume that for each i ∈ {1, … , n}, j ∈ {1, … , m i }, and ∈ {1, … , m ij }, the length of the covariate vector Z ij is the same and is equal to p. The vectors T ij = (T ij1 , … , T ijm ij ) defined for all i ∈ {1, … , n}, j ∈ {1, … , m i } form the clusters at the lower level, while the clusters at the upper level are given by T i = (T i1 , … , T im i ) for all i ∈ {1, … , n}. Also let T = (T 1 , … , T n ) be the entire vector of failure times.
As in the CGD clinical study, clustered survival times are typically right-censored. The observed failure times are thus ij denotes a censoring variable that is assumed independent of T ij conditional on Z ij . We also observe the censoring In a conditional copula model for T given the covariates, the univariate survival functions are specified first. For each t > 0, i ∈ {1, … , n}, j ∈ {1, … , m i }, and ∈ {1, … , m ij }, let F ij (t) = Pr{T ij > t|Z ij (t)} be the conditional survival function of subject in sub-cluster j of cluster i given the covariate vector Z ij (t). Typically,F ij is specified through a regression model. For example, Shih & Lu (2007) consider the Cox proportional hazard model with hazard , where is a parameter vector of length p and (t) denotes a baseline hazard function. Even though it is not considered here, the baseline hazard function can be allowed to depend on i (Spiekerman & Lin, 1998). Then, for each t > 0, the marginal Then the survival copula of T, given the covariates, is the distribution function of U. We assume the latter to be as follows.
(i) The vectors U 1 , … , U n are mutually independent. (ii) For each i ∈ {1, … , n}, the distribution function of U i is a hierarchical Kendall copula with Archimedean clusters as described in Definition 1; the Archimedean generators at the lower level are denoted by ij , j ∈ {1, … , m i }, while the generator at the upper level Note that because the cluster lengths need not be the same, the generators 1 and 2 specified in part (iii) of Assumption 1 have to be d-monotone for a sufficiently large d. In particular, for Example. In applications to survival data like the one considered in this article, the generators 1 and 2 specified in part (iii) of Assumption 1 are often taken to be Clayton with parameters 1 and 2 , respectively. For arbitrary ∈ [−1/(d − 1), ∞) and all t > 0, the generator of the d-dimensional Clayton copula is by (t) = {max(1 + t, 0)} −1/ ; the case = 0 corresponds to the independence copula generator e −t . When > 0, is the Laplace transform of a Gamma variable (1∕ , ) and hence completely monotone. The kth derivative of with > 0 is explicit (Hofert, Mächler & McNeil, 2012); the relevant formula may be found in Section 2 of Supplementary Material. From (3), the survival function of the radial variable R of the d-variate Clayton copula with > 0 is given, for all r > 0, by When 1 = 1 and 2 = 2 , the parameters 1 and 2 can typically be interpreted as intercluster and intracluster associations, respectively. For example, for the bivariate Clayton copula, is in one-to-one correspondence with Kendall's , viz. = /( + 2), so that 1 = 2 1 /(1 − 1 ), where 1 is Kendall's between two members of the same cluster at the lower level. Moreover, 2 is related to Kendall's for two members of different clusters at the lower level that belong to the same cluster at the upper level, but in this case no simple formula is available.

Comparison with the HAC Model
In this section we compare the model from Section 3.1 to the HAC model of Shih & Lu (2007). First note that the latter model also assumes mutual independence of U 1 , … , U n . Next, for each i ∈ {1, … , n}, the HAC model takes the distribution function of U i to be hierarchical Archimedean. This means that for each i ∈ {1, … , n}, the distribution function of the variables U ij with i ∈ {1, … , n} and j ∈ {1, … , m i }, is the m ij -variate Archimedean copula with generator 1 , as in the model proposed here; we denote the latter by C 1 for simplicity.
The difficulty with the HAC model is that this expression is not necessarily a copula, unless 1 and 2 satisfy further conditions; for example, see Joe (1993). When 1 and 2 are Clayton generators with parameters 1 > 0 and 2 > 0, respectively, as in Shih & Lu (2007), these conditions hold if 2 < 1 . This additional constraint on the parameters makes estimating their model more complex. Another disadvantage is that the HAC model treats the vectors U ij and U ik equally for j ≠ k, even though their dimensions may differ. In contrast, the model that we propose is fully flexible in that 1 and 2 can be any generators that exhibit a sufficient degree of monotonicity. Furthermore, the different lengths of U ij and U ik are accounted for through the corresponding radial variables, which have different distributions.
The advantage of the HAC model is the more tractable form of intracluster dependence. Let i ∈ {1, … , n}, 1 ≠ 2 ∈ {1, … , m i }, 1 ∈ {1, … , m ij 1 }, and 2 ∈ {1, … , m ij 2 }. The distribution function of U ij 1 1 and U ij 2 2 is the bivariate Archimedean copula with generator 2 , and the parameter of 2 is directly interpretable in terms of intracluster correlation. In the model proposed here, the dependence between U ij 1 1 and U ij 2 2 is more cumbersome (Brechmann, 2014, Eq. (2.13)). However, in his Example 2.13, Brechmann (2014) shows that it is close to the Archimedean copula C 2 ,2 when the generators are Gumbel.

ESTIMATION
To fit the hierarchical Kendall copula model with Archimedean clusters to two-level clustered data, first recall that the available data are the triplets ( Assume that 1 and 2 in part (iii) of Assumption 1 belong to single-parameter classes of generators Ψ 1 = { 1 , 1 ∈ Θ 1 } and Ψ 2 = { 2 , 2 ∈ Θ 2 }, respectively; this is the case for most commonly used Archimedean families. Thus, we need to estimate the marginal parameters and Λ, and the association parameters 1 and 2 . Since the density of a hierarchical Kendall copula with Archimedean clusters has the product form identified in (4), we propose the following step-wise approach.
The marginal parameters and Λ are fitted first as in Spiekerman & Lin (1998) under the working assumption that all subjects are independent; the required estimators are given in Section 1 of the Supplementary Material. For formulas for standard errors (SEs) that take clustering into account, see Spiekerman & Lin (1998).
The estimateŝandΛ then serve to obtain the estimatesF ij of the conditional marginal Because the marginal survival functions are estimated,Û = (Û 1 , … ,Û n ) is a noisy, left-censored observation on the vector U identified in Section 3.1. Nonetheless, it can be used to estimate 1 and 2 . Analogous to Algorithm 3.9 in Brechmann (2014), we proceed sequentially.

Estimation of 1
Under the working assumption that the lower-level clusters are independent, 1 can be estimated by maximizing the pseudo log-likelihood where L ij (Û ij ) is the contribution of sub-cluster j of cluster i, viz.
where d ij = ij1 + · · · + ijm ij denotes the number of complete observations in the sub-cluster. Note that this procedure for estimating 1 reduces to the method of Shih & Lu (2007) when the generators are Clayton, and to the one of Prenen, Braekers & Duchateau (2017) when m ij = 1 for all i ∈ {1, … , n} and j ∈ {1, … , m i }, which coincides with the approach of Andersen (2005) when m i = 2 for all i ∈ {1, … , n}. The asymptotic behaviour of̂1 that maximizes (5) follows. A proof, based on the ideas of Prenen, Braekers & Duchateau (2017), is given in the Appendix.
The asymptotic variance of̂1 specified in Theorem 1 can be estimated consistently bŷ 2 Φ ∕{I 1 (̂1)} 2 ; an expression for̂2 Φ can be found in the Appendix. Explicit formulas when the generator is Clayton are given in Section 3.2 of the Supplementary Material.

Estimation of 2
Turning to the estimation of 2 , for each i ∈ {1, … , n} and j ∈ {1, … , m i }, we define the . These are noisy, censored pseudo-observations of the radial variables from (i) of Definition 1 whose survival copula is assumed to be C 2 .
To estimate 2 , we first propose to transform the variablesR ij to the uniform scale, that is, for each , where for any generator and integer Because the upper-level clusters are assumed to be independent for each i ∈ {1, … , n}, one can, in principle, maximize a likelihood for censored data based on the pseudo-sample , i ∈ {1, … , n}, ∈ {1, … , m i } that is constructed analogously to the one identified in (5). However, according to a preliminary simulation that we do not report explicitly, the resulting estimator performs poorly. This is because for any i ∈ {1, … , n} and ∈ {1, … , m i }, R ij = 0 whenever at least one of the observations X ij1 , … , X ijm ij in sub-cluster j is censored. Thus, the censoring rate of the variablesV ij is much greater than that of the variables X ij .
Algorithm 1 1. Generateũ ij1 from the conditional distribution function The conditional distributions in Algorithm 1 are explicit and given in Section 3.1 of the Supplementary Material, together with the special case when 1 is Clayton. Because 1 is unknown, it is first replaced by its estimatê1 derived in Section 4.1.
To estimate 2 , we propose to maximize the pseudo log-likelihood where c 2 ,m i denotes the density of the m i -dimensional Archimedean copula with generator 2 specified in (2). Note that c 2 ,m i involves the derivative , which may be intractable for some i when m i is large. To avoid this problem, one can maximize the pair-wise pseudo log-likelihood (Cox & Reid, 2004), viz.
where c 2 ,2 is the density of the bivariate Archimedean copula with generator 2 . Maximizing (7) typically leads to a loss in efficiency; this can be partially remedied by adding a weight function such as 1/(m i − 1); see Joe & Lee (2009).
Theorem 2 establishes the asymptotic normality of the estimator̂2 based on the imputed sample. The proof is similar to that of Theorem 1 and is outlined in Section 2 of the Supplementary Material.
Theorem 2. Let̂2 be a value that maximizes either L 2 or L c 2 given in (6) and (7) In order to reduce the effect of randomness in the imputation procedure, the latter can be repeated K times (Rubin, 1987). Let̂2 ,k denote the estimator based on the kth imputed sample and the maximization of either L 2 or L c 2 given in (6) and (7), respectively. The final estimate of 2 is then the averagē2 Using the results of Rubin (1987), we obtain the following results concerning the variance and asymptotic behaviour of̄2 ,K .
Theorem 3. Let̄2 ,K be the estimator identified in (8)  In the expression for the variance of̄2 ,K given in Theorem 3, W K and B K reflect the variation among the estimatorŝ2 ,1 , … ,̂2 ,K and the variation among imputations, respectively, while the factor (1 + 1/K) accounts for the extra variability of̄2 ,K when the imputation number K is finite. The finite-sample properties of the estimators of the dependence parameters 1 and 2 proposed in this section are investigated in Section 6 via simulations.

A GOODNESS-OF-FIT TEST
Before proceeding with the aforementioned simulations, we propose a test to validate the assumption that the dependence structure of T is indeed a hierarchical Kendall copula with Archimedean clusters as proposed in Section 3.1. Because a fully consistent test would be too cumbersome, we propose to test the following two hypotheses: (a) For each i ∈ {1, … , n} and j ∈ {1, … , m i }, the distribution of the vector U ij is an Archimedean copula with generator 1 , 1 ∈ Θ 1 .
To construct a suitable test statistic, recall from Section 2 and McNeil & Nešlehová (2009) that if a vector (U 1 , … , U d ) is distributed as an Archimedean copula C ,d , R = −1 (U 1 ) + · · · + −1 (U d ) and S = { −1 (U 1 ), … , −1 (U d )}/R are independent, S is uniformly distributed on the unit simplex  d and R has the survival function specified in (3). In particular, R is independent of Q = −1 (U 1 )/R and Q is distributed as the ratio of two independent Erlang variables with parameters 1 and d − 1, respectively. Hence, for any t > 0,F Q (t) = Pr(Q > t) = (t + 1) 1−d . Since this expression is independent of , Y = (Q + 1) 1 − d is a pivotal statistic. Thus, to test that the distribution of (U 1 , … , U d ) is the Archimedean copula C ,d , we propose to test that Spearman's between R and Q is 0, where = corr{F Q (Q),F ,d (R)} = corr{Y,F ,d (R)}.
To turn these observations into a test of the hypothesis specified in part (a), suppose thatŨ is an imputed complete dataset obtained from Algorithm 1. As in Section 4, for each i ∈ {1, … , n} and j ∈ {1, … , m ij }, defineR ij = −1 The combined sample version of Spearman's is then Following the approach of Fisher (1915), we use the test statistic To test the hypothesis specified in part (b) based on a single imputed sampleŨ, for Then the sample version of Spearman's and the resulting Fisher test statistic arē If no censoring is present, √ nz 1 and √ nz 2 are approximately standard Normal under the hypotheses specified in parts (a) and (b). This is illustrated in Figure 1, which displays histograms of N = 2,000 values of √ nz 1 and √ nz 2 when n = 200; for each i ∈ {1, … , n} and j ∈ {1, … , m i }, m i ≡ m 1 ∈ {10, 30} and m ij ≡ m 2 ∈ {3, 5}. The generators are Clayton with parameters 1 = 2 and 2 = 0.86, which correspond to values of 0.5 and 0.3 in terms of Kendall's , respectively. The marginal model is as described in Section 6.1, and the marginal parameters are estimated as in Spiekerman & Lin (1998), given that the data are complete. The overlaid standard normal density curves show that the normality assumption seems reasonable even if the margins are estimated.
As we indicated in connection with (8), the variability introduced through the imputation procedure can be reduced by taking averages over K imputation samples. For each k ∈ {1, … , K}, letz (k) 1 andz (k) 2 denote the statistics specified in (9) and (10) corresponding to the kth imputed sample, respectively. Letz 1, 2 ∕K. According to Rubin (1987), The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs the variance D 1,K ofz 1,K is while the variance ofz 2,K equals D 2, We can test the hypotheses specified in parts (a) and (b) using the statistics T 1,K =z 1,K ∕ √D 1,K and T 2,K =z 2,K ∕ √ D 2,K , respectively. Here,D 1,K is obtained by estimating var(z (k) 1 ) by the bootstrap procedure of Monaco, Cai & Grizzle (2005) in which clusters are sampled with replacement. If the hypotheses specified in parts (a) and (b) hold and K ≥ 2, Rubin (1987) implies that the distributions of T 1, K and T 2, K are approximately Student t with degrees of freedom 1 and 2 , respectively, where for ∈ {1, 2},

SIMULATION STUDY AND DATA ANALYSIS
In this section, we use simulations to evaluate the finite-sample performance of the parameter estimators from Section 4, and of the goodness-of-fit procedure from Section 5. We also use the hierarchical Kendal copula model to analyze data on CGD.

A Simulation Study of the Proposed Estimators
Throughout this section, the parameters of the hierarchical Kendall copula model were fixed as follows. For each i ∈ {1, … , n}, j ∈ {1, … , m i }, and ∈ {1, … , m ij }, p = 2 and Z ij 1 was uniform on (0, 1) and independent of Z ij 2 , which was Bernoulli(0.5). Given Z ij , T ij followed the Cox proportional hazard model with = (0.5, 1); for all t > 0 the Weibull baseline hazard function was (t) = (at a − 1 )/(b a ) with a = 0.91 and b = 52.57. The latter values were the parameter estimates that we obtained by fitting the Cox proportional hazard model with Weibull baseline hazard function to the CGD data. Finally, the generators were Clayton with 1 = 2 and 2 = 0.857, corresponding to values for Kendall's of 0.5 and 0.3, respectively. To generate the vector T of survival times, we first drew the vector U using Algorithms 3.1 and 3.7 of Brechmann (2014). Then, applying the marginal inverse survival functions . These lifetimes were then censored using independent variables T c ij , uniform on (0, c), where c was chosen to reach the targeted censoring rate of 10%, 30%, or 70%.
The censoring rate, the number of upper-level clusters n, and the cluster sizes typically influenced the performance of the estimatorŝ= (̂1,̂2),̂1,̄2 ,K , and̄c 2,K , where the last two estimates were obtained by maximizing the pseudo log-likelihood functions corresponding to (6) and (7), respectively. To investigate our proposed methods of estimation in detail, we considered the following three scenarios:  Table 1 shows an excerpt of the results observed under Scenario B; complete tables for all scenarios are provided in Tables S.1-S.5 in Section 4 of the Supplementary Material. For each estimator, we have reported the average bias, the empirical standard error (SE), the average estimated standard error (ESE), the root mean squared error (RMSE), and the empirical coverage rate (CR) of its large-sample 95% confidence interval (CI). We also have reported the average censoring rates R ij of the cluster representativesR ij . These values were indeed much higher than the censoring rates of the univariate lifetimes, and this phenomenon became worse as the size of the lower-level clusters increased. Overall, the performance of all estimators was reasonable, although the association parameter estimators seemed to be systematically negatively biased. Reassuringly, we observed that the values for SE and ESE were close, even when n = 50.
As expected, all estimators improved as n, the sample size, increased. The CR was surprisingly good, even for n = 50, except when the censoring rate was high and we used̄2 ,K . As the censoring rate increased, so did the SE of all the estimators. The bias of̄2 ,K and̄c 2,K also increased with the censoring rate. However, the impact of the latter on the bias of̂1,̂2, and̂1 was less clear; it may even seem that the bias decreased when the censoring rate was 70%. Although such an observation appears counterintuitive, the same phenomenon was observed in the simulation experiments of Andersen (2005), Shih & Lu (2007), and Prenen, Braekers & Duchateau (2017). Notice that the estimators of 1 and 2 performed better than those of 1 and 2 . Moreover, the estimator of 1 performed better than the estimators of 2 in terms of bias but not necessarily in terms of the SE; the RMSE was actually of a similar magnitude.
Scenario B explored the effect of the cluster size when n = 100. When the size of the lower-level clusters increased from m 2 = 3 to m 2 = 5, more information became available. This reduced the bias and especially the SE of the marginal parameter estimates. The bias and SE of

A Simulation Study of the Proposed Goodness-of-Fit Procedures
To investigate the goodness-of-fit procedures that we proposed in Section 5, we used the same marginal model as the one that we specified in Section 6.1. We used Clayton generators parametrized by Kendall's , viz. 1 = 1 /( 1 + 2) and 2 = 2 /( 2 + 2), respectively. In addition, m i ≡ m 1 and m ij ≡ m 2 for each i ∈ {1, … , n} and j ∈ {1, … , m i }. All tests were carried out at the 5% level and the results that we report were based on 500 simulation runs and 500 bootstrap replicates for the computation ofD 1,K . The test based on T 2,K used the estimator̄c 2,K .
The level of the tests based on T 1, K and T 2,K pertaining to the association in the lower-level and upper-level clusters, respectively, is summarized in Table S.7 of the Supplementary Material for various censoring rates, degrees of association, numbers of upper-level clusters, and cluster sizes. We assumed that the number of imputations was K = 0 if no observation was censored, and K = 30 if the censoring rate was 30%. Even though the margins are estimated, our proposed test held its level fairly well overall. To investigate the power, we chose the generators of the lower and upper level copulas to be both either in the Gumbel or the Frank family, parametrized by Kendall's . The upper left panel of Figure 2 displays the estimated power of the tests based on T 1,K and T 2,K for various values of n, as a function of 1 , when 2 = 0.3. As expected, power increased with n for each value of 1 . The power also increased with 1 , which makes sense given that when 1 = 0, the Frank and Gumbel copulas become the independence copula, which also belongs to the Clayton family. One can also see from these plots that the power depended on the alternative, and that it was negatively affected by the censoring rate, although this feature was less pronounced for the test based on T 2,K . We also noted that the test based on T 1,K exhibited greater power; this result was not surprising in view of the fact that this test is based on a larger number of observations. The lower panel of Figure 2 indicates the effect of the cluster sizes m 1 and m 2 on the estimated power of the test based on T 1,K ; as expected, the power increased with the cluster size.

Data Analysis
In this section, we report on our use of the hierarchical Kendall copula model with Archimedean clusters to analyze data from the study of CGD. The dataset contains 128 patients from n = 13 hospitals. The number m i of patients from a given hospital ranges from 4 to 26 with mean 9.8. The length m ij of a single patient's record ranges from 1 to 8 with mean 1.6. The censoring rate for a patient's record is around 62.5%. Given that n is small, we ran an additional simulation study using the same values of n, m i , and m ij as in the dataset, and the same degree of intercluster and intracluster association, as detailed in Section 4 of the Supplementary Material. The results reported in Table 8 of the Supplementary Material show that the performance of̂1 and̂2 is still reasonable, particularly when the pair-wise likelihood is used. This may seem surprising; the tolerable CR can be attributed to the fact that the association at both the lower and upper level is weak. Note also that while the entire dataset is exploited to estimate the marginal parameters, only patients who experienced at least one infection and one censored time (last observation) contributed to the psedo log-likelihood specified in (5). We first evaluated the effect of gamma interferon on the disease. From the upper section of Table 2, we observe that the treatment effect of -IFN was significant; a similar result was reported by Ha & Lee (2005) and Rondeau, Filleul & Joly (2006).
The results of the association analysis are summarized in the lower section of Table 2. The numbers of imputations and bootstrap replicates for the calculation ofD 1,K were K = 1,000 and B = 500, respectively. For both the lower and upper cluster levels, we chose the generators to be Clayton; from the P-values of the tests previously described in Section 5 and displayed in the last column of Table 2, this assumption seems reasonable. The penultimate column shows the P-values of the test of the hypothesis that the association is not significant, that is, that 1 = 2 = 0. We can see that the within-patient association is significant albeit weak, witĥ1 = 0.4447 corresponding to a value for Kendall's 1 of 0.18. The same conclusion was reached by Ha & Lee (2005) using a random-effects approach. However,̂1 is not significantly different from 0 when the covariates age and gender are also included in the Cox proportional hazard model; this is not surprising given the small sample size (results not reported here). The estimated association between patients treated at the same hospital is̄2 ,K = 0.157 and

CONCLUDING REMARKS
In this article, we adapted the hierarchical Kendall copula model with Archimedean clusters of Brechmann (2014) to clustered data with multiple levels of clustering. Our proposed model aims to reduce dimensionality through clustering. The way it is built reflects the cluster structure, which makes it natural and easy to interpret. Clusters can have different sizes and the model is easily expanded to multiple levels of hierarchy without imposing restrictions on the parameter space. As we discussed, these are clear advantages over the previously proposed hierarchical Archimedean model. Because of the way the model is constructed, association parameters can be estimated sequentially, level by level. We extended the work of Brechmann (2014) in that we developed an estimation procedure that can be used for right-censored observations. Because the censoring rate of the cluster representatives increases with the level of the hierarchy, we relied on imputation to estimate model parameters. The proposed estimators are shown to be asymptotically Gaussian and well-behaved in finite samples. Furthermore, we developed goodness-of-fit tests which can be used to check whether, at a given level of the hierarchy, the Archimedean generator indeed belongs to a specific parametric class. As we demonstrated through simulations, these proposed tests hold their level well and have reasonable power in finite samples. We used the proposed model to analyze two-level clustered data and reached conclusions that agreed with previous analyses of the CGD data that were based on random effects models.
Normality. Let S 1n ( 1 ;̂,Λ) be the score function, that is, the derivative of L 1 ( 1 ) with respect to 1 . Its Taylor series expansion around 10 is Thus it only remains to show the asymptotic normality of √ nS 1n ( 10 ;̂,Λ), which can be established as in the proof of Theorem 3 in Prenen, Braekers, & Duchateau (2017).