Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method

* Corresponding author. Centre for Health Sciences, Barts and The London School of Medicine and Dentistry, Abernethy Building, 2 Newark Street, Queen Mary, University of London, London E1 2AT, UK. E-mail: s.eldridge@qmul.ac.uk

Search for other works by this author on: Deborah Ashby , Deborah Ashby Search for other works by this author on: Sally Kerry Sally Kerry Search for other works by this author on:

International Journal of Epidemiology, Volume 35, Issue 5, October 2006, Pages 1292–1300, https://doi.org/10.1093/ije/dyl129

30 August 2006 25 May 2006 30 August 2006

Cite

Sandra M Eldridge, Deborah Ashby, Sally Kerry, Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method, International Journal of Epidemiology, Volume 35, Issue 5, October 2006, Pages 1292–1300, https://doi.org/10.1093/ije/dyl129

Navbar Search Filter Mobile Enter search term Search Navbar Search Filter Enter search term Search

Abstract

Background Cluster randomized trials are increasingly popular. In many of these trials, cluster sizes are unequal. This can affect trial power, but standard sample size formulae for these trials ignore this. Previous studies addressing this issue have mostly focused on continuous outcomes or methods that are sometimes difficult to use in practice.

Methods We show how a simple formula can be used to judge the possible effect of unequal cluster sizes for various types of analyses and both continuous and binary outcomes. We explore the practical estimation of the coefficient of variation of cluster size required in this formula and demonstrate the formula's performance for a hypothetical but typical trial randomizing UK general practices.

Results The simple formula provides a good estimate of sample size requirements for trials analysed using cluster-level analyses weighting by cluster size and a conservative estimate for other types of analyses. For trials randomizing UK general practices the coefficient of variation of cluster size depends on variation in practice list size, variation in incidence or prevalence of the medical condition under examination, and practice and patient recruitment strategies, and for many trials is expected to be ∼0.65. Individual-level analyses can be noticeably more efficient than some cluster-level analyses in this context.

Cluster randomized trials are trials in which groups or clusters of individuals, rather than individuals themselves, are randomized to intervention groups. This long-recognized trial design 1 has gained popularity in recent years with the advent of health services research where it is particularly appropriate for investigating organizational change or change in practitioner behaviour.

Many authors have described the statistical consequences of adopting a clustered trial design, 2 – 9 but most assume the same number of individual trial participants (cluster size) in each cluster 7, 8 or minimal variation in this number. 9 Researchers calculating sample sizes for cluster randomized trials also generally ignore variability in cluster size, largely because there has been no appropriate, easily usable, sample size formula, in contrast to a simple formula for trials in which cluster sizes are identical. 3 In practice, however, cluster sizes often vary. For example, in a recent review of 153 published and 47 unpublished trials the recruitment strategies in two-thirds of the trials led inevitably to unequal sized clusters. 10 Imbalance in cluster size affects trial power.

The way in which cluster size imbalance affects trial power is intuitively obvious if we consider several trials with exactly the same number of clusters and exactly the same total number of trial participants. The most efficient design occurs when cluster sizes are all equal. If cluster sizes are slightly imbalanced then estimates from the smaller clusters will be less precise and estimates from the larger clusters more precise. There are, however, diminishing gains in precision from the addition of an extra individual as cluster sizes increase. This means that the addition of individuals to larger clusters does not compensate for the loss of precision in smaller clusters. Thus, as the cluster sizes become more unbalanced, power decreases.

Two recent studies report a simple formula for the sample size of a cluster randomized trial with variable cluster size and a continuous outcome, 11, 12 and a third study reports a similar result for binary outcomes. 13 Kerry and Bland 14 use a formula for binary and continuous outcomes, which requires more knowledge about individual cluster sizes in advance of a trial. These papers do not discuss the practical aspects of estimating the formulae in advance of a trial in detail. In addition, all the formulae strictly apply to analysis at the cluster level, whereas analysis options for cluster randomized trials have increased in recent years, including analysis at the level of the individual appropriately adjusted to account for clustering. In the present paper we (i) show how a simple formula can be used to judge the possible effect of variable cluster size for all types of analyses; (ii) explore the practical estimation of a key quantity required in this formula; and (iii) illustrate the performance of the formula in one particular context for individual-level and cluster-level analyses. We articulate our methods using cluster randomized trials from UK primary health care where these trials are particularly common. 10

Method

Judging the possible effect of variable cluster size

Full formulae for sample size requirements for cluster randomized trials are given elsewhere. 15 When all clusters are of equal size, m, sample size formulae for estimating the difference between means (or proportions) in intervention groups for a continuous (or binary) outcome differ from comparable formulae for individually randomized trials only by an inflation factor 1 + (m − 1)ρ, usually called the design effect (DE), 3 or the variance inflation ratio, because it is the ratio of the variance of an estimate in a cluster trial to the variance in an equivalently sized individually randomized trial. The intra-cluster correlation coefficient (ICC), ρ, is usually defined as the proportion of variance accounted for by between cluster variation. 16

More generally, the design effect represents the amount by which the sample size required for an individually randomized trial needs to be multiplied to obtain the sample size required for a trial with a more complex design such as a cluster randomized trial and depends on design and analysis. Here we assume unstratified designs and that clusters are assigned to each intervention group with equal probability. For continuous and binary outcomes common appropriate analyses are: (i) cluster-level analyses weighting by cluster size, (ii) individual-level analyses using a random effect to represent variation between clusters, and (iii) individual-level marginal modelling (population averaged model) using generalized estimating equations (GEEs). 3 When cluster sizes vary, the usual design effect formulae for these analyses require knowledge of the actual cluster sizes in a trial, alongside the value of the ICC ( Table 1). This information is often not known in advance of a trial.

Table 1

Design effects for analyses commonly undertaken in cluster randomized trials