Linking macrotrends and microrates: Re‐evaluating microevolutionary support for Cope's rule

Cope's rule, wherein a lineage increases in body size through time, was originally motivated by macroevolutionary patterns observed in the fossil record. More recently, some authors have argued that evidence exists for generally positive selection on individual body size in contemporary populations, providing a microevolutionary mechanism for Cope's rule. If larger body size confers individual fitness advantages as the selection estimates suggest, thereby explaining Cope's rule, then body size should increase over microevolutionary time scales. We test this corollary by assembling a large database of studies reporting changes in phenotypic body size through time in contemporary populations, as well as studies reporting average breeding values for body size through time. Trends in body size were quite variable with an absence of any general trend, and many populations trended toward smaller body sizes. Although selection estimates can be interpreted to support Cope's rule, our results suggest that actual rates of phenotypic change for body size cannot. We discuss potential reasons for this discrepancy and its implications for the understanding of Cope's rule.

Despite the above arguments many organisms remain small, which suggests constraints or opposing selective forces (Blanckenhorn 2000;Purvis and Orme 2005;Kingsolver and Pfennig 2007). At the individual level, attaining larger size can require faster growth, which can lead to increased foraging risk and therefore higher mortality (Dibattista et al. 2007;Carlson et al. 2008). In addition, faster growth can lead to structure problems (Arendt 1997;Arendt and Wilson 1999) and reduced locomotory performance that can increase predation risk (Lankford et al. 2001). Furthermore, at the macroevolutionary level, there can be advantages to being smaller such as increased potential for adaptive evolution (Bromham et al. 1996;Dombroskie and Aarssen 2010). These reasons might explain why different studies have found either no change or a decrease in body size through time (Jablonski 1997;Alberdi et al. 1998;Knouft and Page 2003;Moen 2006;Churchill et al. 2014).
If Cope's rule is driven by individual large-size fitness benefits, the signatures of this mechanism should be evident on microevolutionary time scales. With this idea in mind, a few studies have tested the logical corollary that selection on body size should be generally positive in contemporary populations in nature (Kingsolver and Pfennig 2004;Kingsolver and Diamond 2011). These analyses reported that selection does tend to be, in general, directional for larger body size and stronger when compared to other types of traits. These results have been interpreted as supportive for the idea that individual, large-size fitness advantages could be a mechanism underlying the evidence for Cope's rule (Kingsolver and Pfennig 2004).
We suggest that a complimentary and perhaps more direct test for Cope's rule would be to assess actual trait changes, instead of selection estimates, in contemporary populations. These trait changes might represent a microevolutionary pattern (response to selection). Such an analysis of trends in mean phenotype circumvents some limitations of selection estimates (see Discussion) and provides a more direct assessment. Specifically, if microevolutionary data support the idea that individual large-size advantages provide an explanation for Cope's rule, those data should generally show increases in body size in contemporary populations. A number of individual studies have reported data that could be used to test this expectation. For example, increasing body size has been reported for some contemporary populations of invertebrates (Huey et al. 2000;D'Amico et al. 2001). Conversely, evidence also exists that body size can decrease in relation to environmental perturbations such as climate change (Millien et al. 2006;Blois et al. 2008;Teplitsky and Millien 2014). However, general inferences require analyses across many populations, an endeavor now made possible by the assembly of a database of rates of phenotypic change in contemporary populations (Hendry and Kinnison 1999;Kinnison and Hendry 2001;Hendry et al. 2008).
We here use an updated version of this database to examine phenotypic trends that could be corollaries of Cope's rule, corollaries selected to be as similar as possible to those advanced based on previous analyses of selection estimates (Kingsolver and Pfennig 2004;Kingsolver and Diamond 2011). We first use the entire database to answer two questions: (1) Is body size generally increasing within populations? and (2) Are rates for body size change more positive (or less negative) than rates for other phenotypic traits? Given that body size changes could differ among taxonomic groups (Yom-Tov and Geffen 2011;Teplitsky and Millien 2014), sexes (Andersson 1994), or anthropogenic disturbances such as harvesting (Hendry et al. 2008;Darimont et al. 2009;Sharpe and Hendry 2009), we also ask (3) Does body size increase when accounting for structure in the database?
These analyses of the entire database include results for wild-caught individuals whose phenotypes can be influenced by both genetic and plastic effects (Rausher 1992;Mauricio and Mojonniner 1997;Stinchcombe et al. 2002). Thus, we finally ask: Is the genetically based component of body size generally increasing within populations? For this last question, analyses were based on a separate database of studies that used "animal model" methods  to estimate temporal changes in mean breeding values for body size. This is important because the trait of interest, in our case, body size, must be heritable as well as under selection as dictated by the breeder's equation (Lush 1937). We recognize that our analyses focus on phenotypic changes rather than evolutionary changes, yet much of the existing micro and macroevolutionary inferences about Cope's rule have been drawn from phenotypic data, and thus, our analyses are parallel to previous work emphasizing evolutionary changes.

Materials and methods
We started from the published database of Hendry et al. (2008), who collated rates of phenotypic change from studies of contemporary populations: that is, over the last few hundred years. We then improved and modified the database in several ways. First, some minor errors were corrected, such as ensuring the timeframe for a given study system spanned at least one generation. Second, additional studies published up to 2012 were added as we discovered them. Third, we included only allochronic studies (data obtained from the same population at multiple times) and excluded synchronic studies because the latter cannot reveal the direction of change. Fourth, one author (MMT) used the Kingsolver and Diamond (2011) system to classify traits into different classes: body size, other morphology, physiological, phenology, and other life history. The database used in this study is archived at Dryad.
For body size, we followed previous analyses (Kingsolver and Pfennig 2004;Siepielski et al. 2009Siepielski et al. , 2013 in using only direct measurements, such as total length or mass, as opposed to morphological proxies, such as tarsus length in birds. Although trends for such proxies might be expected to be similar to those for body size, given their correlation with body size, our goal was to exactly parallel the approach used in selection analyses. However, we recognize that morphological traits are often used as proxies for body size, and we also re-ran analyses on a dataset that reclassified any "other morphological trait" that can scale with body size as "size." Because trait re-classification did not change our interpretation, we report these additional results for the first two questions in Resource S2. Data based on mass and volume, as opposed to a linear dimension, were cube-root transformed to allow for among-study comparisons (Amadon 1943;Uyeda et al. 2011).
For rates of phenotypic change, we calculated both Darwins, which quantify proportional change on an absolute time scale, and Haldanes, which quantify changes in standard deviation (SD) units on a generation time scale (reviewed in Gingerich 1993;Hendry and Kinnison 1999;Kinnison and Hendry 2001). Darwins were calculated as ln(X 2 ) − ln(X 1 ) 10 6 years , where the difference between the natural logarithms of the mean trait valuesX 1 andX 2 are divided by elapsed time in millions of years. Haldanes were calculated as where the difference between the mean trait valuesX 1 andX 2 divided by the pooled SD of both populations S D p is divided by the number of elapsed generations (g). Both metrics were used because they have different properties and only one or the other can be calculated for some studies. In nearly all cases, we extracted data from the original papers, or obtained them from the authors, so as to calculate rates of change ourselves because rates reported in the literature are sometimes incorrect or the absolute values only are reported. Many studies in the database consisted of samples at only two different times, which were used for the rate calculations. For studies that were time series with measurements in multiple years, we calculated a linear regression from the time series data and used the endpoints of the best-fit regression line to obtain endpoints so as to provide a direct comparison with the studies having only two sampling times. The pooled SD to calculate Haldanes was calculated as the square root of the within mean square error from the linear regression. The number of time series systems is relatively small (N = 12), and future compilations of more time series would be useful as they can be used to assess nonlinear changes.

STATISTICAL ANALYSES
Analyses for the first two questions were performed separately on each of four different metrics: Darwins, Darwin numerators, Haldanes, and Haldane numerators. The reason for using both rates and numerators is that phenotypic changes sometimes scale with time interval and sometimes do not (Kinnison and Hendry 2001;Westley 2011). The data did not meet assumptions of normality (Shapiro-Wilks test; 0.279 ࣘ W ࣘ 0.953; P < 0.001), and so nonparametric tests were performed to address the first question. The first two analyses we conducted were designed to be directly comparable to those used in Kingsolver and Pfennig's (2004) analysis of selection estimates.

Is body size generally increasing within populations?
We used a sign test to determine if change in body size was more commonly positive or negative. We also ran the analyses on subsets of the data divided by taxa (invertebrates, vertebrates, and plants) as well as "natural" versus human-perturbed situations. The latter specifically included climate change, fish ladder installation, introductions, and range expansion, as well as in situ anthropogenic disturbances including harvesting, landscape change, and pollution.
Are rates for body size change more positive (or less negative) than rates for other phenotypic traits? Our first analysis was a one-tailed Wilcoxon rank-sum test to compare changes in body size to other phenotypic traits across the entire database. This analysis is akin to that performed on selection estimates by Kingsolver and Pfennig (2004) and was performed on the different classifications of phenotypic traits.
Does body size increase when accounting for structure in the database (taxa, disturbance, or sex)? Given the heterogeneous nature of the dataset, we conducted a formal analysis based on a linear mixed-effect model framework (using the nlme package in R, Pinheiro et al. 2015). Plant and animal data were modeled separately because (1) plants and animals differ in growth patterns and selection regimes for plants and animals and (2) to avoid model overfitting because of a lack of data for predictors "sex" and "disturbance" in plants. All models used square root transformed Darwin or Haldane numerators as the response variable, log-transformed "generations" as a covariate, and "study system" as the random structure. Some studies only reported the final rates, and not the generations, so these data were excluded for this analysis. The fixed-effect structure for the animal data model included "sex" (male, female, and both), "trait class" (physiology, phenology, other life history, other morphology, and size), "taxa" (vertebrates and invertebrates), and "disturbance" (disturbed and natural), whereas the fixed-effect structure for the plant model included only "trait class." Furthermore, to account for potential heteroscedasticity (i.e., unequal variances) in within-group errors, the mixed-effect models included specific variance functions (i.e., varFunc constructors in nlme; Pinheiro and Bates 2000) that were evaluated based on the Akaike Information Criterion (AIC) (i.e., lowest AIC indicates the best model; Burnham and Anderson 2002, Table S3). From these models we used the coefficients of fixed-effect predictors to assess relative strength and direction of evolutionary rates for the respective categories. Because our goal in these analyses was simply to assess relative differences in evolutionary rates for body size versus other predictor categories, while controlling for confounding factors, we did not include interactions. Additional details regarding these analyses can be found in Resource 1.

Is genetically based body size increasing?
For this analysis, we focused on body size time series that presented mean breeding values, which are the additive effect of a genotype on a given trait (Lynch and Walsh 1998; Wilson et al. 2010). We reviewed the existing literature to identify studies that reported mean breeding values through time in natural populations. Breeding values were extracted from a figure in one study , whereas the others were provided by the original authors (see Acknowledgments). For each time series, we estimated linear regressions for mean breeding values through time. Although statistical analyses of breeding values have been criticized for failing to account for uncertainty (Hadfield et al. 2010;Wilson et al. 2010), this concern focuses on statistical confidence (downwardly biased errors) and not the slope estimates. Our conclusions were drawn with this point in mind.

Results
The final database consisted of 1005 data points from 50 published studies representing 148 different species. We estimated 985 rates in Darwins (146 for body size) and 915 rates in Haldanes (70 for body size; Table 1). Some studies reported multiple populations, and we used the individual populations (N = 187) as our unit of replication for statistical inference.

POPULATIONS?
Overall, body size changes through time were more often negative than positive and this was significant for Darwins (Fig. 1, Tables 1, S1, and S2). All taxonomic groups tended to show negative body size changes through time, with this change being significant for Darwins for vertebrates (Table 1). Both disturbed and natural populations also showed negative body size trends that were significant for Darwins (Table 1).

POSITIVE (OR LESS NEGATIVE) THAN RATES FOR OTHER PHENOTYPIC TRAITS?
Considering the entire database, changes in body size were not more positive (or less negative) overall than were those for other traits, except for other life-history traits in Darwins (Table 2, Fig. S1).

DISTURBANCE, OR SEX)?
Body size change did not increase or decrease when accounting for taxa, disturbance, or sex in a linear mixed model and when correcting for potential heteroscedascity (Fig. 2, Table S4). Although it appears that plants might be decreasing in size, only two data points contributed to this subset of data for both Darwins and Haldanes (Fig. 2).

IS GENETICALLY BASED BODY SIZE INCREASING?
Estimated trends for body size breeding values varied considerably among the 12 populations (Table S5), with only two populations showing a significant positive trend and one population showing a significant negative trend (Table S5, Fig. S2). Given that significance would be lower when accounting for uncertainty in the estimates (Hadfield et al. 2010;Wilson et al. 2010), we conclude that no convincing evidence exists for a general trend toward increasing genetically based body size.

Discussion
We are unable to report support for Cope's rule in the same manner as was possible for analyses of selection coefficients (Kingsolver and Pfennig 2004;Kingsolver and Diamond 2011). First, phenotypic body size is not generally increasing in contemporary populations (Fig. 1, Table 1). Second, trends are not more positive (or less negative) for body size than for other traits (Table 2, Fig S1). Third, a mixed model analysis does not indicate that body size is increasing, even after accounting for structure in the database (i.e., sex, disturbance, and taxa, Fig. 2, Table S4). Fourth, time series of breeding values do not reveal a general tendency toward increasing genetically based body size (Fig. S2, Table S5). At face value, these results are not consistent with the earlier analyses of selection coefficients (Kingsolver and Pfennig 2004;Kingsolver and Diamond 2011). However, we note that many of the positive, directional selection estimates for body size are very weak, and many estimates were negative or very close to zero. We first consider potential reasons for the different outcomes of these two   types of analyses (selection vs. phenotypic rates of change), and we then reconsider Cope's rule in general. First, the selection and phenotypic change databases differ in the types of populations they include. The selection database excludes manipulated populations (Kingsolver et al. 2001), whereas the phenotypic change database does not. That is, the latter database includes introduced and harvested populations. Such disturbed populations, especially harvested ones, might be expected to experience particularly fast decreases in body size (Hendry et al. 2008;Darimont et al. 2009;Sharpe and Hendry 2009). However, even if we consider only undisturbed "natural" populations, our analyses do not find any evidence that body size is increasing (Tables 1, S1, and S2). Second, selection estimates are often limited owing to small sample sizes, unmeasured confounding variables, spatiotemporal variation, and imperfect fitness surrogates (Kingsolver et al. 2001;Hereford et al. 2004;Hersch and (Kingsolver et al. 2001;Kingsolver and Pfennig 2004;Siepielski et al. 2009Siepielski et al. , 2013. Fourth, a fundamental disconnect can exist between selection and phenotypic change (Merilä et al. 2001;Haller and Hendry 2014) as a result of countergradient environmental changes (Larsson et al. 1998;, environmental covariance between traits and fitness (Rausher 1992;Mauricio and Mojonniner 1997;Stinchcombe et al. 2002), and covariance between nonheritable traits and fitness (Price et al. 1988;Price and Liou 1989). For all of these reasons, and those we will add below, it is possible that estimates of phenotypic change are a better indicator of microevolutionary trends than are estimates of selection (Gotanda and Hendry 2014), although inferences based on phenotypic rates are not without their own caveats, which we also discuss below. Given the above findings and assertions, it is appropriate to revisit typical arguments summarized in the first paragraph of the introduction for why body size should be under positive selection. The more subtle reality is that a number of good reasons exist for why selection on body size should not be typically positive. In particular, selection estimates almost always use fitness components as opposed to total fitness, and positive selection acting through one component is expected to be often offset by negative selection acting through another component (Blanckenhorn 2000;Purvis and Orme 2005;Kingsolver and Pfennig 2007;Collar et al. 2011). Furthermore, larger body size can have a negative impact on several fitness components (see Introduction). More generally, total selection on traits in well-adapted populations is expected to be stabilizing rather than directional, though the vast majority of estimates are close to zero and nonsignificant (Haller and Hendry 2014). We recognize that changing environmental conditions or high gene flow can impose directional selection, and that interpretation of analyses of selection estimate databases can vary, but no reason exists why such effects would generally favor larger body size.

B R I E F C O M M U N I C AT I O N
Our analyses have their own set of caveats. First, we did not account for phylogentic relationships due to the wide phylogenetic breadth of the species in the dataset. Second, although we did include sex in our full model, we did not focus specifically on sex-specific trends or any resulting changes in sexual dimorphism, although this would be an interesting avenue of future analysis. Third, our analyses were based on phenotypes, and so might not reflect genetic change. Traits that undergo evolutionary change must be both heritable and under selection. However, this caveat similarly applies to the previously analyzed phenotypic selection estimates and also for previous macroevolutionary analyses of Cope's rule. Size changes inferred from the fossil record could very well reflect genetic changes, but these data are very difficult to obtain, and all conclusions drawn have been based on phenotypic measurements. Our phenotypic perspective is therefore directly comparable to previous approaches. Lastly, our analysis of breeding values attempted to directly eliminate plastic effects of phenotypic change, and the results were consistent with our larger phenotype-based analyses. Phenotypic plasticity could have a genetic underpinning, which would suggest a genetic × environment (G × E) component to adaptive trait change (Scheiner 1993;Pigliucci 2001). It would be advantageous to obtain and analyze additional breeding value datasets to better separate genetic, plastic, and potentially G × E contributions.
How do we reconcile our lack of evidence for increasing body size on microevolutionary time-scales with Cope's rule? One possibility is that the individual-level selection that leads to increased body size on macroevolutionary time scales is episodic, occurring only at specific time points. If so, these rare events would not be often captured on the relatively short time scales of microevolutionary studies (Gingerich 2001;Uyeda et al. 2011). For example, studies of an island population of silvereyes (Zosterops lateralis chlorocephalus) show that historically body size subcategories of each categorical predictor from the respective linear mixed-effect models. Note that these coefficients are estimated relative to the first subcategory of each respective predictor (i.e., for trait class: physiology; for sex: both; for disturbance: disturbed; for taxa: invertebrates). Sample sizes are reported in Tables S1 and S2. increased dramatically over a few hundred generations whereas directional selection on body size is currently absent (Clegg et al. 2008).
Alternatively, we might need to look beyond classic microevolutionary processes to explain Cope's rule. One such explanation is higher level selection (Fowler and MacMahon 1982;Brown and Maurer 1986). Specifically, species sorting in the broad sense can affect speciation and extinction rates at the species level, resulting in phenotypic differences among clades (macroevolutionary). For example, size increases in marine animals have been attributed to diversification among classes, not size increases within a given lineage (Heim et al. 2015). However, even this higher level selection can still be interpreted as resulting from organismal-level (microevolutionary) processes, such as individual-level fitness advantages for larger body size (Jablonski 2008).
In conclusion, we found that phenotypic rates of change do not match previous assertions of generally positive directional selection on body size (Kingsolver and Pfennig 2004;Kingsolver and Diamond 2011), nor do they provide microevolutionary support for Cope's rule. We suspect that these different outcomes reflect a fundamental disconnect between selection estimates and phenotypic change, and that well-adapted populations are more likely to be under stabilizing selection for body size than directional selection. We also suggest that, because of inherent differences in micro-and macroevolutionary time scales and selection at different levels (e.g., individual vs. populations vs. species), further attempts to seek a mechanistic explanation for Cope's rule on microevolutionary timescales by focusing only on phenotypes might not be the most profitable endeavor. Instead, we suggest that future studies should focus on untangling the phenotypic, plastic, and G × E contributions that would provide more conclusive microevolutionary support for Cope's rule.

ACKNOWLEDGMENTS
KMG was funded in the form of a Vanier Canada Graduate Scholarship from the Natural Sciences and Engineering Research Council of Canada (NSERC), APH was funded by an NSERC Discovery Grant, and CC was funded by a Comisión Nacional de Investigación Científica y Tecnológica scholarship through the government of Chile. We thank A. Charmantier, D. Garant, A. Husby, C. Teplitsky, and A. Vøllestad for generously sharing their breeding values data with us. We also thank the associate editor and three anonymous reviewers for their constructive comments on a previous version of the manuscript.

DATA ARCHIVING
The doi for our data is 10.5061/dryad.22c9s.

Supporting Information
Additional Supporting Information may be found in the online version of this article at the publisher's website: Resource 1: Linear mixed-effect model evaluation. Resource 2: Supplemental tables with results after re-classifying other morphological traits that scale with body size as "size." Table S1: Summary statistics for Darwins and Darwin numerators for all populations, and subset by taxa, disturbance, and sex. Table S2: Summary statistics for Haldanes and Haldane numerators for all populations, and subset by taxa, disturbance, and sex. Table S3: Summary of the variance function classes to determine the best-fit model to account for the stratified nature of the database. Table S4: Summary results from linear mixed-effect models to see if body size changes are significant utilizing best-fit variance models to account for potential heteroscedacity in the dataset. Table S5: Results from linear regressions run on each population as a function of average breeding values and time. Table S6: Summary statistics for Darwins and Darwin numerators for all populations, and subset by taxa and disturbance with body size trait classification defined as any "other morphological trait" that can scale allometrically with body size. Table S7: Summary statistics for Haldanes and Haldane numerators for all populations, and subset by taxa and disturbance with body size trait classification defined as any "other morphological trait" that can scale allometrically with body size. Table S8: Pairwise Wilcoxon signed-rank test results for size versus a different phenotypic trait (one-sided) to see if rates of evolution for body size were higher than other traits. Figure S1: Density histograms for Darwins, Haldanes, and their numerators for the five different trait classes: body size, other morphology, phenology, other life history, and physiology. Figure S2: Linear regressions of average breeding values for body size through time for 12 different populations of vertebrates.