Estimating mean rarity • MeanRarity

The MeanRarity package contains functions to compute, estimate, and visualize Hill numbers, using the framework that Hill numbers simply compute the abundance-weighted [generalized] mean species rarity of a community (rarity is the reciprocal of relative abundance). Additionally, the package contains functions for generating and sampling abundance vectors and generating plots. This vignette is not a comprehensive guide to the package. However, it includes data and code to replicate the quantitative main-text figures presented in Roswell, Dushoff, and Winfree (2021). Thus, the examples in the vignette are ordered according to the figures in the article.

The goal of [Hill] diversity metrics is to summarize the distribution of species’ [relative] abundances. The first figure in the “Conceptual guide” is a visualization of the observed bee abundances from the four communities in the example dataset beeObs, analyzed in the guide. The package also includes a summary of those data, beeAbunds. The function radplot() takes those data and returns a rank-abundance distribution plot. We love the new R package patchwork for assembling plots, and call that package here.


# to make RAD plot for communities comparable, use one maxab and one maxrich for all plots
maxab <- max(sapply(beeAbunds[2:5], max)) # the maximum observed species abundance in any community

maxrich <- max(sapply(beeAbunds[2:5], function(x){sum(x > 0)})) # maximum observed richness


myplots <- purrr::map(2:5, function(makedatind){
    radplot(beeAbunds[,makedatind], maxab = maxab, maxrich = maxrich
            , shape = 13 + makedatind # solid filled shapes
            , size = 1.5
            , fill = c("#1b9e77", "#d95f02", "#7570b3", "#e7298a")[makedatind - 1]) +
    ggplot2::labs(title = ggplot2::element_blank()) +
    ggplot2::theme(text = ggplot2::element_text(size = 14)
                   , axis.text = ggplot2::element_text(colour = "black"))
})


myplots[[1]] + # first community
  ggplot2::labs(x = "") +# syling
white_y(myplots[[3]]) + # second community 
  ggplot2::labs(x = "") + # styling
myplots[[2]] +  # third community
white_y(myplots[[4]]) # fourth community

The next figure in the guide shows that the expected value for both observed and estimated asymptotic richness increase with sample size. The code to generate expected diversity estimates is computationally expensive and we recommend not to run the code in this vignette. It is documented in the package in the file "data-raw/DATASET.R", and the result is an exported dataset mean_ests. This script uses the function subsam(), which samples without replacement, and the function obs_est(), which computes Hill diversity estimates, including asymptotic estimates (Chao and Jost 2015) using functions modified from the R package SpadeR


# this takes a long time to run so we're going to stash it but export the data it creates

sz<-1:745 #sample sizes from 1 to maximum observed total abundance at a site
nrep<-200 #Large enough because estimating means

#tictoc::tic() #time this
nc <- parallel::detectCores() - 1 #nc is number of cores to use in parallel processing
future::plan(strategy = future::multiprocess, workers = nc) #this sets up the cluster


div_ests <- purrr::map_dfr(1:nrep, function(k){
 furrr::future_map_dfr(sz, function(x){
    map(2:length(beeAbunds), function(sitio){
      f<-beeAbunds %>% dplyr::select(all_of(sitio))
      if(sum(f) > x){
        comm = dplyr::pull(f) %>% subsam(size = x)
        return(data.frame(obs_est(comm)
                     , site = names(f)))
      }
    })
  })
})

#tictoc::toc() #507 seconds with quad-core i7 processor, not bad.
mean_narm <- function(x){mean(x, na.rm = T)}
mean_ests <- div_ests %>%
  dplyr::group_by(site, individuals) %>%
  dplyr::summarize_all(mean_narm)


site_ab <- purrr::map_dfr(names(beeAbunds[,2:5]), function(site){
  data.frame(site, abun = sum(beeAbunds[,site]))
})

mean_ests %>%
  dplyr::left_join(site_ab) %>% 
  dplyr::mutate(cutrows = individuals > 0.8 * abun) %>%
  #truncate the dataset because literal bootstrapping gets funny and  the Chao1
  #levels off near observed abundance, didn't want to discuss this in guide
  #since it's obvious but not interesting and possibly very confusing.
  dplyr::filter(!cutrows) %>% 
  ggplot2::ggplot(aes(individuals, obsrich, color = site))+
  ggplot2::geom_point(alpha = 0.3, size = 0.5) +
  ggplot2::geom_point(aes(y = chaorich, color = site, shape = site)
             , size=0.8, alpha=0.5) +
  ggplot2::theme_classic() +
  ggplot2::labs(x = "\nIndividuals", y = "Species richness\n") +
  ggplot2::xlim(c(0, 400)) +
  ggplot2::ylim(c(0, 80)) +
  ggplot2::theme(text = ggplot2::element_text(size=14)
        , axis.text = ggplot2::element_text(colour = "black")) +
  ggplot2::theme(legend.position = "none") +
  ggplot2::scale_shape_manual(values = c(15:18)) +
  ggplot2::scale_color_brewer(palette = "Dark2")
#> Joining with `by = join_by(site)`
#> Warning: Removed 257 rows containing missing values or values outside the scale range
#> (`geom_point()`).
#> Removed 257 rows containing missing values or values outside the scale range
#> (`geom_point()`).

Figure 3 in the guide shows an empirical Hill diversity profile. The MeanRarity package has a function to produce such profiles, divpro(). This lightweight function does not produce asymptotic estimates. Because the guide focused on the mean rarity formulation for Hill diversities, parameterizing the scaling exponent with \(\ell\), the profile is plotted from \(\ell = -1\) to \(\ell =1\), whereas other software packages such as iNEXT and SpadeR plot the profile from \(q = 1-\ell =0\) to \(q = 1-\ell =2\).


site_pro <- purrr::map_dfr(names(beeAbunds[2:5]), function(x){
  data.frame(site = x, divpro(beeAbunds[, x]))}) #make profile for each community

site_pro %>%  ggplot2::ggplot(aes(ell, d, color = site)) +
    ggplot2::geom_line() +
    ggplot2::geom_point(data = site_pro %>% 
                          dplyr::filter(ell %in% c(-1,1,0))
               , aes(ell, d, color = site, shape = site)
               , size = 3, alpha = 0.5) +
    ggplot2::theme_classic() +
    ggplot2::theme(text = ggplot2::element_text(size = 14)
          , axis.text = ggplot2::element_text(colour = "black")
          , legend.position = "none") +
    #this is the unicode for the curly l
    ggplot2::labs(x = "Exponent \U2113", y = "Diversity") +
    ggplot2::scale_shape_manual(values = c(15:18)) +
    ggplot2::scale_color_brewer(palette = "Dark2")

A central idea in the guide is that researcher decisions can drive the results of biodiversity comparisons: how much to sample, how to standardize data, which diversity metric to use, and how to express uncertainty. Figure 4 illustrates how some of these choices could alter interpretation of the example dataset.

The code below does not highlight any of the novel tools in the MeanRarity package. However, it does use Chao_Hill_abu(), a copy of an internal function from the R package SpadeR. To produce fig 4 in the MS, we used the R package iNEXT, which is not imported by MeanRarity. It also uses our simplified version of the code for computing approximate CI from iNEXT, an internal function MeanRarity:::Bootstrap.CI_Df(). This computes the same intervals as iNEXT. Although elsewhere in the Oikos forum piece, we critique these intervals, we included them in this figure to demonstrate how the choice of tools would or would not drive results.

This code chunk will likely take almost a minute to run.


#tictoc::tic() #time this

iN_form <- as.list(beeAbunds[, -1])
by_size <- iNEXT::estimateD(iN_form
                                , base = "size")

by_coverage <- iNEXT::estimateD(iN_form
                                , base = "coverage")

by_effort <- purrr::map_dfr(lapply(iN_form, sum), function(size){
    iNEXT::estimateD(iN_form
                    , base = "size"
                    , level = size) %>% 
    dplyr::filter(Method == "Observed") 

})
    
iseveral <- dplyr::bind_rows(size = by_size
                             , coverage = by_coverage
                             , effort = by_effort
                             , .id = "standard") %>%
  dplyr::mutate(order = 1 - Order.q)
                   
#tictoc::toc()  #43 sec this time!?!?!

#tictoc::tic()
asyseveral <- purrr::map_dfr(2:5, function(sitio){
  purrr::map_dfr(-1:1, function(l){
    theboots <- MeanRarity:::Bootstrap.CI_df(pull(beeAbunds[, sitio]), l)
    est <- Chao_Hill_abu(dplyr::pull(beeAbunds[, sitio]), l)
    data.frame(Assemblage = names(beeAbunds[, sitio])
               , order = l
               , qD = est
               , qD.LCL = est - theboots[2]
               , qD.UCL = est + theboots[4])
  })
})
#tictoc::toc() # <2 seconds.

asyseveral$Method <- "asymptotic"
asyseveral$standard <- "asymptotic"
allseveral <- dplyr::bind_rows(iseveral, asyseveral)
#diversities, standardized by effort

#get some parameters of data
#individuals in sample with fewest
minm <- min(by_effort %>% 
              dplyr::filter(m == min(m)) %>% 
              dplyr::pull(m)) 

minc <- min(by_effort %>% 
              dplyr::group_by(Assemblage) %>% 
              dplyr::summarize(SC = max(SC)) %>% 
              dplyr::pull(SC)) # coverage of sample with lowest


tog <- allseveral %>% 
  dplyr::mutate(sm = paste(standard, Assemblage, sep=""))

inds <- data.frame("order" = c(1, 0, -1), divind = factor(c("richness"
                                                           , "Hill-Shannon"
                                                           , "Hill-Simpson")
                                                 , levels = c("richness"
                                                              , "Hill-Shannon"
                                                              , "Hill-Simpson")))

togn <- tog %>% 
  dplyr::left_join(inds) %>%  
  dplyr::ungroup()
#> Joining with `by = join_by(order)`

togn <- togn %>%     
  dplyr::mutate(divind = factor(divind
                         , levels = c("richness"
                                      , "Hill-Shannon"
                                      , "Hill-Simpson"))
          , site = factor(Assemblage
                          , levels = c("IAS_3"
                                     , "Cold Soil_5"
                                     , "Lord Stirling_4"
                                     , "Fox Hill_5")))
togn <- togn %>% 
  dplyr::mutate(methlab = factor(standard
                          , levels = c("effort"
                                       , "size"
                                       , "coverage"
                                       , "asymptotic")
                           , labels = c("effort (14 transects)"
                                        , "size (255 individuals)"
                                        , "coverage (95.7%)"
                                        , "asymptotic")))
ebsize<-0.6
ebw<-0.4
psize<-2.5

############################
# make figures, which get stuck together.
leftPlot <- togn %>% 
    dplyr::filter(methlab != "asymptotic") %>% 
    ggplot2::ggplot(aes(site, qD, color = site, shape = site)) +
    ggplot2::geom_errorbar(aes(ymin = qD.LCL, ymax = qD.UCL)
                  , linewidth = ebsize, width = ebw) +
    ggplot2::geom_point(size = psize) +
    ggplot2::facet_grid(divind ~ methlab, switch = "y", scales = "free_y") +
    ggplot2::theme_classic() +
    ggplot2::theme(legend.position = "none", axis.text.x = ggplot2::element_blank()
          , axis.ticks.x = ggplot2::element_blank()) +
    ggplot2::labs(y = NULL) +
    ggplot2::theme(panel.spacing = unit(0, "lines")
          , panel.border = ggplot2::element_rect(fill = NA)
          , strip.background = ggplot2::element_blank()
          , strip.text = ggplot2::element_text(face = "bold", size = 10)
          , axis.title.x = ggplot2::element_blank()
          , legend.position = "none"
          , strip.placement = "outside") +
    ggplot2::scale_y_continuous(limits = c(0,NA)
                       , expand = ggplot2::expansion(mult = c(0, .05))) +
    ggplot2::scale_shape_manual(values = c(17, 15, 18, 16)) +
    ggplot2::scale_color_manual(values = c("#7570b3"
                                , "#1b9e77"
                                , "#e7298a"
                                , "#d95f02")) 
rightPlot <- togn %>% 
  dplyr::filter(methlab == "asymptotic") %>% 
  ggplot2::ggplot(ggplot2::aes(site, qD, color = site, shape = site)) +
  ggplot2::geom_errorbar(ggplot2::aes(ymin = qD.LCL, ymax = qD.UCL)
                         , linewidth = ebsize, width = ebw) +
  ggplot2::geom_point(size = psize) +
  ggplot2::facet_grid(divind~methlab, switch = "y", scales = "free_y") +
  ggplot2::theme_classic() +
  ggplot2::theme(legend.position = "none"
                 , axis.text.x = ggplot2::element_blank()
                 , axis.ticks.x = ggplot2::element_blank()) +
  ggplot2::labs(y = NULL, x = NULL) +
  ggplot2::theme(panel.spacing = unit(0, "lines")
                 , panel.border = ggplot2::element_rect(fill = NA)
                 , strip.background = ggplot2::element_blank()
                 , strip.text.x = ggplot2::element_text(face = "bold"
                                                        , size = 10)
                 , strip.text.y = ggplot2::element_blank()
                 , axis.title = ggplot2::element_blank()
                 , legend.position = "none"
                 , strip.placement = "outside") +
  ggplot2::scale_y_continuous(limits = c(0, NA)
                     , expand = ggplot2::expansion(mult = c(0, 0.05))) +
  ggplot2::scale_shape_manual(values = c(17, 15, 18, 16)) +
  ggplot2::scale_color_manual(values = c("#7570b3"
                              , "#1b9e77"
                              , "#e7298a"
                              , "#d95f02" ))

# throwing errors 20230420
#leftPlot + rightPlot + plot_layout(widths = c(3, 1))

A key idea is that the amount of sampling itself (and not simply standardization) itself affects relative biodiversity estimates. We demonstrated this in the MS with Figure 5. Although the MeanRarity package exports sample_finite(), sample_finite(), and subcom(), all convenience functions for sampling community data, to produce Fig. 5 no tools from MeanRarity were used.

# this takes a long time to run so we're going to stash it but export the data
# in the package

# the idea here is to sample from 14 30-minute sampling bouts from each site and 
# compare coverage. While we resample 30-minute transects, we compute the
# estimated sample coverage and count individuals in each subsample. We use this
# to compare standardization methods. 

maxeff <- 14
reps <- 9999
#tictoc::tic() #time this
nc <- parallel::detectCores() - 1 #nc is  cores to use in parallel processing
future::plan(strategy = future::multiprocess, workers = nc) # sets up cluster

show_ests <- furrr::future_map_dfr(1:reps, function(x){
  purrr::map_dfr(1:maxeff, function(i){
    sam <- beeObs %>%
      dplyr::group_by(sr) %>%
      dplyr::do(dplyr::filter(., start %in%
                         sample(unique(.$start)
                                , size = maxeff - i + 1
                                , replace = F)))
    subcom <- sam %>% dplyr::group_by(bee, sr) %>%
      dplyr::summarize(abund = n()) %>%
      tidyr::spread(key = sr
                    , value = abund
                    , fill = 0) %>%
      as.data.frame()
    basicdat <- purrr::map_dfr(apply(subcom[, -1], MARGIN = 2, obs_est), rbind)
    return(data.frame(basicdat
                      , site = names(subcom[, -1])
                      , transects = maxeff - i + 1))
  })
})

#tictoc::toc() #Under 2 hours with only 4 cores

effort_means <- show_ests %>%
  dplyr::group_by(site, transects) %>%
  dplyr::summarize_at(.vars = c("n"
                              , "coverage"
                              , "obsrich"
                              , "chaorich"
                              , "obsshan"
                              , "chaoshan"
                              , "obssimp"
                              , "chaosimp"), .funs = mean)


#gather to make a prettier plot, will need to get fancy now that there's asy also
effort_rare <- effort_means %>% dplyr::rename(
  "richness_observed" = obsrich
  , "richness_asymptotic" = chaorich
  , "Hill-Shannon_observed" = obsshan
  , "Hill-Shannon_asymptotic" = chaoshan
  , "Hill-Simpson_observed" = obssimp
  , "Hill-Simpson_asymptotic" = chaosimp
  , size = n
  , effort = transects) %>%
  tidyr::gather(key = "ell"
                , value = "diversity"
                , "richness_observed"
                , "richness_asymptotic"
                , "Hill-Shannon_observed"
                , "Hill-Shannon_asymptotic"
                , "Hill-Simpson_observed"
                , "Hill-Simpson_asymptotic") %>%
  tidyr::gather(key = method
                , value =xax
                , effort
                , size
                , coverage) %>%
  tidyr::separate(ell, c("divind", "etype"), sep = "_")

effort_rare$divind <- factor(effort_rare$divind
                          , levels = c("richness"
                                       , "Hill-Shannon"
                                       , "Hill-Simpson")
                          , labels = c("richness"
                                     , "Hill-Shannon"
                                     , "Hill-Simpson"))
effort_rare$method <- factor(effort_rare$method
                          , levels = c("effort"
                                     , "size"
                                     , "coverage"))
effort_rare$etype <- factor(effort_rare$etype
                         , levels = c("observed"
                                      , "asymptotic"))


effort_rare %>% 
  dplyr::filter(etype == "observed") %>% 
  ggplot2::ggplot(aes(xax
             , diversity
             , color = site
             , shape = site)) +
  ggplot2::geom_point(size = 2) +
  ggplot2::geom_line() +
  ggplot2::theme_classic() +
  ggplot2::scale_alpha(range=c(0.4, 0.8)) +
  ggplot2::theme( legend.position = "none"
         , axis.text = ggplot2::element_text(colour = "black")
         , axis.title.y = ggplot2::element_text(size = 14)
         , panel.spacing = unit(1.3, "lines")
         , strip.placement.y = "outside") +
  ggplot2::labs(x = "    transects                            individuals                             % coverage    "
       , y = "diversity") +
  ggplot2::facet_grid(divind ~ method, scales = "free", switch = "y") +
  ggplot2::expand_limits(y = 0) +
  ggplot2::scale_shape_manual(values = c(15:18)) +
  ggplot2::scale_color_brewer(palette = "Dark2") + 
  ggplot2::scale_y_log10()
#> Warning in ggplot2::scale_y_log10(): log-10 transformation
#> introduced infinite values.

We examine whether the same sensitivities apply to asymptotic diversity estimates, which present an alternative to the data standardization approaches, in Appendix C. In fact, within the range of sampling completeness and unevennesses in our dataset, the asymptotic diversity estimators seem as unreliable as sample diversity in demonstrating relative differences in diversity between communities.

effort_rare %>% 
  ggplot2::ggplot(ggplot2::aes(xax, diversity, color = site, shape = site)) +
  ggplot2::geom_line(ggplot2::aes(alpha = etype), size = 1) +
  ggplot2::theme_classic() +
  ggplot2::scale_alpha(range = c(0.4, 0.8)) +
  ggplot2::theme( legend.position = "none"
         , axis.text = ggplot2::element_text(colour = "black")
         , axis.title.y = ggplot2::element_text(size = 14)
         , panel.spacing = unit(0.3, "lines")
         , strip.placement.y = "outside") +
  # shoudl fix the spacing between x-axis labels
  ggplot2::labs(x = "    transects                                     individuals                                     % coverage    "
       , y = "diversity") +
  ggplot2::facet_grid(divind ~ method + etype, scales = "free", switch = "y") +
  ggplot2::expand_limits(y = 0) +
  ggplot2::scale_color_brewer(palette = "Dark2") + 
  ggplot2::scale_y_log10() +
  ggplot2::scale_alpha_manual(values = c(1, 0.6))
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Scale for alpha is already present.
#> Adding another scale for alpha, which will replace the existing scale.
#> Warning in ggplot2::scale_y_log10(): log-10 transformation
#> introduced infinite values.

In Box 5, we visualize the mean rarity by finding a physical balance point. We hope this helps clarify how rarity is scaled by different means, but always weighted by the abundance of individuals. The function rarity_plot() generates these figures. They can also be generated interactively in the shiny app . The key thing to notice is that while the scaling of rarity changes with the exponent \(\ell\), the weighting of each species stays the same. This is the simplest way to see how Hill diversities compute mean rarity from relative abundances, and we find the mean rarity framework simpler than imagining that the scaling parameter somehow adjusts species weighting.

The balance plots are fairly complex, and use several exported functions from the MeanRarity package. They rely, of course, on Hill diversity computations, with the function rarity(), which in turn calls pfun() and ipfun() to compute power transformations and their inverse. Additionally, the plots are built with ggplot, and the curious reader may wish to explore the code (and even suggest improvements!).


# plot some rarity "balance plots" for guide to measuring diversity (Roswell et al. 2021 http://onlinelibrary.wiley.com/doi/10.1111/oik.07202/abstract)
ab <- c(20,8,5,4,2,1)

rich_bal <- rarity_plot(ab, 1, base_size = 12) +
  ggplot2::labs(x = "") +
  scale_color_brewer(type = "qual", palette = "Dark2") 
#> 
#>      `rarity_plot()` expects a square viewport (likely issues in the RStudio
#>      plotting device) and resizes points based on `min(dev.size()` and ``noco`
#>      (for number of columns).
#> 
#>      Selecting `lines = TRUE` will plot stacks of individuals as a line
#>      element, which tends to be more robust to window size.
#> 
#>      Setting `lines = TRUE` may be the best way to deal with overplotting,
#>      which results from several species with similar but not identical
#>      rarities.
#> [1] "diversity = 6"            "community size = 40"     
#> [3] "max observed rarity = 40" "min observed rarity = 2"
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.

#do not return all the text this time
shan_bal <- white_y(rarity_plot(ab, 0, base_size = 12, verbose = F)) + 
  scale_color_brewer(type = "qual", palette = "Dark2")
#> 
#>      `rarity_plot()` expects a square viewport (likely issues in the RStudio
#>      plotting device) and resizes points based on `min(dev.size()` and ``noco`
#>      (for number of columns).
#> 
#>      Selecting `lines = TRUE` will plot stacks of individuals as a line
#>      element, which tends to be more robust to window size.
#> 
#>      Setting `lines = TRUE` may be the best way to deal with overplotting,
#>      which results from several species with similar but not identical
#>      rarities.
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.

simp_bal <- white_y(rarity_plot(ab, -1, base_size = 12, verbose = F)) + 
  ggplot2::labs(x = "") +
  scale_color_brewer(type = "qual", palette = "Dark2")
#> 
#>      `rarity_plot()` expects a square viewport (likely issues in the RStudio
#>      plotting device) and resizes points based on `min(dev.size()` and ``noco`
#>      (for number of columns).
#> 
#>      Selecting `lines = TRUE` will plot stacks of individuals as a line
#>      element, which tends to be more robust to window size.
#> 
#>      Setting `lines = TRUE` may be the best way to deal with overplotting,
#>      which results from several species with similar but not identical
#>      rarities.
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.

#assemble triptych with patchwork
rich_bal +
  shan_bal +
  simp_bal

In the appendices of the guide we explored confidence interval performance for diversity estimates via simulation. The code we used to do this requires substantial computational resources and we conducted much of it on a university high performance computing cluster. Thus, we did not include the code and data in this R package (it can be found at (GITHUB LINK), and do not recommend running on a home/office computer with the sample sizes we used. However, to explore CI performance, we designed a way to simulate species abundance distributions (SADs) directly based on their Hill diversity, demonstrated below with the function fit_SAD(). In the github repository, you may see how we took the simulated abundance distributions, took random samples (with replacement), and for the random samples we computed the estimated confidence intervals (CI) given by Chao and Jost (2015). We assessed what proportion of empirical CI contained the target value: the expected sample diversity or the true (simulated) diversity of the SAD.

First we generate SADs with the function fit_SAD(). fit_SAD() uses a target Simpson diversity and distributional family to set the relative abundance for a fixed number of species, where species relative abundances are given by evenly-spaced quantiles of the continuous [gamma or log-normal] distribution. The function uses three parameters: richness, evenness, and the parametric distribution family. Richness can be any integer >0. To parameterize evenness, we used , which is equal to 1 with a perfectly even species-abundance distribution, and 0 when Hill-Simpson diversity reaches its minimum, 1, for any richness value (Chao, Ricotta, and Aldo 2019). Hill-Simpson diversity is a positive number between 1 and richness. We used the log-normal distribution because it fits observed sample species abundance distributions well (Enquist et al. 2019; Baldridge et al. 2016; Thomas J. Matthews et al. 2019). We used the gamma distribution because it fits observed species abundance distributions as well (Thomas J. Matthews and Whittaker 2014; Thomas J. Matthews et al. 2019), but makes different assumptions about rare species. Generating many SADs with different diversities and distributional assumptions provides scope to test proposed confidence intervals (Béguinot 2018).


# simulate SADs for analysis
SADs_list<-map(c("lnorm", "gamma"), function(distr){
  map(c(100, 200), function(rich){
    map(c(0.05, .15,.25,.5,.75,.85), function(simp_Prop){ #simp_Prop is evenness
      fit_SAD(rich = rich, simpson = simp_Prop*(rich-1)+1, distr = distr)
    })
  })
})
#> Warning in fit_SAD(rich = rich, simpson = simp_Prop * (rich - 1) + 1, distr = distr): Rarest species has relative abundance  7.18739193244666e-47 . 
#> Consider implied community size.
#> Warning in fit_SAD(rich = rich, simpson = simp_Prop * (rich - 1) + 1, distr = distr): Rarest species has relative abundance  1.20738866916925e-15 . 
#> Consider implied community size.
#> Warning in fit_SAD(rich = rich, simpson = simp_Prop * (rich - 1) + 1, distr = distr): Rarest species has relative abundance  1.7925645196612e-09 . 
#> Consider implied community size.
#> Warning in fit_SAD(rich = rich, simpson = simp_Prop * (rich - 1) + 1, distr = distr): Rarest species has relative abundance  1.99403602229896e-53 . 
#> Consider implied community size.
#> Warning in fit_SAD(rich = rich, simpson = simp_Prop * (rich - 1) + 1, distr = distr): Rarest species has relative abundance  1.36083992426459e-17 . 
#> Consider implied community size.
#> Warning in fit_SAD(rich = rich, simpson = simp_Prop * (rich - 1) + 1, distr = distr): Rarest species has relative abundance  1.21155246166886e-10 . 
#> Consider implied community size.

The function fit_SAD() provides useful way to simulate species abundance distributions. Unlike many methods to simulate SADs that rely on finite communities, fit_SAD provides only relative abundances. This is particularly desirable for testing the statistical behavior of Hill diversities, as these also consider only relative abundances. Another feature of fit_SAD() is that the true diversity (richness and Hill-Simpson diversity), along with a parametric distributional assumption, are the inputs to the simulation, and not downstream effects of other, ancillary parameters.

fit_SAD(), like other methods, does require a sort of discrete approximation to continuous distributions. However, rather than rounding, this is simply achieved by taking the probability density of the selected distribution at evenly-spaced intervals as proportional to the relative abundances of each species.

To generate finite data for our research, samples from the simulated SADs are taken directly, without first simulating a finite pool or “community,” in contrast to other commonly used methods to simulate SADs (May et al. 2018). Here’s a quick visualization of the SADs we just generated (this is not a figure in the MS). Rather than showing all 24, we’ll look at a sample of 6: The most and least even ones, and one of the intermediate ones as well, for each of the two distributional families.

# function to remove subscripts; apparently this way of making a labeller is now
# deprecated
remsub <- function(variable, value){
  return(gsub("_"," ",value))}

#function to summarize frequencies
asab<-function(namevec){as.numeric(table(namevec))}

# make rank abundance distributions
myabs<-map_dfr(flatten(flatten(SADs_list)) #, function(x) data.frame(names(x)))
               , function(x){data.frame(ab=x$rel_abundances)}
               , .id="SAD")


myabs %>% dplyr::left_join(data.frame(
  SAD=as.character(1:24)
  , skew=factor(c("uneven", "int","int", "int","int", "even"), levels=c("uneven", "int", "even"))
  , dist=factor(c(rep("lognormal", 12), rep("gamma", 12)), levels=c("lognormal", "gamma"))
)) %>%
  dplyr::mutate(abD=paste(dist, skew)) %>%
  dplyr::group_by(SAD, abD, dist, skew) %>%
  dplyr::mutate(abrank = dplyr::min_rank(desc(ab))
                , log_relative_abundance = log(ab)
                , relative_abundance = ab) %>%
  #for plotting, gather the two scales to use `facet_grid`
  tidyr::gather(scl, rel_abund, relative_abundance, log_relative_abundance )%>%
  # grab the extreme SADs and one intermediate for each distributional
  # assumption, just the 200 spp versions (they look the same though)
  dplyr::filter(SAD %in% c("7","10","12","19","22","24")) %>%
  # plot the RADs to show difference between gamma and lognormal, more extreme
  # with skew
  ggplot2::ggplot(aes(abrank, rel_abund, color=dist))+
    ggplot2::geom_point(alpha=0.3, size=1)+
    ggplot2::geom_line(size=.4, alpha=0.3)+
    ggplot2::theme_classic()+
    # theme(text=element_text(size=16))+
    ggplot2::labs(x="abundance rank", y="", color="", shape="")+
      ggplot2::facet_grid(forcats::fct_rev(scl)~skew, scales="free", switch="y", labeller=remsub)
#> Joining with `by = join_by(SAD)`
#> Warning: The `labeller` API has been updated. Labellers taking `variable` and `value`
#> arguments are now deprecated.
#> ℹ See labellers documentation.

In the supplementary materials for (Roswell, Dushoff, and Winfree 2021), we demonstrate that the CI for diversity estimates based on the approach articulated in (Chao and Jost 2015) tend to be anti-conservative. The code and data for this are not exported by this package; the scripts to generate all relevant data and figures are available at [github link].

References

Baldridge, Elita, David J. Harris, Xiao Xiao, and Ethan P. White. 2016. “An extensive comparison of species-abundance distribution models.” PeerJ 4: e2823. https://doi.org/10.7717/peerj.2823.

Béguinot, Jean. 2018. “How to Extrapolate Species Abundance Distributions with Minimum Bias When Dealing with Incomplete Species Inventories.” Advances in Research 13 (4): 1–24. https://doi.org/10.9734/air/2018/39002.

Chao, Anne, and Lou Jost. 2015. “Estimating diversity and entropy profiles via discovery rates of new species.” Methods in Ecology and Evolution 6 (8): 873–82. https://doi.org/10.1111/2041-21.

Chao, Anne, Carlo Ricotta, and Piazzale Aldo. 2019. “Quantifying evenness and linking it to diversity, beta diversity, and similarity.” Ecology, no. 0: 1–15. https://doi.org/10.1002/ecy.2852.

Enquist, Brian J., Xiao Feng, Brad Boyle, Brian Maitner, Erica A. Newman, Peter Møller Jørgensen, Patrick R. Roehrdanz, et al. 2019. “The commonness of rarity: Global and future distribution of rarity across land plants.” Science Advances 5 (11): 1–14. https://doi.org/10.1126/sciadv.aaz0414.

Matthews, Thomas J., Jon P. Sadler, Yasuhiro Kubota, Christopher W. Woodall, and Thomas A. M. Pugh. 2019. “Systematic variation in North American tree species abundance distributions along macroecological climatic gradients.” Global Ecology and Biogeography 28 (5): 601–11. https://doi.org/10.1111/geb.12879.

Matthews, Thomas J, and Robert J Whittaker. 2014. “Fitting and comparing competing models of the species abundance distribution: assessment and prospect.” Frontiers of Biogeography 6 (2). https://doi.org/10.21425/f5fbg20607.

May, Felix, Katharina Gerstner, Daniel J. McGlinn, Xiao Xiao, and Jonathan M. Chase. 2018. “mobsim: An R package for the simulation and measurement of biodiversity across spatial scales.” Methods in Ecology and Evolution 9 (6): 1401–8. https://doi.org/10.1111/2041-210X.12986.

Roswell, Michael, Jonathan Dushoff, and Rachael Winfree. 2021. “A conceptual guide to measuring species diversity.” Oikos 130 (3): 321–38. https://doi.org/10.1111/oik.07202.