diversity.evaluate.core {EvaluateCore}  R Documentation 
Description
Compute the following diversity indices and perform corresponding statisticaltests to compare the phenotypic diversity for qualitative traits betweenentire collection (EC) and core set (CS).
Simpson's andrelated indices
Simpson's Index (\(d\))(Simpson 1949; Peet 1974)
Simpson's Index of Diversity or Gini's Diversity Index or GiniSimpsonIndex or Nei's Diversity Index or Nei's Variation Index (\(D\))(Gini 1912, 1912; Greenberg 1956; Berger and Parker 1970; Nei 1973; Peet 1974)
Maximum Simpson's Index of Diversity or Maximum Nei'sDiversity/Variation Index (\(D_{max}\))(Hennink and Zeven 1990)
Simpson'sReciprocal Index or Hill's \(N_{2}\) (\(D_{R}\))(Williams 1964; Hill 1973)
Relative Simpson's Index of Diversity or Relative Nei'sDiversity/Variation Index (\(D'\))(Hennink and Zeven 1990)
ShannonWeaver and related indices
Shannon orShannonWeaver or ShannonWeiner Diversity Index (\(H\))(Shannon and Weaver 1949; Peet 1974)
Maximum ShannonWeaver Diversity Index (\(H_{max}\))(Hennink and Zeven 1990)
RelativeShannonWeaver Diversity Index or Shannon Equitability Index (\(H'\))(Hennink and Zeven 1990)
McIntoshDiversity Index
McIntosh Diversity Index (\(D_{Mc}\))(McIntosh 1967; Peet 1974)
Usage
diversity.evaluate.core(data, names, qualitative, selected, base = 2, R = 1000)
Arguments
data  The data as a data frame object. The data frame should possessone row per individual and columns with the individual names and multipletrait/character data. 
names  Name of column with the individual names as a character string 
qualitative  Name of columns with the qualitative traits as a charactervector. 
selected  Character vector with the names of individuals selected incore collection and present in the 
base  The logarithm base to be used for computation of ShannonWeaverDiversity Index (\(I\)). Default is 2. 
R  The number of bootstrap replicates. Default is 1000. 
Value
A list with three data frames as follows.
simpson 

shannon 

mcintosh 

Details
The diversity indices and the corresponding statisticaltests implemented in diversity.evaluate.core
are as follows.
Simpson's and related indices
Simpson's index (\(d\))which estimates the probability that two accessions randomly selected willbelong to the same phenotypic class of a trait, is computed as follows(Simpson 1949; Peet 1974).
Where, \(p_{i}\) denotes the proportion/fraction/frequency ofaccessions in the \(i\)th phenotypic class for a trait and \(k\)is the number of phenotypic classes for the trait.
The value of \(d\) can range from 0 to 1 with 0 representing maximumdiversity and 1, no diversity.
\(d\) is subtracted from 1 to give Simpson's index of diversity(\(D\))(Greenberg 1956; Berger and Parker 1970; Peet 1974; Hennink and Zeven 1990)originally suggested byGini (1912, 1912)and described in literature as Gini's diversity index or GiniSimpsonindex. It is the same as Nei's diversity index or Nei's variation index(Nei 1973; Hennink and Zeven 1990).Greater the value of \(D\), greater the diversity with a range from 0to 1.
\[D = 1  d\]The maximum value of \(D\), \(D_{max}\) occurs when accessionsare uniformly distributed across the phenotypic classes and is computed asfollows (Hennink and Zeven 1990).
\[D_{max} = 1  \frac{1}{k}\]Reciprocal of \(d\) gives the Simpson's reciprocal index(\(D_{R}\))(Williams 1964; Hennink and Zeven 1990)and can range from 1 to \(k\). This was also described inHill (1973) as (\(N_{2}\)).
\[D_{R} = \frac{1}{d}\]Relative Simpson's index of diversity or Relative Nei's diversity/variationindex (\(H'\)) (Hennink and Zeven 1990)is defined as follows (Peet 1974).
\[D' = \frac{D}{D_{max}}\]Differences in Simpson's diversity index for qualitative traits of EC andCS can be tested by a ttest using the associated variance estimatedescribed in Simpson (1949)(Lyons and Hutcheson 1978).
The t statistic is computed as follows.
\[t = \frac{d_{EC}  d_{CS}}{\sqrt{V_{d_{EC}} + V_{d_{CS}}}}\]Where, the variance of \(d\) (\(V_{d}\)) is,
\[V_{d} = \frac{4N(N1)(N2)\sum_{i=1}^{k}(p_{i})^{3} + 2N(N1)\sum_{i=1}^{k}(p_{i})^{2}  2N(N1)(2N3) \left( \sum_{i=1}^{k}(p_{i})^{2} \right)^{2}}{[N(N1)]^{2}}\]The associated degrees of freedom is computed as follows.
\[df = (k_{EC}  1) + (k_{CS}  1)\]Where, \(k_{EC}\) and \(k_{CS}\) are the number of phenotypicclasses in the trait for EC and CS respectively.
ShannonWeaver and related indices
An index of information\(H\), was described byShannon and Weaver (1949) as follows.
\[H = \sum_{i=1}^{k}p_{i} \log_{2}(p_{i})\]\(H\) is described as Shannon or ShannonWeaver or ShannonWeinerdiversity index in literature.
Alternatively, \(H\) is also computed using natural logarithm insteadof logarithm to base 2.
\[H = \sum_{i=1}^{k}p_{i} \ln(p_{i})\]The maximum value of \(H\) (\(H_{max}\)) is \(\ln(k)\). Thisvalue occurs when each phenotypic class for a trait has the same proportionof accessions.
\[H_{max} = \log_{2}(k)\;\; \textrm{OR} \;\; H_{max} = \ln(k)\]The relative ShannonWeaver diversity index or Shannon equitability index(\(H'\)) is the Shannon diversity index (\(I\)) divided by themaximum diversity (\(H_{max}\)).
\[H' = \frac{H}{H_{max}}\]Differences in ShannonWeaver diversity index for qualitative traits of ECand CS can be tested by Hutcheson ttest(Hutcheson 1970).
The Hutcheson t statistic is computed as follows.
\[t = \frac{H_{EC}  H_{CS}}{\sqrt{V_{H_{EC}} + V_{H_{CS}}}}\]Where, the variance of \(H\) (\(V_{H}\)) is,
\[V_{H} = \frac{\sum_{i=1}^{k}n_{i}(\log_{2}{n_{i}})^{2} \frac{(\sum_{i=1}^{k}\log_{2}{n_{i}})^2}{N}}{N^{2}}\]\[\textrm{OR}\]\[V_{H} = \frac{\sum_{i=1}^{k}n_{i}(\ln{n_{i}})^{2} \frac{(\sum_{i=1}^{k}\ln{n_{i}})^2}{N}}{N^{2}}\]The associated degrees of freedom is approximated as follows.
\[df = \frac{(V_{H_{EC}} + V_{H_{CS}})^{2}}{\frac{V_{H_{EC}}^{2}}{N_{EC}} + \frac{V_{H_{CS}}^{2}}{N_{CS}}}\]McIntosh Diversity Index
A similar index of diversity wasdescribed by McIntosh (1967) asfollows (\(D_{Mc}\)) (Peet 1974).
\[D_{Mc} = \frac{N  \sqrt{\sum_{i=1}^{k}n_{i}^2}}{N  \sqrt{N}}\]Where, \(n_{i}\) denotes the number of accessions in the \(i\)thphenotypic class for a trait and \(N\) is the total number ofaccessions so that \(p_{i} = {n_{i}}/{N}\).
Testing for difference with bootstrapping
Bootstrap statisticsare employed to test the difference between the Simpson, ShannonWeaver andMcIntosh indices for qualitative traits of EC and CS(Solow 1993).
If \(I_{EC}\) and \(I_{CS}\) are the diversity indices with theoriginal number of accessions, then random samples of the same size as theoriginal are repeatedly generated (with replacement) \(R\) times andthe corresponding diversity index is computed for each sample.
\[I_{EC}^{*} = \lbrace H_{EC_{1}}, H_{EC_{}}, \cdots, H_{EC_{R}} \rbrace\]\[I_{CS}^{*} = \lbrace H_{CS_{1}}, H_{CS_{}}, \cdots, H_{CS_{R}} \rbrace\]Then the bootstrap null sample \(I_{0}\) is computed as follows.
\[\Delta^{*} = I_{EC}^{*}  I_{CS}^{*}\]\[I_{0} = \Delta^{*}  \overline{\Delta^{*}}\]Where, \(\overline{\Delta^{*}}\) is the mean of \(\Delta^{*}\).
Now the original difference in diversity indices (\(\Delta_{0} = I_{EC}  I_{CS}\)) is tested against mean of bootstrap null sample(\(I_{0}\)) by a z test. The z score test statistic is computed asfollows.
\[z = \frac{\Delta_{0}  \overline{H_{0}}}{\sqrt{V_{H_{0}}}}\]Where, \(\overline{H_{0}}\) and \(V_{H_{0}}\) are the mean andvariance of the bootstrap null sample \(H_{0}\).
The corresponding degrees of freedom is estimated as follows.
\[df = (k_{EC}  1) + (k_{CS}  1)\]References
Berger WH, Parker FL (1970).“Diversity of planktonic foraminifera in deepsea sediments.”Science, 168(3937), 1345–1347.
Gini C (1912).Variabilita e Mutabilita. Contributo allo Studio delle Distribuzioni e delle Relazioni Statistiche. [Fasc. I.].Tipogr. di P. Cuppini, Bologna.
Gini C (1912).“Variabilita e mutabilita.”In Pizetti E, Salvemini T (eds.), Memorie di Metodologica Statistica.Liberia Eredi Virgilio Veschi, Roma, Italy.
Greenberg JH (1956).“The measurement of linguistic diversity.”Language, 32(1), 109.
Hennink S, Zeven AC (1990).“The interpretation of Nei and ShannonWeaver within population variation indices.”Euphytica, 51(3), 235–240.
Hill MO (1973).“Diversity and evenness: A unifying notation and its consequences.”Ecology, 54(2), 427–432.
Hutcheson K (1970).“A test for comparing diversities based on the Shannon formula.”Journal of Theoretical Biology, 29(1), 151–154.
Lyons NI, Hutcheson K (1978).“C20. Comparing diversities: Gini's index.”Journal of Statistical Computation and Simulation, 8(1), 75–78.
McIntosh RP (1967).“An index of diversity and the relation of certain concepts to diversity.”Ecology, 48(3), 392–404.
Nei M (1973).“Analysis of gene diversity in subdivided populations.”Proceedings of the National Academy of Sciences, 70(12), 3321–3323.
Peet RK (1974).“The measurement of species diversity.”Annual Review of Ecology and Systematics, 5(1), 285–307.
Shannon CE, Weaver W (1949).The Mathematical Theory of Communication, number v. 2 in The Mathematical Theory of Communication.University of Illinois Press.
Simpson EH (1949).“Measurement of diversity.”Nature, 163(4148), 688–688.
Solow AR (1993).“A simple test for change in community structure.”The Journal of Animal Ecology, 62(1), 191.
Williams CB (1964).Patterns in the Balance of Nature and Related Problems in Quantitative Ecology.Academic Press.
See Also
shannon
, diversity
,boot
Examples
data("cassava_CC")data("cassava_EC")ec < cbind(genotypes = rownames(cassava_EC), cassava_EC)ec$genotypes < as.character(ec$genotypes)rownames(ec) < NULLcore < rownames(cassava_CC)quant < c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM")qual < c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR")ec[, qual] < lapply(ec[, qual], function(x) factor(as.factor(x)))diversity.evaluate.core(data = ec, names = "genotypes", qualitative = qual, selected = core)
[Package EvaluateCore version 0.1.3 Index]