---
title: "Introduction to toolStability"
author: 'Tien-Cheng Wang'
date: "`r Sys.Date()`"
output:
  pdf_document:
    fig_caption: no
    toc: no
  html_document:
    df_print: paged
    toc: yes
header-includes:
- \usepackage{fancyhdr}
- \usepackage{wrapfig}
- \pagestyle{fancy}
- \fancyhead[LE,RO]{\slshape \rightmark}
- \fancyfoot[C]{\thepage}
- \usepackage{xcolor}
- \usepackage{hyperref}
- \hypersetup{colorlinks=true}
- \hypersetup{linktoc=all}
- \hypersetup{linkcolor=blue}
- \usepackage{pdflscape}
- \usepackage{booktabs}
- \newcommand{\blandscape}{\begin{landscape}}
- \newcommand{\elandscape}{\end{landscape}}
- \fancyhead[LO,RE]{The \texttt{toolStability} Package{:} A Brief Introduction}
vignette: >
  %\VignetteIndexEntry{Introduction to toolStability}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
bibliography: REFERENCES.bib
link-citations: yes
csl: https://tinyurl.com/apa6-meta-analysis
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

\tableofcontents
\clearpage
\begin{wrapfigure}{r}{0.35\textwidth}
  \vspace{-20pt}
  \begin{center}
    \includegraphics[width=0.25\textwidth]{`r system.file("extdata", "toolStability.png", package = "toolStability")`}
  \end{center}
    \vspace{-20pt}
\end{wrapfigure}

## Overview


The package `toolStability` is part of the publication from [@wang2023]. The package is a collection of functions which implements eleven methods for describing the `stability` of a `trait` in terms of `genotype` and `environment`. 


The goal of this vignette is to introduce users to these functions and get started in analyzing dataset with interaction between `genotype` and `environment`. 

Further analysis for original publication using this package can be found in the link ( [click](https://github.com/Illustratien/Wang_2023_TAAG)).

## Installation
The package can be installed using the following functions:

```{r, eval=FALSE}
# Install from CRAN
install.packages('toolStability', dependencies=TRUE)

# Install development version from Github
devtools::install_github("Illustratien/toolStability")

```

Then the package can be loaded using the function
```{r}
library(toolStability)
```


## Welcome to toolStability
This is an r package for calculating parametric, non-parametric, and probabilistic stability indices.

## Structure overview of toolStability

toolStability contains different functions to calculate stability indices, including:

1.  adjusted coefficient of variation
2.  coefficient of determination
3.  coefficient of regression
4.  deviation mean squares
5.  ecovalence
6.  environmental variance
7.  genotypic stability
8.  genotypic superiority measure
9.  safety first index
10. stability variance
11. variance of rank
\clearpage

## Build-in data set

The default data set `Data` is the subset of $APSIM$ simulated wheat data set, which includes 5 genotypes in 4 locations for 4 years, with 2 nitrogen application rates, 2 sowing dates, and 2 $CO_2$ levels of treatments [@casadebaig2016]. Full dataset used in the publication see here [click](https://zenodo.org/record/4729636).

`Data` in this package is a data frame with 640 observations and 8 variables.

: Data Structure

| Parameters | Number | Description |
| :--- | ---: | :--- |
| Trait |   | Wheat yield ($kg~ha^{-1}$).| 
| Genotype | 5 | varieties.|
| Environment | 128 | unique combination of environments for each genotype.|
| Year | 4 | years.|
| Sites | 4 | locations.|
| Nitrogen | 2 | nitrogen application levels.|
| $CO_2$ | 2 | $CO_2$ concentration levels.|
| Sowing | 2 | sowing dates.|

```{r plot, fig.cap = 'Example of environmental effects on wheat yield.', echo = FALSE, fig.height=4, fig.width=8, message=FALSE}

ggplot2::ggplot(Data,ggplot2::aes(x=Sites,y=Yield,col=Genotype))+
  ggplot2::geom_boxplot()+
  ggplot2::facet_grid(Sowing~Nitrogen,labeller =ggplot2::label_both)+
  ggplot2::ylab(bquote('Wheat yield (kg' %.%'ha'^'-1'*')'))+
   ggplot2::theme_test()

```
\clearpage

## Tutorial

1. Data preparation 

In order to calculate stability index, you will need to prepare a data frame with 3 columns containing `trait`, `genotype`, and `environment`. <br /><br />

* `trait`:       numeric and continuous, trait value to be analyzed.
* `genotype`:    character or factor, labeling different genotypic varieties.
* `environment`: character or factor, labeling different environments.

2.  Input formats of function

Most of the functions in the package work with the following format: 
```r
function(data = Data,
  trait = "Trait_Column_Name",
  genotype = "Genotype_Column_Name",
  environment = "Environment_Column_Name")
```

For calculation of probabilistic stability index `safety_first_index`, an additional parameter `lambda` is required.<br /><br />

`lambda`:      minimal acceptable value of `trait` that the user expected from crop across `environment`. `lambda` should between the range of `trait` value.<br /><br />

Under the assumption of `trait` is normally distributed, safety first index is calculated based on the probability of trait below `lambda` across the `environment` for each `genotype`.<br /><br />


3. Function Features

Function`table_stability` generates the summary table containing all the stability indices in the package for every genotypes, also including the mean `trait` value and normality check results for the `trait` of each `genoytpe` across all the `enviornment`. <br /><br />

User can specify the interested combination of environments by entering a vector of column names which containing environmental factors. Option `normalize = TRUE` allow user to compare between different stability indices. Option `unit.correct = TRUE` is designed for getting the square root value of stability indices which have the squared unit of `trait`. For function `ecovalence`, option `modify = TRUE` takes the number of environments into account and make  `modified ecovalence` comparable between different number of environments.

## Examples
```{r}
rm(list=ls())
library(toolStability)
### load data
data("Data")
### check the structure of sample dataset
### be sure that the trait is numeric!!!


dplyr::glimpse(Data)

### calculate ecovalence for all genotypes
single.index.ecovalence <- ecovalence(data = Data,
                                      trait = 'Yield',
                                      genotype = 'Genotype',
                                      environment = 'Environment',
                                      unit.correct = FALSE,
                                      modify = FALSE)
### check the structure of result
dplyr::glimpse(single.index.ecovalence)

### calculate modified ecovalence for all genotypes
single.index.ecovalence.modified <- ecovalence(data = Data,
                                      trait = 'Yield',
                                      genotype = 'Genotype',
                                      environment = 'Environment',
                                      unit.correct = FALSE,
                                      modify = TRUE)
### check the structure of result
dplyr::glimpse(single.index.ecovalence.modified)
```
\clearpage
```{r}
### calculate all stability indices for all genotypes
summary.table <- table_stability(data = Data,
                                 trait = 'Yield',
                                 genotype = 'Genotype',
                                 environment = 'Environment',
                                 lambda = median(Data$Yield),
                                 normalize = FALSE,
                                 unit.correct = FALSE)
#### warning message means your data structure is not distributed as normal distribution

#### check the structure of result
dplyr::glimpse(summary.table)

### calculate all stability indices for all genotypes
normalized.summary.table <- table_stability(data = Data,
                                            trait = 'Yield',
                                            genotype = 'Genotype',
                                            environment = 'Environment',
                                            lambda = median(Data$Yield),
                                            normalize = TRUE,
                                            unit.correct = FALSE)
#### warning message means your data structure is not distributed as normal distribution

#### check the structure of result
dplyr::glimpse(normalized.summary.table)

### compare the result from summary.table and normalized.summary.table


### calculate the stability indices only based only on CO2 and Nitrogen environments
summary.table2 <- table_stability(data = Data,
                                  trait = 'Yield',
                                  genotype = 'Genotype',
                                  environment = c('CO2','Nitrogen'),
                                  lambda = median(Data$Yield),
                                  normalize = FALSE,
                                  unit.correct = FALSE)

#### check the structure of result
dplyr::glimpse(summary.table2)

### compare the result from summary.table and summary.table2
### see how the choice of environments affect the data

```

\newpage

## Equation of stability indices
Let $X_{ij}$ represent the observed mean value of the genoptype $i$ in environment $j$, and let $\bar{X_{i.}}$, $\bar{X_{.j}}$ and $\bar{X_{..}}$ denote the marginal means of genotype $i$, environment $j$ and the overall mean, respectively.

### adjusted coefficient variation

Adjusted coefficient of variation [@doering2018] is calculated based on regression function.
Variety with low adjusted coefficient of variation is considered as stable.
Under the linear model

$$v_{i} = a + b~m_{i}$$
where $v_{i}$ is the $log_{10}$ of phenotypic variance and $m_{i}$ is the $log_{10}$ of  phenotypic mean.
$$\widetilde{c_{i}} = \frac{1}{\widetilde{\mu_{i}}}  \left [
10^{(2-b)~ m_{i} +(b-2)~ \bar{m}+v_{i}} 
\right ] ^{0.5} \times 100 \% $$

\newpage

### coefficient of determination
Coefficient of determination [@pinthus1973] is calculated based on regression function.
Variety with low coefficient of determination is considered as stable.
Under the linear model
$$Y_{ij} =\mu + \beta_{i}~ e_{j} + g_{i} + d_{ij}$$
where $Y_{ij}$ is the observed mean value of $i^{th}$ genotype in the $j^{th}$ environment; $g_{i}$, $e_{j}$ and $\mu$ denoting
genotypic mean, environmental mean and overall population mean, respectively.

The effect of GE-interaction may be expressed as:
$$(ge)_{ij} = \beta_{i}~ e_{j} + d_{ij}$$
where $\beta_{i}$ is coefficient of regression and $d_{ij}$ is deviation from regression.


For $s^{2}_{di}$ and $S_{xi}^{2}$, see $deviation~mean~squares$ and $environmental~ variance$ for details.


Coefficient of determination may be expressed as:
$$r_{i}^{2} = 1 - \frac{s_{di}^{2}}{s_{xi}^{2}} $$

where $X_{ij}$ is the observed phenotypic mean value of genotype $i$ (i = 1,..., G)
in environment $j$ (j = 1,...,E), with $\bar{X_{i.}}$ and  $\bar{X_{.j}}$ 
denoting marginal means of genotype $i$ and environment $j$, respectively. 
$\bar{X_{..}}$ denote the overall mean of $X$.

\newpage

### coefficient of regression
Coefficient of regression [@finlay1963] is calculated based on regression function.
Variety with low coefficient of regression is considered as stable.
Under the linear model
$$Y_{ij} =\mu + \beta_{i}~ e_{j} + g_{i} + d_{ij}$$
where $Y_{ij}$ is the observed mean value of $i^{th}$ genotype in the $j^{th}$ environment; $g_{i}$, $e_{j}$ and $\mu$ denoting
genotypic mean, environmental mean and overall population mean, respectively.

The effect of GE-interaction may be expressed as:
$$(ge)_{ij} = \beta_{i}~ e_{j} + d_{ij}$$
where $\beta_{i}$ is coefficient of regression and $d_{ij}$ is deviation from regression.

Coefficient of regression may be expressed as:
$$ b_{i}=1 + \frac{\sum_{j}(X_{ij} -\bar{X_{i.}}-\bar{X_{.j}}+\bar{X_{..}}) ~
(\bar{X_{.j}}- \bar{X_{..}})}{\sum_{j}(\bar{X_{.j}}-\bar{X_{..}})^{2}}$$

where $X_{ij}$ is the observed phenotypic mean value of genotype $i$ (i = 1,..., G)
in environment $j$ (j = 1,...,E), with $\bar{X_{i.}}$ and $\bar{X_{.j}}$ 
denoting marginal means of genotype $i$ and environment $j$, respectively. 
$\bar{X_{..}}$ denote the overall mean of $X$. $b_{i}$is the estimator of $\beta_{i}$.

\newpage

### deviation mean squares
Deviation mean squares [@eberhart1966] is calculated based on regression function.
Variety with low stability variance is considered as stable.


Deviation mean squares may be expressed as:
$$
s^{2}_{di} = \frac{1}{E-2} \left [
\sum_{j}~(X_{ij} - \bar{X_{i.}}- \bar{X_{.j}} + \bar{X_{..}}^{2}) - (b_{i} - 1)^{2} ~
(\bar{X_{.j}}- \bar{X_{..}})^{2} \right ]
$$

where $X_{ij}$ is the observed phenotypic mean value of genotype $i$ (i = 1,..., G)
in environment $j$ (j = 1,...,E), with $\bar{X_{i.}}$ and $\bar{X_{.j}}$ 
denoting marginal means of genotype $i$ and environment $j$, respectively. 
$\bar{X_{..}}$ denote the overall mean of $X$. $b_{i}$ is the estimation of coefficient of regression.

\newpage

### ecovalence
Ecovalence [@wricke1962] is calculated based on square and sum up the genotype–environment
interaction all over the environment.
Variety with low ecovalence is considered as stable.
Ecovalence is expressed as:
$$W_{i} = \sum_{j}~(X_{ij} - \bar{X_{i.}} - \bar{X_{.j}} + \bar{X_{..}})^{2}$$
To let $W_{i}$ comparable between experiments, we also provide the modified ecovalence ($W_{i}'$), whcih take the number of environments into account. User can get ($W_{i}'$) by setting `modify = TRUE`.

$$W_{i}' = \frac{\sum_{j}~(X_{ij} - \bar{X_{i.}} - \bar{X_{.j}} + \bar{X_{..}})^{2}}{E-1}$$
where $X_{ij}$ is the observed phenotypic mean value of genotype $i$ (i = 1,..., G)
in environment $j$ (j = 1,...,E), with $\bar{X_{i.}}$ denoting marginal means of genotype $i$.

\newpage

### environmental variance
Environmental variance [@roemer1917] is calculated by squared and summing up all deviation from genotypic mean for each genotype.
The larger the environmental variance of one genotype is, the lower the stability.

$$S_{xi}^{2} = \frac{\sum_{j}~(X_{ij} - \bar{X_{i.}})^{2}}{E-1} $$
where $X_{ij}$ is the observed phenotypic mean value of genotype $i$ (i = 1,..., G)
in environment $j$ (j = 1,...,E), with $\bar{X_{i.}}$ denoting marginal means of genotype $i$.

\newpage

### genotypic stability
Genotypic stability [@hanson1970] is calculated based on regression function.
Variety with low stability variance is considered as stable.
Under the linear model
$$Y_{ij} =\mu + \beta_{i}~ e_{j} + g_{i} + d_{ij}$$
where $Y_{ij}$ is the observed mean value of $i^{th}$ genotype in the $j^{th}$ environment; $g_{i}$, $e_{j}$ and $\mu$ denoting
genotypic mean, environmental mean and overall population mean, respectively.


The effect of GE-interaction may be expressed as:
$$(ge)_{ij} = \beta_{i}~ e_{j} + d_{ij}$$
where $\beta_{i}$ is coefficient of regression and $d_{ij}$ is deviation from regression.

Genotypic stability:
$$ D_{i}^{2} = \sum_{j}~(X_{ij} - \bar{X_{i.}}- b_{min}~ \bar{X_{.j}} + b_{min}~ \bar{X_{..}})^{2}$$

where $X_{ij}$ is the observed phenotypic mean value of genotype $i$ (i = 1,..., G)
in environment $j$ (j = 1,...,E), with $\bar{X_{i.}}$ and $\bar{X_{.j}}$ 
denoting marginal means of genotype $i$ and environment $j$, respectively. 
$\bar{X_{..}}$ denote the overall mean of $X$.

$b_{min}$ is the minimum value of coefficient of regression over all environments.

\newpage

### genotypic superiority measure
Genotypic superiority measure [@lin1988] is calculatd based on means square distance between maximum value of environment j and genotype i. Variety with low genotypic superiority measure is considered as stable.

$$P_{i} = \sum_{j}^{n} \frac{(X_{ij}-M_{j})^{2}}{2n}$$
where $X_{ij}$ stands for observed trait and
$M_{j}$ stands for maximum response among all genotypes in the $j^{th}$ location.

\newpage

### safety first index
Safety-first index [@eskridge1990] is calculated based on the normality assumption of trait over the environments.
Among different environments, trait below a given critical level $\lambda$ is defined as failure of trait.
Safety-first index calculating the probability of trait failure over the environment. Variety with low safety first index is considered as stable.

$$Pr~(Y_{ij} < \lambda) = \Phi \left[
(\lambda - \mu_{i})/ \sqrt \sigma_{ii}
\right]$$

where $\lambda$ is the minimal acceptable value of trait that the user expected from crop across environments. Lambda should between the range of trait value. $\Phi$ is the cumulative distribution function of the standard normal distribution. $\mu_{i}$ and $\sigma_{ii}$ is the mean and variance of the system i. Under the assumption of trait is normally distributed, safety first index is calculated based on the probability of trait below lambda across the environments for each genotype.

\newpage

### stability variance
Stability variance [@shukla1972] is calculated based on linear combination of ecovalence and mean square of genotype-environment interaction.
Variety with low stability variance is considered as stable.

$$
\sigma^{2}_{i} = \frac{1}{(G-1) ~ (G-2) ~ (E-1)} \left [
G ~(G-1)~ \sum_{j}~(X_{ij} - \bar{X_{i.}}-\bar{X_{.j}} + \bar{X_{..}})^{2}- \sum_{i} \sum_{j}~(X_{ij} - \bar{X_{i.}}-\bar{X_{.j}} + \bar{X_{..}})^{2}
\right ]
$$
where $X_{ij}$ is the observed phenotypic mean value of genotype $i$ (i = 1,..., G)
in environment $j$ (j = 1,...,E), with $\bar{X_{i.}}$ and $\bar{X_{.j}}$ 
denoting marginal means of genotype $i$ and environment $j$, respectively. 
$\bar{X_{..}}$ denote the overall mean of $X$.

Negative values of stability variance is replaced with 0.

\newpage

### variance of rank
Variance of rank [@nassar1987] is calculated based on regression function.
Variety with low variance of rank is considered as stable.

Correction for each genotype $i$ was done by subtraction of marginal genotypic mean $\bar{X_{i.}}$ and the addition of overall
mean $\bar{X_{..}}$.
$$X_{corrected~ij} = X_{ij} - \bar{X_{i.}} + \bar{X_{..}}$$
Then calculated the rank all genotypes for each environment $j$
$$r_{ij} = rank~(X_{correctedij})$$

Variance of rank is calculated as the following equation.
$$S_{i}4 = \frac{\sum_{j}~(r_{ij}-\bar{r_{i.}})^{2}}{E-1}$$
where $r_{ij}$ is the rank of genotype $i$ in environment $j$ and $\bar{r_{i.}}$ is the marginal rank of genotype $i$ over environment,
based on the corrected $X_{ij}$ values.

\newpage
## Citing `toolStability`
Wang, TC., Casadebaig, P. & Chen, TW. More than 1000 genotypes are required to derive robust relationships between yield, yield stability and physiological parameters: a computational study on wheat crop. Theor Appl Genet 136, 34 (2023). https://doi.org/10.1007/s00122-023-04264-7

## relavant links
    * Reproducible R code for publication https://github.com/Illustratien/Wang_2023_TAAG
    * Data https://doi.org/10.5281/zenodo.4729636
<!-- # ```{r, echo = FALSE, collapse = TRUE} -->
<!-- # # detach("package:toolStability", unload=TRUE) -->
<!-- # # suppressPackageStartupMessages(library(toolStability)) -->
<!-- # # cit <- citation("toolStability") -->
<!-- # #  -->
<!-- # # cit -->
<!-- # ``` -->

\newpage
## References