This is a brief introduction to AMBI index calculations with ambiR.
Structuring species observation data
Species counts (or abundance) should be organized in a data frame arranged in long format. That is, species names are in one column and species counts in another column. If the data represents several stations and/or if there are replicates, then the data should include columns with this information.
Looking at the test_data example dataset included with
the ambiR package, one can see how the data should be arranged:
library(ambiR)
head(test_data)
#> # A tibble: 6 × 4
#> station replicate species count
#> <dbl> <chr> <chr> <dbl>
#> 1 1 a Cumopsis fagei 2
#> 2 1 a Diogenes pugilator 2
#> 3 1 a Paradoneis armata 1
#> 4 1 b Bathyporeia elegans 1
#> 5 1 b Diogenes pugilator 5
#> 6 1 b Dispio uncinata 1If your data look like this, you can go directly to Calculate AMBI. If not, the following examples show how to reorganize your data.
Species names in columns
Here is an example dataframe where there are counts for species in separate columns:
head(wide_data_species)
#> # A tibble: 6 × 36
#> station replicate `Cumopsis fagei` `Diogenes pugilator` `Paradoneis armata`
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 1 a 2 2 1
#> 2 1 b 0 5 0
#> 3 1 c 0 3 0
#> 4 2 a 1 0 0
#> 5 2 b 1 0 0
#> 6 2 c 0 0 0
#> # ℹ 31 more variables: `Bathyporeia elegans` <dbl>, `Dispio uncinata` <dbl>,
#> # `Astarte sp.` <dbl>, `Glycera tridactyla` <dbl>,
#> # `Heteromastus filiformis` <dbl>, `Lekanesphaera hookeri` <dbl>,
#> # `Melita palmata` <dbl>, `Nassarius reticulatus` <dbl>, Nemertea <dbl>,
#> # `Prionospio fallax` <dbl>, `Scolaricia sp.` <dbl>,
#> # `Spio martinensis` <dbl>, `Tapes decussatus` <dbl>, `Tellina sp.` <dbl>,
#> # `Lekanesphaera rugicauda` <dbl>, `Lepidochitona cinerea` <dbl>, …To arrange the data in the correct form, use
tidyr::pivot_longer():
# columns 1 and 2 contain station and replicate information
# so, select all columns from 3 to be pivoted
long_data <- wide_data_species %>%
pivot_longer(cols = 3:ncol(wide_data_species),
names_to = "species",
values_to = "count")
head(long_data)
#> # A tibble: 6 × 4
#> station replicate species count
#> <dbl> <chr> <chr> <dbl>
#> 1 1 a Cumopsis fagei 2
#> 2 1 a Diogenes pugilator 2
#> 3 1 a Paradoneis armata 1
#> 4 1 a Bathyporeia elegans 0
#> 5 1 a Dispio uncinata 0
#> 6 1 a Astarte sp. 0Stations and replicates in columns
Here is an example where each column contains species counts for a station and replicate. The first row of the table contains the station ID 1, 2, … and the second row contains the replicate ID a, b, …. The first column of the table contains species names.
head(wide_data_stns)
#>
#> 1 <NA> 1 1 1 2 2 2 3 3
#> 2 <NA> a b c a b c a b
#> 3 Cumopsis fagei 2 0 0 1 1 0 0 0
#> 4 Diogenes pugilator 2 5 3 0 0 0 0 0
#> 5 Paradoneis armata 1 0 0 0 0 0 0 0
#> 6 Bathyporeia elegans 0 1 0 0 0 0 0 0Note that if the if the observation data only has stations or replicates, then rearranging the data can be done in one step, as in the previous example. But in this case, the station and replicate information are in separate rows. The restructuring process is a little more complicated.
Combine station and replicate values
Before we can use a pivot function, we need to combine the station ID
and replicate ID for each column into a single value. In this example,
each station ID and replicate ID are joined into a single character
value, with an underscore to separate them _. The
underscore will be used again after pivoting the table to identify where
to split the combined station/replicate values back into separate values
again.
If there are station names which already contain underscores, then another suitable character should be used when joining and splitting station/replicate IDs.
sep_character <- "_"
# get the station IDs from row 1, excluding column 1 (this contains species names)
stations <- wide_data_stns[1, 2:ncol(wide_data_stns)]
# get the replicate IDs from row 2
replicates <- wide_data_stns[2, 2:ncol(wide_data_stns)]
# combine the station and replicate for each column using an underscore
station_replicate <- paste(stations, replicates, sep=sep_character)
# now we have extracted the station/replicate information, we can drop the
# first two rows of the data frame
wide_data_stns <- wide_data_stns[3:nrow(wide_data_stns),]
# apply the station_replicates as column names
names(wide_data_stns) <- c("species", station_replicate)
head(wide_data_stns)
#> species 1_a 1_b 1_c 2_a 2_b 2_c 3_a 3_b
#> 3 Cumopsis fagei 2 0 0 1 1 0 0 0
#> 4 Diogenes pugilator 2 5 3 0 0 0 0 0
#> 5 Paradoneis armata 1 0 0 0 0 0 0 0
#> 6 Bathyporeia elegans 0 1 0 0 0 0 0 0
#> 7 Dispio uncinata 0 1 0 0 0 0 0 0
#> 8 Astarte sp. 0 0 1 0 0 0 0 0Transpose using the combined station/replicate variable
We can see that the column names now contain the combined station and replicate information. We are ready to transpose the data.
# column 1 contains species names
# so, select all columns from 2 to be pivoted
long_data <- wide_data_stns %>%
pivot_longer(cols = 2:ncol(wide_data_stns),
names_to = "stn_rep",
values_to = "count")
head(long_data)
#> # A tibble: 6 × 3
#> species stn_rep count
#> <chr> <chr> <chr>
#> 1 Cumopsis fagei 1_a 2
#> 2 Cumopsis fagei 1_b 0
#> 3 Cumopsis fagei 1_c 0
#> 4 Cumopsis fagei 2_a 1
#> 5 Cumopsis fagei 2_b 1
#> 6 Cumopsis fagei 2_c 0Retrieve station and replicate variables from the transposed data
Now we can split the stn_rep column into separate columns for station and replicate, We will use the underscore we introduced earlier to identify where the split should occur:
long_data <- long_data %>%
separate_wider_delim(cols="stn_rep",
delim = sep_character,
names=c("station", "replicate"))
head(long_data)
#> # A tibble: 6 × 4
#> species station replicate count
#> <chr> <chr> <chr> <chr>
#> 1 Cumopsis fagei 1 a 2
#> 2 Cumopsis fagei 1 b 0
#> 3 Cumopsis fagei 1 c 0
#> 4 Cumopsis fagei 2 a 1
#> 5 Cumopsis fagei 2 b 1
#> 6 Cumopsis fagei 2 c 0Now we are ready to calculate AMBI…
Calculate AMBI
We have now ensured that our species abundance/count data have the
correct structure, as in the example test_data
provided:
head(test_data)
#> # A tibble: 6 × 4
#> station replicate species count
#> <dbl> <chr> <chr> <dbl>
#> 1 1 a Cumopsis fagei 2
#> 2 1 a Diogenes pugilator 2
#> 3 1 a Paradoneis armata 1
#> 4 1 b Bathyporeia elegans 1
#> 5 1 b Diogenes pugilator 5
#> 6 1 b Dispio uncinata 1Call the AMBI() function:
res <- AMBI(test_data, by="station", var_rep="replicate",
var_species="species", var_count="count")Results
The AMBI() function returns a list of three
dataframes:
.$AMBI.$AMBI_rep.$matched
.$AMBI- the main results with the AMBI
index calculated for each unique combination of by
variables, in our case the results are per station.
In addition to the AMBI index, the results also include
the Shannon diversity index H' and the Species
richness S, the three metrics which are necessary to
calculate M-AMBI.
res$AMBI
#> # A tibble: 3 × 13
#> station AMBI AMBI_SD H S fNA N I II III IV V
#> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1.48 0.338 1.80 6 0 16 0.125 0.75 0.125 0 0
#> 2 2 1.89 0.238 3.54 22 0 80 0.4 0.138 0.3 0.15 0.0125
#> 3 3 4.12 0.884 2.50 9 0 24 0 0.125 0.292 0.0833 0.5
#> # ℹ 1 more variable: Disturbance <chr>.$AMBI_rep - if the observations include replicates,
then the function also returns results calculated for each replicate,
within each unique combination of by variables:
res$AMBI_rep
#> # A tibble: 8 × 11
#> station replicate AMBI S fNA N I II III IV V
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 a 1.8 3 0 5 0 0.8 0.2 0 0
#> 2 1 b 1.5 3 0 7 0.143 0.714 0.143 0 0
#> 3 1 c 1.12 2 0 4 0.25 0.75 0 0 0
#> 4 2 a 1.88 12 0 32 0.406 0.156 0.219 0.219 0
#> 5 2 b 2.13 12 0 19 0.316 0.158 0.368 0.105 0.0526
#> 6 2 c 1.66 10 0 29 0.448 0.103 0.345 0.103 0
#> 7 3 a 3.5 5 0 6 0 0.333 0.167 0.333 0.167
#> 8 3 b 4.75 6 0 18 0 0.0556 0.333 0 0.611.$matched - for each observation, this dataframe shows
which species in the AMBI list the observed species was
matched with, if any. It also shows the AMBI species group
assigned. This dataframe has the same number of rows as the input
data.
head(res$matched)
#> # A tibble: 6 × 7
#> station replicate species species_matched count group RA
#> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 a Cumopsis fagei Cumopsis fagei 2 2 0
#> 2 1 a Diogenes pugilator Diogenes pugilator 2 2 0
#> 3 1 a Paradoneis armata Paradoneis armata 1 3 0
#> 4 1 b Bathyporeia elegans Bathyporeia elegans 1 1 0
#> 5 1 b Diogenes pugilator Diogenes pugilator 5 2 0
#> 6 1 b Dispio uncinata Dispio uncinata 1 3 0For more information about the principles underlying the
AMBI calculations and the grouping of species according to
sensitivity to pollution, see vignette("background").
Calculate M-AMBI
Calculate M-AMBI the multivariate AMBI index, based on the three separate species diversity metrics:
- AMBI index
AMBI. - Shannon diversity index
H - Species richness
S.
All three indices required to calculate MAMBI()are
included in the results returned by the AMBI()
function.
Limit values
In addition to index values calculated from observed species data, the M-AMBI factor analysis requires values defining the limits for the three metrics, corresponding to the best and worst possible conditions.
See Muxika, Borja, and Bald (2007) for more details.
The default limit values used by MAMBI() are:
limits_AMBI <- c("bad" = 6, "high" = 0)
limits_H <- c("bad" = 0, "high" = NA)
limits_S <- c("bad" = 0, "high" = NA)By default, upper limit values for H and S
are not provided (= NA). If they are not provided
explicitly, then the maximum values found in the input data will be
used. The results returned by MAMBI() show the limits
used.
Status class boundaries
MAMBI() also returns the status class (Bad, Poor,
Moderate, Good or High) for each M-AMBI index value. To do
this, it compares the calculated M-AMBI index value with values defining
the boundaries between status classes.
-
PB- Poor / Bad -
MP- Moderate / Poor -
GM- Good / Moderate -
HG- High / Good
The default values for the class boundaries are:
bounds <-c("PB" = 0.2, "MP" = 0.39, "GM" = 0.53, "HG" = 0.77)Call MAMBI()
We call MAMBI() using the previously generated
AMBI results:
res_mambi <- MAMBI(res$AMBI, var_H = "H", var_S = "S", var_AMBI = "AMBI")M-AMBI results
In addition to retaining the values for AMBI.
H and S, the results include the following
information:
-
x,y,z- factor scores giving coordinates in the new factor space. -
MAMBI- the calculated M-AMBI index value -
Status- the status class for the M-AMBI index value -
EQR- the normalised EQR index
The dataframe returned contains 2 more rows than the input data. Our
input data contained 3 rows (1 for each station). The
results contain 5 rows. The 2 additional rows show the limit
values for each of the three metrics used in the M-AMBI
calculations.
res_mambi %>%
select(station, AMBI, H, S, x, y, z, MAMBI, Status, EQR)
#> # A tibble: 5 × 10
#> station AMBI H S x y z MAMBI Status EQR
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
#> 1 1 1.48 1.80 6 0.0366 -0.0556 0.584 0.518 Mod 0.467
#> 2 2 1.89 3.54 22 -0.246 -1.18 -1.78 0.893 High 0.907
#> 3 3 4.12 2.50 9 -0.00687 0.602 0.377 0.478 Mod 0.446
#> 4 NA 6 0 0 0.524 2.53 2.93 0 NA 0
#> 5 NA 0 3.54 22 -0.308 -1.90 -2.11 1 NA 1EQR values
As well as the status class, the function also returns a normalised
index value from 0 to 1 (EQR), calculated using the
bounds boundary values for the M-AMBI index. The following
EQR values correspond to status class boundaries:
-
0.2- Poor / Bad -
0.4- Moderate / Poor -
0.6- Good / Moderate -
0.8_ High / Good
For example, the M-AMBI value for station 3 is 0.478.
This lies between the M-AMBI value corresponding to the
Moderate/Poor status class boundary ("MP" = 0.39)
and the M-AMBI value corresponding to the Good/Moderate status
class boundary ("GM" = 0.53).
The normalised EQR value is given by linear interpolation: \[ EQR = 0.4 + (0.6 - 0.4) \frac{0.478 - 0.39}{0.53 - 0.39 } \]
