EXECUTIVE SUMMARY
Scope
The Aquatic Effects Technology Evaluation (AETE) program commissioned a technical evaluation of methods in benthic invertebrate data analysis and interpretation for biological monitoring at mine sites. The objective of the technical evaluation was to review the recent literature and recommend analytical approaches that are valid, objective, effective, and ecologically relevant for monitoring Canadian metals mines. The best analytical methods are those that derive the most useful information and provide the greatest sensitivity in a biomonitoring program at the lowest cost. Sensitivity, the ability to detect small or moderate changes in benthic invertebrate community structure against a background of natural spatial and temporal variability, is especially important in a biomonitoring program because sensitive methods can act as early warning systems of impending ecosystem damage, and are more likely to detect subtle effects of chronic, low-level metals loadings.
The technical evaluation covers statistical analysis and ecological interpretation of quantitative data on benthic invertebrate densities derived from more or less simultaneous samples allotted according to a simple spatial design. This study design is based on replicate samples collected at one or more reference sites upstream from, or otherwise outside, the zone of influence of the effluent outfall and at a series of exposed sites downstream. Replication may be either by multiple samples at individual sites or by multiple sites, each sampled once, within larger zones.
Analytical Approach
The favoured statistical approach rests on the premise that a biomonitoring study is essentially a test of a hypothesis, specifically, that a mine is exerting biological effects on a particular water body at a particular time. The investigator begins with a null hypothesis that the mine effluent has no effect, and tests the hypothesis by comparing exposed sites against unaffected reference sites, while attempting to minimize, by careful attention to design and analysis, the possibility of a site difference occurring for reasons unrelated to mining. The conclusion that a mine effect is or is not present is based on strong inference because in a spatial design the possibility of another source of downstream effects can never be completely eliminated. Thus, in most routine surveys the pollution source is implicated if the nature and spatial distribution of effects on the benthos are congruent with beforehand expectations based on the nature of the effluent, and there are no other disturbances present to which the effects could reasonably be attributed.
Analysis of Variance (ANOVA) or its derivatives (ANCOVA, MANOVA) is the preferred method of testing for significant differences in species abundances or community metrics among sites in a biomonitoring study. Descriptive multivariate methods such as ordination and clustering may be useful to reduce the complexity of the data set or reveal major patterns, but are not sufficient by themselves to determine an effluent effect. It is only on the basis of statistical tests of hypotheses that a statement can be made, with known probability of error, that the mine is causing deleterious effects on the exposed water body.
Analysis of Covariance (ANCOVA) is a powerful means of reducing variability from habitat variables unrelated to the mine and thereby increasing the sensitivity of the analysis, and careful use of ANCOVA is to be encouraged. Multivariate Analysis of Variance (MANOVA) is preferred over simple ANOVA because it considers several variables at once and thereby reduces the risk of a Type I error (finding a difference where none exists), especially when the variables are correlated. However, the use of MANOVA is restricted in routine monitoring studies by the requirement for large numbers of replicates. Modifications to sampling programs (collection of habitat data at each sampling point, increasing replication and decreasing sample size) that would facilitate the use of these two methods should be promoted.
Simple graphs of species abundances, richness or other variables against sites and distances from point sources are a straightforward and easily comprehended means of presenting benthic invertebrate data. Means and ranges or standard deviations should be included on the graphs along with an indication of statistically significant differences. Large-scale site descriptors such as canopy cover or land use, that cannot be statistically compared can be included on graphs to illustrate broader differences among sites. Graphs from ordinations or clustering dendrograms can also be informative but should not displace simple scatterplots of the original data as the mainstay of data presentation.
The determination of effects of mines is strongest when it is based on the composite results for many taxa and community variables combined in a weight-of-evidence argument. The thrust of this approach is to search for trends in taxa densities that are consistent with a hypothesized effect of the effluent or other disturbance. Results for any one taxon alone are not sufficient to reject the null hypothesis, but similar changes in other taxa are taken as confirmation that the observed site difference is real. Hence, this approach uses the weight-of-evidence based on the number and kinds of taxa showing differences between sites, and the strength of the response from each.
Choice of Response Variable
Abundances of common taxa, aggregated into groups of similar organisms if numbers are low, constitute the keystone of the weight-of-evidence analysis for site differences. Individual species or genera are the most varied and sensitive indicators of environmental conditions. The parallel analysis of several taxa provides both an opportunity to confirm the direction of observed trends, and, when combined with knowledge of the biology of the organisms, provide valuable insight into the nature of the stresses affecting the community. Higher taxonomic levels such as insect orders should only be used where lower taxa are too rare or too variable to be useful and the members of the higher taxon are reasonably similar in ecological requirements. To avoid a large number of redundant or unhelpful analyses, it is important to screen the raw data carefully and retain only those variables that are likely to show a statistically significant trend that is consistent with the expected effect of the disturbance. However, all taxa contribute to the weight-of-evidence argument, including those that do not differ among sites.
Selected summary statistics ought to be included in the analysis also, to provide a measure of the severity of effects on the community as a whole. Total abundance of all organisms and total number of taxa per sample are useful, and well-established variables but may be unresponsive to slight degradation. Similarity indices should be included in site comparisons because: (1) they summarize the overall difference in community structure between reference and exposed sites as a single number; (2) they require no preconceived assumptions about the nature of a healthy community; and (3) they can only vary in one direction, avoiding the interpretive problems that arise from stimulation. The most reliable similarity indices appear to be the Bray-Curtis Index and the Per Cent Similarity Index.
Diversity indices, such as the Shannon-Weaver Index, have been popular in pollution assessment, but they tend to be unresponsive to slight or moderate disturbance, especially when it does not involve organic enrichment, and are not recommended for biomonitoring at Canadian mine sites. Biotic indices assess water quality based on the presence or absence of indicator species of known tolerance and summarize conditions in a single number. Biotic indices should only be included in biomonitoring at mines when they are applicable to the geographic region and there is reason to expect organic or mixed effluents. Biotic indices must be calculated for each sample and subjected to statistical analysis in the same manner as other variables.
Functional feeding groups are guilds of invertebrate taxa that obtain food in similar ways, regardless of taxonomic affinities. Ecological studies on flowing waters suggest that the proportion of different feeding groups will change in response to disturbances that affect the food base of the system, thereby offering a means of assessing disruption of ecosystem function. The utility of functional feeding groups as variables to estimate impairment of benthic communities at mine sites is uncertain. Research to date has laid too much emphasis on evaluating effects of severe impairment. More research is needed to test the sensitivity and reliability of feeding groups at moderately contaminated sites where the food base has not been directly altered.
Rapid assessment procedures are intended for quick, qualitative assessments of water quality based on preliminary sampling and are not an adequate tool for biomonitoring at mines. Nevertheless, some metrics used in rapid assessment procedures may also be useful in quantitative biomonitoring, and research comparing the sensitivity and accuracy of different metrics should not be disregarded. However, the “multi-metric” approach to biomonitoring, in which a diversity of unrelated metrics is combined into a single number to rank sites, is not sound biologically or statistically. All metrics based on ratios between two variables should also be avoided.
Power
Statistical power is the probability that a test will report a difference between two treatments when they are truly different; it is the statistical analogue of sensitivity. Power is a key element of sound experimental design in biomonitoring that has not been afforded the attention that it deserves. Power analysis should be routinely incorporated into every biomonitoring study. During study design, power should be calculated based on preliminary sampling or data from previous years to ensure that sampling intensity is sufficient to ensure a reasonable probability of detecting site differences of a magnitude deemed to be ecologically significant. Power calculations should also be done on every analysis of variance that fails to detect a significant differences among sites. The power analysis should either demonstrate that the power of the test was reasonable, or determine the magnitude of difference between sites that would be required for a test of reasonable power.
Research Needs
More research on the effects of mine wastes on benthic invertebrates in lakes and rivers, especially their responses to low-level, chronic loading and to mixed metal-organic wastes, would help investigators attempting to formulate hypotheses of expected mine effects. Research to determine the occurrence and significance of stimulation responses at slightly contaminated sites is needed because of the complexity of interpretation introduced by bi-directional responses to disturbance. Experiments to establish toxicity of various metals to a variety of common benthic species in Canadian water bodies would also be useful. All raw data from each biomonitoring study should be archived in a safe, organized, accessible data base for future studies of temporal trends, and possible integration into a network of regional reference sites.