Background & Introduction

Harmful algal blooms (HABs) are sudden, rapid overgrowths of algae that can damage coastal and lakeshore ecosystems, cause socioeconomic harm to areas dependent on them, and cause serious human health issues. HAB incidence has increased worldwide in recent decades, with escalating associated costs and damage. Accurately predicting HABs before they occur could significantly mitigate these costs; however they are dependent on complex interactions among many variable factors. We investigate whether an artificial neural network could yield accurate HAB predictions. NeurAlgae is a system of Artificial Neural Networks (ANNs) that uses nine measurements from real-time ocean monitoring stations taken at 15-minute intervals over three years to train its predictions. We used a first set of networks to expand our dataset, and then a second set of recurrent networks to generate more accurate predictions. We tested our system with a separate subset of the data to prevent overfitting. Our system produced predictions 10 times more accurate than the only comparable machine learning-based HAB prediction tool we were able to find in the scientific literature (Zhang et al. 2016). NeurAlgae thus constitutes a novel and robust tool for HAB prediction that demonstrates the utility of ANNs in approaching a complex multivariate problem with profound human and ecological consequences.

Algal blooms are incidences of sudden and rapid overgrowth in microscopic cyanobacteria and algae in water bodies, typically caused by eutrophication. Eutrophication is the overabundance of nutrients on which the algae feed. It is often caused by the presence of “urban and rural wastewater, fertilizers applied to agricultural fields, combustion of fossil fuels, erosion of soil containing nutrients and sewage treatment plant discharges” (Sanseverino, Conduto, Pozzoli, Dobricic, & Lettieri, 2016, p. 11). In some cases these blooms proliferate to the point where they harm the surrounding aquatic ecosystem (Anderson, Glibert, & Burkholder, 2002). These harmful algal blooms (HABs) can damage coastal and lakeshore ecosystems, cause socioeconomic harm to areas dependent on them, and injure human health (Anderson, Hoagland, Kaoru, & White, 2000). Algal blooms grow in sheets that can cover large areas, restricting sunlight to aquatic vegetation below. Once the bloom dies off, its decomposition depletes the dissolved oxygen in the water, creating hypoxic, or “dead,” zones (Fields, 2004). In addition, the metabolic processes of many saltwater species of algae, such as Pseudo-nitzschia, produce toxins capable of harming humans and animals (Anderson et al., 2002, p. 705).

The prevalence of HABs is increasing; this is likely due to human causes, and a by-product of human activities (Sanseverino et al., 2016). According to Anderson et al. (2002), “virtually every coastal country is now threatened by multiple harmful or toxic algal species” (p. 706). Algal blooms grow rapidly; thus, although response mechanisms exist in some areas, remediation of the damage they cause is still very costly. Kudela et al. (2015) state that HABs conservatively cost several billion US dollars globally per year (p. 13). If a HAB could be predicted days or weeks in advance, remediation or even prevention could make the bloom far less costly.

The onset of HABs is difficult to predict, as they are dependent on a multitude of complex variables, including “algal species presence/abundance, degree of flushing or water exchange, weather conditions, and presence and abundance of grazers” (Anderson et al., 2002, p. 704). Although real-time oceanographic monitoring systems measuring many of these variables have recently come online, early response efforts are restricted by the ability to analyze this data (Pellerin et al., 2016). Prediction methods for algal blooms do exist, but generally suffer either from inaccuracy or over-specificity (i.e. are only useful in localized areas). We are unaware of any systems capable of accurately predicting HABs, which are still able to generalize to other datasets (can be used to predict blooms in areas other than those in their dataset). An accurate while simultaneously versatile system would be of enormous ecological and socioeconomic value.

Given that HAB outbreaks are dependent on many factors whose interactions are very complicated, machine learning, denoting programs that behave in a way resembling human learning (Samuel 1959, p. 535), may be a candidate for providing a solution to this difficult problem. One attractive implementation of machine learning techniques is the Artificial Neural Network (ANN). ANNs are made up of mathematical operators which pass values through a synthetic network, similarly to how neurons parse information via electrical impulses (Kleene, 1951). To use an ANN for a machine learning problem, a model for the problem is generated, and the network makes correlations between given inputs and desired outputs by modifying its own parameters. In this way, computers are able to process and create relationships between data that might be incomprehensibly vast or complicated to humans.

We propose the use of a system of ANNs to identify the direct relationships between the environmental factors affecting a body of water and the probability of a HAB occurring there. Some of the language used here to describe the NeurAlgae system is technical, so full explanations will be provided by the authors upon request.

The Vision

Purpose & Hypothesis

The purpose of the NeurAlgae system is to accurately predict HAB occurrences, while generalizing its operation to other datasets. Since the system is designed for any user, it should also be readily accessible and built into an intuitive interface.

A neural network of adequate complexity, trained in a sufficient range of oceanographic data, will learn to accurately predict HABs based on these data. Such an ANN, combined with a user friendly interface, will allow for robust predictions and mitigation of HABs.

Procedure

We developed NeurAlgae, a system of neural networks which aims to predict HAB occurrences in a given region using oceanographic data concerning that region. NeurAlgae parses nine separate water measurements (water surface level, pressure, chlorophyll concentration, temperature, electrical conductivity, oxygen and nitrate concentrations, salinity, and turbidity) taken at 15-minute intervals for three years. These measurements are correlated to three indicators of a HAB event: the probability of Pseudo-nitzschia diatoms being present in concentrations exceeding 10 000 cells per litre; the probability of the neurotoxin domoic acid (DA), being present within cells in concentrations exceeding 10 picograms per cell; and the probability of DA being present in the surrounding water in concentrations exceeding 500 nanograms per litre. The water measurements are first fed through three densely-connected ANNs of REctified Linear Units (RELUs) (with dropout and L2 regularization) trained to predict each of the three probabilities at the same time that the measurement is taken (a nowcast). The entire set of data points and correlated nowcast predictions are then fed through three more networks, each designed to make predictions for various future times. These networks took in sequential data using a Long Short-Term Memory (LSTM) layer. Combining separate neural network models allows for accurate predictions to be made on many parameters, without encountering potential training difficulties of training one large model, which the computer might not recognize as having multiple components. To enhance accessibility, a progressive web app (PWA) was created to host the NeurAlgae system. The web server can access and download data in real time, using a combination of client-side and server-side computation for fast and accurate results. The PWA combines the functionality of the NeurAlgae system with supplementary information to increase user understanding of the HAB issue and the NeurAlgae tool. Training data was taken from the Central and Northern California Ocean Observing System (CENCOOS) database, measured near Morro Bay, California. The source code for NeurAlgae is available at https://github.com/FlowBoat/Flow-Tech-NeurAlgae under the GNU GPL.

Analysis

Results and analysis

When validating our networks, our model performed optimally using 4 layers (including an LSTM layer or an additional dense RELU layer) in each network, dropout, and 0.0001 for the L2 regularization. We trained each model using the Nadam learning algorithm (Kingma and Ba, 2015), with the model’s mean squared error (MSE) for the cost function, which measures how well our model fits historical data. Our data consisted of ~35 000 data points taken from the CENCOOS water monitoring database, which was split into training and testing datasets. When the networks were trained, the training costs dropped to between 0.004 and 0.01, within 50 epochs. Note that results with some degree of noise should be expected, since biological outcomes are determined by far more parameters than the aforementioned nine. However, our model was able to achieve a MSE within 0.001-0.01 with only nine parameters - and the approach could be used to fit arbitrary datasets, so if more data became available, accuracy would increase correspondingly. Our models taking in smaller amounts of data achieved good generalization, but accuracy was limited by regularization. However, when models were given larger quantities of data, they achieved very accurate fits for consistent data, while still generalizing for noisy data, developing increasingly sophisticated models for extrapolating data. We discovered another machine-learning approach to the prediction of HABs: Zhang et al. (2016) created a predictive model for HAB prediction in the East China Sea using stacked layers of Boltzmann machines. When compared to their model, predicting different but closely correlated data, NeurAlgae achieved losses approximately ten times lower, a significant improvement, and with regularization methods to allow for improved generalization.


Visit Appendicies

When given input data, the trained networks gave output values which were not exactly the same as the given algal bloom probabilities, but were in a close range - which is the optimal result for this problem, since the empirical data can vary so much, a fairly wide margin to account for differences is necessary in the model to prevent overfitting, and yet the model still is able to produce output data that is accurate enough to give a good idea of the algal bloom chances in that region.

Conclusion

The NeurAlgae system was demonstrated to accurately predict the probability of HABs. The model can form predictions up to one week in advance, giving authorities time to respond. The model’s accuracy increases as it is given more data, managing to stay general when the data is limited but fitting precisely given a larger dataset. Our model can be generalized to other datasets, given the use of the L2 regularization and dropout, and can use an arbitrarily expanded set of parameters to improve accuracy. Overall, the NeurAlgae system constitutes a novel and robust tool in the prediction of HABs, and demonstrates the applicability of ANNs in approaching a complex multivariate problem with profound human and ecological consequences.

Next steps

Although the likeliness of our training algorithm to improve from this point on is minimal; however, moving forward we look to imrpove the accessibiltiy of our webapp by eagerly awaiting, and immediately updating to the latest patch of Keras.js. Also, by implementing a node module like Synaptic.js or TensorFlow in the web, we will be able to train our network in realtime, with a constant stream of data from CENCOOS (or other databases). Currently our only limitations are our budget, and and the scope of our imaginations. If provided access to a dedicated server, the management of the NeurAlgae database and Neural network operation becomes much more stable. As of now a standard Firebase or Heroku deployment wouldn't fit our needs as data scientists. We hope to move towards a dedicated server soon, and with it a safer future for coastlines globally.

Acknowledgements

Thank you to our parents for supporting our project throughout its development, and constantly allowing us the time to work on it. Thanks also to Mr. Menhennet, who provided guidance throughout the project on both formatting issues and geographical data.

Appendicies

(Note that network preformance improves as one moves down)

About Us

Hi, in all the rush of Neural Networks and their application in Algal Bloom prediction, we forgot to mention to you who we were! We're Atif and Zach, partners, long-time pals and best friends. We've been in the same class for 7 years (and counting) and it's been a blast! Our friendship goes way back and so does our love of science. This is our second year doing a joint project and it has been the most fun of all our years of science fair. If you have any questions, comments or concerns feel free to hit us up at any of our socials!

Atif Mahmud

Personal Email: atifmahmud101@gmail.com
School Email: mahma6337@googleapps.wrdsb.ca
Cell: (226) 606-9535

Zach Trefler

Personal Email: zmct99@gmail.com
School Email: trefz7495@googleapps.wrdsb.ca
Cell: (226) 972-0492

HAB Prediction


ALERT:

Unfortunately, breaking changes in Keras 2.0 render the python to web model conversion library (Keras.js), which is used in NeurAlgae's frontend, dysfunctional. However, Keras.js is due for an update in the coming weeks, so the service will return shortly. Thank you for your patience.


Data Formatting:


The following variables represent:
WL: Water level above (or below) average sea level
WP: Water pressure
CC: Chlorophyll concentration
TP: Water temperature
EC: Electrical conductivity
OC: Concentration of oxygen
NC: Concentration of nitrogen
SL: Salinity
TB: Turbidity

Note all input values should be 32-bit floating point values
which correspond to the normalized value against highest and
lowest possbile measurements.
(ex. A chlorophyll concentration of 15% = 0.15)

NeurAlgae takes a JavaScript array of 96*n datapoints (where n is the number of days the data was recorded), taken in
15 minute intervals summing to a days worth of data collection:

For Example:
[[WL, WP, CC, TP, EC, OC, NC, SL, TB],
.......(94 Data Points Later).......,
[WL, WP, CC, TP, EC, OC, NC, SL, TB]]
                

Elements

Text

This is bold and this is strong. This is italic and this is emphasized. This is superscript text and this is subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.


Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5
Heading Level 6

Blockquote

Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.

Preformatted

i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;
    deck.shuffle();
    i++;
}

print 'It took ' + i + ' iterations to sort the deck.';

Lists

Unordered

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Alternate

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Ordered

  1. Dolor pulvinar etiam.
  2. Etiam vel felis viverra.
  3. Felis enim feugiat.
  4. Dolor pulvinar etiam.
  5. Etiam vel felis lorem.
  6. Felis enim et feugiat.

Icons

Actions

Table

Default

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Alternate

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Buttons

  • Disabled
  • Disabled

Form