March 3, 2026Open Access

Phytoclass , Now with a GUI : Point‐and‐Click Pigment Chemotaxonomy

Key Points

The phytoclass GUI enables efficient analysis of chlorophyll a contributions from various phytoplankton taxa.
Using simulated annealing, the program effectively explores the solution space for accurate pigment ratios.
Users can interactively configure input stages, including data checks and clustering, enhancing usability.
The software provides reproducible reports, ensuring transparency in analyses conducted across different environments.

Abstract

Phytoplankton pigments measured by high-performance liquid chromatography (HPLC) have been used for more than three decades as biomarkers of community composition. Marker pigments such as fucoxanthin, peridinin, zeaxanthin, or 19′-hexanoyloxyfucoxanthin can serve as indicators of particular groups (e. g. , diatoms, dinoflagellates, cyanobacteria, or haptophytes), but rarely correspond one-to-one with phytoplankton taxa. Instead, their chlorophyll a (Chl a) concentrations must be interpreted through statistical inversion methods that partition total Chl a among co-occurring classes. The most widely used tool for partitioning of Chl a between phytoplankton taxa has been CHEMTAX (Mackey et al. 1996), which formalized the pigment-to-Chl a ratio approach when assessing phytoplankton physiology and taxonomy. This framework provided the community with a powerful and accessible method, and it remains widely applied in field and modeling studies. However, the software has several limitations including scalability for processing many samples, sensitivity to users' initial ratio setups, the software getting trapped in local minima, and limited diagnostic tools. The phytoclass method (Hayward et al. 2023) was designed to address these concerns by using non-negative matrix factorization solved within a simulated annealing framework. Simulated annealing is a stochastic search algorithm inspired by the physical process of cooling metals: at high “temperature” the algorithm accepts both better and worse solutions, which helps it escape shallow local minima, while at lower “temperature” it gradually becomes more selective, converging on a stable solution. This approach allows the method to explore the solution space more thoroughly than deterministic approaches and has been adopted in diverse ecosystems including the Southern Ocean (Hayward et al. 2024; Viljoen et al. 2025; Hayward et al. , 2025), the Mediterranean (von Jackowski et al. 2024), the Monterey Bay Canyon (Hwang et al. 2025), and the South China Sea (Xu et al. 2025). Despite the advantages of phytoclass, the method was challenging for many users, as it required familiarity with R programming, as well as knowledge of matrix formats, and script-based workflows. The GUI introduced here translates the same inversion engine into an approachable, streamlined interface. It preserves the familiar matrix framing that CHEMTAX and phytoclass users recognize, while adding diagnostics, clustering tools, and reproducible reporting. In this sense, phytoclass can be seen as both a continuation of the CHEMTAX tradition and an expansion, offering a modernized, open-source alternative. The phytoclass GUI allows users to study phytoplankton ecology in diverse global environments without the requirement of being familiar with R coding, or coding in general. This will provide users will tools to assess how phytoplankton taxa may be responding to a warming climate, or how changes in mixed layers, light availability, or shifting ice regimes may be affecting phytoplankton-mediated ocean ecology. Similarly, we hope that the domain www. phytoclass. org will provide a forum for user discussion, debugging, as well as suggestions for potential additions to the software. The interface begins by helping users get their inputs right. As soon as pigment data and initial ratio matrices are uploaded, there is the option for the application to run a matrix check on the S, F, and minₘax matrices (Fig. 1). This step screens for mismatched names or dimensions, and ill-conditioned designs. Any issues are flagged with plain-language messages that point directly to the fix, reducing the chance of hidden errors before analysis even starts. The GUI offers an interactive table editor, so that any problematic column or row names can be edited within the interface, without needing to re-edit and upload data files. Here users can rename pigments and taxa, adjust headers, and set minimum–maximum bounds for pigment ratios without leaving the app. This makes refining inputs far more streamlined than repeated edits in external spreadsheets. With inputs configured, the next optional stage is clustering. Phytoclass determines the contribution of each taxon to total Chl a by first determining the pigment: Chl a ratios for each taxon. Phytoclass requires that these ratios are stable within the group of samples being analyzed in order to calculate a reliable value. Yet environmental factors such as light, depth, and nutrients all affect these ratios (Higgins et al. 2011), and thus it is usually wise to divide the dataset into sub-groups of samples based on cluster analysis of their pigment: Chl a ratios. The clustering uses the same approach as the phytoclass R package, using the Ward clustering method based on either Manhattan or Euclidian distances, and the dynamic-tree-cut package (Langfelder et al. 2008). This setup can be changed interactively in the cluster setup page by adjusting the text (Fig. 2). It is important to note that if the sample size is small, clustering may not be feasible and this step can be omitted from the analysis. After clustering, the annealing inversion itself can be run. At this point, the GUI produces convergence plots that trace how each pigment ratio evolves over iterations. These outputs are especially useful for diagnosing whether parameters have stabilized, whether constraints are too tight, or whether additional clustering or bounds adjustment might be needed. Finally, the GUI focuses on reproducibility. Alongside the estimated class biomasses and final ratio matrix, users can export residual diagnostics and a self-contained report that records the data, parameters, and results. This ensures that analyses can be repeated, shared across teams, or submitted with manuscripts in a transparent and auditable form. The GUI is designed so that a complete analysis can be carried out in just a few minutes. A typical session moves through five stages: preparing files, uploading and checking, clustering, inversion, and inspection/export. Open the “Upload Data Files” panel and load the S matrix. Then go to the “Taxa List/minₘax Table” to enter or edit taxa names and bounds. Click “Run Matrix Check” to verify that inputs are usable. The app automatically reports issues such as: missing or duplicated pigment names, columns dominated by zeros or very low values, misalignments between S and F matrices, and warnings about near-singular setups that could cause unstable results. Problems can be corrected directly in the interactive tables and the check repeated until the matrices are sound. Move to the “Run Clustering” tab and click “Generate report. ” The default settings are sufficient for a first pass. Clustering groups samples that are likely to share a common F matrix, a step that typically improves both fit and interpretability but, as mentioned above, can be omitted if there are insufficient samples (< 12–20). Next, go to “Run Annealing” and select “Generate report. ” The engine then: normalizes and weights inputs according to package defaults, carries out simulated annealing with tested iteration and step settings, estimates class abundances (C matrix) along with an updated pigment-ratio matrix (F), and produces diagnostic plots showing per-pigment convergence and overall residuals. Finally, inspect the convergence plots and residuals. If certain pigments fail to converge or remain pinned against bounds, adjust the min–max table and repeat the inversion. When satisfied, download the results. The app provides: CSV tables of class abundances and final ratios, residual diagnostics, as well as a self-contained HTML, PDF, or Quarto report capturing inputs, parameters, and outputs. These exports ensure that analyses are transparent, repeatable, and ready to be shared with collaborators or reviewers. The CRAN vignette describes file structures in more detail for those who wish to customize further. Running pigment-based inversions is not only about pressing “run” on the software. The quality of the output depends on thoughtful choices at the setup stage, and these can be strengthened by combining pigment data with other lines of evidence. A common temptation is to divide communities into many narrow groups. This often causes problems, because taxa with very similar pigment signatures produce nearly collinear pigment profiles. Such over-splitting inflates condition numbers and can make the inversion unstable or non-invertible. It is usually better to begin with broad groups, and only subdivide if diagnostics—such as low residuals and stable pigment ratios—indicate that the data can support finer distinctions (Hayward et al. 2023). For example, although it would be nice to separate Haptophyte type 6 (similar to Gephyrocapsa huxleyi (formerly Elimiania huxleyi) ) from Haptophyte type 8 (similar to Phaeocystis antarctica), their pigment profiles identically overlap (apart from the presence of the minor pigment monovinyl Chlc3 in type 6), and robust inversion would not be possible using CHEMTAX or phytoclass unless the limits of pigment: Chlₐ ratios are known beforehand (e. g. , Wright et al. (2010) were able to distinguish high-iron from low-iron haptophyte using ratios from DiTullio et al. (2007) ), or informed by microscopy or other techniques. Equally important is to include taxa only when there is a clear pigment marker present in your dataset. For example, Synechococcus should only be specified if zeaxanthin is reliably measured, and peridinin-containing dinoflagellates only if peridinin is present above the noise level. External information can guide these choices: microscopy counts, flow cytometry profiles, or even regional ecological knowledge can help determine which groups are plausible candidates. Using these methods to narrow down the taxa list keeps the inversion realistic and reduces the chance of spurious results. The robustness of any matrix inversion improves when more samples are included in each cluster. As a rule of thumb, 12 or more samples per cluster are sufficient for relatively clean datasets, while noisier datasets benefit from around 20 samples before inversion (see Hayward et al. 2023). Clustering methods can have significant impacts on results. Clustering should be informed both by the pigment data themselves and by context. For example, samples taken from similar water masses or ecological settings are more likely to share a common ratio structure. Users must choose to cluster using phytoclass's statistical clustering method or to cluster manually by location, season, or other metric. The min–max ranges provided for pigment-to-Chl a ratios can influence the optimization. Literature-derived bounds are a sensible starting point, especially when supplemented by values measured in comparable environments. However, bounds should not be widened unnecessarily: doing so can make solutions underdetermined. The convergence plots produced by the GUI are a valuable guide (Fig. 3). If a ratio flattens at the edge of its bounds, this suggests constraints are too strict, or that the group does not belong in the analysis. For example, if no solution is found for a particular group (e. g. , Zeaxanthin for Synechococcus), the software will drive up the pigment: Chl a value to attribute very little Chl a to that group. Parameters that wander or never stabilize may point to insufficient clustering, noisy input data, or an unrealistic combination of taxa. Iteratively refining the bounds and taxa list, guided by both diagnostics and supporting observations (e. g. , microscopy), is usually the most productive path to stable and interpretable results. No. The hosted version of the app runs directly in a web browser and requires no installation. For offline use or integration into existing workflows, the GUI and underlying package can also be run locally in R (via RStudio). Installation is straightforward from GitHub or CRAN, but this is optional rather than essential. Still, for advanced debugging, some knowledge of R may help. For each pigment, the GUI shows how its ratio to Chl a changes over successive iterations of simulated annealing. A well-behaved fit typically stabilizes into a plateau while overall residual error decreases. This behavior can be confirmed with a convergence plot. Ratios that jump excessively, remain pinned at the edge of their bounds, or fail to stabilize indicate that the setup may need adjustment, such as revised bounds, additional clustering, or reconsideration of the taxa included. Yes. While the default settings are tuned for marine systems, including the Southern Ocean, the method itself is not restricted by environment. For freshwater applications, it is important to supply an F matrix and bounds that reflect the taxa and pigments relevant to the lake or river in question. Supporting information from microscopy or flow cytometry can be particularly valuable for deciding which groups to include. Users should cite both the methods paper (Hayward et al. 2023 published in Limnology and Oceanography: Methods) and the version of the package employed (CRAN release or GitHub commit). When analyses are carried out through the GUI, it is also good practice to cite this paper, the GUI repository, or release tag to enable precise reproducibility. Source code for the phytoclass-GUI application and the phytoclass R library is available on GitHub. com/phytoclass. We are grateful to the Danish Meteorological Institute National Center for Climate Research (NCKF) for support. We acknowledge NIWA for providing support to the project. We are also grateful to the New Zealand MBIE Endeavour Programme C01X1710 (Ross-RAMP), Antarctic Science Platform, Project 3 (MBIE contract no. ANTA1801), and MBIE NIWA SSIF (“Structure and function of marine ecosystems”) for support. We are grateful to New Zealand's eScience infrastructure for the use of their high-performance computer. We thank the PHYTO-CCI program for support. We acknowledge the contribution to this work by the R open-source collective and RStudio. We also gratefully acknowledge the support of the Institute for Marine and Antarctic Studies (IMAS) and the Australian Antarctic Division (AAD). We further acknowledge the College of Marine Science, Institute for Marine Remote Sensing (IMaRS), University of South Florida, St. Petersburg, USA, for their contribution to this work. Furthermore, we would like to acknowledge that AI tools (i. e. , ChatGPT 5) were used in correcting grammar, typos, and language edits.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Hayward et al. (Fri,) studied this question.

synapsesocial.com/papers/69a768b0badf0bb9e87e599b https://doi.org/https://doi.org/10.1002/lob.70012

Bookmark

View Full Paper