Development of the US Environmental Protection Agency’s Environmental Source Apportionment Toolkit (ESAT)

PHILIP K. HOPKE, Deron Smith, Michael Cyterski, John Johnston, Kurt Wolfe, Rajbir Parmar, U.S. Environmental Protection Agency

     Abstract Number: 435
     Working Group: Advancing Aerosol Science through Data Analysis Tools

Abstract
In environmental data analysis, source apportionment can be an important approach to extract useful information that might otherwise be hidden within the data. The United States Environmental Protection Agency (EPA) has developed an open-source python package, the Environmental Source Apportionment Toolkit (ESAT), which enables source apportionment modeling and error estimation workflows. ESAT is intended to replace the Positive Matrix Factorization v5 (PMF5) model, which has substantial data size limitations. ESAT has been in alpha testing with development plans for enhanced functionality and support of large datasets, High-performance Computing (HPC) execution through a command line interface (CLI), and a standalone desktop graphical user interface (GUI). The alpha product of ESAT offers a complete application programming interface (API) to replicate the workflows and functionality of PMF5, with examples provided through Jupyter Notebooks. The ESAT computing module currently contains two non-negative matrix factorization (NMF) algorithms for model training, with the module designed for other algorithms to be easily added. The two algorithms currently available are the least-squares NMF (LS-NMF) and weighted-semi NMF (WS-NMF). Each algorithm offers different benefits depending on project or data requirements. The ESAT python codebase has been optimized to run in a highly parallelized manner, with most of the numerical computations implemented in Rust, a low-level language comparable in performance to C. ESAT replicates the PMF5 bootstrap, displacement, and hybrid methods for error estimation. ESAT also contains a synthetic dataset generator/model simulator that can evaluate how well ESAT can recreate synthetic factors and contributions. Additional features are added to the python package on a regular basis. The alpha version of the ESAT python package is available from pypi at https://pypi.org/project/esat/. Further testing and development will proceed to a full release in late 2025. The development of a GUI desktop application is currently planned to follow the ESAT full release.