Interpretable machine learning for analysing heterogeneous drivers of geographic events in space-time

This preprint version precedes the formal peer review and publication in IJGIS. Final version: https://doi.org/10.1080/13658816.2021.1965608

ABSTRACT: Machine learning (ML) interpretability has become increasingly crucial for identifying accurate and relevant structural relationships between spatial events and factors that explain them. Methodologically aspatial ML algorithms with an apparent high predictive power ignore characteristic non-stationary domain relationships in spatio-temporal data (e.g., dependence, heterogeneity), and can lead to incorrect interpretations and poor management decisions. This study addresses this critical methodological issue of ‘interpretability’ in ML-based modeling of structural relationships between spatio-temporal events and corresponding bio-physical drivers. Specifically, we present and evaluate a spatio-temporally interpretable random forest (iST-RF) modeling framework using the example of drivers of wildfire characteristics across the United States. Our experiments show that the spatio-temporal sampling and weighted prediction approach can improve predictive accuracy (76%) compared to the aspatial RF approach (70%), while also enhancing interpretations of the ML model’s spatio-temporal relevance for its ensemble prediction. This novel approach can help balance prediction and interpretation with fidelity in a spatial data science life cycle. However, challenges exist for predictive modeling when dataset is very small, because in such cases locally optimized sub-model’s prediction performance can be suboptimal. With that caveat, our proposed approach is an ideal choice for identifying drivers of spatio-temporal events at country or regional-scale studies.

Files

Metadata

Work Title Interpretable machine learning for analysing heterogeneous drivers of geographic events in space-time
Access
Open Access
Creators
  1. Arif Masrur
  2. Manzhu Yu
  3. Prasenjit Mitra
  4. Donna Jean Peuquet
  5. Alan H Taylor
Keyword
  1. Spatial modeling
  2. Machine learning interpretability
  3. Spatial heterogeneity
  4. Random forest
  5. Decision tree
  6. Wildfire
License CC BY-NC-ND 4.0 (Attribution-NonCommercial-NoDerivatives)
Work Type Article
Acknowledgments
  1. We acknowledge Penn State Center for Security Research and Education (CSRE) and Information Sciences and Technology (IST) for providing seed grants to conduct this study.
Publisher
  1. Taylor & Francis
Publication Date August 31, 2021
Subject
  1. Geographic Information Science
Language
  1. English
DOI doi:10.26207/mj2j-7391
Publisher Identifier (DOI)
  1. https://doi.org/10.1080/13658816.2021.1965608
Geographic Area
  1. Continental U.S.
Deposited September 05, 2021

Versions

Analytics

Collections

This resource is currently not in any collection.

Work History

Version 1
published

  • Created
  • Added Creator Arif Masrur
  • Added manuscript_non-anonymous.pdf
  • Added Supplementary information-iST-RF.pdf
  • Updated Publication Date, License Show Changes
    Publication Date
    • 2021-09-05
    License
    • https://opensource.org/licenses/MIT
  • Added Creator Manzhu Yu
  • Updated Keyword, Geographic Area Show Changes
    Keyword
    • Spatial modeling, Machine learning interpretability , Spatial heterogeneity, Random forest, Decision tree, Wildfire
    Geographic Area
    • United States
    • Continental U.S.
  • Updated Acknowledgments Show Changes
    Acknowledgments
    • We acknowledge Penn State Center for Security Research and Education (CSRE) and Information Sciences and Technology (IST) for providing seed grants to conduct this study.
  • Updated Description Show Changes
    Description
    • ABSTRACT
    • This preprint version precedes the formal peer review and publication in IJGIS. Final version: https://doi.org/10.1080/13658816.2021.1965608
    • ABSTRACT
    • Machine learning (ML) interpretability has become increasingly crucial for identifying accurate and relevant structural relationships between spatial events and factors that explain them. Methodologically aspatial ML algorithms with an apparent high predictive power ignore characteristic non-stationary domain relationships in spatio-temporal data (e.g., dependence, heterogeneity), and can lead to incorrect interpretations and poor management decisions. This study addresses this critical methodological issue of ‘interpretability’ in ML-based modeling of structural relationships between spatio-temporal events and corresponding bio-physical drivers. Specifically, we present and evaluate a spatio-temporally interpretable random forest (iST-RF) modeling framework using the example of drivers of wildfire characteristics across the United States. Our experiments show that the spatio-temporal sampling and weighted prediction approach can improve predictive accuracy (76%) compared to the aspatial RF approach (70%), while also enhancing interpretations of the ML model’s spatio-temporal relevance for its ensemble prediction. This novel approach can help balance prediction and interpretation with fidelity in a spatial data science life cycle. However, challenges exist for predictive modeling when dataset is very small, because in such cases locally optimized sub-model’s prediction performance can be suboptimal. With that caveat, our proposed approach is an ideal choice for identifying drivers of spatio-temporal events at country or regional-scale studies.
  • Updated License Show Changes
    License
    • https://opensource.org/licenses/MIT
    • https://creativecommons.org/licenses/by-nc-nd/4.0/
  • Updated Description Show Changes
    Description
    • This preprint version precedes the formal peer review and publication in IJGIS. Final version: https://doi.org/10.1080/13658816.2021.1965608
    • ABSTRACT
    • Machine learning (ML) interpretability has become increasingly crucial for identifying accurate and relevant structural relationships between spatial events and factors that explain them. Methodologically aspatial ML algorithms with an apparent high predictive power ignore characteristic non-stationary domain relationships in spatio-temporal data (e.g., dependence, heterogeneity), and can lead to incorrect interpretations and poor management decisions. This study addresses this critical methodological issue of ‘interpretability’ in ML-based modeling of structural relationships between spatio-temporal events and corresponding bio-physical drivers. Specifically, we present and evaluate a spatio-temporally interpretable random forest (iST-RF) modeling framework using the example of drivers of wildfire characteristics across the United States. Our experiments show that the spatio-temporal sampling and weighted prediction approach can improve predictive accuracy (76%) compared to the aspatial RF approach (70%), while also enhancing interpretations of the ML model’s spatio-temporal relevance for its ensemble prediction. This novel approach can help balance prediction and interpretation with fidelity in a spatial data science life cycle. However, challenges exist for predictive modeling when dataset is very small, because in such cases locally optimized sub-model’s prediction performance can be suboptimal. With that caveat, our proposed approach is an ideal choice for identifying drivers of spatio-temporal events at country or regional-scale studies.
    • ABSTRACT: Machine learning (ML) interpretability has become increasingly crucial for identifying accurate and relevant structural relationships between spatial events and factors that explain them. Methodologically aspatial ML algorithms with an apparent high predictive power ignore characteristic non-stationary domain relationships in spatio-temporal data (e.g., dependence, heterogeneity), and can lead to incorrect interpretations and poor management decisions. This study addresses this critical methodological issue of ‘interpretability’ in ML-based modeling of structural relationships between spatio-temporal events and corresponding bio-physical drivers. Specifically, we present and evaluate a spatio-temporally interpretable random forest (iST-RF) modeling framework using the example of drivers of wildfire characteristics across the United States. Our experiments show that the spatio-temporal sampling and weighted prediction approach can improve predictive accuracy (76%) compared to the aspatial RF approach (70%), while also enhancing interpretations of the ML model’s spatio-temporal relevance for its ensemble prediction. This novel approach can help balance prediction and interpretation with fidelity in a spatial data science life cycle. However, challenges exist for predictive modeling when dataset is very small, because in such cases locally optimized sub-model’s prediction performance can be suboptimal. With that caveat, our proposed approach is an ideal choice for identifying drivers of spatio-temporal events at country or regional-scale studies.
  • Updated Publication Date Show Changes
    Publication Date
    • 2021-09-05
    • 2021-08-31
  • Published
  • Added Creator Prasenjit Mitra
  • Added Creator Donna Jean Peuquet
  • Added Creator Alan H Taylor
  • Updated