A multimodal attention-based model for tree species classification using LiDAR and satellite imagery (Papers Track)

Hadrien Sablon (PG&E); Rajen Bajgain (PG&E)

Paper PDF Poster File Cite
Forests Computer Vision & Remote Sensing

Abstract

Accurate mapping of tree species is crucial for wildfire mitigation, biodiversity conservation, and sustainable forest management under climate change. While advances in remote sensing and deep learning have improved species classification, scarcity of high-quality ground truth data, low-resolution sensors, and small study areas with limited species diversity hinder scalability and generalization. To address these limitations, we assembled a dataset of half a million data points from five distinct level-III ecoregions in California. Ground truth labels across more than 20 species were obtained from arborist-supported tree inventories. We developed three deep learning models: a LiDAR-derived depth-view model (DVM) that exploits structural characteristics, a satellite-based Surface Reflectance model (SRM) that leverages spectral information, and a novel multimodal framework (MXAT) built with attention mechanisms to learn interdependencies between these complementary data modalities. Tested across 20 tree taxa, DVM achieves strong overall performance (mean sensitivity = 57.5\%). In contrast, SRM exhibits limited predictive capabilities across most species but excels in identifying specific species such as Monterey Pine and Coast Redwood. Despite this performance gap, our findings show that LiDAR-based representations benefit substantially from integrating multispectral data: MXAT surpasses DVM baseline by nearly 5\%. These results demonstrate the effectiveness of well-structured multimodal architectures in leveraging the complementary strengths of LiDAR and satellite imagery at scale.