ATLAS: A spend classification benchmark for estimating scope 3 carbon emissions (Papers Track) Spotlight
Andrew Dumit (Watershed Technology, Inc.); Krishna Rao (Watershed Technology, Inc.); Travis Kwee (Watershed Technology, Inc.); Varsha Gopalakrishnan (Watershed Technology Inc.); Katherine Tsai (Watershed Technology, Inc.); Sangwon Suh (Watershed Technology, Inc.)
Abstract
The majority (70%) of companies reporting their value chain emissions rely on financial spend ledger and emissions factors per dollar. Accurate classification of expenditures to emissions factors is critical but complex, given the sheer number of line items and the diversity of how they are categorized and described. This is an area where Large Language Models (LLMs) can play a key role. However, there is currently no benchmark dataset to evaluate the performance of LLM-based solutions. Here, we introduce the Aggregate Transaction Ledgers for Accounting Sustainability dataset or, ATLAS, and the initial evaluation results of four models using ATLAS. ATLAS is the first spend classification benchmark and is comprised of 10,000 synthetic, labeled spend items reflecting the distribution of corporate expenditures. We evaluate four baseline models, with the best model achieving a top-1 accuracy of 57.3% and a top-3 accuracy of 72.2%. ATLAS enables systematic evaluation of LLMs for spend classification. Our results provide a starting point for advancing automated carbon accounting and sustainability reporting for spend- based emissions.