Large Language Models as a New Modality for Generalizable Earth Data Monitoring (Papers Track)
Tong Nie (Tongji University); Junlin He (The Hong Kong Polytechnic University); Wei Ma (The Hong Kong Polytechnic University)
Abstract
Earth observation data are critical for monitoring progress toward Sustainable Development Goals (SDGs), yet persistent challenges in accessibility, integration of multimodal data, and geographic bias hinder comprehensive global assessments. While satellite imagery paired with machine learning (SIML) offers cost-effective monitoring, it struggles with socioeconomic indicators, data inequity, and spatial biases. This paper presents a novel framework leveraging large language models (LLMs) as a complementary modality to address these limitations. By extracting geospatial knowledge from pretrained LLMs through structured prompting—encoding coordinates into rich, task-agnostic embeddings—we enable efficient prediction of diverse earth monitoring indicators using linear regression. Evaluated on 25 global tasks spanning from climate metrics (e.g., temperature) to socioeconomic variables (e.g., poverty rates), our method outperforms state-of-the-art SIML approaches, achieving higher accuracy and sample efficiency. Notably, LLM-derived representations exhibit reduced geographic bias compared to existing methods and inherently capture socioeconomic contexts that form semantically meaningful clusters aligned with regional development patterns.