Composing Open-domain Vision with RAG for Ocean Monitoring and Conservation (Proposals Track)

Sepand Dyanatkar (OnDeck Fisheries AI); Angran Li (OnDeck Fisheries AI); Alexander Dungate (OnDeck Fisheries AI)

Paper PDF NeurIPS 2024 Recorded Talk Cite
Computer Vision & Remote Sensing Earth Observation & Monitoring Ecosystems & Biodiversity Unsupervised & Semi-Supervised Learning

Abstract

Climate change's destruction of marine biodiversity is threatening communities and economies around the world which rely on healthy oceans for their livelihoods. The challenge of applying computer vision to niche, real-world domains such as ocean conservation lies in the dynamic and diverse environments where traditional top-down learning struggle with long-tailed distributions, generalization, and domain transfer. Scalable species identification for ocean monitoring is particularly difficult due to the need to adapt models to new environments and identify rare or unseen species. To overcome these limitations, we propose leveraging bottom-up, open-domain learning frameworks–specifically vision-language models (VLMs) combined with retrieval-augmented generation (RAG)–as a resilient, scalable solution for image and video analysis in marine applications. We validate this approach through a preliminary application in classifying fish from video onboard fishing vessels, demonstrating impressive emergent retrieval and prediction capabilities without domain-specific training or knowledge of the task itself.