DeepMyco - Dataset Generation for Dye Mycoremediation (Proposals Track)

Danika Gupta (The Harker Upper School)

Paper PDF NeurIPS 2024 Recorded Talk Cite
Heavy Industry and Manufacturing Natural Language Processing

Abstract

Textile dyes comprise 20% of global water pollution. Mycoremediation, a promising approach utilizing cheap, naturally growing fungi, has not seen scale production. While numerous studies indicate benefits, it is challenging to apply the specific learnings of each study to the combination of environmental factors present in a given physical site - a gap we believe machine learning can help fill if datasets become available. We propose an approach to drive machine learning research in mycoremediation by contributing a comprehensive dataset. We propose using advanced language models and vision transformers to extract and categorize experimental data from various research papers. This dataset will enable ML-driven innovation in matching fungi to specific dye types, optimizing remediation processes, and scaling up mycoremediation efforts effectively.