Chemistry & Materials

Tutorials

Blog Posts

Discussion Seminars and Webinars

Innovation Grants

Talks

Workshop Papers

Venue Title
NeurIPS 2024 AI-Driven Predictive Modeling of PFAS Contamination in Aquatic Ecosystems: Exploring A Geospatial Approach (Papers Track)
Abstract and authors: (click to expand)

Abstract: Per- and polyfluoroalkyl substances (PFAS), a class of synthetic fluorinated compounds termed “forever chemicals”, have garnered significant attention due to their persistence, widespread environmental presence, bioaccumulative properties, and associated risks for human health. Their presence in aquatic ecosystems highlights the link between human activity and the hydrological cycle. They also disrupt aquatic life, interfere with gas exchange, and disturb the carbon cycle, contributing to greenhouse gas emissions and exacerbating climate change. Federal agencies, state governments and non-government research and public interest organizations have emphasized the need for documenting the sites and the extent of PFAS contamination. However, the time-consuming and expensive nature of data collection and analysis poses challenges. It hinders the rapid identification of locations at high risk of PFAS contamination, which may then require further sampling or remediation. To address this data limitation, our study leverages a novel geospatial dataset, machine learning models including frameworks such as Random Forest, IBM-NASA's Prithvi and UNet, and geospatial analysis to predict regions with high PFAS concentrations in surface water. Using fish data from the National Rivers and Streams Assessment (NRSA) dataset by the Environmental Protection Agency (EPA), our analysis suggests the potential value of machine learning based models for targeted deployment of sampling investigations and remediation efforts.

Authors: Jowaria Khan (University of Michigan); David Andrews (Environmental Working Group); Kaley Beins (Environmental Working Group); Sydney Evans (Environmental Working Group); Alexa Friedman (Environmental Working Group); Elizabeth Bondi-Kelly (MIT)

NeurIPS 2024 Multimodal AI framework for predicting candidate high temperature superconductors (Proposals Track)
Abstract and authors: (click to expand)

Abstract: Materials science is at the forefront of addressing some of the most pressing challenges of our era, particularly in enhancing energy efficiency and sustainability. One of the most promising avenues in this field is the study of superconductors—materials that, when cooled below a critical temperature (Tc), exhibit zero electrical resistance. This unique property not only eliminates energy loss due to resistance but also enables a wide range of advanced technologies, such as MRI machines, magnetically levitating trains, and other high-efficiency systems. Superconductors can significantly reduce the carbon footprint of power transmission and other industrial applications. Given the complexity and importance of predicting candidate and practical high-temperature superconductors, we propose to develop a multimodal AI framework to predict new high-Tc superconducting materials. By integrating various material properties, including structural and compositional data, we seek to study patterns and relationships that could guide the discovery of new high-temperature superconductors. Success in this endeavor could significantly reduce energy losses in electrical systems, contributing to the fight against climate change.

Authors: Nidhish Sagar (Massachusetts Institute of Technology); Eslam G. Al-Sakkari (Polytechnique Montréal); Ahmed Ragab (Polytechnique Montréal)

ICLR 2024 Analyzing the secondary wastewater-treatment process using Faster R-CNN and YOLOv5 object detection algorithms (Papers Track)
Abstract and authors: (click to expand)

Abstract: The activated sludge (AS) process is the most common type of secondary wastewater treatment, applied worldwide. Due to the complexity of microbial communities, imbalances between the different types of bacteria may occur and disturb the process, with pronounced economical and environmental consequences. Microscopic inspection of the morphology of flocs and microorganisms provides key information on AS properties and function. This is a time-consuming, highly skilled, and expensive process that is not readily available in all locations. Thus, most wastewater-treatment plants do not carry out this essential analysis, resulting in frequent operational faults. In this study, we develop a novel deep learning (DL) object detection algorithm to analyze and monitor the AS process based on a unique microscopic image database of flocs and microorganisms. Specifically, we applied YOLOv5 and Faster R-CNN algorithms as tools for segmentation and object detection to analyze the wastewater. The mean average precision (mAP) of the YOLOv5 was 0.67, outperforming the Faster R-CNN by 15%. Histogram equalization preprocessing of both bright-field and phase-contrast images significantly improved the results of the algorithm in all classes. In the case of YOLOv5, the mAP increased by 16.67%, to 0.77, where the AP of protozoa, filaments, and open floc classes outperformed the previous model by over 20%. These results demonstrate the potential of leveraging DL algorithms to enhance the analysis and monitoring of WWTPs in an affordable manner, consequently reducing environmental pollution caused by contaminated effluent. The fundamental challenge addressed herein has important global relevance, especially in an era in which the demand for high-quality wastewater reuse is expected to increase dramatically.

Authors: Offir Inbar (Tel-Aviv University); Moni Shahar (Tel Aviv University); Jacob Gidron (Tel-Aviv University); Ido Cohen (Tel-Aviv University); Dror Avisar (Tel-Aviv University)

ICLR 2024 Explaining Zeolite Synthesis-Structure Relationships using Aggregated SHAP Analysis (Papers Track)
Abstract and authors: (click to expand)

Abstract: Zeolites, crystalline aluminosilicate materials with well-defined porous structures, have emerged as versatile materials with applications in carbon capture. Hydrothermal synthesis is a widely used method for zeolite production, offering control over crystallinity and and pore size. However, the intricate interplay of synthesis parameters necessitates a comprehensive understanding to optimize the synthesis process. We train a supervised classification machine learning model on ZeoSyn (a dataset of zeolite synthesis routes) to predict the zeolite framework product given a synthesis route. Subsequently, we leverage SHapley Additive Explanations (SHAP) to reveal key synthesis-structure relationships in zeolites. To that end, we introduce an aggregation SHAP approach to extend such analysis to explain the formation of composite building units (CBUs) of zeolites. Analysis at this unprecedented scale sheds light on key synthesis parameters driving zeolite crystallization.

Authors: Elton Pan (MIT)

ICLR 2024 Literature Mining with Large Language Models to Assist the Development of Sustainable Building Materials (Papers Track)
Abstract and authors: (click to expand)

Abstract: Concrete industry, as one of the significant sources of carbon emissions, drives the urgency for its decarbonization that requires a shift to alternative materials. However, the absence of systematic knowledge summary remains a challenge for further development of sustainable building materials. This work offers a cost-efficient strategy for information extraction tasks in complex terminology settings using small (2.8B) large language models (LLMs) with well-designed instruction-completion schemes and fine-tuning strategies, introducing a dataset cataloging civil engineering applications of alternative materials. The Multiple Choice instruction scheme significantly improves model accuracies in entity inference from non-Noun-Phrase sources, with supervised fine-tuning benefiting from straightforward tokenized representations of choices. We also demonstrate the utility of the dataset by extracting valuable insights into promising applications of alternative materials from knowledge graph representations.

Authors: Yifei Duan (Massachusetts Institute of Technology); Yixi Tian (Massachusetts Institute of Technology); Soumya Ghosh (IBM Research); Richard Goodwin (IBM T.J. Watson Research Center); Vineeth Venugopal (Massachusetts Institute of Technology); Jeremy Gregory (Massachusetts Institute of Technology); Jie Chen (IBM Research); Elsa Olivetti (Massachusetts Institute of Technology)

NeurIPS 2023 Scaling Sodium-ion Battery Development with NLP (Papers Track)
Abstract and authors: (click to expand)

Abstract: Sodium-ion batteries (SIBs) have been gaining attention for applications like grid-scale energy storage, largely owing to the abundance of sodium and an expected favorable $/kWh figure. SIBs can leverage the well-established manufacturing knowledge of Lithium-ion Batteries (LIBs), but several materials synthesis and performance challenges for electrode materials need to be addressed. This work extracts a large database of challenges restricting the performance and synthesis of SIB cathode active materials (CAMs) and pairs them with corresponding mitigation strategies from the SIB literature by employing custom natural language processing (NLP) tools. The derived insights enable scientists in research and industry to navigate a large number of proposed strategies and focus on impactful scalability-informed mitigation strategies to accelerate the transition from lab to commercialization.

Authors: Mrigi Munjal (Massachusetts Institute of Technology); Thorben Pein (TU Munich); Vineeth Venugopal (Massachusetts Institute of Technology); Kevin Huang (Massachusetts Institute of Technology); Elsa Olivetti (Massachusetts Institute of Technology)

NeurIPS 2023 Predicting Adsorption Energies for Catalyst Screening with Transfer Learning Using Crystal Hamiltonian Graph Neural Network (Proposals Track)
Abstract and authors: (click to expand)

Abstract: As the world moves towards a clean energy future to mitigate the risks of climate change, the discovery of new catalyst materials plays a significant role in enabling the sustainable production and transformation of energy [2]. The development and verification of fast, accurate, and efficient artificial intelligence and machine learning techniques is critical to shortening time-intensive calculations, reducing costs, and improving computational feasibility. We propose applying the Crystal Hamiltonian Graph Neural Network (CHGNet) on the OC20 dataset in order to iteratively perform structure-to-energy and forces calculations and identify the lowest energy across relaxed structures for a given adsorbate-surface combination. CHGNet's predictions will be compared and benchmarked to corresponding values calculated by density functional theory (DFT) [7] and other models to determine its efficacy.

Authors: Angelina Chen (Foothill College/Lawrence Berkeley National Lab); Hui Zheng (Lawrence Berkeley National Lab); Paula Harder (Mila)

ICLR 2023 Graph Neural Network Generated Metal-Organic Frameworks for Carbon Capture (Proposals Track)
Abstract and authors: (click to expand)

Abstract: The level of carbon dioxide (CO2) in our atmosphere is rapidly rising and is projected to double today‘s levels to reach 1,000 ppm by 2100 under certain scenarios, primarily driven by anthropogenic sources. Technology that can capture CO2 from anthropogenic sources, remove from atmosphere and sequester it at the gigaton scale by 2050 is required stop and reverse the impact of climate change. Metal-organic frameworks (MOFs) have been a promising technology in various applications including gas separation as well as CO2 capture from point-source flue gases or removal from the atmosphere. MOFs offer unmatched surface area through their highly porous crystalline structure and MOF technology has potential to become a leading adsorption-based CO2 separation technology providing high surface area, structure stability and chemical tunability. Due to its complex structure, MOF crystal structure (atoms and bonds) cannot be easily represented in tabular format for machine learning (ML) applications whereas graph neural networks (GNN) have already been explored in representation of simpler chemical molecules. In addition to difficulty in MOF data representation, an infinite number of combinations can be created for MOF crystals, which makes ML applications more suitable to alleviate dependency on subject matter experts (SME) than conventional computational methods. In this work, we propose training of GNNs in variational autoencoder (VAE) setting to create an end-to-end workflow for the generation of new MOF crystal structures directly from the data within the crystallographic information files (CIFs) and conditioned by additional CO2 performance values.

Authors: Zikri Bayraktar (Schlumberger Doll Research); Shahnawaz Molla (Schlumberger Doll Research); Sharath Mahavadi (Schlumberger Doll Research)

NeurIPS 2022 AutoML for Climate Change: A Call to Action (Papers Track)
Abstract and authors: (click to expand)

Abstract: The challenge that climate change poses to humanity has spurred a rapidly developing field of artificial intelligence research focused on climate change applications. The climate change ML (CCML) community works on a diverse, challenging set of problems which often involve physics-constrained ML or heterogeneous spatiotemporal data. It would be desirable to use automated machine learning (AutoML) techniques to automatically find high-performing architectures and hyperparameters for a given dataset. In this work, we benchmark popular Auto ML libraries on three high-leverage CCML applications: climate modeling, wind power forecasting, and catalyst discovery. We find that out-of-the-box AutoML libraries currently fail to meaningfully surpass the performance of human-designed CCML models. However, we also identify a few key weaknesses, which stem from the fact that most AutoML techniques are tailored to computer vision and NLP applications. For example, while dozens of search spaces have been designed for image and language data, none have been designed for spatiotemporal data. Addressing these key weaknesses can lead to the discovery of novel architectures that yield substantial performance gains across numerous CCML applications. Therefore, we present a call to action to the AutoML community, since there are a number of concrete, promising directions for future work in the space of AutoML for CCML. We release our code and a list of resources at https://github.com/climate-change-automl/climate-change-automl.

Authors: Renbo Tu (University of Toronto); Nicholas Roberts (University of Wisconsin-Madison); Vishak Prasad C (Indian Institute Of Technology, Bombay); Sibasis Nayak (Indian Institute of Technology, Bombay); Paarth Jain (Indian Institute of Technology Bombay); Frederic Sala (University of Wisconsin-Madison); Ganesh Ramakrishnan (IIT Bombay); Ameet Talwalkar (CMU); Willie Neiswanger (Stanford University); Colin White (Abacus.AI)

ICML 2021 A multi-task learning approach to enhance sustainable biomolecule production in engineered microorganisms (Proposals Track)
Abstract and authors: (click to expand)

Abstract: A sustainable alternative to sourcing many materials humans need is metabolic engineering: a field that aims to engineer microorganisms into biological factories that convert renewable feedstocks into valuable biomolecules (i.e., jet fuel, medicine). Microorganism factories must be genetically optimized using predictable DNA sequence tools, however, for many organisms, the exact DNA sequence signals defining their genetic control systems are poorly understood. To better decipher these DNA signals, we propose a multi-task learning approach that uses deep learning and feature attribution methods to identify DNA sequence signals that control gene expression in the methanotroph M. buryatense. This bacterium consumes methane, a potent greenhouse gas. If successful, this work would enhance our ability to build gene expression tools to more effectively engineer M. buryatense into an efficient biomolecule factory that can divert methane pollution into valuable, everyday materials.

Authors: Erin Wilson (University of Washington); Mary Lidstrom (University of Washington); David Beck (University of Washington)