TweetDrought: A Deep-Learning Drought Impacts Recognizer based on Twitter Data (Papers Track)
Beichen Zhang (University of Nebraska-Lincoln); Frank Schilder (Thomson Reuters); Kelly Smith (National Drought Mitigation Center); Michael Hayes (University of Nebraska-Lincoln); Sherri Harms (University of Nebraska-Kearney); Tsegaye Tadesse (University of Nebraska-Lincoln)
Abstract
Acquiring a better understanding of drought impacts becomes increasingly vital under a warming climate. Traditional drought indices describe mainly biophysical variables and not impacts on social, economic, and environmental systems. We utilized natural language processing and bidirectional encoder representation from Transformers (BERT) based transfer learning to fine-tune the model on the data from the news-based Drought Impact Report (DIR) and then apply it to recognize seven types of drought impacts based on the filtered Twitter data from the United States. Our model achieved a satisfying macro-F1 score of 0.89 on the DIR test set. The model was then applied to California tweets and validated with keyword-based labels. The macro-F1 score was 0.58. However, due to the limitation of keywords, we also spot-checked tweets with controversial labels. 83.5% of BERT labels were correct compared to the keyword labels. Overall, the fine-tuned BERT-based recognizer provided proper predictions and valuable information on drought impacts. The interpretation and analysis of the model were consistent with experiential domain expertise.