I work in Natural Language Processing with a particular interest in commonsense reasoning. My PhD topic is thinking of ways to diagnose current systems' reasoning capabilities beyond what a single performance score can tell us.
(2022) I made a crowdsourced semi-structured explanation dataset for a commonsense reasoning benchmark which can be used to train explanation-generating models or to compare with existing knowledge bases. Available here!
(2024) I also made a dataset of explanations with fine-grained quality ratings to aid in the development of automatic explanation evaluation metrics, as well as analyzing the behavior of large language models as evaluators. Available here!
In the upcoming future, I hope to work on developing more robust, transparent, and reliable reason-capable AI systems, with increasingly sophisticated evaluation methods to match.