Network externality pricing
Dynamic pricing allows rideshare applications to vary prices based on spatio-temporal demand fluctuations. These prices do not take into account network externalities however, which can result from prices being too high or low, leading to utility propogating across networks. I'm working on incorporating network externalities into rideshare, using approximate online matching algorithms to maximize profit non-myopically.
Data collection for Noun Phrase Linking
Named entity linking and coreference resolution describe relationships within a document. Noun phrase linking is the task of combining named entity linking with coreference resolution, but is more challenging due ot the presence of anaphoric references. To combat this, I'm working on collecting data to train a noun phrase linker, which could be used to improve downstream tasks such as question answering. To improve data collection efficiency, I'm leveraging tools from mechanism design to maximize annotator efficiency.
Automatic Data Imputation
The rise of neural models has created the need for large volumes of high quality data. However, many times big data can be noisy, with some data being incoherent or erroneous. I'm working on developing automatic methods to detect such data, specifically in the context of discovering incongruous inputs for survey data.
Improved human-AI collaboration
Humans and AI work together across a variety of fields, including healthcare and autonomous driving. In these situations, it's important for AI to understand the strengths and weaknesses of the human it works with. We worked on improving learning-to-defer algorithms by incorporating fine-tuning, so AI are fine-tuned to particular human strengths and weaknesses.
Fairness in Rideshare
Rideshare platforms use AI-based matching algorithms to match riders and drivers. These algorithms typically aim to maximize profit, but these can have unintended consequences, such as disservicing minority riders. We worked on developing improved matching algorithms that take into account various definitions of fairness for both riders and drivers.
Toxicity in Open Source Communities
In open source communities such as Github, toxicity and burnout are major impedients to projects. We studied the prevalence of toxicity in these communities by designing a classifier, leveraging log-odds with Dirichlet prior to account for jargon. We used the classifier to find patterns over time and across language-based subcommunities.
Matrix Factorization for Mutational Signatures
Cancer operates through various processes which can have different mutational imprints on the genome. One tool to computationally extract these various processes is matrix factorization, which transforms mutation data into cancer signature data. We looked at incorporating external sources of information, such as patient metadata, into matrix factorization via graph regularization.