## About

I am a postdoctoral researcher in the Computing + Mathematical Sciences Department at California Institute of Technology, working with Anima Anandkumar. I received my Ph.D. from Princeton University, where I was advised by Jia Deng and also worked with Olga Russakovsky and Danqi Chen.

My research aims to make machine learning capable of symbolic reasoning. I have approached the goal from two angles: (1) applying machine learning to symbolic reasoning tasks, such as automated theorem proving; (2) introducing symbolic components into machine learning models to make them more interpretable, verifiable, and data-efficient.

In addition, I also work on constructing and analyzing machine learning datasets, focusing on fairness, privacy, and mitigating dataset bias. They collectively contribute to a future of machine learning we can trust, even in real-world, high-stake applications.

[kaiyuy (at) caltech.edu] [Bio] [CV]

## Publications

Learning Symbolic Rules for Reasoning in Quasi-Natural Language

**Kaiyu Yang** and Jia Deng

*Under review*

[paper]

Generating Natural Language Proofs with Verifier-Guided Search

**Kaiyu Yang**, Jia Deng, and Danqi Chen

*Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022, Oral*

[paper] [code] [slides] [poster]

A Study of Face Obfuscation in ImageNet

**Kaiyu Yang**, Jacqueline Yau, Li Fei-Fei, Jia Deng, and Olga Russakovsky

*International Conference on Machine Learning (ICML), 2022*

[paper] [code] [slides] [talk] [poster] [project]

Strongly Incremental Constituency Parsing with Graph Neural Networks

**Kaiyu Yang** and Jia Deng

*Neural Information Processing Systems (NeurIPS), 2020*

[paper] [code] [slides] [talk] [poster]

Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D

Ankit Goyal, **Kaiyu Yang**, Dawei Yang, and Jia Deng

*Neural Information Processing Systems (NeurIPS), 2020, Spotlight*

[paper] [code]

Towards Fairer Datasets:

Filtering and Balancing the Distribution of the People Subtree in the ImageNet Hierarchy

**Kaiyu Yang**, Klint Qinami, Li Fei-Fei, Jia Deng, and Olga Russakovsky

*Conference on Fairness, Accountability, and Transparency (FAT*), 2020*

[paper] [slides] [talk] [blog] [media]

Learning to Prove Theorems via Interacting with Proof Assistants

**Kaiyu Yang** and Jia Deng

*International Conference on Machine Learning (ICML), 2019*

[paper] [code] [slides] [poster]

SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition

**Kaiyu Yang**, Olga Russakovsky, and Jia Deng

*International Conference on Computer Vision (ICCV), 2019*

[paper] [code] [poster]

Stacked Hourglass Networks for Human Pose Estimation

Alejandro Newell, **Kaiyu Yang**, and Jia Deng

*European Conference on Computer Vision (ECCV), 2016*

[paper] [code]

## Research

**Generating Natural Language Proofs**: Deductive reasoning—drawing logical conclusions from assumptions—is a challenging problem in NLP. We address it by a novel method NLProofS (*N*atural *L*anguage *P*roof *S*earch) for generating natural language proofs: given a hypothesis and a set of supporting facts in natural language, the model generates a proof tree indicating how to deduce the hypothesis from supporting facts.

**Learning Symbolic Rules for Reasoning**: Symbolic reasoning, rule-based symbol manipulation, is a hallmark of human intelligence. However, rule-based systems have had limited success competing with learning-based systems outside formalized domains such as automated theorem proving. We hypothesize that this is due to the manual construction of rules in past attempts. In this work, we ask how to build a rule-based system that can reason with natural language input but without the manual construction of rules. We propose *MetaQNL*, a “Quasi-Natural” language that can express both formal logic and natural language sentences, and *MetaInduce*, a learning algorithm that induces MetaQNL rules from training data.

**Face Obfuscation in ImageNet**: 977 out of 1000 categories in ImageNet are not people categories; nevertheless, many incidental people are in the images, whose privacy is a concern. We first annotate faces in the dataset. Then we investigate how face blurring—a typical obfuscation technique—impacts image classification and transfer learning.

**Strongly Incremental Constituency Parsing**: Psycholinguistic research suggests that human parsing is strongly incremental—humans grow a single parse tree by adding exactly one token at each step. We propose a strongly incremental transition system for parsing named *Attach-Juxtapose*. It represents a partial sentence using a single tree, and each action adds exactly one token into the partial tree. Based on our transition system, we develop a strongly incremental parser that achieves state of the art on Penn Treebank and Chinese Treebank.

**CoqGym**: We use machine learning to automatically prove theorems, including not only theorems in math but also theorems describing the behavior of software and hardware systems. Current theorem provers usually search for proofs represented at a low level, such as first-order logic and resolutions. Therefore they lack the high-level reasoning and problem-specific insights common to humans.

In contrast, we use a powerful set of tools called proof assistants (a.k.a. interactive theorem provers). These are software that assists human experts in proving theorems. They thus provide a high-level framework that is close to human mathematical reasoning. Instead of humans, we develop machine learning agents to interact with proof assistants. Our agent can learn from human interactions by imitation learning using a large amount of data available online. We use this data to construct a large-scale dataset for training/evaluating the agent. We also develop a baseline model that can prove many new theorems not provable by existing methods.

**Adversarial Crowdsourcing and SpatialSense**: Benchmarks in vision and language suffer from dataset bias—models can perform exceptionally well by exploiting simple cues without even looking at the image, which undermines the benchmark’s value in measuring visual reasoning abilities. We propose adversarial crowdsourcing to reduce dataset bias. Annotators are explicitly tasked with finding examples that are difficult to predict using simple cues such as 2D spatial configuration or language priors. Specifically, we introduce SpatialSense, a challenging dataset for spatial relation recognition collected via adversarial crowdsourcing.

**Minimally Contrastive Pairs and Rel3D**: We construct Rel3D: the first large-scale, human-annotated dataset for grounding spatial relations in 3D. It enables quantifying the effectiveness of 3D information in predicting spatial relations. Moreover, we propose minimally contrastive data collection—a novel crowdsourcing method for reducing dataset bias. The examples in Rel3D come in minimally contrastive pairs: two examples in a pair are almost identical but have different labels.

**Fairer and More Representative ImageNet** Computer vision technology is being used by many but remains representative of only a few. People have reported misbehavior of computer vision models, including offensive prediction results and lower performance for underrepresented groups. Current computer vision models are typically developed using datasets consisting of manually annotated images or videos; the data and label distributions in these datasets are critical to the models’ behavior.

In this paper, we examine ImageNet, a large-scale ontology of images that has spurred the development of many modern computer vision methods. We consider three key factors within the `person`

subtree of ImageNet that may lead to problematic behavior in downstream computer vision technology: (1) the stagnant concept vocabulary of WordNet, (2) the attempt at exhaustive illustration of all categories with images, and (3) the inequality of representation in the images within concepts. We seek to illuminate the root causes of these concerns and take the first steps to mitigate them constructively.

**Stacked Hourglass Networks**: We introduce the hourglass network: a novel convolutional network architecture for human pose estimation. It has become a standard component in many state-of-the-art methods for pose estimation.

## Teaching

I have been a teaching assistant for:

- COS484/584: Natural Language Processing, Princeton University
- Data Structures and Algorithms, Tsinghua University