Kaiyu Yang

I am a research scientist at Meta Fundamental AI Research (FAIR), New York. Before joining Meta, I was a postdoctoral scholar at Caltech, working with Pietro Perona, Yisong Yue, and Swarat Chaudhuri. I received my Ph.D. from Princeton University, where I was advised by Jia Deng and also worked with Olga Russakovsky and Danqi Chen.

杨凯峪 / kaiyuy [MASK] meta [MASK] com / CV / Bio / Google Scholar / GitHub

News

We're organizing the MATH-AI workshop at NeurIPS 2025 and are calling for papers. See you in San Diego!
We released Goedel-Prover-V2, the strongest open-source theorem proving model to date.
We introduced Verina: a benchmark for verifiable code generation in Lean.
We released a position paper—Formal Mathematical Reasoning: A New Frontier in AI

Research

I work on AI for Mathematics (AI4Math), especially formal mathematical reasoning, i.e., mathematical reasoning grounded in formal systems such as Lean.

Large language models (LLMs) for formal theorem proving: [CoqGym], [LeanDojo], [LIPS], [Goedel-Prover]
Autoformalization: [LeanEuclid]
Verifiable code generation: [Verina]
AI as copilots for mathematicians: [Lean Copilot]
LLMs for solving problems in mathematics and sciences: [SciInstruct]
Test-time compute for reasoning: [NLProofS]
Neuro-symbolic machine learning: [MetaQNL], [LIPS]

Verina: Benchmarking Verifiable Code Generation
Zhe Ye, Zhengxu Yan, Jingxuan He, Timothe Kasriel, Kaiyu Yang, Dawn Song
Preprint, 2025
arXiv / project / data / code

We introduce Verina (Verifiable Code Generation Arena), a high-quality benchmark for verifiable code generation (joint generation of code, specifications, and proofs) in Lean

Formal Mathematical Reasoning: A New Frontier in AI
Kaiyu Yang, Gabriel Poesia, Jingxuan He, Wenda Li, Kristin Lauter, Swarat Chaudhuri, Dawn Song
International Conference on Machine Learning (ICML), Position Papers Track, 2025, Spotlight
arXiv

This position paper advocates for formal mathematical reasoning, i.e., mathematical reasoning grounded in formal systems such as proof assistants. It is complementary to the informal approach (training LLMs on mathematical texts) and is arguably indispensable for advancing AI4Math to the next level.

Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning
Zenan Li*, Zhaoyu Li* Wen Tang, Xian Zhang, Yuan Yao, Xujie Si, Fan Yang, Kaiyu Yang†, Xiaoxing Ma† († equal advising)
International Conference on Learning Representations (ICLR), 2025
arXiv / code

To prove inequalities in math competitions, we analyze and distill human techniques into scaling and rewriting, which are well-suited for symbolic methods and LLMs respectively. By integrating LLMs with domain-specific mathematical insights, our approach substantially outperforms existing methods.

	Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, Chi Jin Conference on Language Modeling (COLM), 2025 arXiv / project (V1) / project (V2) / code We introduce Goedel-Prover, an open-source, state-of-the-art LLM for automated theorem proving in Lean. The key is to synthesize 1.64 million formal statements through autoformalization.
	Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean Peiyang Song, Kaiyu Yang, Anima Anandkumar International Conference on Neuro-symbolic Systems (NeuS), 2025 arXiv / code / demo / talk / media We introduce a framework for running neural network inference directly in Lean. It enables programmers to build various LLM-based proof automation tools that integrate seamlessly into the workflow of Lean users, including tools for suggesting proof steps and completing intermediate proof goals using LLMs.
	SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models Dan Zhang, Ziniu Hu, Sining Zhoubian, Zhengxiao Du, Kaiyu Yang, Zihan Wang, Yisong Yue, Yuxiao Dong, Jie Tang Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, 2024 arXiv / code We curated SciInstruct, a diverse and high-quality dataset of college-level mathematics, physics, chemistry, and formal proofs. Using SciInstruct to finetune the ChatGLM family of LLMs, we introduce SciGLM, a suite of scientific language models for college-level mathematical/scientific reasoning.
	A Survey on Deep Learning for Theorem Proving Zhaoyu Li, Jialiang Sun, Logan Murphy, Qidong Su, Zenan Li, Xian Zhang, Kaiyu Yang, Xujie Si Conference on Language Modeling (COLM), 2024 arXiv / code We present the first comprehensive survey of deep learning for theorem proving and autoformalization.
	Autoformalizing Euclidean Geometry Logan Murphy, Kaiyu Yang*, Jialiang Sun, Zhaoyu Li, Anima Anandkumar, Xujie Si ( equal contribution) International Conference on Machine Learning (ICML), 2024 arXiv / code We release LeanEuclid, a benchmark for testing autoformalization, consisting of Euclid's Elements (Book I) manually formalized in Lean. It is challenging for state-of-the-art LLMs like GPT-4V. Furthermore, the process of constructing LeanEuclid has uncovered intriguing ambiguities in Euclid's original works.
	LeanDojo: Theorem Proving with Retrieval-Augmented Language Models Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, Anima Anandkumar Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, 2023, Oral presentation arXiv / project / code / talk / slides / media Can LLMs generate mathematical proofs that can be rigorously checked? We release LeanDojo: an open-source playground consisting of toolkits, benchmarks, and models for LLMs to prove formal theorems in the Lean proof assistant.
	Infinite Photorealistic Worlds using Procedural Generation Alexander Raistrick, Lahav Lipson, Zeyu Ma, Lingjie Mei, Mingzhe Wang, Yiming Zuo, Karhan Kayan, Hongyu Wen, Beining Han, Yihan Wang, Alejandro Newell, Hei Law, Ankit Goyal, Kaiyu Yang, Jia Deng Conference on Computer Vision and Pattern Recognition (CVPR), 2023 arXiv / project / code Data drives progress in computer vision. We introduce Infinigen*: a generator of unlimited high-quality 3D data. 100% procedural, no external assets, no AI. Free and open source.
	Learning Symbolic Rules for Reasoning in Quasi-Natural Language Kaiyu Yang and Jia Deng Transactions on Machine Learning Research (TMLR), 2023 arXiv / code We propose MetaQNL, a symbolic "Quasi-Natural" language that can express both formal logic and natural language. Instead of manually constructing MetaQNL rules, we propose MetaInduce: an algorithm for learning rules from data.
	Generating Natural Language Proofs with Verifier-Guided Search Kaiyu Yang, Jia Deng, Danqi Chen Empirical Methods in Natural Language Processing (EMNLP), 2022, Oral presentation arXiv / code / slides We introduce NLProofS (Natural Language Proof Search) for multi-step logical reasoning in natural language. Given a hypothesis and a set of supporting facts, it generates a proof tree indicating how to derive the hypothesis from supporting facts.
	A Study of Face Obfuscation in ImageNet Kaiyu Yang, Jacqueline Yau, Li Fei-Fei, Jia Deng, Olga Russakovsky International Conference on Machine Learning (ICML), 2022 arXiv / code / slides / talk / project / media We annotate human faces in ImageNet and obfuscate them for privacy protection. We show that face obfuscation does not hurt image classification and transfer learning.
	Strongly Incremental Constituency Parsing with Graph Neural Networks Kaiyu Yang and Jia Deng Neural Information Processing Systems (NeurIPS), 2020 arXiv / code / slides / talk We propose a novel transition-based constituency parser named Attach-Juxtapose, inspired by how humans perform parsing.
	Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D Ankit Goyal, Kaiyu Yang, Dawei Yang, Jia Deng Neural Information Processing Systems (NeurIPS), 2020, Spotlight arXiv / code We propose Minimally Contrastive Data Collection: a novel crowdsourcing method for reducing dataset bias. And we use it to construct Rel3D—the first large-scale, human-annotated dataset for grounding spatial relations in 3D.
	Filtering and Balancing the Distribution of the People Subtree in the ImageNet Hierarchy Kaiyu Yang, Klint Qinami, Li Fei-Fei, Jia Deng, Olga Russakovsky Conference on Fairness, Accountability, and Transparency (FAT)*, 2020 arXiv / slides / talk / blog / media We reveal and mitigate fairness issues of ImageNet, filtering its concept vocabulary and balancing its representation of various demographic groups in images.
	Learning to Prove Theorems via Interacting with Proof Assistants Kaiyu Yang and Jia Deng International Conference on Machine Learning (ICML), 2019 arXiv / code / slides We introduce CoqGym, one of the first and largest datasets for theorem proving in proof assistants, and ASTactic, a deep learning prover generating tactics as programs.
	SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition Kaiyu Yang, Olga Russakovsky, Jia Deng International Conference on Computer Vision (ICCV), 2019 arXiv / code We propose Adversarial Crowdsourcing to reduce dataset bias and use it to construct SpatialSense, a challenging dataset for recognizing spatial relations in images.
	Stacked Hourglass Networks for Human Pose Estimation Alejandro Newell, Kaiyu Yang, Jia Deng European Conference on Computer Vision (ECCV), 2016 arXiv / code We introduce Stacked Hourglass Networks—one of the most popular architectures for human pose estimation, object detection, and more.

Workshops & Tutorials

I'm a co-organizer of the following events:

The 5th Workshop on Mathematical Reasoning and AI @ NeurIPS 2025
The 3rd Workshop on Mathematical Reasoning and AI @ NeurIPS 2023 (Video)
Tutorial on Machine Learning for Theorem Proving @ NeurIPS 2023 (Video)

Media

My works are covered by:

Mathematicians’ Newest Assistants Are Artificially Intelligent, Scientific American, 2024
Can LLMs Generate Mathematical Proofs that can be Rigorously Checked?, MarkTechPost, 2023
Exploring the Tradeoff Between Privacy and Algorithm Performance, Princeton Insights, 2022
Researchers Devise Approach to Reduce Biases in Computer Vision Data Sets, Princeton Engineering News, 2020
AI Is Biased. Here's How Scientists Are Trying to Fix It, Wired, 2019

Mentoring

I'm fortunate to have worked with many talented students and junior researchers:

Bartosz Piotrowski: Postdoc @ Meta FAIR
Zhaoyu Li: Ph.D. student @ University of Toronto
Jiacheng Chen: Undergrad @ South China University of Technology -> Incoming Ph.D. student @ University of Toronto
Peiyang Song: Undergrad @ UCSB -> Undergrad @ Caltech
Rahul Chalamala: Undergrad @ Caltech -> Researcher @ Together AI
Shixing Yu: Master's student @ UT Austin -> Ph.D. student @ Cornell
Gene Chou: Undergrad @ Princeton -> Ph.D. student @ Cornell
Jacqueline Yau: Master’s student @ Stanford -> Ph.D. student @ UIUC

Teaching

Guest Co-instructor, Advanced Large Language Model Agents, UC Berkeley & MOOC
Guest Lecturer, AIST 5030: Generative Artificial Intelligence, Chinese University of Hong Kong
Guest Lecturer, CS 159: Large Language Models for Reasoning, Caltech
Teaching Assistant, COS 485/585: Natural Language Processing, Princeton University
Head Teaching Assistant, Data Structures and Algorithms, Tsinghua University

Service

Area Chair: ECCV 2024, ICML 2025
Reviewer: ICML, NeurIPS, ICLR, etc.

Website template credit: Jon Barron