Introduction

I am a Lecturer at the Department of Engineering Science, University of Oxford. My research focuses on scalable and robust sequential and strategic decision-making, where decisions must account for long-term consequences under uncertainty. This line of work is typically studied within, but is not limited to, reinforcement learning. My current research interests include:

computationally and sample-efficient reinforcement learning, especially through efficient planning with world models for physical systems with evolving dynamics;
robustness under distribution shift across space and time, including continual learning.

Email: yangchen Dot pan AT eng DOT ox DOT ac DOT uk

People

Qizhen Ying (2024-present, PhD, University of Oxford)

Publications

* indicates co-first authorship.

Preprints/Workshop/Work in progress

Variability measures for risk-averse RL.
…

Selected Refereed Publications

Temporal Difference Learning for Diffusion Models.
Qizhen Ying, Yangchen Pan, Victor Adrian Prisacariu, Junfeng Wen.
International Conference on Machine Learning (ICML), 2026.
An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models. [paper]
Yangchen Pan *, Junfeng Wen *, Chenjun Xiao, Philip Torr.
Journal of Artificial Intelligence Research (JAIR), 2025.
PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling. [paper]
Avery Ma, Yangchen Pan, Amir-massoud Farahmand.
International Conference on Machine Learning (ICML, spotlight), 2025.
Label Alignment Regularization for Distribution Shift. [paper]
Ehsan Imani, Guojun Zhang, Runjia Li, Jun Luo, Pascal Poupart, Philip Torr, Yangchen Pan.
Journal of Machine Learning Research (JMLR), 2024.
Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination. [paper]
Zhiyao Luo, Yangchen Pan, Peter Watkinson, Tingting Zhu.
International Conference on Machine Learning (ICML, spotlight), 2024.
A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization. [paper]
Yudong Luo, Yangchen Pan, Han Wang, Philip Torr, Pascal Poupart.
Reinforcement Learning Conference (RLC), 2024.
Understanding the robustness difference between SGD and adaptive gradient methods. [paper]
Avery Ma, Yangchen Pan, Amir-massoud Farahmand.
Transactions on Machine Learning Research (TMLR, featured certification), 2023.
An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient. [paper]
Yudong Luo, Guiliang Liu, Pascal Poupart, Yangchen Pan.
Conference on Neural Information Processing Systems (NeurIPS), 2023.
The In-Sample Softmax for Offline Reinforcement Learning. [paper]
Chenjun Xiao *, Han Wang *, Yangchen Pan, Adam White, Martha White.
International Conference on Learning Representations (ICLR, spotlight), 2023.
Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning. [paper]
Xutong Zhao, Yangchen Pan, Chenjun Xiao, Sarath Chandar, Janarthanan Rajendran.
Conference on Uncertainty in Artificial Intelligence (UAI), 2023.
Understanding and Mitigating the Limitations of Prioritized Experience Replay. [paper]
Yangchen Pan *, Jincheng Mei *, Amir-massoud Farahmand, Martha White, Hengshuai Yao, Mohsen Rohani, Jun Luo.
Conference on Uncertainty in Artificial Intelligence (UAI), 2022.
Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online. [paper]
Yangchen Pan, Kirby Banman, White Martha.
International Conference on Learning Representations (ICLR), 2021.
An implicit function learning approach for parametric modal regression. [paper]
Yangchen Pan, Ehsan Imani, Martha White, Amir-massoud Farahmand.
Conference on Neural Information Processing Systems (NeurIPS), 2020.
Maxmin Q-learning: Controlling the Estimation Bias of Q-learning. [paper]
Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White.
International Conference on Learning Representations (ICLR), 2020.
Frequency-based Search-control in Dyna. [paper]
Yangchen Pan *, Jincheng Mei *, Amir-massoud Farahmand.
International Conference on Learning Representations (ICLR), 2020.
Hill Climbing on Value Estimates for Search-control in Dyna. [paper]
Yangchen Pan, Hengshuai Yao, Amir-massoud Farahmand, Martha White.
International Joint Conference on Artificial Intelligence (IJCAI), 2019.
Organizing experience: a deeper look at replay mechanisms for sample-based planning in continuous state domains. [paper]
Yangchen Pan, Muhammad Zaheer, Adam White, Andrew Patterson, Martha White.
International Joint Conference on Artificial Intelligence (IJCAI), 2018.
Reinforcement learning with function-valued action spaces for partial differential equation control. [paper]
Yangchen Pan, Amir-massoud Farahmand, Martha White, Saleh Nabi, Piyush Grover, Daniel Nikovski.
International Conference on Machine Learning (ICML, long talk), 2018.
Adapting kernel representations online using submodular maximization. [paper]
Matthew Schlegel, Yangchen Pan, Jiecao Chen, Martha White.
International Conference on Machine Learning (ICML), 2017.
Effective sketching methods for value function approximation. [paper]
Yangchen Pan, Erfan Sadeqi Azer, Martha White.
Conference on Uncertainty in Artificial Intelligence (UAI), 2017.
Accelerated gradient temporal difference learning. [paper]
Yangchen Pan, Adam White, Martha White.
AAAI Conference on Artificial Intelligence (AAAI), 2017.

PhD thesis

Improving Sample Efficiency of Online Temporal Difference Learning. Yangchen Pan.

Code

You should be able to find links to the code repositories for the papers mentioned above. For those papers where the code is mine, you can access either the entire repository or the core parts of the code at below.

Teaching

2025 Hilary: C25 Optimization, University of Oxford. [website]
2024-2025 Hilary: Machine learning lab, University of Oxford. [website]
2023-2025 Trinity: CWM, Artificial Intelligence and Machine Learning with python, University of Oxford [website]
2019 Fall: CMPUT 466/566, Machine Learning, Teaching Assistant, University of Alberta
2019 Spring: CMPUT 272, Formal Systems and Logic in Computing Science, Teaching Assistant, University of Alberta
2016 Spring: CSCI C343, Data Structure, Associate Instructor (aka TA), Indiana University at Bloomington
2015 Fall: CSCI B503, Algorithm Design and Analysis, Associate Instructor (aka TA), Indiana University at Bloomington
2014 Fall: CSCI 1311 Discrete Structure I, Teaching Assistant, George Washington University