Yong Zheng-Xin
I am a final-year PhD student at Brown University advised by Stephen Bach. I am fortunate to be supported by the Open Philanthropy (now Coefficient Giving) grant for technical AI safety.
I am currently an Astra Safety Research Fellow with OpenAI, mentored by Miles Wang and Olivia Watkins.
Research
I currently work on chain-of-thought monitorability as well as agentic failure mode elicitation.
I have also worked on the following (ordered by recency):
- Frontier risk evaluations such as for Kimi K2.5 (preprint 2026).
- Understanding why alignment does or does not generalize (EMNLP 2024; ICLR 2026).
- Adversarial robustness to jailbreaks (Best Paper, NeurIPS 2023 SoLaR; Best Paper, ACL 2024; NAACL 2025).
Personal site inspired by Tianyu Gao and Gregory Gundersen.