Yong Zheng-Xin
I am a final-year PhD student at Brown University advised by Stephen Bach. I am fortunate to be supported by the Open Philanthropy (now Coefficient Giving) grant for technical AI safety.
I am currently an Astra Safety Research Fellow with OpenAI, mentored by Miles Wang and Olivia Watkins.
Research
I work on AI safety, with a current focus on chain-of-thought monitorability and agentic failure mode elicitation.
My other relevant work includes:
- Frontier risk evaluations such as Kimi K2.5 safety and preparedness report (preprint 2026).
- Understanding why alignment does or does not generalize (EMNLP 2024; ICLR 2026).
- Adversarial robustness to (multilingual) jailbreaks (Best Paper @ NeurIPS 2023 SoLaR; NAACL 2025).
Used to contribute to massively multilingual models (ACL 2023; Best Paper @ ACL 2024).