Yong Zheng-Xin
I am a final-year PhD student at Brown University, advised by Stephen Bach. I am fortunate to be supported by the Open Philanthropy (now Coefficient Giving) grant for technical AI safety.
I am currently an Astra Safety Research Fellow with OpenAI, mentored by Miles Wang and Olivia Watkins.
Research
I spend a lot of time thinking about safe and beneficial AI, with the current focus on AGI preparedness research. I previously worked on multilingual LLMs so that existing AI systems benefits everyone equitably and safely, and now I want to make sure future AI systems would be aligned.
My research interests include:
- CoT oversight such as CoT monitorability/controllability.
- Frontier risk evaluations such as Kimi K2.5 safety and preparedness report (preprint 2026).
- Understanding why alignment does or does not generalize (EMNLP 2024; ICLR 2026).
- Adversarial robustness to (multilingual) jailbreaks (Best Paper @ NeurIPS 2023 SoLaR; NAACL 2025).