Yong Zheng-Xin
I am a final-year PhD student at Brown University advised by Stephen Bach. I am fortunate to be supported by the Open Philanthropy (now Coefficient Giving) grant for technical AI safety.
I am currently an Astra Safety Research Fellow with OpenAI, mentored by Miles Wang and Olivia Watkins.
Research
I spend a lot of time thinking about safe and beneficial AI. I used to work on multilingual LLMs so everyone can equally benefit from the frontier technology, and now I want to make sure AGI/ASI would be aligned.
What I am interested in includes:
- CoT oversights such as CoT monitorability/controllability.
- Frontier risk evaluations such as Kimi K2.5 safety and preparedness report (preprint 2026).
- Understanding why alignment does or does not generalize (EMNLP 2024; ICLR 2026).
- Adversarial robustness to (multilingual) jailbreaks (Best Paper @ NeurIPS 2023 SoLaR; NAACL 2025).