Yong Zheng-Xin
Starting a new role soon.
I finished my PhD at Brown University (advised by Stephen Bach), where I was supported by the Open Philanthropy (now Coefficient Giving) grant for technical AI safety.
Previous experience: Astra Safety Research Fellow (mentored by Miles Wang and Olivia Watkins), Research Scientist Intern at Meta, and Core Collaborator with Cohere Labs.
Research
I spend a lot of time thinking about safe and beneficial AI, with the current focus on AGI preparedness. I previously worked on multilingual LLMs so that AI systems can benefit everyone equitably and safely, and now I want to make sure future AI systems would be aligned.
My current research interests include:
- CoT oversight such as CoT monitorability and controllability.
- Frontier risk and preparedness evaluations such as for Kimi K2.5 (preprint 2026).
- Understanding why alignment does or does not generalize (EMNLP 2024; ICLR 2026).
- Adversarial robustness to (multilingual) jailbreaks (Best Paper @ NeurIPS 2023 SoLaR; NAACL 2025).
Some previous work on multilingual LLMs: ACL 2023; Best Paper @ ACL 2024; INTERSPEECH 2025.