Yong Zheng-Xin
I am joining OpenAI to do AI safety research.
I finished my PhD at Brown University (advised by Stephen Bach), where I was supported by the Open Philanthropy (now Coefficient Giving) grant for technical AI safety.
Previous experience: Astra Safety Research Fellow (mentored by Miles Wang and Olivia Watkins), Research Scientist Intern at Meta, and Core Collaborator with Cohere Labs.
Research
I spend a lot of time thinking about safe and beneficial AI, with the current focus on AGI/ASI preparedness. I previously worked on multilingual LLMs so that AI systems can benefit everyone equitably and safely, and now I want to make sure future AI systems would be aligned.
I currently work on model organisms and scalable oversights such as CoT monitorability, and my past work includes:
- Frontier risk and preparedness evaluations such as for Kimi K2.5 (preprint 2026).
- Generalization of alignment (EMNLP 2024; ICLR 2026).
- Adversarial robustness to (multilingual) jailbreaks (Best Paper @ NeurIPS 2023 SoLaR; NAACL 2025).
Some previous work on multilingual LLMs: ACL 2023; Best Paper @ ACL 2024; INTERSPEECH 2025.