Yong Zheng-Xin
CS PhD @ Brown University
I am a final-year PhD student at Brown University advised by Stephen Bach. My research interests are on AI safety and alignment, and I am fortunate to be supported by the Open Philanthropy grant for technical AI safety for my PhD study.
I am currently an Astra Research Fellow working with OpenAI, mentored by Miles Wang. I am working on CoT monitorability/obfuscation as well as agentic safety evaluation. My other relevant work includes:
- Safety for reasoning models: Emergent self-jailbreaking behaviors by open-source reasoning models (ICLR 2026).
- Safety for multilingual models: GPT-4 being jailbroken by low-resource languages (Best Paper, NeurIPS 2023 SoLaR). Cross-lingual generalization study of detoxification (EMNLP 2024) and finetuning attacks (NAACL 2025).
Previously, I was a research scientist intern at Meta AI and a research collaborator at Cohere Labs. I contributed to multilingual frontier models including both instruction-following LLMs such as Aya and Bloom (Best Paper, ACL 2024; ACL 2023) and ASR speech models (INTERSPEECH 2025).
Selected Publications (see all)
- ACL, 2024 (Best Paper Award)
- NeurIPS Workshop: Socially Responsible Language Modelling Research (SoLaR) , 2023 (Best Paper Award)