Yong Zheng-Xin
CS PhD @ Brown University
I am a final-year PhD student at Brown University advised by Stephen Bach. I am also an Astra Research Fellow working with OpenAI, mentored by Miles Wang. Previously, I was a research scientist intern at Meta AI and a research collaborator at Cohere Labs.
My research focuses on AI safety and alignment, and I am fortunate to be supported by the Open Philanthropy grant for technical AI safety. My relevant work includes:
- Safety for reasoning models: Emergent self-jailbreaking behaviors by open-source reasoning models (ICLR 2026).
- Safety for multilingual models: GPT-4 being jailbroken by low-resource languages (Best Paper, NeurIPS 2023 SoLaR). Cross-lingual generalization study of detoxification (EMNLP 2024) and finetuning attacks (NAACL 2025).
Previously, I contributed to multilingual frontier models including both instruction-following LLMs such as Aya and Bloom (Best Paper, ACL 2024; ACL 2023) and ASR speech models (INTERSPEECH 2025).
Selected Publications (see all)
- ACL, 2024 (Best Paper Award)
- NeurIPS Workshop: Socially Responsible Language Modelling Research (SoLaR) , 2023 (Best Paper Award)