Yong Zheng-Xin

CS PhD @ Brown University

prof_pic.jpg

I am a final-year PhD student at Brown University advised by Stephen Bach. I am also an Astra Research Fellow working with OpenAI, mentored by Miles Wang. Previously, I was a research scientist intern at Meta AI and a research collaborator at Cohere Labs.

My research focuses on AI safety and alignment, and I am fortunate to be supported by the Open Philanthropy grant for technical AI safety. My relevant work includes:

  • Safety for reasoning models: Emergent self-jailbreaking behaviors by open-source reasoning models (ICLR 2026).
  • Safety for multilingual models: GPT-4 being jailbroken by low-resource languages (Best Paper, NeurIPS 2023 SoLaR). Cross-lingual generalization study of detoxification (EMNLP 2024) and finetuning attacks (NAACL 2025).

Previously, I contributed to multilingual frontier models including both instruction-following LLMs such as Aya and Bloom (Best Paper, ACL 2024; ACL 2023) and ASR speech models (INTERSPEECH 2025).


Selected Publications (see all)

  1. Ahmet Üstün* ,  Viraat Aryabumi* ,  Zheng-Xin Yong* , and 14 more authors
    ACL, 2024 (Best Paper Award)
  2. Zheng-Xin Yong ,  Cristina Menghini ,  and  Stephen Bach
    NeurIPS Workshop: Socially Responsible Language Modelling Research (SoLaR) , 2023 (Best Paper Award)