Yong Zheng-Xin
Current: CS PhD @ Brown University, Astra Fellow
Past: Research Scientist Intern @ Meta AI, Research Collaborator @ Cohere Labs
I am a final-year PhD student at Brown University advised by Prof. Stephen Bach. I am fortunate to have interned/collaborated at Meta (GenAI, FAIR) and Cohere Labs, and my research is funded by Open Philanthropy grant for technical AI safety.
I work on making AI systems safe and helpful for everyone around the world. My recent research focuses on post-training, especially on reasoning and safety alignment. In particular, I work on understanding surprising properties for the reasoning chain-of-thoughts (CoTs), such as
- Cross-lingual reasoning through test-time scaling (preprint).
- Self-jailbreaking where models try to reason themselves out of safety guardrail after benign reasoning training (preprint)
- Being able to predict safety outcomes before models finish thinking (preprint).
A huge part of my previous research was on multilingual LLMs and speech models, especially on their alignment and capability on low-resource languages.
- I discovered that low-resource languages can jailbreak GPT-4 (⭑Best Paper Award, NeurIPS 2023 SoLaR Workshop; featured on New Scientist), with follow-up interpretability work on crosslingual detoxification (EMNLP 2024 Findings) and finetuning attacks (NAACL 2025 Findings). We also recently released a survey on multilingual AI safety (EMNLP 2025).
- I contributed to instruction-following models such as the Aya model (⭑Best Paper Award, ACL 2024) and studied how LLMs can learn low-resource languages through language adaptation (ACL 2023) and synthetic data (EMNLP 2024 Findings).
- I also worked on making speech models robust to new accents (INTERSPEECH 2025), which contributed to the Omnilingual ASR models (preprint).
Actively seeking full-time research roles in industry.
Featured work/preprints (see all)
-
- ACL, 2024 (Best Paper Award)
- NeurIPS Workshop: Socially Responsible Language Modelling Research (SoLaR) , 2023 (Best Paper Award)