Yong Zheng-Xin
Computer Science Ph.D. student @ Brown University
Research Scientist Intern @ Meta AI (FAIR), Collaborator @ Cohere For AI
I am a incoming fourth-year Ph.D. student in Computer Science at Brown University, advised by Prof. Stephen Bach. I’m fortunate to have collaborated with amazing researchers at Cohere For AI and at Meta AI (FAIR and GenAI Trust Team). Currently interning at Meta AI and will be back to Brown in Spring 2025.
Lately, I focus on making multilingual LLMs safe. I currently research cross-lingual transfer of safety knowledge, such as toxicity (Findings of EMNLP 2024) and harmful-finetuning (coming soon). My other notable work in AI safety includes:
- I discovered cross-lingual vulnerability in LLM safety guardrails as I found “Low-Resource Languages Jailbreak GPT-4” (NeurIPS 2023 Socially Responsible Language Modeling Workshop, ⭑Best Paper Award). This discovery catalyzed the paradigm shift towards multilingual red-teaming and was highlighted by the UK Government and AI Safety Institute in the International Scientific Report on the Safety of Advanced AI 2024.
- I collaborated with Cohere For AI to perform multilingual safety red-teaming for Aya-101 (ACL 2024, ⭑Best Paper Award), which is the state-of-the-art open LLM that follows instructions in 101 languages.
I also work on making foundational models overcome language barriers so AI can serve all users around the world, including those who speak underrepresented languages. I have worked on both model-centric and data-centric solutions.
- I researched how to finetune multilingual LLMs, such as BLOOM+1 (ACL 2023), to adapt to low-resource languages unseen during pretraining.
- I proposed novel methods to generate training data for low-resource languages. For the very first time, synthetic labeled data generated by my proposed LexC-Gen (Findings of EMNLP 2024) can match the performance of manually collected training data for very low-resource languages in Indonesia and Africa.
- I helped build massively multilingual LLMs and speech technology:
- Meta AI (FAIR): I worked on mitigating accent bias for the Massively Multilingual Speech model.
- Cohere For AI: I served as language ambassador who coordinated the data collection efforts for Malay language in Aya dataset. In addition, I was part of safety red-teaming team for Aya models.
- BigScience: I contributed to LLMs such as BLOOM, T0 and mT0/BLOOMZ. In addition, I led the research efforts for adapting BLOOM to unseen languages.
As a Malaysian, I also contributed to NLP for Southeast Asian (SEA) languages. I’ve hosted *ACL tutorials, helped curate SEACrowd data hub (EMNLP 2024), and studied how well LLMs can handle SEA linguistic phenomenon, such as code-switching (EMNLP 2023 CALCS Workshop), and understand culture in SEA region (NeurIPS 2024).
Other Misc Stuff:
- If you want to chat or collaborate on any of the research directions above (or just talk about graduate schools), feel free to send an email to me:
contact [dot] yong @ brown [dot] edu
. - My passion hobby is dancing, especially salsa and bachata. I also dance a bit of Lindy Hop, Argentine Tango and K-pop.
I usually check out the dance scenes in the city when I travel to conferences ––– if you also enjoy dancing, hmu we can check them out together. - I went to Minerva University during undergrad so I had the opportunity to travel and live in six different cities around the world: San Francisco, Seoul, Hyderabad, Berlin Buenos Aires and London.
selected recent publications (see all)
- Aya model: An instruction finetuned open-access multilingual language modelACL, 2024 (Best Paper Award)
- Low-Resource Languages Jailbreak GPT-4NeurIPS Workshop: Socially Responsible Language Modelling Research (SoLaR) , 2023 (Best Paper Award)
news
09 / 2024 | LexC-Gen and mechanistic explanations of multilingual AI safety generalization were accepted to Findings of EMNLP 2024. SEACrowd was also accepted to EMNLP 2024. CVQA was accepted to NeurIPS 2024 Datasets & Benchmarks. |
---|---|
08 / 2024 | Aya Model paper received the ⭑Best Paper Award at ACL 2024. |
07 / 2024 | Gave a talk about multilingual AI safety at London Data Week (organized by The Alan Turing Institute and supported by Mayor of London). |
06 / 2024 | Meta AI: Started my research scientist internship at Meta AI (FAIR), working on Massively Multilingual Speech (MMS) models. Also collaborated with GenAI Trust Team on a multilingual safety project. |
02 / 2024 | Aya model and dataset papers are released! I presented Aya multilingual safety research at Aya Grand Finale. |
11 / 2023 | Co-organized the tutorial of Current Status of NLP in South East Asia at AACL 2023. |
10 / 2023 | “Low-Resource Languages Jailbreak GPT-4” received the ⭑Best Paper Award at NeurIPS 2023 Socially Responsible Language Modeling (SoLaR) workshop. |
09 / 2023 | Cohere For AI: Joining the Responsible Deployment Team for Aya red-teaming. |
05 / 2023 | Interviewed by Wired on our code-switching paper and grassroot research initiative for Southeast Asian (SEA) languages. |
03 / 2022 | T0 is accepted to ICLR 2022 (Spotlight) and its blog post is out! PromptSource is also accepted to ACL 2022 Demo track. |
06 / 2021 | Started PhD at Brown University. |