publications

(*) indicates co-first-authorship.

2024

  1. Preference Tuning For Toxicity Mitigation Generalizes Across Languages
    Xiaochen Li* ,  Zheng-Xin Yong* ,  and  Stephen H Bach
    EMNLP Findings, 2024
  2. LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons
    Zheng-Xin Yong ,  Cristina Menghini ,  and  Stephen H Bach
    EMNLP Findings, 2024
  3. SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
    Holy Lovenia ,  Rahmad Mahendra ,  Salsabil Maulana Akbar , and 58 more authors
    EMNLP, 2024
  4. CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
    David Romero ,  Chenyang Lyu ,  Haryo Akbarianto Wibowo , and 72 more authors
    NeurIPS 2024 Datasets and Benchmarks, 2024 (Oral Presentation)
  5. Aya model: An instruction finetuned open-access multilingual language model
    Ahmet Üstün* ,  Viraat Aryabumi* ,  Zheng-Xin Yong* , and 14 more authors
    ACL, 2024 (Best Paper Award)
  6. A safe harbor for ai evaluation and red teaming
    Shayne Longpre ,  Sayash Kapoor ,  Kevin Klyman , and 20 more authors
    ICML, 2024

2023

  1. Bloom: A 176b-parameter open-access multilingual language model
    BigScience Collaboration
    arXiv, 2023
  2. Current Status of NLP in South East Asia with Insights from Multilingualism and Language Diversity
    Alham Fikri Aji ,  Jessica Zosa Forde ,  Alyssa Marie Loo , and 8 more authors
    AACL-IJCNLP 2023 Tutorials , 2023
  3. Representativeness as a Forgotten Lesson for Multilingual and Code-switched Data Collection and Preparation
    A. Seza Doğruöz ,  Sunayana Sitaram ,  and  Zheng-Xin Yong
    EMNLP Findings , 2023
  4. Low-Resource Languages Jailbreak GPT-4
    Zheng-Xin Yong ,  Cristina Menghini ,  and  Stephen Bach
    NeurIPS Workshop: Socially Responsible Language Modelling Research (SoLaR) , 2023 (Best Paper Award)
  5. Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages
    Zheng-Xin Yong ,  Ruochen Zhang ,  Jessica Forde , and 13 more authors
    EMNLP Workshop: Computational Approaches to Linguistic Code-Switching (CALCS) , 2023
    Also featured in: WIRED
  6. The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges
    Genta Winata ,  Alham Fikri Aji ,  Zheng-Xin Yong , and 1 more author
    ACL Findings , 2023
  7. BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting
    Zheng-Xin Yong ,  Hailey Schoelkopf ,  Niklas Muennighoff , and 12 more authors
    ACL , 2023
  8. Crosslingual Generalization through Multitask Finetuning
    Niklas Muennighoff ,  Thomas Wang ,  Lintang Sutawika , and 16 more authors
    ACL , 2023

2022

  1. What Language Model to Train if You Have One Million GPU Hours?
    Teven Le Scao ,  Thomas Wang ,  Daniel Hesslow , and 15 more authors
    EMNLP Findings , 2022
  2. PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
    Stephen Bach* ,  Victor Sanh* ,  Zheng-Xin Yong , and 24 more authors
    ACL Demo , 2022
  3. Frame Shift Prediction
    Zheng-Xin Yong ,  Patrick D. Watson ,  Tiago Timponi Torrent , and 2 more authors
    LREC , 2022
  4. Multitask Prompted Training Enables Zero-Shot Task Generalization
    Victor Sanh* ,  Albert Webson* ,  Colin Raffel* , and 37 more authors
    ICLR , 2022 (Spotlight Presentation)

2020

  1. Semi-supervised deep embedded clustering with anomaly detection for semantic frame induction
    Zheng-Xin Yong ,  and  Tiago Timponi Torrent
    LREC , 2020