Few Things I Observed after Working in Hyped AI Research Areas

09 Oct, 2024

It’s year 2024–––only three years since LLMs become (somewhat) general-purpose and can follow instructions. Who would have guessed how much AI has shaken the whole world to the point that Nobel Prize in Physics was awarded for “laying the foundation for today’s powerful machine learning”, and Nobel Prize in Chemistry was awarded for “using AI to predict proteins’ complex structures”?

It is kind of surreal for me being part of this revolution, especially how I started my NLP research in computational linguistics (in particular frame semantics), then had the opportunity to work on the OG work on instruction-following, and now doing research in AI safety. In my past three years of PhD research, I have worked on both niche/established topics (such as low-resource NLP, SEA language technology, and code-switching)¹ and hyped topics. In this post, I want to share some of my observations.

Personal Background (Feel free to skip this)

Before I started my PhD, I was dabbling with using machine learning to improve frame semantics. I was not really satisfied with my work, because in the back of my mind, I couldn’t see how my work that leads to linguistic-driven solutions² can scale up and solve any NLP problems in any language.³ Note that linguistic tasks also often don’t come with enough data, so training models (such as Autoencoders and GAT) does not give any significant performance gains–––in today’s terms, they don’t pass the vibe check.

Anyone who transitioned from computational linguistics into LLMs scaling during the time (around year 2021) must have come across the Stochastic Parrot paper and the hype around linguistic-driven language modeling. As an undergrad fresh out of college and joining PhD, who was deciding the research direction to spend the next 5 years on, I was really confused on whether I should work with LLMs. Many linguists were saying that “language models are simply regurgitating what they are trained on”, that “next-token prediction cannot learn the language (and therefore cannot do incredible things like humans do)”.

I want to be clear: I am really grateful to have started my NLP research on semantics and working with linguists like Prof. Tiago Torrent and Dr. Collin Baker (who had supported me so much and taught me a lot on the qualities of being a good NLP researcher.) It was the field, which antagonized scaling and LLMs, that had done more harms than goods to how I viewed AI when I started my PhD.⁴ I guess that’s why it is called the Bitter Lesson.

Ultimately, I decided to lean fully into LLMs and believe the magic of scaling. My decision eventually comes down to “if AlexNet can replace hand-engineered kernels, large language models can do the same”. You might think it is a no-brainer from today’s point of view⁵, but at that time, I was really conflicted when I made this conscious decision. And I am forever grateful for being supervised by my advisor Steve who came from pure machine learning background.

Observation 1: You will have to work on hyped stuff anyway, if the hype is in the right direction.

What is hyped stuff? Those stuff that you know can be easily classified as one of the viral topics on X that people cannot stop talking about. The cornerstone paper on the topic had blown up and gained thousands of citations in less than a year (think about T0/Flan for instruction-following, GPT-3 for in-context learning, chain-of-thought for reasoning, etc.)

Voices about on hyped stuff are often loud for a period of time, and things move really fast and are competitive. Hyped stuff usually catches attention of people who have never worked on it directly before, and yet they turned into huge proponents for it in academic social circle after a short while.

LLMs were hyped stuff in 2021 and are commodity now. Everyone now works with LLMs to some degree, regardless of how much they believed in LLMs in pre-ChatGPT era. I’d say now there are two groups of people: (1) those who are excited to work on it (usually early adopters or (2) those who begrudgingly have to work on it (because of reviewers #2 and where funding comes from).

This is my bitter observation. If the hype train is right, you will have to directly face the monster you despise (regardless of your ego) in one way or the other if you want to stay relevant in AI research. So my takeaway is to study the trend, think two steps ahead, and iterate fast (Omar Khattab has really good advice on this) in the direction of hype train (that you think is the right direction).

Observation 2: Hype train is never crowded. There’s always room for more research.

One common argument against working on hyped stuff is that it is very easy to get scooped. Yes that is inevitable. My multilingual toxicity work with Cohere For AI got scooped by Microsoft and AllenAi–––they released their work around the same time, and we were only halfway to completion so we had to halt our project entirely.

But I think people who complain about being scooped and hence shy away from hyped stuff miss the big picture. Hyped stuff is nascent by nature. There are exponentially more things remain unexplored and you can always find another research problem to pivot into.

In my case, we stopped the project on collecting multilingual prompts that elicit toxicity, and started thinking about what to work on. But the project had enabled me to fully understand what problems haven’t been studied and what resources are available to research on, so I quickly bang out one paper with Jacob (who is interested in the intersection of mechanistic interpretability and safety) under one month on why cross-lingual transfer can happen in LLM detoxification via preference tuning.

Observation 3: You cannot really predict whether a low-hanging fruit is really low-hanging.

In my opinion, in a well-studied area (usually opposite of hyped stuff), low-hanging fruit has capped value because high-value research problems have mostly been figured out. But in hyped area where many things are unexplored–––many people called the obvious next steps as low-hanging fruits–––there’s really no low-hanging fruit because doing those grunt work and promoting your findings in reality allow people to immediately explore the next bigger thing.

An analogy here is that you are being on an literally unchartered territory. Any direction that is well-motivated and well-reasoned, even if it seems to be the obvious low-hanging fruit to work on, is intrinsically valuable because now others know where to steer into. The most important aspect here is to move fast and communicate well.

I will use myself as an example. Multilingual jailbreaks are in my opinion low-hanging fruits–––somebody eventually gotta investigate⁶ if these safety guardrails hold up in non-English settings because more than half of the global population speaks languages aside English. I simply translated malicious prompts with Google Translate into non-English and we evaluated if GPT-4’s safeguards were robust. The method was so stupidly simple, and yet the implications for global AI users were significant because the safeguards are not robust across languages.

My biggest takeaway here is that do the obvious next steps (note that what’s obvious to you is not necessarily obvious to the others) and invest time in communicating the significance of the work⁷. Make sure the people who will build on your work see your work, so the field can move to the next (and hopefully less obvious) research questions.⁸

Observation 4: You might come across naysayers who put you down (to the point you feel personal.)

Sometimes you might come across people who make you feel like what you work on is meaningless. I personally came across one highly-regarded researcher at NeurIPS 2023 who told me AI safety redteaming is nonsense work⁹ when I asked for research career advice. On that day, I felt like shit, and it was really hard not to take it personally, especially because that conversation took place in front of a big group of people.

I’ve moved on since then because I’ve gotten enough positive feedback that what I work on is useful and more people/industry labs are investing in AI safety. I know, I should not let people’s opinions get to me, but hey I’m still a human slash a PhD student :). My biggest takeaway is that when you work on hyped stuff, you have significantly higher chances in encountering negative feedback and strong pushbacks because working on hyped stuff requires you to be both an early adopter and an active promoter of your work so as to feed the hype.

You encounter naysayers because you work in an extremely visible area. People will form extreme opinions and point it out to you (sometimes impolitely) because (1) either you are objectively wrong (well, not all hyped stuff will work) or (2) they didn’t see the same value you see in your pursuits. Often, you see this manifests as conversations spiraling into drama on social media. In either case, you need to develop mental fortitude and remember that nobody truly knows what works or fundamentally valuable (or we would have scaled language models much earlier).

Sometimes, people just don’t get it no matter how hard you try explaining the significance of your research direction. It is okay. AI research is to a great extent empirical in nature, so let results do the talk.

Don’t take things too personally. If you happen to hop on the wrong hype train, learn from your mistakes and see where your reasoning went wrong. If you are right, figure out where the naysayers came from and learn from their mistakes.

I am putting things into binary category here to make my point, even though it’s arguable that all these topics are not really niche. They are just not catalyzing any foreseeable paradigm shifts in how AI does things. ↩︎
I refer to “linguistic-driven” as in manually baking in linguistic rules (aka injecting explicit inductive bias) into language models. ↩︎
I firmly believe that for AI to be truly useful–––or to achieve AGI/ASI–––it shouldn’t have any language barrier. Advancement in technology in a small subset of langauge leads to socioeconomic divide at a global scale. ↩︎
Allow me to indulge in my what-ifs: If it were the opposite, I would have seen myself engaging with literature around data/model scaling much earlier. ↩︎
Felix’s notes: “If it’s possible for a human domain expert to discover from the data the basis of a useful inductive bias, it should be obvious for your model to learn about it too, so no need for that inductive bias.” ↩︎
During the same period, there were three other papers released and submitted to ICLR 2024 about the same multilingual jailbreaks [1, 2, 3] ↩︎
In my case, huge thanks to Cristina and Steve who helped sharpen the findings for the low-resource language jailbreak paper. ↩︎
(Footnote written on Oct 11th, 2024) I am not being crazy here. Just found out today that Terrence Tao made the same comments about what one should do (although with different motivation): “spend most of your time on more feasible “low-hanging fruit”. ↩︎
That’s what I recall from my fuzzy memory. Probably not verbatim but definitely along the line. ↩︎

#Hot-Takes #Phd