Alibaba staffer offers a glimpse into building LLMs in China

February 27, 2024 ndowd

Chinese tech companies are gathering all sorts of resources and talent to narrow their gap with OpenAI, and experiences for researchers on both sides of the Pacific Ocean can be surprisingly similar. A recent X post from an Alibaba researcher offers a rare glimpse into the life of developing Large Language Models at the e-commerce firm, which is amongst a raft of Chinese internet giants striving to match the capabilities of ChatGPT.

Binyuan Hui, a natural language processing researcher at Alibaba’s large language model team Qwen, shared his daily schedule on X, mirroring a post by OpenAI researcher Jason Wei that went viral recently.

The parallel glimpse into their typical day reveals striking similarities, with wake-up times at 9 a.m. and bedtime around 1 a.m. Both start the day with meetings, followed by a period of coding, model training and brainstorming with colleagues. Even after getting home, they continue to run experiments at night and ponder on ways to enhance their models well into bedtime.

The notable differences are in how they choose to characterize leisure time. Hui, the Alibaba employee, mentioned reading research papers and browsing X to catch up on “what is happening in the world.” And as a commentator pointed out, Hui doesn’t have a glass of wine after he arrives home like Wei does.

This intense work regime is not unusual in China’s current LLM space, where tech talent with top university degrees are joining tech companies in droves to build competitive AI models.

To a certain extent, Hui’s demanding schedule seems to reflect a personal drive to match (or at least the social media appearance of doing so), if not outpace, Silicon Valley companies in the AI space. It seems different from the involuntary “996” work hours associated with more “traditional” types of Chinese internet businesses that involve heavy operations, such as video games and e-commerce.

My typical day as a Member of Technical Staff at Qwen (Just for myself):
[9:00am] Wake up, might stay in bed for an extra 15 mins.
[9:30am] Taking a cab to work, browsing X to catch up on what’s happening in the world, checking out @_jasonwei ‘s latest post.
[10:00am] Work… https://t.co/7o47EQrWcW

— Binyuan Hui (@huybery) February 21, 2024

Indeed, even renowned AI investor and computer scientist Kai-Fu Lee puts in an incredible amount of effort. When I interviewed Lee about his newly minted LLM unicorn 01.AI in November, he admitted that late hours were the norm, but employees were willingly working hard. That day, one of his staff messaged him at 2:15 a.m. to express his excitement about being part of 01.AI’s mission.

Outward displays of intense work ethics speak to the urgency of the remits laid out by tech firms in the country, and subsequently the speed with which those firms are now rolling out LLMs.

Qwen, for example, has open sourced a series of foundation models trained with both English and Chinese data. The number of parameters — a figure that speaks to the knowledge the model gains from historical training data that defines its ability to generate contextually relevant responses — is 72 billion for the largest of these. (For some context, GPT3 from OpenAI is believed to have 175 billion; GPT4, its latest LLM, has 1.7 trillion. However, it’s arguable that the aim of a particular LLM will be the more important key to decoding the value of high parameter numbers.)

The team also has been quick to introduce commercial applications. Last April, Alibaba began integrating Qwen into its enterprise communication platform Dingtalk and online retailer Tmall.

No definite leader has emerged in China’s LLM space so far, and venture capital firms and corporate investors are spreading their bets across multiple contenders. Besides building its own LLM in-house, Alibaba has been aggressively investing in startups such as Moonshot AI, Zhipu AI, Baichuan and 01.AI.

Facing competition, Alibaba has been trying to carve out a niche, and its multilingual move could become a selling point. In December, the company released an LLM for several Southeast Asian languages. Called SeaLLM, the model is capable of processing information in Vietnamese, Indonesian, Thai, Malay, Khmer, Lao, Tagalog and Burmese. Through its cloud computing business and acquisition of ecommerce platform Lazada, Alibaba has established a sizable footprint in the region and can potentially introduce SeaLLM to these services down the road.

source