Open-source LLMs catch up to private models
What a week for the open-source AI community! There were major announcements from Meta and Mistral. Meta released Llama 3.1 — a family of language models of various sizes — and according to Meta, the largest model is competitive with the state-of-the-art private models like GPT4 and Anthropic's Claude 3.5 Sonnet. Dunking on OpenAI further, Meta also shared a 92-page research report on Llama which includes a healthy amount of detail on the development process. As for Mistral, the French startup's newest model, Mistral Large 2, was highlighted to excel at mathematical reasoning and coding generation nearing performances of the much larger private models and outperforming similarly-sized Llama 3.1 on some relevant benchmarks.
Open-source is boon for developers and Meta
With Llama in particular, there is now a viable alternative to the private models. With similar performance at a fraction of the costs (perhaps even 5-10x cheaper), there's a real consideration to developing with Llama. What's more, Meta relaxed the terms of their licence to allow developers to use the outputs from the Llama models to improve other models. It's possible we'll see these foundation models employed more extensively to train smaller, niche models that perform better on specific use cases and can be substantially more cost-effective to run.
This is also a great move for Meta. Meta has built a pipeline to drink from the oil wells of these private organisations with few consequences to their business, as Mark Zuckerberg pointed out in his corresponding letter
That means openly releasing Llama doesn’t undercut our revenue, sustainability, or ability to invest in research like it does for closed providers.
Meta is also standing as the champion for open-source contributing significant technology to the community including the popular web-framework React, widely-used machine learning framework Pytorch and now this. It's fantastic for the company's image with developers and will undoubtedly be a competitive advantage for recruitment. Not only that, the choice to open-source can accelerate the development of a technology. We saw this play out in the battle of the machine learning frameworks, Pytorch vs. Tensorflow, which Meta has hands-down won. Googlers find themselves stuck on a unfavourable technology and the company is likely spending significant chunks of money continuing to maintain it.
Zuckerberg is also channeling his inner developer and he clearly understands the power they yield. He's also desperate to avoid a repeat of the smartphone era, where Apple's stronghold on the platform imposed serious constraints on application companies.
One of my formative experiences has been building our services constrained by what Apple will let us build on their platforms. Between the way they tax developers, the arbitrary rules they apply, and all the product innovations they block from shipping, it’s clear that Meta and many other companies would be freed up to build much better services for people if we could build the best versions of our products and competitors were not able to constrain what we could build. On a philosophical level, this is a major reason why I believe so strongly in building open ecosystems in AI and AR/VR for the next generation of computing.
Zuckerberg wants to ensure the next paradigm of computing is open. And at the moment, for Meta anyway, they continue to reap the majority of the rewards from widespread adoption and community innovation.
Should we continue to open source AI models?
With this release there will be renewed discussions about the risks of open-sourcing technology as powerful as this. There's the potential for individuals and larger actors (the West's eyes are firmly fixed on China) to cause widespread calamity and in the extreme case, trigger an existential threat — although let's be clear, I don't think this release has any chance of that.
While there are undeniable risks, I side with Zuck. Firstly I believe the benefits to society outweigh the risks but secondly, I believe that a unified community educated and familiar with this technology will stand a better chance of defending against future dangers than the alternative of placing our hopes in the custody of the tech giants. Especially important will be empowering the academic community to work on deepening humanity's understanding of these systems and the associated risks.
John
If you found this valuable, please like and share. And, leave a comment. I’m especially interested in your thoughts and opinions on the content.
At a Glance
Other interesting things that happened this week in the world of AI.
Deepmind developed two new models that demonstrated high-levels of mathematical reasoning on the International Mathematical Olympiad challenges almost reaching gold-level standards.
OpenAI previewed a AI search engine to fan the flames on the steadily heating search industry.
What caught your attention this week? Leave a comment and let me know!