No More Hustleporn: Scale AI CEO Alexandr Wang on how AGI will be achieved incrementally

We pulled out the highlights from Scale AI CEO Alexandr Wang's recent interview on the No Priors podcast. Transcription and light editing by Anthropic's Claude, curation by Yiren Lu 😄

Special request: This newsletter is a hobby for me and I am trying to gauge long-term interest/increase readership. If you enjoy reading No More Hustleporn and would like it to continue, please share with a friend!

Highlights

Alexandr Wang: There's this very large mandate, I think, for our industry to actually figure out what is the means of production by which we're actually going to be able to generate and produce more tokens to fuel the future of this industry. And I think there's a few sources or there's a few answers to this.
Alexandr Wang: So the first is we need the best and brightest minds in the world to be contributing data. I think it's one of the things I think is actually quite interesting about this technology is very smart humans. So PhDs or doctors or lawyers or experts in all these various fields actually can have an extremely high impact into the future of this technology by producing data that ultimately feeds into the algorithms.

Alexandr Wang: I think philosophically, the question is not, is a model better than a human unassisted from a model? The question is, is a human plus a model together going to be able to produce better output than a model alone? I think that'll be the case for a very, very long time, that humans are still, you know, human intelligence is complementary to machine intelligence that we're building and they're going to be able to combine to do things that are strictly better than what the models are going to be able to do on their own.

Alexandr Wang: I think it's never, because I think that the key quality of human intelligence or biological intelligence is this ability to reason and optimize over very long time horizons. This is biological, because our goals as biological entities is to optimize over our lifetimes, optimize for reproduction, et cetera. We have the ability, as human intelligence is, to produce long term goals, continue optimizing, adjusting and reasoning over very long time horizons.
Alexandr Wang: Current models don't have this capability because the models are trained on these little nuggets of human intelligence. They're very good at almost a shot glass full of human intelligence, but they're very bad at continuing that intelligence over a long time period or a long time horizon. This fundamental quality of biological intelligence is something that will only be taught to the model over time through a direct transfer via data to fuel these models.

Alexandr Wang: My biggest belief here is that the path to AGI is one that looks a lot more like curing cancer than developing a vaccine. And what I mean by that is, I think that the path to build AGI is going to be in. You're going to have to solve a bunch of small problems where you don't get that much positive leverage between, um, solving one problem to solving the next problem. And there's just sort of, you know, it's like curing cancer, which is you have to then zoom in to each individual cancer and solve them independently.
And eventually, over a multi decade timeframe, we're going to look back and realize that we've, we've, you know, built AGI, we've cured cancer. But the, the path to get there will be this, like, you know, quite plodding road of solving individual capabilities and building individual sort of data flywheels to support this end mission. Whereas I think a lot of people in industry paint the path to AGI as, like, eventually we'll just, boop, we'll get there, we'll solve it in one fell swoop. And I think this has a lot of implications for how you actually think about the technology arc and how society is going to have to deal with it. I think it's actually a pretty bullish case for society adapting the technology, because I think it's going to be consistent, slow progress for quite some time, and society will have time to fully sort of acclimate to the technology that develops.

Full Transcript

Sarah Guo: Hi, listeners, and welcome to Know Priors. Today, I'm excited to welcome Alex Wang, who started Scale AI as a 19 year old college dropout. Scale has since become a juggernaut in the AI industry. Modern AI is powered by three pillars: compute, data and algorithms. While research labs are working on algorithms and AI chip companies are working on the compute pillar, Scale is the data foundry serving almost every major LLM effort, including OpenAI, Meta, and Microsoft.

Sarah Guo: This is a really special episode for me, given Alex started Scale in my house in 2016, and the company has come so far. Alex, welcome. I'm so happy to be talking to you today.

Alexandr Wang: Thanks for having me. Known you at gol for quite some time. So excited to be on the pod.

Sarah Guo: Why don't we start at the beginning? Just for a broader audience, talk a little bit about the founding story of Scale.

Alexandr Wang: Right before Scale, I was studying AI and machine learning at MIT, and this was the year when DeepMind came out with AlphaGo, where Google released Tensorflow. So sort of maybe the beginning of the deep learning hype wave, or hype cycle. And I remember I was at college, I was trying to use neural networks. I was trying to train image recognition, neural networks. And the thing I realized very quickly is that these models were very much so, just a product of their data.

Alexandr Wang: And I sort of played this forward and thought through it. And these models, or AI in general, is the product of three fundamental pillars. There's the algorithms, the compute, and the computation power that goes into them and the data. At that time, it was clear there were companies working on the algorithms, labs like OpenAI or Google's labs, you know, a number of AI research efforts. There were. Nvidia was already a very clear leader in building compute for these AI systems, but there was nobody focused on data.

Alexandr Wang: And it was really clear that over the long arc of this technology, data was only going to become more and more important. And so in 2016, dropped out of MIT, did YC, and really started Scale to solve the data pillar of the AI ecosystem, and be the organization that was going to solve all the hard problems associated with how do you actually produce and create enough data to fuel this ecosystem? And really, this was the start of Scale as the data foundry for AI.

Sarah Guo: It's incredible foresight, because you describe it as the beginning of the deep learning hype cycle. I don't think most people notice that a hype cycle was yet going on. And so I just distinctly remember you working through a number of early use cases, building this company in my house at the time and discovering, I think, far before anybody else noticed that the AV companies were spending all of their money on data. How did you think about, like, talk a little bit about how the business has evolved since then? Because it's certainly not just that use case today.

Alexandr Wang: AI is an interesting technology because it is, at the core mathematical level, such a general purpose technology. It's basically functions that can approximate nearly any function, including intelligence. And so it can be applied in a very wide breadth of use cases. And I think one of the challenges in building in AI over the past, we've been at it for eight years now, has really been what are the applications that are gaining traction, and how do you build the right infrastructure to fuel those applications?

Alexandr Wang: So, as an infrastructure provider, we provide the data foundry for all these AI applications. Our burden is to be thinking ahead as to where are the breakthrough use cases in AI going to be, and how do we basically lay down the tracks before the freight train of AI comes rolling through.

Alexandr Wang: When we got started in 2016, this was the very beginning of the autonomous vehicle sort of cycle. It was, I think, right when we were doing YC was when Cruise got acquired, and it was sort of the beginning of the, sort of the wave of autonomous driving being one of the key tech trends. And I think that we followed the early startup advice you have to focus early on as a company. And so we built the very first data engine that supported sensorfused data, so support a combination of 2D data plus 3D data, so lidars plus cameras that were built onto the vehicles.

Alexandr Wang: And then that very quickly became an industry standard across all the players, working with folks like General Motors and Toyota and Stellantis and many others in the first few years of the company were just focused on autonomous driving and a handful of other robotics use cases. But that was sort of the prime time AI use case.

Alexandr Wang: And then starting in about 2019, 2020, um, it was an interesting moment where, uh, it was actually pretty unclear where the future of, you know, AI use cases, where AI applications were going to come. And this is obviously pre language model, pre generative AI. And. And it was, uh, a period of high uncertainty. So we, uh, we then started focusing on government applications. That was one of the areas where it was clear that there was high applicability, um, and it was one of the areas that was becoming more and more important globally.

Alexandr Wang: So we built the very first data engines to support government data. This was support mostly geospatial and satellite and other overhead imagery. This ended up fueling the first AI program of record for the USDOD and was sort of the start of our government business. And that technology ended up being critical years later in the Ukraine conflict.

Alexandr Wang: And then also around that time was when we started working on generative AI. So we partnered with OpenAI at that time to do the very first experiments on RLHF on top of GPT-2 this was like the primordial days of RLHF. And the models back then were really rudimentary. Like, they didn't. They truly, it did not seem like anything to us, but we were just like OpenAI, they're a bunch of smart people. We should work with them, we should partner with them.

Alexandr Wang: And so we partnered with the team that originally invented RLHF, and then we basically continued innovating with them from 2019 onwards. But we didn't think that much about the underlying technological trend. They integrate all of this technology into GPT-3 there was a paper instruct GPT, which is the precursor to ChatGPT, that we worked with them on.

Alexandr Wang: And then ultimately, in 2022, Dolly two and ChatGPT rolled around, and we ended up focusing a lot of our effort as a company into how do we fuel the data for gender bi? How do we be the data foundry for gender BI? And today, fast forward to today. Our data foundry fuels basically every major large language model in the industry. Work with OpenAI, Meta, Microsoft, many of the other players partner with them very closely in fueling their AI development.

Alexandr Wang: And in that timeframe, the ambitions of AI have just totally exploded. I mean, we've gone from GPT-3 I think it was a landmark model, but there was a modesty to GPT-3 at the time, and now we're looking at building agents and very complex reasoning capabilities. Multimodality. Multilinguality. The infrastructure that we have to build to support all the directions that developers want to take. This technology has been really staggering and quite incredible.

Elad Gil: Yeah, you've basically surfed multiple waves of AI, and one of the big shifts that's happening right now is there's other types of parties that are starting to engage with this technology. So you're obviously now working with a lot of the technology giants, with government, with automotive companies. It seems like there's emergence now of enterprise customers and a platform for that. There's emergence of sovereign AI. How are you engaging with these other massive use cases that are coming now on the generative AI side, it's quite.

Alexandr Wang: An exciting time, because I think for the first time in maybe the entire history of AI, AI truly feels like a general purpose technology, which can be applied in a very large number of business use cases. I contrast this to the autonomous vehicle era, where it really felt like we were building a very specific use case that happened to be very, very valuable. Now, its general purpose can be encompassed across the broad span.

Alexandr Wang: And as we think about what are the infrastructure requirements to support this broad industry, and what is the broad arc of the technology, it's really one where we think, how do we empower data abundance? There's this question that comes up a lot. Are we going to run out of tokens? And what happens when we do? And I think that that's a choice. I think we as an industry can either choose data abundance or data scarcity, and we view our role and our job in the ecosystem to build data abundance.

Alexandr Wang: The key to the scaling of these large language models, and these language models in general, is the ability to scale data. And I think that one of the fundamental bottlenecks to what's in the way of us getting from GPT four to GPT ten is data abundance. Are we going to have the data to actually get there? Our goal is how do we ensure that we have enough tokens to do that?

Alexandr Wang: And we've sort of, as a community, we've had easy data, which is all the data on the Internet, and we've kind of exhausted all the easy data. And now it's about forward data production that has high supervisory signal that is basically very valuable. And we think about this as frontier data production and the kinds of data that are really relevant and valuable to the models today.
Alexandr Wang: The quality requirements have just increased dramatically. It's not any more the case that these models can learn that much more from various comments on Reddit or whatnot. They need truly frontier data. And what does this look like? This is reasoning chain of thoughts from the world's experts, or from mathematicians, or physicists, or biologists, or chemists, or lawyers or doctors. This is agent workflow data of agents in enterprise use cases, or in consumer use cases, or even coding agents and other agents like that.
Alexandr Wang: This is multilingual data. So data that encompasses the full span of the many, many languages that are spoken in the world. This includes all the multimodal data, to your point, like, how do we integrate video data, audio data? Start including more of the esoteric data types that exist within enterprises and exists within a lot of industrial use cases into these models.
Alexandr Wang: There's this very large mandate, I think, for our industry to actually figure out what is the means of production by which we're actually going to be able to generate and produce more tokens to fuel the future of this industry. And I think there's a few sources or there's a few answers to this.
Alexandr Wang: So the first is we need the best and brightest minds in the world to be contributing data. I think it's one of the things I think is actually quite interesting about this technology is very smart humans. So PhDs or doctors or lawyers or experts in all these various fields actually can have an extremely high impact into the future of this technology by producing data that ultimately feeds into the algorithms.
Alexandr Wang: If you think about it, it's actually their work is one of the ways that they can have a very scaled society level impact. You know, there's, there's an argument that you can make that, um, uh, producing high quality data for AI systems is, is near infinite impact, because, you know, even if you improve the model just a little bit, if you were to integrate that over all of the future invocations of that model, that's like a ridiculous amount of impact. So I think that's something that's quite exciting.

Elad Gil: It's kind of interesting because Google's original mission was to organize a world's information and make it universally accessible and useful. And, um, they would go and they would, um, scan in books, right, from library archives, and they were trying to find different ways to collect all the world's information. And effectively that's what you folks are doing or helping others do. You're effectively saying, where is all the expert knowledge, and how do we translate that into data that can then be used by machines so that people can ultimately use that information? And that's super exciting.

Alexandr Wang: It's exciting to the contributors who are in our network as well, because I think there's obviously a monetary component and they're excited to do this work. There's a, there's a very meaningful motivation, which is how do I leverage my expert knowledge and expert insight and use that to fuel this entire AI movement, which I think is, um, is like a deep, you know, that's kind of like the deepest scientific motivation, which is how do I use my knowledge and capability and intelligence to fuel humanity and progress and knowledge, um, going into the.

Sarah Guo: I think the somewhat undervalued thing is that there was a decade or so where the biggest thing happening in technology was digitization of different processes. There's actually some belief that that's happened - interactions are digital and information is captured in relational database systems on customers and employees or whatever.
Sarah Guo: But one of the big discoveries as an investor in this field over the last five years has been that the data is not actually captured for almost any use case you might imagine for AI. The first six months of many companies is a question of where are we going to get this data? You go to many of the incumbent software and services vendors, and despite having done this task for years, they have not actually captured the information you'd want to teach a model. That knowledge capture era happening at scale is really important.

Alexandr Wang: To make a Dune analogy, I think data production is very similar to spice production. It will be the lifeblood of all the future of these AI systems. Best and brightest people is one key source. Proprietary data is definitely a very important source as well.

Alexandr Wang: JPMorgan's proprietary dataset is 150 petabytes of data. GPT-4 is trained on less than one petabyte of data. So there's clearly so much data that exists within enterprises and governments that is proprietary data that can be used for training incredibly powerful AI systems.

Alexandr Wang: And then I think there's this key question of what's the future of synthetic data, and how synthetic data needs to emerge. Our perspective is that the critical thing is what we call hybrid human-AI synthetic data. How can you build hybrid human-AI systems such that AI are doing a lot of the heavy lifting, but human experts and people, the smartest people, the best at reasoning, can contribute all of their insight and capability to ensure that you produce data that's of extremely high quality, of high fidelity, to ultimately fuel the future of these models.

Sarah Guo: I want to pull this thread a little bit, because something you and I were talking about, both in the context of data collection and evals, is what do you do when the models are actually quite good, better than humans on many measured dimensions? Can you talk about that from both the data and evaluation perspectives?

Alexandr Wang: I think philosophically, the question is not, is a model better than a human unassisted from a model? The question is, is a human plus a model together going to be able to produce better output than a model alone? I think that'll be the case for a very, very long time, that humans are still, you know, human intelligence is complementary to machine intelligence that we're building and they're going to be able to combine to do things that are strictly better than what the models are going to be able to do on their own.

Sarah Guo: I have this optimism. Elad and I had a debate at one point that was challenging for me philosophically about whether or not Centaur play or machine and human intelligence were complementary.

Alexandr Wang: My simple case for this is when we look at the machine intelligence, the models that are produced, we always see things that are really weird. There's the rot13 versus rot8 thing, for example, where the models know how to do rot13, they don't know how to do rot8. There's the reversal curse. There's all these artifacts that indicate somehow that it is not like human intelligence or not like biological intelligence.

Alexandr Wang: I think that's the bold case for humanity, which is that there are certain qualities and attributes of human intelligence which are somehow distinct from the very separate and very different process by which we're training these algorithms. In practice, if a model produces an answer or response, how can a human critique that response to improve it? How can a human expert highlight where there's factuality errors or where there's reasoning errors to improve the quality of it? How can the human aid in guiding the model over a long period of time to produce reasoning chains that are very correct and deep and are able to drive the capability of these models forward?

Alexandr Wang: This is what we spend all of our time thinking about. What is the human expert plus model teaming that's going to help us keep pushing the boundary of what the models are capable of doing.

Elad Gil: How long do you think human expertise continues to play a role in that? If I look at certain models, med palm two would be a good example where Google released a model where they showed that the model output was better than the average physician. You could still get better output from a cardiologist, but if you just asked a GP a cardiology question, the model would do better as ranked by physician experts. It showed that already, for certain types of capabilities, the model provided better insights or output than people who were trained to do some aspects of that. How far do you think that goes in terms of, or when do you think human expertise no longer is additive to these models? Is that never? Is it three years from now?

Alexandr Wang: I think it's never, because I think that the key quality of human intelligence or biological intelligence is this ability to reason and optimize over very long time horizons. This is biological, because our goals as biological entities is to optimize over our lifetimes, optimize for reproduction, et cetera. We have the ability, as human intelligence is, to produce long term goals, continue optimizing, adjusting and reasoning over very long time horizons.
Alexandr Wang: Current models don't have this capability because the models are trained on these little nuggets of human intelligence. They're very good at almost a shot glass full of human intelligence, but they're very bad at continuing that intelligence over a long time period or a long time horizon. This fundamental quality of biological intelligence is something that will only be taught to the model over time through a direct transfer via data to fuel these models.

Sarah Guo: You don't think there's an architectural breakthrough in planning that solves it?

Alexandr Wang: I think there will be architectural breakthroughs that improve performance dramatically. But if you think about it inherently, these models are not trained to optimize over long time horizons in any way, and we don't have the environments to be able to get them to optimize for these amorphous goals over long time horizons. So I think this is a somewhat fundamental limitation.

Sarah Guo: Before we talk about some of the cool releases you guys have coming out and what's next for Scale, maybe we can zoom out and just congratulate you on the fundraise that you guys just did. A billion dollars at almost 14 billion in valuation with really interesting investors like AMD, Cisco, Meta. I want to hear a little bit about the strategics.

Alexandr Wang: Our mission is to serve the entire AI ecosystem and the broader AI industry. We're an infrastructure provider. That's our role is to be as much as possible, supporting the entire industry to flourish as much as possible.

And we thought an important part of that was how can we be an important part of the ecosystem and build as much ecosystem around this data foundry, which is going to fuel the future of the industry as much as possible? Which is one of the reasons why we wanted to bring along other infrastructure providers like Intel and AMD and folks who are also laying the groundwork for the future of the technology, but also key players in the industry like Meta, folks like Cisco as well.

Our view is that ultimately there's the stack that we think about, there's the infrastructure, there's the technology, and there's the application. And our goal as much as possible is how do we leverage this data capability, this data foundry, to empower every layer of that stack as much as possible and build a broader industry viewpoint around what's needed for the future of data.

I think that this is an exciting moment for us. We see our role going back to the framing of what's holding us back from GPT-10, what's in the way from GPT-4 to GPT-10. We want to be investing into actually enabling that pretty incredible technology journey. And there's tens of billions, maybe hundreds of billions of dollars investment going into the compute side of this equation. And one of the reasons why we thought it was important to raise the money and continue investing is, you know, there's real investment that's going to have to be made into the data production to actually get us there.

Sarah Guo: With great power comes great responsibility. If these AI systems are what we think they are in terms of societal impact, trust in those systems is a crucial question. How do you guys think about this as part of your work at Scale?

Alexandr Wang: A lot of what we think about is how do we utilize, how does the data foundry enhance the entire AI lifecycle? That lifecycle goes from ensuring that there's data abundance as well as data quality going into the systems, but also being able to measure the AI systems, which builds confidence in AI and also enables for further development and further adoption of the technology.

This is the fundamental loop that I think every AI company goes through. They get a bunch of data, or they generate a bunch of data, they train their models, they evaluate those systems, and they sort of go again in the loop. And so evaluation and measurement of the AI systems is a critical component of the lifecycle, but also a critical component, I think, of society being able to build trust in these systems.

How are governments going to know that these AI systems are safe and secure and fit for broader adoption within their countries? How are enterprises going to know that when they deploy an AI agent or an AI system, that it's actually going to be good for the consumers and that's not going to create greater risk for them? How are labs gonna be able to consistently measure the intelligences of the AI systems that we build, and how are we gonna make sure they continue to develop responsibly as a result?

Sarah Guo: Can you give our listeners a little bit of intuition for what makes evals hard?

Alexandr Wang: One of the hard things is that because we're building systems that we're trying to approximate and build human intelligence, grading one of these AI systems is not something that's very easy to do automatically. It's sort of like you have to kind of build IQ tests for these models, which in and of itself is a very fraught philosophical question. It's like, how do you measure the intelligence of a system?

And there's very practical problems as well. Most of the benchmarks that we as a community look at, the academic benchmarks that are what the industry used to measure the performance of these algorithms, are fraught with issues. Many of the models are overfit on these benchmarks. They're sort of in the training datasets of these models.

Sarah Guo: And so you guys just did some interesting research here. Published some.

Alexandr Wang: Yes. So one of the things we did is we published GSM-1K, which was a held out eval. We basically produced a new evaluation of the math capabilities of models that there's no way would ever exist in the training data set to really see how much of how were the performance of the models, what were the reported performance of the model capability versus the actual capability.

And what you notice is some of the models performed really well, but some of them performed much worse than the reported performance. And so this whole question of how we as a society are actually going to measure these models is a really tough one.

And our answer is we have to leverage the same human experts and kind of the best and brightest minds to do expert evaluations on top of these models to understand where are they powerful, where are they weak, and what are the sort of risks associated with these models.

So one of the things that we're very passionate about is there needs to be sort of public visibility and transparency into the performance of these models. There need to be leaderboards, there need to be evaluations that are public, that demonstrate in a very rigorous scientific way what the performance of these models are.

And then we need to build the platforms and capabilities for governments, enterprises, labs to be able to do constant evaluation on top of these models to ensure that we're always developing the technology in a safe way, and we're always deploying it in a safe way.

So this is something that we think is just in the same way that our role as an infrastructure provider is to support the data needs for the entire ecosystem. We think that building this layer of confidence in the systems through accurate measurement is going to be fundamental to the further adoption and further development of the technology.

Sarah Guo: You want to talk about the state of AI at the application layer, because you have a viewpoint into that that very few people do.

Alexandr Wang: After GPT-4 launched, there was this frenzy of application build out. There were all these agent companies, all this excitement around agents, and a lot of applications that were built out.

I actually think it's an interesting moment in the lifecycle of AI. GPT-4, as a model, was a little early of a technology for us to have this entire hype wave around. The community very quickly discovered all the limitations of GPT-4. But we all know GPT-4 is not the terminal model that we are going to be using. There are better models on the way.

In the coming models, we're going to see this trough of disillusionment that we're going to come out of, because the future models are going to be so much more powerful. You're actually going to have all of the fundamental capabilities you need to build agents or all sorts of incredible things on top of it.

We are very passionate about how we empower application builders, whether that be enterprises, governments, or startups, to build self-improvement into the applications that they build. What we see from the large labs like OpenAI and others is that self-improvement comes from data flywheels. It's about how you have a flywheel by which you're constantly getting new data that improves your model, you're constantly evaluating that system to understand where there are weaknesses, and you're continually hydrating this workflow.

We think that fundamentally every enterprise, government, or startup is going to need to build applications that have this self-improvement loop and cycle, and it's very hard to build. So we built our Genei platform to lay the groundwork and the platform to enable the entire ecosystem to be able to build these self-improvement loops into their products as well as possible.

Elad Gil: I was just curious. One thing related to that, as you mentioned, is that JPMorgan has 150 petabytes of data. That's 150 times what some early GPT models trained on. How do you work with enterprises around those loops? What are the types of customer needs that you're seeing right now or application areas?

Alexandr Wang: One of the things that all the model developers understand well, but the enterprise doesn't understand super well, is that not all data is created equal. High quality data or frontier data can be 10,000 times more valuable than just any run-of-the-mill data within an enterprise.

A lot of the problems that we solve with enterprises are how do you go from this giant mountain of data that is truly all over the place and distributed everywhere within the enterprise to how do you compress that down and filter it down to the high quality data that you can actually use to fine-tune, train, or continue to enhance these models to drive differentiated performance?

Elad Gil: There are some papers out of Meta which show that narrowing the amount of data that you use creates better models. The output is better, the models are smaller, which means they're cheaper and faster to run. It's really interesting because a lot of people are sitting on these massive data sets and they think all that data is really important. It sounds like you're really working with enterprises to narrow that down to the data that actually improves the model. That was an information theory question in some sense. What are some of the launches that are coming from Scale?

Alexandr Wang: We're building evaluations for the ecosystem. One is that we're going to launch these private held-out evaluations and have leaderboards associated with these evals for the leading LLMs in the ecosystem. We're going to rerun this contest periodically, every few months, to consistently benchmark and monitor the performance of our models and continue adding more domains.

We're going to start with areas like math, coding, instruction following, and adversarial capabilities, and then over time, we're going to continue increasing the number of areas that we test these models on. We think of it as kind of like an Olympics for LLMs, but instead of every four years, it'll be every few months. That's one thing we're quite excited about.

We also have an exciting launch coming with some of our government customers. In the government space, as they're trying to use LLMs and these capabilities, there are a lot of cases where even the current agentic capabilities of the models can be extremely valuable, often in pretty boring use cases like writing reports, filling out forms, or pulling information from one place to another. But it's well within the capabilities of these models. So we're excited about launching some agentic features for our government customers with our Donovan product.

Sarah Guo: Are these applications you build yourselves or an application building framework?

Alexandr Wang: For our government customers, we basically build an AI staff officer. It's a full application, but it integrates with whatever model our customers think is appropriate for their use case.

Sarah Guo: Do you think Scale will invest in that for enterprise applications in the future?

Alexandr Wang: Our view for enterprises is fundamentally about how we help them build self-improvement into the applications that they are going to build. We think about it much more at the platform level for enterprises.

Sarah Guo: Does the new OpenAI or Google release change your point of view on anything fundamentally? Multimodality, the applicability of voice agents, et cetera?

Alexandr Wang: I think you tweeted about this, but one very interesting element is the direction that we're going in terms of consumer focus. And it's fascinating. I mean, I think multimodality, well, taking a step back, first off, I think it points to where there's still huge data needs. So multimodality as an entire space is one where, for the same reasons that we've exhausted a lot of the Internet data, there's a lot of scarcity for good multimodal modal data that can empower these personal agents and these personal use cases.

So I think there's, as we want to keep improving these systems and improving these personal agent use cases, we think about this a lot. What are the data needs that are going to be required to actually fuel that? I think the other thing that's fascinating is, is the convergence, actually. So both labs have been working independently on various technologies, and Astro, which is Google's major sort of hubcap release, as well as 4.0, they're both shockingly similar and sort of demonstrations of the technology. And so I think that was very fascinating. The labs are sort of converging on the same end use cases or the same visionary use cases for the technology.

Sarah Guo: I think there's two reads of that. One is like, there is an obvious technical next step here, and very smart people have independently arrived, and the other is like, competitive intelligence is pretty good.

Alexandr Wang: Yeah, I think both are probably true. I think both are true.

Elad Gil: It's funny, because when I used to work on products at Google, we'd spend two years working on something, and then the week of launch, somebody else would come out with something and we'd launch it, and then people would claim that we copied them. And so I do think a lot of this stuff just happens to be, in some cases, just where the whole industry is heading. And it's kind of, people are aware that multimodality is one of the really biggest big areas, and a lot of these things are years of work going into it. So it's kind of interesting to watch it as an external observer.

Sarah Guo: Yeah, I mean, this is also not a training run. That is a one week copy effort.

Alexandr Wang: Right. Well, and then I think the last thing that is, that I've been thinking a lot about is, like, when are we going to get smarter models? So we got multimodality capability. That's exciting. It's more of a lateral expansion of the models, and the, the industry needs smarter models. We need GPT five or we need Gemini two or whatever those models are going to be. And so to me, I was somewhat disappointed because I just want much smarter models that are going to enable, as you mentioned before, way more applications to be built on top of them.

Sarah Guo: The year is long, end of year. Okay, so quickfire and Ilan chime in if you have ones here, something you believe about AI that other people don't.

Alexandr Wang: My biggest belief here is that the path to AGI is one that looks a lot more like curing cancer than developing a vaccine. And what I mean by that is, I think that the path to build AGI is going to be in. You're going to have to solve a bunch of small problems where you don't get that much positive leverage between, um, solving one problem to solving the next problem. And there's just sort of, you know, it's like curing cancer, which is you have to then zoom in to each individual cancer and solve them independently.
And eventually, over a multi decade timeframe, we're going to look back and realize that we've, we've, you know, built AGI, we've cured cancer. But the, the path to get there will be this, like, you know, quite plodding road of solving individual capabilities and building individual sort of data flywheels to support this end mission. Whereas I think a lot of people in industry paint the path to AGI as, like, eventually we'll just, boop, we'll get there, we'll solve it in one fell swoop. And I think this has a lot of implications for how you actually think about the technology arc and how society is going to have to deal with it. I think it's actually a pretty bullish case for society adapting the technology, because I think it's going to be consistent, slow progress for quite some time, and society will have time to fully sort of acclimate to the technology that develops.

Sarah Guo: When you say solve a problem at a time, if we just pull away from the analogy a little bit, should I think of that as generality of multi step reasoning is really hard, as Monte College research is not the answer that people think it might be. We're just going to run into scaling walls. What are the dimensions of solving multiple problems?

Alexandr Wang: I think the main thing fundamentally is I think there's very limited generality that we get from these models. And even for multimodality, for example, my understanding, there's no positive transfer from learning in one modality to other modalities. So training off of a bunch of video doesn't really help you that much with your text problems and vice versa.

Alexandr Wang: And so I think what this means is each niche of capabilities or each area of capability is going to require separate flywheels, data flywheels, to be able to push through and drive performance.

Sarah Guo: You don't yet believe in video as basis for world model. That helps. For this reason.

Alexandr Wang: I think it's great narrative. I don't think there's strong scientific evidence of that yet. Maybe there will be eventually, but I think that this is the. I think the base case, let's say, is one where there's not that much generalization coming out of the models. And so we actually just need to slowly solve lots and lots of little problems to ultimately result in.

Sarah Guo: AGI, one last question for you. Is leader of scale, a scaling organization? What are you thinking about as a CEO?

Alexandr Wang: And this will almost sound cliche, but just how early we are in this technology? I think that there's. It's strange because on the one hand, it feels like we're so late because the tech giants are investing so much and there's a bajillion launches all the time. There's, you know, there's all sorts of investment into the space.

Sarah Guo: But markets look crowded in the obvious use cases.

Alexandr Wang: Yeah, exactly. Markets look super crowded. But I think fundamentally, we're still super early because the technology is one 100th or 1000th of its future capability. And as we as a community and as an industry and as a society ride that wave, it's just going to be, you know, there's so many more chapters of the book, and so as a, you know, if you think about any organization.

Alexandr Wang: What we think about a lot is, is nimbleness. Like, how do we ensure that as this technology continues to develop, that we're able to continue adapting alongside the developments of the technology? Great.

Elad Gil: That's a great place to end. Thanks so much for joining us today.

Sarah Guo: Yeah, thanks, Alex.

Alexandr Wang: Thank you.

Sarah Guo: Find us on Twitter @opryorspod. Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way, you get a new episode every week. Sign up for emails or find transcripts for every episode at no-pryors.com.