No More Hustleporn: Gen AI infra startups face competition from Databricks' AI play
We pulled out the highlights from Databrick's recent Data + AI Summit Keynote. Transcription and light editing by Anthropic's Claude, curation by Yiren Lu :-)
Highlights
Matei Zaharia: So, you know, if you've looked around, every company on the planet is looking at their difficult technical problems and just slapping on an LLM. How many of your bosses have asked you to do this? I assume it's like pretty much everyone here. The problem is that in many domains, and especially the challenging domain we're in of enterprise data, just naively adding an LLM assistant doesn't really work. And the challenge is that in your domain, in your company, you have a lot of context jargon terminology, data structure and so on that's unique to you. And to make sense of business questions and to answer them accurately, you need LLM based features that actually understand that context.
So every organization has its own unique jargon, data structure, et cetera. Just as some examples here, I'm showing a software company. This is actually Databricks. So here are some of the terms we use internally. So Dbus, if you're a customer, you know they're a unit of Compute that we use for billing and pricing. Nephos. I'm not sure how many people know that Nephos is our internal code name for serverless offering serverless Compute. Mac is a monthly active customer. Warehouse is a thing that runs SQL. Queries job is a scheduled job.
Now, if you look at a different company, like a retailer, they will also use some of the same words, but they'll mean something totally different. So warehouse for a retailer is like an actual warehouse. They have these terms like Boris and BOPUS and pop that mean something very specific. And there's a lot of jargon that's only in retail, not to mention the unique stuff inside each company as it accumulates sort of decades of growth and experience.
And finally, if you look at a telecom, they also have terms like Mac and pop and so on, but they mean something totally different. So even if you get a simple question, you can imagine this question in our assistant, like how many Dbus were there in Europe last quarter? It's actually really hard to tell what this question means unless you really understand how the company works. You have to know what Dbus are. You have to know what Europe is like, how does that map to columns in a SQL table. And even the fiscal year is different at each company. Our fiscal year actually begins in February. So you have to know all these things. And even for DPUs, I just looked it up on Wikipedia. It turns out there are many other things that mean DPUs. It can be a chemical, it can be a university, it can be a unit of volume, all kinds of stuff.
So how does lake house IQ solve this? So, Lake House IQ takes in a whole bunch of signals about how data is actually used in your organization. And we can do that partly because we have this product Surface that goes all the way to end users with things like dashboards and notebooks and just all the stuff that people build with their data. So it's all centered on Unity catalog. It takes all the metadata data on there, but it also takes in Docs, dashboards, notebooks, your chart and groups, popularity signals. Maybe there are a thousand tables with customer in the name, but one of them is used much more often, lineage and the actual queries running on them. And we use these to build models for your company based on its use of data to help support users in basically all aspects of our product.
Naveen Rao, the CEO and co-founder of Mosaic ML: Let me talk about those three points that Ali mentioned earlier. Control, privacy, and cost—how do we address those things?
Naveen Rao, the CEO and co-founder of Mosaic ML: We're seeing every company that uses data warehouses, analytics tools, data filtering, and ETL pipelines plugging into generative AI to produce a model and then serving that model back to their customers in their front-end application.
Naveen Rao, the CEO and co-founder of Mosaic ML: Our solution borrows some of the ideas Databricks pioneered, where you can run computation on behalf of a user in a user-secure environment, whether supplied through us or the user. We make it completely invisible for us as a company to see our customers' data. All model artifacts, weights, and other things are written back to the user's storage. We literally can't see it; it's a brick wall. This is critical and can be enforced through technology.
Cost was actually a deep technology problem we tried to innovate on before others. The problem is simple: I have a flop, a floating-point operation, the basis of training a neural network. How can I learn from data more quickly with one flop? A flop costs money. It runs on a processor like a GPU, and there are operational dollars to keep the GPU going. Every flop matters, and we start throwing lots of flops at problems, becoming very expensive. Many have heard GPT-3 cost $15 million to train. We wanted to bring those costs down to something more tractable for enterprises.
Full Transcript
Ali Ghodsi: Everyone, I'm super excited to be here. This is my favorite week of the year every year. We're calling it generation AI because I really feel like this is a special time. I got a text this morning that said, "What an amazing time we're living in. This is the best time ever to be alive." We called it generation AI because I think we're all part of this generation that's going to shape the future of the planet, the future of technology, the future of generative AI and machine learning and data. This is a really special time.
We think every company in the future will be a data and AI company. The people here are representing those companies. You are shaping the future of the planet. With this technology, we can make the world much better. We can make everyone smarter. We can cure diseases and raise the standard of living. Of course, there are issues, and we'll discuss them. But overall, this will be game-changing. You here will make it happen.
I welcome you to the Data and AI Summit. We have 75,000 people online and 12,000 here. We have over 150 countries represented. We have 320 sessions, 100 data teams, and over 100 exhibitors. This is a global event, and I thank you for making it happen over the years. [applause]
This is our 10th year. It started as Spark Summit, then Spark+AI Summit, now Data and AI Summit. Spark now has 1 billion downloads per year. Delta Lake has half a billion downloads per year. MLflow has 120 million downloads per year. These projects represent huge communities, and you helped make them successful. Thank you.
Special thanks to our partners, especially AWS, Microsoft, and Prophecy. We have an awesome lineup of speakers. Today we'll talk about innovations from the last year. We'll hear from Larry Feinsmith of JPMC about how they use the lakehouse. We'll hear from JetBlue about using AI and language models. We'll hear from Rivian about using our tech for everything from optimizing batteries to avoiding collisions.
First, I'm excited to chat with Satya Nadella of Microsoft. Welcome, Satya.
Satya Nadella: I'm supposed to be in a San Francisco court later this afternoon, but I'm glad to be with you this morning.
Ali Ghodsi: Awesome. Thank you so much. Hey, Satya, I wanted to start I'm so impressed. I mean, just seeing what you've done to Microsoft, the huge sort of transformation, it's almost unbelievable. How did you do it? How did you see the importance of cloud AI data so early? The investments that you made? What gives you those signals?
Satya Nadella: First of all, Ali, congratulations to you and the entire Databricks team. I mean, what you all have done over the last decade plus, it's just unbelievable. I mean, at some level, when I think about any of us who are trying to innovate or even the customers you were just talking about, really just before it becomes conventional wisdom to have a real sense of vision of where is the arc of technology going? It's interesting you asked about this AI stuff, right? I mean, all of us have been working on AI for a long time. But this specific era of AI, the bet was, quite frankly, around the scaling laws on these foundation models, right? It is unclear, right, even a few years ago whether they would work or not. But they seem to be working. And I'm not saying this is the last model architecture. There will be more breakthroughs, but at least they are working. And when they're working as technologists, you take advantage of them.
And then the more important thing, though, is when I came at it first, one thing I learned growing up in infrastructure, Ali, is that I always am sensitive to new workloads. So in other words, this training, for example, if you take that training workload inference is also interesting. But the training workload is very data parallel great. It's also very synchronous, so different than any data parallel workload I had seen in the past. So one of the things I said is one of the no regrets way for Azure to get ahead was to think about compute storage network and memory in a way that will help these workloads. And so that's kind of how we got started. And then, of course, we saw the two five GPD, two five to three to three five. The GitHub copilot gave us a lot of confidence around the emergent capabilities of these foundation models. And by the way, I think I love open source models and I love frontier models. And I want us to be able to sort of run both very well so that application developers have the best choice of this next generation of AI.
Ali Ghodsi: That's amazing. Well, hats off to you. I mean, we use Databricks, all of our engineers use Copilot, and we're seeing huge boosts thanks to that preschent that you had. I'm curious. So with AI, there's huge, huge opportunities, okay? And there's some challenges as well. I'm curious, what's your thoughts about the challenges, the opportunities that we all face with this technology and companies and people that represent those companies here in audience today?
Satya Nadella: Obviously, it's a great question to ask, Ali and it's a good sort of set of issues for us to deal with. Somebody gave me this analogy, which I think is very helpful. Let's just say when the steam engine first came out in the late 17 hundreds, if all of us talked about the greatness of the steam engine but also sort of dealt with all the unintended consequences like oh, we're going to have pollution, the issues of child labor, we would have had like what better history, right? We would have avoided at least the horrible 200 years of history. So in some sense, having a conversation about AI and responsible AI and societal impact of AI all simultaneously, I think it's a good thing.
So I'll first acknowledge that and then perhaps even thinking about it in three parallel traps, right? One is I'll call it here and now. Let's face it right here and now, there are certain things which are around, let's call it misinformation, which have real solutions, maybe watermarking, maybe some regulation around distribution, right? So there are real things one can do. After all, misinformation existed before gen AI. It may get accelerated with gen AI. So what do we do about it then?
There is probably more in the intermediate time frame. We will have more cyber risk, bioterrorism risk, or bias, right, that are real world harms. And so we should really go and say how do we ground these models, how do in facts like regulations, how do we sort of align them? Those are all things. Again, there are real engineering solutions and then there is a third bucket, which is the AI takeoff, right? What if we lose control? And that one obviously is a science problem today because in some sense we really need to solve the alignment problem.
And so I think thinking about all these three things while making both engineering progress. So one thing that I encourage our teams is, hey, remember, we are responsible here as engineers to introduce new technology that is safe by design. So we shouldn't abdicate, so at least we can't, as you and I and all of us in this conference cannot abdicate our responsibility to produce responsible AI. And some of the choices, even Ali, that's why I like copilot, even as a metaphor. After all, we know these generative models have hallucinations. One of the things we can do about Hallucinations, of course, is ground them in facts and do retrieval, augmented generation or what have you. But more before that, you can even put a human in the loop on a design choice. So I think there's a lot of choices we as developers of this technology can make that can make this safe in terms of its use today while solving some of the harder problems and ensuring that there is never an AI takeoff that we're not in control of.
Ali Ghodsi: That's amazing. I mean, that's a really thoughtful answer, and I have no doubt. I've seen Microsoft, and I know how you think about this. You've invested so much in responsible AI for so many years. I think Microsoft is the company that people can bet on will actually be very thoughtful about how to approach this, all the way from the research at MSR to the engineers that are building this stuff. So it's amazing to see. My final question for you. We've had an amazing partnership. It's Microsoft. Azure Databricks made thousands of thousands of customers worldwide super successful with these projects around Data and AI for the last five to ten years. I'm curious, what's your vision for the next five years for the partnership?
Satya Nadella: First of all, I think of this as Azure Databricks. So as far as I'm concerned, to me it's one of the best partnerships you and I and Scott and others, we were able to form. And it's just been fantastic to see it grow over the years. So again, I really thank you and your leadership and your team for even taking a bet on us and really helping grow. I think what has been fantastic for all our mutual customers, right? I mean, when I look at AT&T and its use, or T-Mobile and its use, or Swiss Re or TD Bank, I mean, it's just fantastic to see the type of intense usage of what you all have done on top of Azure. And I think going forward, I really think they're taking even this generative AI. For example, one of the things we announced at our developer conference was Databricks and Azure Confidential computing. I think Azure Confidential computing plus Databricks in a world where people want to secure the weights, where they want to secure the models, I think could be a real thing for all the customers and partners at the conference. So that's a lovely area. The other one, of course, is power bi on top of Databricks, right? So one of the things, perhaps the most exciting thing is natural language interface finally comes to bi tools to be able to make sense of data. So that power of analysis, I think, could be a big breakthrough. Or the other day I looked at your Vs code extension. I mean, that's so beautifully done. And so to me, really thinking about GitHub copilot your Vs code extensions and then what it does to developer productivity. So there's so many areas of integration. Also, I would love Azure OpenAI to be one of the computes on top of what is happening on Databricks. So I think we have a tremendous surface area of really doing practical, good product integration so that customers can do more with Databricks.
Ali Ghodsi: Thank you so much. Satya, you put it well. I mean Azure. Confidential compute vs. Code integration with power bi. Those are things that are top of mind for us. Thank you so much. You're such an inspiration. Good luck with the end of the year.
Satya Nadella: Thank you so much, Ali.
Ali Ghodsi: So that was Satya. All right, so we called the conference Generation AI, or that's the theme this year. It's top of mind for everyone - generative AI, large language models, AI data. But the question we have to ask ourselves, why is this happening now? We've been talking about Data and AI for ten years here at this conference. Why is it taking off now? Why is it becoming such a sort of breakthrough?
Well, we think at Databricks that innovations don't really become technology revolutions before they're democratized. Okay? So you're going to hear this word a lot - democratization. So we think that's why this is happening right now. And actually it makes sense. If you look historically at this, computing was actually invented already. I mean, frankly, in the 1940s computers existed, at least the construct of the Turing machine and then early computers and so on. But it wasn't until really the 1980s, when the personal computer appeared in everyone's home, that it was democratized.
Then suddenly everyone had a PC and they had the spreadsheets. And with Dos, Microsoft had developed, it became democratized. Same thing with the Internet. Internet was actually a DARPA project in 1969, and none of us were using it, or none of our parents, probably, maybe someone in the audience. But it wasn't until 1990s, when Mark Andreessen, who actually will be here on stage for Fireside Chat tomorrow, invented the web browser, that it became really democratized. And everybody got access to that using Mosaic or eventually Netscape.
And now, same thing now. So AI has been around for a very long time, maybe fifty s, sixty s. But this wave of deep learning really started, I would say in the early two thousand and ten s. And it's not until this year that really generative models are taken off because they're becoming democratized. Everybody has access to them, everybody's noticed them. So I think there's been a revolution in awareness around the planet, around what this technology can do.
And at Databricks we have been saying that we want to democratize Data and AI for a very long time. Actually for almost now, a decade, we've been saying this technology needs to be democratized. And the problem in the industry has been that these two worlds of data and AI have actually been separated in the past. There is the world of Data, which primarily consists of data warehousing, which consists of structured data that you store, and then really you do business intelligence on it and you ask questions about the know. Its roots is in oracle. It's great technology, but it doesn't have any AI whatsoever in it.
And then the world of AI, which is all about unstructured data, all these text that's all over the internet, the video, the audio, all this unstructured information that you can actually train models on and then start doing predictions in the future. And these worlds were incompatible. And our belief was you have to have them together, you have to merge them. And that's why we so hard push for the lake house, the Data lake house, right? The lake representing the AI, the house representing the structured AI and unifying them in one place so that you can have all your data. Whether it's structured unstructured with unified governance, and that's the most important part actually, of the lake house.
That's the most important part to get right. So that you can do governance not at the level of files, but at the level of data. And then on top of that, you could do all these things. You could do backwards looking, bi data warehousing, ask questions about the past, and you can look into the future, and you can do AI, and you can predict the future at Databricks, the actual technologies that we embraced and that we're excited about.
You can see it on the slides. The foundation of our lake house is Delta Lake, but there are other alternatives out there as well. We'll talk about them today. There's Unity catalog for governance. We have lots of exciting announcements around that as well. There's Databricks Workflows, which actually is the bulk of what people do on Databricks. Believe it or not, it's all the data processing that they do to get the data in a shape where they can actually start doing interesting things with it.
Databricks SQL, which is our data warehouse, we have lots of announcements, we have actually talk on that. So I'm excited about that. And then finally for real time stream processing with Delta live tables. So that's what this looks like. And this lake house now is actually adopted by over 10,000 customers. So many of you are actually in the audience that are using this technology. So we're very proud to announce this number for the first time here.
And I want to really welcome to stage Larry Feinsmith, who leads all of tech strategy and innovation at JPMC, which is the largest bank in the United States to talk about how they are using the Data lake house for all their data and AI. So let's welcome Larry. Larry, welcome. Good to see. Awesome. Very, very excited. So Larry, I'd love to start just at the top. JPMC, you have for a long time believed in Data AI. It's been a big investment. You started very early in that. Can you tell us a little bit about that evolution and what role did Data and AI play in this journey?
Larry Feinsmith: Sure. First let me start by saying what a fabulous conference this is. I'm thrilled to be here today. I great to hear Satya's comments and your comments and great to be here with all of you today. The energy is palpable in here. It's just absolutely fabulous.
So let me start and you covered a couple of the points. JPMorgan Chase has always been a data-driven company. In fact, we have about 500 petabytes of data. And I used Chat GPT last night because I wanted to visualize what does 500 petabytes of data actually look like. So I'm going to give you what Chat GPT said to me as an analogy. It said if you took 500 petabytes of data and sliced it up into DVDs five gig at a time, it would be 100 times higher than Mount Everest, which is the largest peak on Earth.
So if you want to talk about data-driven so we've been investing in data warehousing technologies, as you and Sacha mentioned. We've been investing in dashboards, we've been investing in bi tools. And that has been for since I joined JPMorgan Chase 15 plus years ago. So we've always been data driven. But as you continue to look at and I'm surprised you talked about it a little bit. You talked about traditional analytics. We've also focused on AI/ML and predictive analytics. And that's where we're accruing a lot of value in AI and ML.
And the predictive side, we have about 300 use cases in production.
Ali Ghodsi: Wow.
Larry Feinsmith: And those use cases are yielding about a billion and a half value to our firm this year alone. That could be operational efficiency, revenue enablement. And let me bring it to life a little bit. Fraud. Many of in the room who are Chase customers, maybe you get a text in milliseconds when a transaction goes through, we're able to look at a model and determine is it fraudulent or is it a good transaction? And that's all been driven by AIML, and we think on average about we have benefit of about 150,000,000.
Ali Ghodsi: Wow. Can I tell you a personal story? I'm a chase customer. I have the credit card.
Larry Feinsmith: You're on a special list, so we look for you. We want to talk about that's your own. That's gen AI at its best.
Ali Ghodsi: Yeah. That's awesome. Usually when you travel, you have to tell your bank, I'm going to this country. Please don't cancel the card. So I was traveling to the Bahamas recently, and I had to put that in. And the bank said, you don't need to tell us anymore. That feature is disabled. We're okay. We know. We have the analytics. You're safe. You don't need to tell us where you're traveling.
Larry Feinsmith: And we have a tremendous focus. The other area Ali, and that's around personalization. What you just experienced was personalization. So during the Pandemic, every one of our business lines had a digital engagement platform, and we want to make sure that their experience with us is the best user experience. So we've embedded AIML into things like personalization of all our digital properties. We think that's going to make a huge difference for how we service our clients and customers. And AIML was the primary reason we decided a couple of years back to go to the public cloud for the elasticity and availability of computing resources. So that's a little bit about our journey and how we're thinking about things.
Ali Ghodsi: That's amazing. It's remarkable to see.
Larry Feinsmith: I do want to say one more thing. If you look at our size and scale, and that's 65 million digital customers, 45 million mobile customers, we move $10 trillion of payment transactions a day as well as has 6000 applications in our fabric. We need to deploy AIML at scale and the only way to do that is through platforms. And that's how we're addressing our entire stack is to roll out platforms. So in the AI ML space, we have a platform we call Jade, which allows us to move. And more importantly, as you talked about manage. We got applause for Jade out there, move and manage data. And the other one is called Infinite AI, which I love because it's the AI that never ends, it's Infinite and that platform is for data scientists.
Ali Ghodsi: That's awesome. That's so curious. Platforms, we are big believers in that too. If you have lots of hodgepodge of things, it's very hard to build a sustainable sort of future with data and AI. So I'm curious, you touched on some of these. What are the capabilities that you require from those kind of platforms?
Larry Feinsmith: You talk a lot about the importance of data. I know you a long time and you've been talking to me about the importance of data. But equally as important, as you always say is the capabilities that surround that data. So of course, the first thing is data discovery. After data discovery, you need to properly entitle data. And for instance, at Databricks you probably have access to a lot of data that maybe some of your other organizations don't have. And then being a highly regulated company, data lineage for Morgan Chase is key. Where did the data come from, where do you land it and what are you using it for?
In fact, we have a Data Use Council, so that's incredibly important. And then if you combine that with the capabilities for data scientists, that's all the capabilities around the model development, lifecycle, so feature engineering, experimentation, training the models, which you want to do in an economic way, the whole operations part of that.
So what I'm going to do for you is I'm going to mark to market data bricks on that because you're helping us solve a lot of those goals. And by the way, this is all happening, you mentioned streaming, this is all starting to happen in real time. And when you think about 500 petabytes and Mount Everest, that's a lot of data to move around the environment in real time.
So we're very excited about the Unity catalog in our use cases, that we're using Databricks for the ability to govern and manage and do data lineage. That just has been fabulous for us. By the way, you mentioned your SQL data warehousing. We have gotten great results and performance from your SQL data warehousing.
Ali Ghodsi: Thank you. Thank you.
Larry Feinsmith: So that's pretty good. We're incredibly excited from Delta live tables. We also over time, want to share with our clients and customers and internally over time. So your Delta sharing but I would just tell you this and you hit the nail on the head. The lake house architecture is transformational. And the reason I say that, for years, JP, Morgan Chase and other enterprises, it's two stacks, and it was optimized for two stacks, and that was great then. But the economies of scale, the efficiency of moving, the amount of data that I talked about into an environment and being able to do different processing capabilities, either SQL or training or what have you, is incredibly differentiating. I will tell you this, it does need to be interoperable because all of our data isn't going to end up in databricks, no matter what discount you give me. So I just want you to know that interoperability is very important.
Ali Ghodsi: Yeah, that makes a lot of sense. I think Jamie really wants to talk to you. Yeah, that's amazing. Yeah, I remember actually when you told me many years back, you said, hey, it's a big deal that data warehousing and AI finally can be unified. Do you remember? It was in New York. This is a big deal for us.
Larry Feinsmith: And by the way, initial blush, you have to prove that. And that's what we've proved.
Ali Ghodsi: So this is awesome. Okay, so you've invested a lot in data bi and then now using the lake house. What about Generative AI? How do you guys look at generative AI and what does it mean for the business and the future of JPMC?
Larry Feinsmith: So, listen, a lot of optimism, a lot of excitement. The businesses all know about it because you can read about it every day. So it's very exciting. And what I will say is the capabilities around Summarization Q-A-I asked the question last night about an analogy. It came back right away around Mount Everest. It's making people more productive, functional areas like marketing, that want to do marketing copy. And of course, as Sachi mentioned, code. We're very much interested in GitHub Copilot and the ability to have Gen AI help us with code. But there's still human in the loop there. I just want to make that. But that is democratization of AI. By the way, in this room, which is enormous, by the way, this is like a concert almost. But who here is a Python engineer? Who here uses C plus plus or Scala or Go? Raise your hand virtually all. Of you, which is great, but through Gen AI, I'm the same now, because the way that's right, I'm the same.
The way in which you're going to speak to computers is through English, and it's going to be through native languages in different countries. That is incredibly democratizing. But let me just tell you, we think there's a bifurcation here because even though we're going to get value through LLMs, through vector databases and vector search and embeddings and fine tuning and training our employees around prompt engineering, it's still trained on public data, and we'll provide context through that in a highly secure way. But where we think is additive to that is training on those 500 petabytes, making Mount Everest valuable for all of our clients and customers. So we're going to use open source models. Wherever you're interested in Dolly, we're going to have to have a platform for that. Congratulations on Mosaic. Fabulous acquisition. And we knew that organization, so I just so I'm getting a lot of applause here. So I don't know what's going on.
Larry Feinsmith: In any event, I think it's going to be a one plus one equals whatever that multiplier is. It's going to be way more than ten. But I will tell you this. We at JPMorgan Chase will not roll out generative AI until we can mitigate all of the risks. So you want to talk about responsible AI, you want to talk about hallucinations, you want to talk about misuse, you want to talk having the right cyber capabilities so that the models aren't poisoned or tampered. So we're excited. We're working through those risks as we speak, but we won't roll it out until we can do this in an entirely responsible manner. And it's going to take time, as others have said, to work through all of that.
Ali Ghodsi: Yeah, no, I mean, we've been discussing and you've kept me up to date, and it's very clear that you want to implement AI. You want to go all in.
Larry Feinsmith: It's the first inning, but not baseball.
Ali Ghodsi: And you want to do everything, but you're not just going to play it fast and loose and see where you end up. That makes a lot of sense. Yeah. I'm curious. So I'm a technologist, so all problems can be solved with technology only. But the reality is you have people in an organization. JP Morgan Chase is a gigantic organization with lots of people. How do you get the people to come along in this journey with data and AI and generative AI so that you can actually bring about that change? Because if the people in JPMC do not embrace this, then it doesn't matter. All the technology is sort of wasted. Right. So how do you conduct change management as a large organization?
Larry Feinsmith: Yeah, it's a very timely question. And I agree with you. It starts with people and talent. And by the way, there are a lot of really bright minds in this room who are really adding value to this whole data ecosystem, AIML. So if you want to work for the best financial institution that's doing AIML, I encourage you to send your resume to Air, which is artificial INTELLIGENCEResearch recruiters@chase.com. Send them over.
Ali Ghodsi: You don't even need to know programming. You just need to know English.
Larry Feinsmith: We'll share. So, organizationally, it all starts with the business. And the business has to be accountable for understanding how to apply AI ML. And that's something that's, from Jamie Dimon on down, that making the business accountable. We embed our data scientists and our ML engineers in the business. They know what problems to solve. They know the data, what aspects of the data. And by the way, we're moving to data products. So in the business, you'll earn certain products, which is why, again, Unity Catalog is incredibly important to be able to store that metadata and provision data products. But it all starts in the business.
And to that, let me just tell you this. We made an announcement last week that we named a firmwide Chief Data and Analytics Officer. Now the title chief data and analytics officer. And maybe some of you are in the room here who have that title, that's typically in the line of business. This is a firmwide four responsibilities. AI enablement the ability to have an ML Center of Excellence, a SWAT team that works with the business, AI research. Horizon. Two horizon three horizon four. Where is this technology going? But the fourth one, I think, is the most important. It's a responsible AI team. And they're Ethicists in that team.
Now, this organization that now has a firm white CDO, they'll work with the platform teams in AI. So it's not just about technology. It's about the business understanding how to use this technology. So they'll work with the J team and the infinite AI team with Ethicists data scientists, ML engineers. And that's how but there'll be a lot of scrutiny on gen AI specifically. So we're really excited to mobilize the organization to figure out how to transform the business. Digital engagement, better risk management, building responsible products, so on and so forth.
And by the way, Ali, a whole industry is popping up around AI security right in front of us right now, of course. And I'm meeting with a bunch of folks in the AI security space over the next couple of days. So the fact that you can embed or interoperate with those is very, very important.
Ali Ghodsi: Actually, I have to say, it's so impressive to see at Morgan Chase scale. I mean, you're the largest bank in the United States. This is the thing I get asked all the time. What should we do at the people? How do we get much smaller companies? How do we get the Data and AI strategy right. And that's actually one of my advice to them usually is if you have lots of different fiefdoms and lots of different parts of the organization that are doing data and AI and shadow it and then they get into politics fighting each other, you make no progress. And that's the thing that's stagnating the organizations and slowing them down.
So the fact that if Morgan Chase can have one completely company wide chief data analytics officer that actually can run the operations with large, anyone should do that. I mean, anyone can do that and it's so important. Yeah.
Larry Feinsmith: And Ali, you're right, everyone has an opinion on what platform to use here. But I will tell you, one of the advantages you have, and I Evangelize, at our firm, is efficiently move the data once and then manage the environment and have as much capability against it. Once you start moving data around, it's a nightmare to manage. It's highly inefficient, you break data lineage. So, yeah, I couldn't agree more with you.
Ali Ghodsi: Okay, so final question. You guys are investing so much now, you have a sort of centralized .org that does this. There's so much innovation, it's central, but - federated that's actually the right way because - otherwise it won't work either because they won't have the domain knowledge that the different parts of the business has. Exactly. So I'm curious, there's lots and lots of practitioners here, young folks, people that actually are excited and actually I strongly recommend, I think JPMC has an amazing organization, amazing talent, investing heavily. Any advice to the audience? How should they think about if they're interested in working on data and AI.
Larry Feinsmith: I'll do a little pitch for innovation at Morgan Chase. It starts with our senior leadership. I would encourage everyone in this room to take a little bit of time to read Jamie Dimon's shareholder letter that he does every year. And you know what, the well AIML was mentioned 19 times in Jamie's shareholder letter. The only words that were mentioned more were interest rates, of course, which is not surprising. And it starts from top of house. We also spend a great deal of time highlighting and making sure the businesses can innovate. So we have innovation teams in the business. I happen to run a firmwide technology innovation team, as you pointed out, and we spend a lot of time ensuring that we're connected to the ecosystem. So there's a great opportunity for anyone who joins to work with amazing entrepreneurs, amazing startups, all the way up through Microsoft for that matter. And that's how you and I met. You and I met in 2016, before you were famous. And look where we are now. And the last thing I would say is we recognize and celebrate innovation. Just two weeks ago, we have an internal event we call our JPMC Innovation Week. And even though we innovate every day, every week, every month. We take that week to celebrate our inventors and the things we're doing and share knowledge. We had 45 sessions on Data and AI. Jamie and I kicked it off. And it's amazing to hear your CEO and your chairman talk about innovation as part of our DNA and innovation as part of our culture deeply cares.
Ali Ghodsi: Okay, so let's talk about the thing that we're here to talk about. Okay. The thing that excites us the most. We really think that we can, with generative AI, take Data and AI to just completely different level. So that's what we want to talk about.
All right, so what do we want to do? I want to basically talk about two things today. Just two things. First thing is, how do we democratize data? Larry already kind of touched on it. We're excited that we think that we can actually bring this technology to everyone in an organization so that anyone who can speak English or just use words or their mother tongue can ask questions from the data. So we're going to talk a lot about that. It turns out that's actually way harder than you think, but we have great announcements there. That's probably the thing that I'm most excited about. And then second, we're going to talk about how do we democratize generative AI AI into every product and service that exists out there? So what we mean by that is all of you represent organizations that have services and products. How do we infuse AI into them so that you can all make that revolution that's happening a reality?
Ali Ghodsi: Okay, so let's start with the left one. So how do we democratize access to data so everyone, not just people who know Scala, Java, Python, SQL can do this?
Ali Ghodsi: So I actually think this was a really, really prescient tweet by Andre Karpathi, who actually, by the way, gave a keynote here a few years ago. You should check it out. It's the highest rated keynote of all the Data and AI Summits. So check out Andre's keynote at DataAI Summit. But he said the hottest new language is English or the hottest new programming language is English or any other language that you're familiar with. And we're very excited about that.
Ali Ghodsi: So when we started Databricks, we wanted to democratize Data and AI. And we said, look, how do we enable every organization to be able to leverage data analytics AI and get insights from it and do predictions? When we started with actually Scala and Java, and then we said, well look, let's broaden that so that we can reach bigger audiences so more people can do this. So we added Python and Python become a first-class citizen in Databricks and Spark and Delta and all these things that we were doing. And then I would say about five years ago, we said, let's broaden it even further. Now we can reach really broad masses with SQL, with our data warehouse and so on. But we really, really hope that with the LLMs and generative AI, we can broaden it now to reach the whole enterprise or every organization, or every person that's in an organization, as long as they can speak English or any other natural language, they should be able to ask questions from the data. So that's what we're excited about.
Ali Ghodsi: And that's why I'm really, really excited to announce something called Lakehouse IQ. So Lakehouse IQ is a knowledge engine, and we'll explain what that is in a second. But here is the problem. Every product right now on the planet, they're basically putting assistants that you can chat with in every product. There's announcements every day, someone's added an assistant, you can write things in English and you can ask questions, and they make for really cool demos on stage. So we'll do some of those here too. But the truth is they don't work. They don't work because it turns out it's actually really hard to use them in reality for real problems. So Lakehouse IQ is the knowledge engine that sits in the data platform that addresses this. And we're very excited.
Ali Ghodsi: I actually think this will be the future of Databricks, and it's something that we just started working on now, we're going to work on for many, many years. So we're starting to make it available now, but we keep pushing the boundaries on this. But instead of hearing from me, I want to welcome my co founder, the creator of Apache Spark, and our CTO at Databricks Matei Zaharia, to stage.
Matei Zaharia: So, you know, if you've looked around, every company on the planet is looking at their difficult technical problems and just slapping on an LLM. How many of your bosses have asked you to do this? I assume it's like pretty much everyone here. The problem is that in many domains, and especially the challenging domain we're in of enterprise data, just naively adding an LLM assistant doesn't really work. And the challenge is that in your domain, in your company, you have a lot of context jargon terminology, data structure and so on that's unique to you. And to make sense of business questions and to answer them accurately, you need LLM based features that actually understand that context.
So every organization has its own unique jargon, data structure, et cetera. Just as some examples here, I'm showing a software company. This is actually Databricks. So here are some of the terms we use internally. So Dbus, if you're a customer, you know they're a unit of Compute that we use for billing and pricing. Nephos. I'm not sure how many people know that Nephos is our internal code name for serverless offering serverless Compute. Mac is a monthly active customer. Warehouse is a thing that runs SQL. Queries job is a scheduled job.
Now, if you look at a different company, like a retailer, they will also use some of the same words, but they'll mean something totally different. So warehouse for a retailer is like an actual warehouse. They have these terms like Boris and BOPUS and pop that mean something very specific. And there's a lot of jargon that's only in retail, not to mention the unique stuff inside each company as it accumulates sort of decades of growth and experience.
And finally, if you look at a telecom, they also have terms like Mac and pop and so on, but they mean something totally different. So even if you get a simple question, you can imagine this question in our assistant, like how many Dbus were there in Europe last quarter? It's actually really hard to tell what this question means unless you really understand how the company works. You have to know what Dbus are. You have to know what Europe is like, how does that map to columns in a SQL table. And even the fiscal year is different at each company. Our fiscal year actually begins in February. So you have to know all these things. And even for DPUs, I just looked it up on Wikipedia. It turns out there are many other things that mean DPUs. It can be a chemical, it can be a university, it can be a unit of volume, all kinds of stuff.
So how does lake house IQ solve this? So, Lake House IQ takes in a whole bunch of signals about how data is actually used in your organization. And we can do that partly because we have this product Surface that goes all the way to end users with things like dashboards and notebooks and just all the stuff that people build with their data. So it's all centered on Unity catalog. It takes all the metadata data on there, but it also takes in Docs, dashboards, notebooks, your chart and groups, popularity signals. Maybe there are a thousand tables with customer in the name, but one of them is used much more often, lineage and the actual queries running on them. And we use these to build models for your company based on its use of data to help support users in basically all aspects of our product.
Weston Hutchins, Databricks: And good morning, everyone. All right, so for this demo, let's just say I'm a product manager and I work for a startup that builds medical device wearables. It's a lovely Wednesday morning. Maybe I'm cleaning out email. I'm at Data and AI summit when all of a sudden I get a slack from my CEO. They want updated numbers for a newly launched product, and they want it broken down by a number of dimensions in customer segments. Oh, and it's for a board meeting in exactly 15 minutes. Now, normally this is going to take me some time. I don't have any of these charts or graphs ready to go, so I have to create a bunch of this from scratch. Don't panic. Let's see if I can pull this off with the help of Lake House IQ.
Lake House IQ gives me an instant answer. So I can see that we sold 2700 units in the last month for a revenue of about $725,000. But there's a few other interesting things to note here. First off, it knew that HLS means health and life science. Second, it also knew that Apollo was the code name for the blood pressure sensor product that we launched. Now, Apollo is not mentioned anywhere in the table or column names, but Lakehouse IQ uses comments from queries and code snippets and descriptions in Unity catalog to match the code name to the right field in our database. But Lake Housecue also surfaces relevant tables for me as well.
Lake House ATQ also understands the relationships between datasets and the people at my company. It knows who I frequently collaborate with and surfaces assets that they use when they have similar questions to mine. I can also see that this table is popular, so it's probably a pretty good place to get started. Let's explore this and it'll open up a pre populated query based on my question. But I'm going to refine this a bit more. Let's say just give me the sales for premium customers in the last three months. Lakehouse IQ makes it really easy to refine queries with natural language.
I get a diff view that shows me what's changed. This looks pretty good. We'll go ahead and insert it into the editor, and now we'll rerun this and I get some updated results. But you don't actually have to use the panel. We've integrated lakehouse IQ directly into our editor so I can start typing a comment. And let's say I only want to show customers with more than 1000 employees. Lakehouse IQ pulls in common code snippets from my team and surface those directly in my editor. So in this example, we're going to join with our salesforce data to limit results to just companies that have more than 1000 employees.
All right, my query looks good. Now I need to build a vis. I'm a bit more comfortable in python. So we're going to switch over to a notebook for this. We'll go into the notebook and add a new cell. And then we'll go over here and we'll ask Lakehouse IQ to convert this Python, this query over to Python and create a line plop lake House IQ makes it really easy to switch back and forth between SQL and Python. So you can always use the language that works for you. I'm going to insert this into our cell and we'll go ahead and run and all right, well, we have a viz, but there appears to be a problem. I don't know what's causing this dip.
Let's ask lakehouse IQ to see if they can help us debug what is what caused the dip. Lake house IQ is integrated with unity catalog lineage, so it knows my upstream and downstream dependencies. It can tell me when tables need to be repaired and backfilled. So in this particular example, it knows that the orders gold table is running, but there appears to be a problem in one of the upstream jobs. Let's go ahead and click on the orders pipeline and we'll see if we can debug what's going on. We've integrated lakehosiq into a number of UI elements inside datarix, so I can actually ask it to explain the error message for me and propose a fix.
It's a pretty common change. It looks like somebody updated the schema but forgot to update the pipeline code. It's a simple fix. We'll go and repair the runs and backfill the data. This can take a little bit of time, so we've gone ahead and done that for the sake of this demo, let me go back into my notebook, and I'll change this to use the updated table. We'll rerun this and see if our graph gets updated. All right? Fantastic. There we go. Let's ask Lakehouse IQ to go and suggest a name for our notebook. And this looks ready to go. I can send this off to my CEO just in time for his board meeting.That's an overview of Lakehouse IQ, the knowledge engine for your company.
Matei Zaharia: Super, super excited about Lakehouse IQ. It's not an assistant. Okay, it's not an assistant, but it can power the assistance in databricks. It also powers troubleshooting. It also powers when you're running the jobs or the workflows, all these kind of things. It hooks in to Lakehouse IQ, which then has all the semantic information from Unity Catalog. So that's why Unity Catalog is so important. You're going to hear about Unity Catalog again and again today. Okay, so we're very excited about that. That's the strategy for many years to come, and we're just taking a first step towards that, right? It's not yet in a place where just English is enough to get any insight you like, but it's one big step forward in that direction.
Okay, so what about the right hand side? How do we democratize AI? How do we democratize generative AI? How do we get every organization on the planet to be able to infuse generative AI into their applications? And we really believe at Databricks firmly that you should own your own machine learning models, your own generative AI models, your own LLMs, and you should train them on your own data. And that's what's going to set you apart in your industry. We really believe in that.
You might ask, okay, but why should we do that? There's already models. We can just use them. Why should you build your own model? Well, the reason is this gives you control over your intellectual property. And what do I mean by that? What I mean by that is every company on the planet is going to be a Data and AI company in the next 510 years. So the companies that you all represent, they will leverage Data and AI, and that's how they're going to beat the competition. So the future of the organization you work at depends on you all that are here and the models that you can actually build and the competitive advantage you built into that model.
You don't want someone else to have that, right? You want to really own that IP. So that's number one. Number two, you want to really lock down your data sets. That's what's unique. So the reason you're going to succeed in your organization and you all will be promoted is that you're going to leverage that data that your organization has. When they're working with customers, and you can actually lock those down.
They can be private. They can be secure, and you can train your own models using that data. And then the final reason is that you can actually push down the cost, so you can actually do this much cheaper. We're going to be using generative AI everywhere, and it gets pretty expensive, actually, to be able to do that really efficiently on modern hardware at scale. Owning the model will actually change that price equation rather than just relying on two, three big models that just a few companies have.
So that's what we deeply believe in the future, and that's why we're really, really excited that we've agreed to acquire Mosaic ML. Why am I so excited about Mosaic ML? Because the team behind Mosaic ML is absolutely fantastic. We love the team. We had complete alignment and vision the moment we saw them. They truly have been pushing for democratization. They call it generative AI for all, and the team is amazing, and especially the founder and the CEO, Naveen. First time I talked to him, I immediately knew I want to work really closely with him. So I'm really, really excited to welcome him to stage. Naveen, take it away, man. Thank you.
Naveen Rao, the CEO and co-founder of Mosaic ML: Thank you, Ali. I'm super excited to be here. It's wonderful to have an introduction from Ali. I'm Naveen Rao, the CEO and co-founder of Mosaic ML. We're a relatively new company, about two and a half years old. We started with the premise that we want to bring all of the capabilities of generative AI and large-scale AI to many people. When we started thinking about joining forces with Databricks, it was interesting that beyond our obvious hair preferences, we share a lot in common. We see the world similarly in that when you democratize technologies, the world gets better. I love the analogy of the personal computer. I was one of those kids in the 80s programming a PC. All of the people who learned to program did so because of that revolution. I think we're on the cusp of something as big, if not bigger. How we see the entire world of data will be changed with AI. We need to put those capabilities in the hands of as many people as possible.
Naveen Rao, the CEO and co-founder of Mosaic ML: Let me talk about those three points that Ali mentioned earlier. These are front and center to what we've been building, which is why the alignment between our companies is so tight. Control, privacy, and cost—how do we address those things? This is something we've focused on from the very beginning, two and a half years ago. You're always a little bit crazy at first when you do something until it becomes obvious later. I think we focused on the right things early on.
On the control side, there's a great case study by one of our mutual customers, Replit. Replit is a shared integrated development environment, an IDE. It enables people to essentially pair program anywhere in the world together. Code completion and code generation are a big deal. Now, anyone who has written code knows that when you type, it gives suggestions, accelerating what you do and making you more productive. Replit really wanted to own this control and model behavior because they had insights from their customers on how they could best serve them rather than a centralized model built by somebody else.
They took Databricks, built an extract, transform, and load or ETL pipeline, cleaned their data, and had a high-quality dataset. They wanted latency to be very fast, like while typing, producing code or when typing English, producing code right away. They didn't want to wait seconds or tens of seconds. Making the model very small and having adequate performance was very important to them.
They came to us for help on how to do this. They plugged their data, along with some open-source data, into our training platform and built a model in a few days, about three days. That was state-of-the-art. Again, it came down to leveraging both Databricks and Mosaic MLs' capabilities. It kind of worked on the first try. That's amazing.
We love this use case because we believe this is the paradigm of the future. We're seeing every company that uses data warehouses, analytics tools, data filtering, and ETL pipelines plugging into generative AI to produce a model and then serving that model back to their customers in their front-end application.
Privacy was part of this Replit example. Really, what we focused on from the very start was respecting data privacy. We believe privacy is critical to scaling generative AI capabilities and democratizing them. If you don't have technologies that can enforce data privacy, you end up with misaligned incentives in the market. Giving people the ability to build solutions on their unique data and create their own intellectual property is very important. We do that by respecting data privacy.
Our solution borrows some of the ideas Databricks pioneered, where you can run computation on behalf of a user in a user-secure environment, whether supplied through us or the user. We make it completely invisible for us as a company to see our customers' data. All model artifacts, weights, and other things are written back to the user's storage. We literally can't see it; it's a brick wall. This is critical and can be enforced through technology.
Cost was actually a deep technology problem we tried to innovate on before others. The problem is simple: I have a flop, a floating-point operation, the basis of training a neural network. How can I learn from data more quickly with one flop? A flop costs money. It runs on a processor like a GPU, and there are operational dollars to keep the GPU going. Every flop matters, and we start throwing lots of flops at problems, becoming very expensive. Many have heard GPT-3 cost $15 million to train. We wanted to bring those costs down to something more tractable for enterprises.
MPT-7B is one of our foundational models we put into open source. We did this to get builders and developers going with a small bite. It's amazing how quickly the community has built on this. It's the most downloaded large language model in history, only a month and a half old. This model is state-of-the-art for its size, 7 billion parameters, and only cost $250,000 to build from scratch. Our platform made this simple. Point it to the data, hit go, and we publish all numbers transparently. This is a real number, not squishing margins. Even bigger models, we released MPT-30B, a 30-billion-parameter model with more capabilities and reasoning abilities, performing as well as or better than ChatGPT. It's under $1 million, around $700,000 to train from scratch. Not everyone needs to do that. You can layer in your data, fine-tune, and other approaches. But training costs are in the hundreds of thousands for most models, not hundreds of millions. I want to be clear this is approachable and doable with our tools.
This is about building the best model for your application. Every application is different, necessitating different performance, behaviors, latencies, memory requirements, and so on. We see a world with millions of models solving interesting problems and building applications. We have customers leveraging this, growing very fast. We've been fortunate to be in this industry at the right time. It's been a ride the last few months, taking off. We're happy to see the great applications our customers are building.
I really want to thank the Databricks team for trusting us. We're excited to serve our customers better and get everyone to the forefront of AI. That's been our goal, and this accelerates it. We're super excited to be part of the Databricks family. Thank you all.
Zahira Vilani: Thank you, Ali. Good morning. I'm so excited to be here. So this is a data savvy audience, so I'm sure you're sharing data internally within your organizations, but raise your hand if you're sharing data externally with your customers or partners. I see hands going up. Now. Raise your hand if you wish this was easier. Oh, still see lots of hands up. We hear from our customers that building a strategy for data sharing and collaboration can seem like a massive mountain to climb.
Enterprises want open collaboration for data and AI. With trust, you want to be able to acquire that data easily. You want to get value from that data quickly, and you need to be able to trust it. For an organization like Edmunds, data sharing and collaboration is at the core of their business. They share data with customers, people shopping for cars, they share data with their partners, and even with competitors for use cases like inventory reconciliation. All of this starts with trust. The data that flows through their business and their ecosystem is filled with PII and other sensitive information that must be secured and governed properly. And as they acquire second and third party data, they need to be able to procure that data easily and make sure that it's valid the databricks. Lake House Platform is purpose built for Edmunds and many organizations to collaborate on data and AI. The platform provides a comprehensive set of tools to securely collaborate on data and AI. With the platform, users can share data and AI assets across platforms and clouds. And this flexibility is achieved with strong security and governance by integrating delta sharing with Unity Catalog so that you can track, govern, and audit access to these shared datasets.
Delta sharing is the foundation that makes all of this possible. Two years ago, we announced the Open Source Delta Sharing Project, the industry's first open protocol for secure data sharing. With delta sharing, providers can easily share live datasets without replication, and consumers can connect to this data without having to be tied to a specific vendor solution. Since launch, we've seen incredible adoption from organizations. There's petabytes of data that are being shared daily. This is data that is processed and shared across organizations. Now, built on delta sharing are other collaboration capabilities, including the databricks marketplace. So what differentiates the databricks marketplace?
Many cloud providers offer data marketplaces, but when we asked data providers, they shared with us that they run into limitations. One such limitation is that each marketplace is closed to its specific cloud or platform. So if you're a data provider and you've put in all this work to create this data set, now you need to put in more effort to publish it to ten different platforms. Existing marketplaces are also limited to just data sets. The databricks marketplace is an open marketplace. So for providers, you can reach users on any platform. You can monetize more than just data, and you can share this data securely. And for consumers, you can discover more than just data. You can evaluate data products faster because the assets include notebooks and visualizations. And you don't have to be a databricks customer to take advantage of the databricks marketplace. It's truly open.
One of our marquee partners, the London Stock Exchange, is a provider in the marketplace, and they're really excited to be able to get access to more consumers and be able to have customers get insights faster. The databricks marketplace has been in preview for a few months now, and we have many providers across a variety of industries, everything from healthcare to life sciences to retail. And we have hundreds of listings already in the marketplace today. I'm really excited to announce that the databricks marketplace is generally available today.
In the marketplace, you can share data sets and notebooks, you can share and monetize data assets in the public marketplace, or you can set up a private exchange to share data securely with the organizations of your choice. But that's not all. We've seen so much demand for AI models that we are doubling down our investments. Coming soon, you'll be able to discover and share AI models in the marketplace. Let me show you how this will work.
So here we are in the databricks marketplace. Let me walk you through it from the perspective of an end consumer. So let's imagine that I'm an analyst in a large healthcare organization, and I'm looking for an AI tool to help me summarize medical notes. Here in the databricks marketplace, I can see there is a variety of data products available to me, everything from financial data products from providers like SMP to healthcare products from providers like IQVIA.
So let's search for summarize medical notes. Let's take a look at this data product from Jon Snow Labs. John Snow Labs is an AI company that helps healthcare and life sciences organizations put AI to work faster. Here I can see that Jon Snow Labs has provided an overview of the data product. There are six models and a. Sample notebook to help me understand how to use the model. But where I can really speed up my understanding is to actually just try it. I'll select an example, and I can see the model output. All right, here I'm getting to do this exploratory analysis and to evaluate the data product before I acquire it. Now that I'm confident I want to get it, I can click on Get Instant Access, and in less than a second, the data product is now available for me. All of this was powered by Delta Sharing. Behind the scenes, when I clicked on Get Instant Access, the provider is automatically provisioning the six models and the notebook in my workspace. And so now I can run the model directly from within the notebook, or I can invoke it from a model serving endpoint.
Someone's excited about that. I just walked you through discovering, evaluating, and acquiring a model in the databricks marketplace. Every organization wants to leverage AI as a catalyst for innovation, productivity, and even cost savings. The databricks marketplace helps democratize AI by distributing both open source as well as proprietary models.
Now, what if you could discover and connect to prebuilt applications that already incorporate AI to solve your use cases so that you don't have to find the trusted data and find the right AI model, and then figure out how to put all of that together? That's just too much work. Today I'm thrilled to introduce Lakehouse Apps. Lake house apps are a new way to build, deploy, and manage applications for the databricks platform.
There are a lot of amazing startups and software vendors out there solving important use cases. So let's say you're a startup and you've built a compelling data application that leverages AI. First, you need to promote your application and find potential customers. And once you've found a potential customer, you have to go through a lengthy legal security review process before your application can be deployed in the customer's environment. You also have to figure out how the app is going to securely access the customer's data.
With lake House Apps, because data never leaves the customer's databricks instance, this means no lengthy review process. And through the databricks marketplace, you get access to Databricks's over 10,000 customers. And for app developers, you can use the language and platform of your choice. Our early development partners include platforms like Retool and Posit. And for organizations, you're getting incredible business value by being able to discover and connect to these pre built applications that solve your use cases.
So let me show you how lake house apps will work in the marketplace. So here we are back again in the databricks marketplace. Let's take a look at Kumo AI. Yes. So Kumo AI is a startup in the AI space, and their goal is to automate the end to end machine learning process directly on your enterprise data. So let's add the application, and I'll zoom in here a little. We'll select the