No More Hustleporn: Jensen Huang of Nvidia talks about how today, the data center is the computer

We pulled out the best insights from Jensen Huang's recent keynote at COMPUTEX 2023. Transcription and light editing by Anthropic's Claude, curation by Yiren Lu :-)

Highlights

The most important computer of our generation is unquestionably the IBM System 360. This computer revolutionized several things. It was the first computer in history to introduce the concepts of a central processing unit, virtual memory, expandable I/O, multi-tasking, the ability to scale this computer for different applications across different computing ranges. And one of the most important contributions, and one of its greatest insights, was the importance of preserving software investment. The software ran across the entire range of computers, and it ran across multiple generations of software. IBM recognized the importance of software, recognized the importance of preserving your investment, and very importantly, recognized the importance of installed base.
This computer revolutionized not only computing. And many of us grew up reading the manuals of this computer to understand how computer architecture worked, to even learn about DMA for the very first time. This computer not only revolutionized computing, it revolutionized the thinking of the computer industry. System 360 and the programming model of System 360 has largely retained, until today. 60 years! In 60 years, a trillion dollars worth of the world's data centers all basically used a computing model that was innovated all the way 60 years ago until now.
There are two fundamental transitions happening in the computer industry today. All of you are deep within it and you feel it. There are two fundamental trends. The first trend is because CPU scaling has ended. The ability to get ten times more performance every five years has ended. The ability to get ten times more performance every five years at the same cost is the reason why computers are so fast today. The ability to sustain ten times more computing every five years without increase in power is the reason why the world's data centers hasn't consumed so much more power on Earth. On Earth, that trend has ended, and we need a new computing approach. And accelerated computing is the path forward.
It happened at exactly the time when a new way of doing software was discovered. Deep learning. These two events came together and is driving computing today. Accelerated computing and Generative AI. This way of doing software, this new way of doing computation, is a reinvention from the ground up. And it's not easy. Accelerated computing is a full stack problem. It's not as easy as general purpose computing. The CPU is a miracle. High level programming languages, great compilers, almost anybody could write reasonably good programs because the CPU is so flexible. However, its ability to continue to scale and performance has ended, and we need a new approach. Accelerate computing is full stack. You have to re engineer everything from the top down and from the bottom up, from the chip to the systems, to the system, software, new algorithms, and, of course, optimizing the applications.
The second is that it's a data center scale problem. And the reason why it's a data center scale problem is today the data center is the computer. Unlike the past, when your PC was a computer or the phone was a computer, today your data center is the computer. The application runs across the entire data center, and therefore it's vital that you have to understand how to optimize the chips, the compute, the software across the Nic, the switch all the way to the other end in a distributed computing way.
And the third, accelerated computing is multidomain. It's domain-specific. The algorithms and the software stacks that you create for computational biology and the software stack you create for computational fluid dynamics are fundamentally different. Each one of these domains of science needs their own stack, which is the reason why accelerated computing has taken us nearly three decades to accomplish. This entire stack has taken us nearly three decades. However, the performance is incredible. And I'll show you.
After three decades, we realize now that we're at the tipping point. A new computing model is extremely hard to come by. And the reason for that is this in order for there to be a new computing model, you need developers. But a developer would only come if their end developers have to create applications that end users would buy. And without end users, there would be no customers, no computer companies to build computers. Without computer companies like yourself building computers, there would be no install base. Without install base, there would be no developers. Without developers, there'll be no applications.
This loop has been suffered by so many computing companies in the 40 years that I've been in this industry. This is really one of the first major times in history a new computing model has been developed and created. We now have 4 million developers, 3000 plus applications, 40 million CUDA downloads in history, 25 million just last year. 40 million downloaded in history. 25 million just last year. 15,000 startup companies in the world built on Nvidia today. And 40,000 large companies, enterprises around the world are using accelerated computing. We have now reached the tipping point of a new computing era.
This new computing model is now enjoyed and embraced by just about every computer company and every cloud company in the world. There's a reason for that. It turns out that every single computing approach, its benefit in the final analysis, is lower cost. The PC revolution that started and that Taiwan enjoyed in 1984, starting in 1984, the year I graduated, that decade in the 80s, was the PC revolution. PC brought computing to a price point nobody's ever seen before. And then, of course, mobile devices was convenient, and it also saved enormous amounts of money. We aggregated and combined the camera, the music player, your PC, a phone. So many different devices were all integrated into one. And as a result, not only are you able to enjoy your life better, it also saves a lot of money and great convenience. Every single generation provided something new and saved money.
Well, this is how accelerated computing works. This is accelerated computing used for large language models. For large language models, basically the core of generative AI. This example is a $10 million server. And we costed everything. We costed the processor, we costed all the chips, we costed all the network, we costed literally everything. And so $10 million gets you nearly 1000 CPU servers. And to train to process this large language model takes eleven gigawatt hours. Eleven gigawatt hours. Okay, this is what happens when you accelerate this workload with accelerated computing.
And so with $10 million for a $10 million server, you buy 48 GPU servers. It's the reason why people say that GPU servers are so expensive. Remember, people say GPU servers are so expensive. However, the GPU server is no longer the computer. The computer is the data center. Your goal is to build the most cost effective data center, not build the most cost effective server. Back in the old days, when the computer was the server, that would be a reasonable thing to do. But today, the computer is the data center. And so what you want to do is you want to create the most effective data center with the best TCO.
So for $10 million, you buy 48 GPU servers. It only consumes 3.2 gigawatt hours and 44 times the performance. Let me just show it to you one more time. This is before and this is after. And this is we want dense computers, not big ones. We want dense computers, fast computers, not big ones. And so that's ISO budget.
Let me show you something else. Okay, so this is $10 million again, 960 CPU servers. Now, this time, this time, we're going to be ISO power. We're going to keep this number the same. We're going to keep this number the same. Okay? So this number is the same, the same amount of power. This means your data center is power limited. In fact, most data centers today are power limited.
And so with being power limited, using accelerated computing, you can get 150 times more performance with three times more cost. But why is that such a great deal? The reason for that is because it's very expensive and time consuming to find another data center. Almost everybody is power limited today.

Full Transcript

I am a translator, transforming text into creative discovery, turning movement into to animation, and infusing words with emotion.

I am a healer, exploring the building blocks that make us unique,  discovering new threats before they happen and searching for the cure to keep them at bay.

I am a visionary, creating new medical miracles and unlocking the secrets of our sun to keep us safer here on Earth.

I am a navigator, finding a single moment in a sea of content.

We are announcing the next generation and.

A: The perfect setting for our most amazing story. I am a creator, adding new dimensions to creative expression and reimagining our virtual selves. I am a helper personalizing our surroundings, help me arrange the living room, harnessing the wisdom of a million programmers.

B: And.

A: Turning the real world into a virtual playground,

I even helped write the script, breathe life into the words, and compose the melody.

I am AI, brought to life by Nvidia deep learning and brilliant mind everywhere.

Ladies and gentlemen, please welcome Nvidia founder

C: And CEO Jensen Huang.

We're back. Our first live event in almost four years. I haven't given a public speech in four years. Wish me luck. I have a lot to tell you, very little time. So let's get going.

Ray tracing simulating the characteristics of light and materials is the ultimate accelerated computing challenge. Six years ago, we demonstrated for the very first time rendering this scene in less than a few hours. After a decade of research, we were able to render this scene in 15 seconds on our highest-end GPU.

And then we invented Nvidia RTX and combined three fundamental technologies:  hardware, accelerated ray tracing, artificial intelligence processing on Nvidia Tensor core GPUs, and brand new algorithms. Let's take a look at the difference in just five years.

This is running on CUDA GPUs six years ago, rendering this beautiful image that would have otherwise taken a couple of hours on a CPU. So this was a giant breakthrough already , an enormous speed-up running on accelerated computing.

And then we invented the RTX GPU. The holy grail of computer graphics, ray tracing is now possible in real-time. This is the technology we have put into RTX.

And this, after five years, is a very important time for us because for the very first time, we took our third-generation Ada architecture RTX GPUs and brought it to the mainstream with two new products that are now completely in production. I got that backwards. Everything looks different inside out and upside down.

This is our brand new right here. You're looking at an Ada GPU running ray tracing and artificial intelligence at 60 frames a second. It's 14 inch. It weighs almost nothing. It's more powerful than the highest end PlayStation. And this is the RTX 40 60 Ti for our core gamers.

Both of these are now in production. Our partners here in Taiwan are producing both of these products in very, very large productions. And I'm really excited about them. Thank you very much. I can almost put this in my pocket.

AI made it possible for us to do that. Everything that you saw would have been utterly impossible without AI. For every single pixel we render, we use AI to predict seven others. For every pixel we compute, AI predicted seven others. The amount of energy we save, the amount of performance we get is incredible.

Now, of course, I showed you the performance on those two GPUs, but it wouldn't have been possible if not for the supercomputer back at Nvidia running all the time, training the model so that we can enhance applications.

So the future is what I demonstrated to you just now. You can extrapolate almost everything that I'm going to talk about for the rest of the talk into that simple idea that there will be a large computer writing software developing and deploying software that is incredible, that can be deployed in devices all over the world.

We used AI to render this scene. We're going to also use AI to bring it alive. Today we're announcing Nvidia Ace avatar cloud engine that is designed for animating to bringing a digital avatar to life. It has several characteristics, several capabilities. Speech recognition, text-to-speech, natural language understanding, basically a large language model and using the sound that you will be generating with your voice, animate the face and using the sound and the expression that you're saying animate your gestures. All of this is completely trained by AI.

We have a service that includes pre-trained models that you can come, developers can come and modify and enhance for your own application, for your own story. Because every game has a different story. And then you can deploy it in the cloud or deploy it on. Your device has a great back end, has a Tensor RT. Tensor RT is Nvidia's deep learning optimizing compiler. And you could deploy it on Nvidia GPUs as well as output onyx and industry standard back end so that you can run it on any device.

Let's take a look at this scene in just a second. But let me first tell you about it. It is completely rendered with ray tracing. Notice the beautiful lights, so many different lights. And all of the different lights are projecting light from that source. So you have all kinds of direct lights. You have global illumination. You're going to see incredibly beautiful shadows and physics simulation and notice the character, the beautiful rendering of the character. Everything is done in unreal engine five. We partnered with an avatar framework and avatar tool maker called Comai and together we developed this demo you're about to see.

Everything is real time. Hey, Jen. How are you?

Unfortunately, not so good.

How come?

I am worried about the crime around here. It's gotten bad lately. My Ramen shop got caught in the crossfire.

B: Can I help? If you want to do something about this.

G: I have heard rumors that the powerful.

Crime lord Kuman Ayoki is causing all  sorts of chaos in the city.

F: He may be the root of this violence.

B: I'll talk to him. Where can I find him?

I have heard he hangs out in the underground fight clubs on the city's east side.

None of that conversation was scripted. We gave that a this Gin AI character, A, backstory his story about his Ramen shop and the story of this game. And all you have to do is go up and talk to this character. And because this character has been infused with artificial intelligence and large language models, it can interact with you, understand your meaning, and interact with you in a really reasonable way. All of the facial animation completely done by the AI, we have made it possible for all kinds of characters to be generated. They're all domain. They have their own domain knowledge. You can customize it. So everybody's game is different. And look how wonderfully beautiful they are and natural they are. This is the future of video games.

Not only will AI contribute to the rendering and the synthesis of the environment, AI will also animate the characters. AI will be a very big part of the future of video games.

The most important computer of our generation is unquestionably the IBM System 360. This computer revolutionized several things. The first computer in history to introduce the concept of a central processing unit. The CPU virtual memory expandable, I O multitasking, the ability to scale this computer for different applications across different computing ranges. And one of the most important contributions, and one of its greatest insights, is the importance of preserving software investment. The software ran across the entire range of computers, and it ran across multiple generations so that the software you develop. IBM recognized the importance of software, recognized the importance of preserving your investment, and very importantly, recognized the importance of installed base.
This computer revolutionized not only computing. And many of us grew up reading the manuals of this computer to understand how computer architecture worked, to even learn about DMA for the very first time. This computer not only revolutionized computing, it revolutionized the thinking of the computer industry. System 360 and the programming model of System 360 has largely retained, until today, 60 years. In 60 years, a trillion dollars worth of the World's Data Center all basically used a computing model that was innovated all the way 60 years ago until now.
There are two fundamental transitions happening in the computer industry today. All of you are deep within it and you feel it. There are two fundamental trends. The first trend is because CPU scaling has ended. The ability to get ten times more performance every five years has ended. The ability to get ten times more performance every five years at the same cost is the reason why computers are so fast today. The ability to sustain ten times more computing every five years without increase in power is the reason why the World's Data Center hasn't consumed so much more power on Earth. On Earth, that trend has ended, and we need a new computing approach. And accelerated computing is the path forward.
It happened at exactly the time when a new way of doing software was discovered. Deep learning. These two events came together and is driving computing today. Accelerated computing and Generative AI. This way of doing software, this new way of doing computation, is a reinvention from the ground up. And it's not easy. Accelerated computing is a full stack problem. It's not as easy as general purpose computing. The CPU is a miracle. High level programming languages, great compilers, almost anybody could write reasonably good programs because the CPU is so flexible. However, its ability to continue to scale and performance has ended, and we need a new approach. Accelerate computing is full stack. You have to re engineer everything from the top down and from the bottom up, from the chip to the systems, to the system, software, new algorithms, and, of course, optimizing the applications.
The second is that it's a data center scale problem. And the reason why it's a data center scale problem is today the data center is the computer. Unlike the past, when your PC was a computer or the phone was a computer, today your data center is the computer. The application runs across the entire data center, and therefore it's vital that you have to understand how to optimize the chips, the compute, the software across the Nic, the switch all the way to the other end in a distributed computing way.
And the third, accelerated computing is multidomain. It's domain-specific. The algorithms and the software stacks that you create for computational biology and the software stack you create for computational fluid dynamics are fundamentally different. Each one of these domains of science needs their own stack, which is the reason why accelerated computing has taken us nearly three decades to accomplish. This entire stack has taken us nearly three decades. However, the performance is incredible. And I'll show you.
After three decades, we realize now that we're at the tipping point. A new computing model is extremely hard to come by. And the reason for that is this in order for there to be a new computing model, you need developers. But a developer would only come if their end developers have to create applications that end users would buy. And without end users, there would be no customers, no computer companies to build computers. Without computer companies like yourself building computers, there would be no install base. Without install base, there would be no developers. Without developers, there'll be no applications.
This loop has been suffered by so many computing companies in the 40 years that I've been in this industry. This is really one of the first major times in history a new computing model has been developed and created. We now have 4 million developers, 3000 plus applications, 40 million CUDA downloads in history, 25 million just last year. 40 million downloaded in history. 25 million just last year. 15,000 startup companies in the world built on Nvidia today. Building on Nvidia today. And 40,000 large companies, enterprises around the world are using accelerated computing. We have now reached the tipping point of a new computing era.
This new computing model is now enjoyed and embraced by just about every computer company and every cloud company in the world. There's a reason for that. It turns out that every single computing approach, its benefit in the final analysis, is lower cost. The PC revolution that started and that Taiwan enjoyed in 1984, starting in 1984, the year I graduated, that decade in the 80s, was the PC revolution. PC brought computing to a price point nobody's ever seen before. And then, of course, mobile devices was convenient, and it also saved enormous amounts of money. We aggregated and combined the camera, the music player, your PC, a phone. So many different devices were all integrated into one. And as a result, not only are you able to enjoy your life better, it also saves a lot of money and great convenience. Every single generation provided something new and saved money.
Well, this is how accelerated computing works. This is accelerated computing used for large language models. For large language models, basically the core of generative AI. This example is a $10 million server. And we costed everything. We costed the processor, we costed all the chips, we costed all the network, we costed literally everything. And so $10 million gets you nearly 1000 CPU servers. And to train to process this large language model takes eleven gigawatt hours. Eleven gigawatt hours. Okay, this is what happens when you accelerate this workload with accelerated computing.
And so with $10 million for a $10 million server, you buy 48 GPU servers. It's the reason why people say that GPU servers are so expensive. Remember, people say GPU servers are so expensive. However, the GPU server is no longer the computer. The computer is the data center. Your goal is to build the most cost effective data center, not build the most cost effective server. Back in the old days, when the computer was the server, that would be a reasonable thing to do. But today, the computer is the data center. And so what you want to do is you want to create the most effective data center with the best TCO.
So for $10 million, you buy 48 GPU servers. It only consumes 3.2 gigawatt hours and 44 times the performance. Let me just show it to you one more time. This is before and this is after. And this is we want dense computers, not big ones. We want dense computers, fast computers, not big ones. And so that's ISO budget.
Let me show you something else. Okay, so this is $10 million again, 960 CPU servers. Now, this time, this time, we're going to be ISO power. We're going to keep this number the same. We're going to keep this number the same. Okay? So this number is the same, the same amount of power. This means your data center is power limited. In fact, most data centers today are power limited.
And so with being power limited, using accelerated computing, you can get 150 times more performance with three times more cost. But why is that such a great deal? The reason for that is because it's very expensive and time consuming to find another data center. Almost everybody is power limited today.

All right.

Look at this. People love that, right? Nice to see you, Carol. Nice to see you, Spencer. Okay, so let's do that one more time. It's so delightful. Look at this. Okay, look at this. Look at this. Before, after. The more you buy, the more you save. That's right. The more you buy, the more you save. The more you buy, the more you save. That's nvidia you don't have to understand the strategy. You don't have to understand the technology. The more you buy, the more you save. That's the only thing you have to understand. Data center. Data center.

Now, why is it you've heard me talk about this for so many years, in fact, every single time you saw me, I've been talking to you about accelerated computing. I've been talking about accelerated computing. Well, for a long time, well over two decades. And now why is it that finally it's the tipping point? Because the data center equation is very complicated. This equation is very complicated. This is the cost of building a data center. The data center TCO is a function of and this is the part where everybody mess up. It's a function of the chips, of course. No question. It's a function of the systems. Of course, no question. But it's also because there's so many different use cases. It's a function of the diversity of systems that can be created. It is the reason why Taiwan is at the bedrock, at the foundation of the computer industry. Without Taiwan, why would there be so many different configurations of computers. Big, small, powerful, cheap, enterprise, hyperscale, supercomputing. So many different types of configurations. One u, two u for you. Right. And all completely compatible. The ability for the hardware ecosystem of Taiwan to have created so many different versions that are software compatible. Incredible.
The throughput of the computer, of course, is very important. It depends on the chip. But it also depends, as you know, the algorithm. Because without the algorithm, libraries, accelerated computing does nothing. It just sits there. And so you need to algorithm software libraries. It's a data center scale problem. So networking matters. If networking matters, distributed computing is all about software. Again, system software matters. And before long, in order for you to present your system to your customers, you have to ultimately have a lot of applications that run on top of it. The software ecosystem matter.
Well, the utilization of a data center is one of the most important criteria of its TCO. Just like a hotel, if the hotel is wonderful, but it's mostly empty, the cost is incredible. And so you need the utilization to be high. In order for the utilization to be high, you have to have many different applications. So the richness of the applications matter. Again, the algorithm and libraries, and now the software ecosystem. You purchase a computer, but these computers are incredibly hard to deploy. From the moment that you buy the computer to the time that you put that computer to work to start making money, that difference can be weeks. If you're very good at it, incredibly good at it. We can stand up a supercomputer in a matter of a couple of weeks because we build so many all around the world, hundreds around the world. But if you're not very good at it, it could take a year. That difference depriving yourself. The year of making money and the year of depreciation. Incredible cost.
Lifecycle optimization. Because the data center is software defined, there are so many engineers that will continue to refine and continue to optimize the software stack, because Nvidia's software stack is architecturally compatible across all of our generations, across all of our GPUs. Every time we optimize something, it benefits everybody. Every time we optimize something, it benefits everybody.
So, lifecycle optimization and of course, finally the energy that you use, power. But this equation is incredibly complicated. Well, because we have now addressed so many different domains of science, so many industries in data processing, in deep learning, classical machine learning, so many different ways for us to deploy software, from the cloud, to enterprise, to supercomputing to the edge. So many different configurations of GPUs, from our HGX versions, to our Omniverse versions, to our cloud, GPU and graphics versions. So many different versions. Now the utilization is incredibly high. The utilization of Nvidia GPU is so high, almost every single cloud is overextended. Almost every single data center is overextended. There are so many different applications using it. So we have now reached the tipping point of accelerated computing. We have now reached the tipping point of generative AI. And I want to thank all of you for your support and all of your assistance and partnership in making this dream happen. Thank you.

Every single time we announced a new product, the demand for every single generation increased and increased and increased. And then one generation it hockey steps. We stick with it. We stick with it. We stick with it. Kepler and then Volta, and then Pascal, and then Volta, and then Ampere, and now this generation of accelerated computing. The demand is literally from every corner of the world. And we are so excited to be in full volume production of the H 100. I want to thank all of you for your support. This is incredible. H 100 is in full production, manufactured by companies all over Taiwan, used in clouds everywhere, enterprises everywhere.

And let's, let's take a look at a short video of how H 100 is produced. It's incredible, this computer. 35,000 components on that system board. Eight Harper GPUs. Let me show it to you.

All right. This I would, I would lift this, but I still have the rest of the keynote I would like to give. This is 60 pounds. 65 pounds. It takes robots to lift it, of course, and it takes robots to insert it because the insertion pressure is so high and has to be so perfect. This computer is $200,000. And as you know, it replaces an entire room of other computers. So this I know. It's a very, very expensive computer. It's the world's single most expensive computer that you can say, the more you buy, the more you save. This is what a compute tray looks like. Even this is incredibly heavy. See that?

So this is the brand new H 100 with the world's first computer that has a transformer engine in it. The performance is utterly incredible. Hopper is in full production. We've been driving computing, this new form of computing, for twelve years. When we first met the deep learning researchers, we were fortunate to realize that not only was deep learning going to be a fantastic algorithm for many applications, initially computer vision and speech, but it would also be a whole new way of doing software this fundamental new way of doing software that can use data to develop to train a universal function approximator of incredible dimensionality. It can basically predict almost anything that you have data for, so long as the data has structure that it can learn from.

C:Hi, Computex. I'm here to tell you about how wonderful stinky tofu is. You can enjoy it right here in Taiwan. It's best from the night market.

B: The only input was words. The output was that video. Okay, here's another prompt. Taiwanese. We tell this AI. Okay, we tell this AI. This is a Google text to music. Traditional Taiwanese music. Peaceful like it's warm and raining in a lush forest at daybreak. Please. We send text in.  AI says okay, this music.

Okay, here this one. I am here at Computext. I will make you like me best. Sing. Sing it with me. I really like nvidia okay, so this is the word. These are the words. These are the words and I hey, voice mod.

Could you write me a song? These are the words. Okay, play it.

I am here. I come to you with text.

I will make you like me first. Yeah, with me I really like and am busy.

Generative AI is the most important computing platform of our generation. Everyone from first movers to Fortune 500 companies are creating new applications to capitalize on Generative AI's ability to automate and cocreate.

For Creatives, there's a brand new set of tools that would be simply impossible a few years ago. Adobe is integrating Firefly into their Creative apps. Ethically trained and artist friendly, you can now create images with a simple text prompt or expand the image of your real photos to what lies beyond the lens.

Productivity apps will never be the same. Microsoft has created a copilot for Office apps. Every profession is about to change. Tab nine is democratizing programming by tapping into the knowledge base of a million developers to accelerate application development and reduce debugging time.

If you're an architect or just thinking of remodeling your home, planner 5D can instantly turn a 2D floor plan to a 3D model. And right here in Taiwan, Anhorn medicines is targeting difficult to treat diseases and accelerating cures.

The future of wireless and video communications will be 3D generated by AI. Let's take a look at how Nvidia Maxine 3D running on the Nvidia Gracehopper superchip can enable 3D video conferencing on any device without specialized software or hardware.

Starting with a standard 2D camera sensor that's in most cell phones, laptops and webcams, and tapping into the processing power of Gracehopper, maxine 3D converts these 2D videos to 3D using cloud services. This brings a new dimension to video conferencing with Maxine 3D visualization, creating an enhanced sense of depth and presence, you can dynamically adjust the camera to see every angle in motion, engage with others more directly with enhanced eye contact, and personalize your experience with animated avatars, stylizing them with simple text prompts.

With Maxine's language capabilities, your avatar can speak in other languages, even ones you don't know.

B: Nvidia's AI life force is Chinta, our AI assistant. Chinta is now helping Nvidia be more productive.

F: AI will fundamentally transform computing. NVIDIA invented the GPU, which is the engine of the AI revolution.

Here is the edited transcript in HTML:

Maxine 3d together with Grace Hopper bring Immersive 3d video conferencing to anyone with a mobile device. Revolutionizing the way we connect, communicate and collaborate.

Wait, Smith, wait. Junk. Why? Uncle Dan, we okay, so all of the words all of the words coming out of my mouth, of course, was generated by AI. So instead of compression, stream and decompression, in the future, communications will be perceive stream and reconstruction regeneration. And it can be generated in all kinds of different ways. It could be generated in course, it can regenerate your language in another language. So we now have a universal translator. This computing technology could be, of course, placed into every single cloud. But the thing that's really amazing grace Hopper is so fast, it can even run the 5G stack. A state of the art 5G stack could just run in software in Grace Hopper. Completely free. Completely free. All of a sudden. A 5G radio runs in software just like a video codec used to run in software. Now you can run a 5G stack in software. Of course, the layer one phi layer, the layer two Mac layer, and the 5G core. All of that computation is quite intensive and it has to be timing precise, which is the reason why we have a Blue field three in the computer. So that kind of precision timing networking. But the entire stack can now run in a gracehopper. Basically.

What's happening here, this computer that you're seeing here allows us to bring generative AI into every single data center in the world today. Because we have software defined 5G then the telecommunication network can also become a computing platform like the cloud data centers. Every single data center in the future could be intelligent. Every data center could be software defined. Whether it's internet based, networking based, or 5G communications based, everything will be software defined. This is really a great opportunity. And we're announcing a partnership with SoftBank to partner, to rearchitect and implement generative AI and software defined 5G stack into the network of SoftBank data centers around the world. Really excited about this collaboration.

I just talked about how we are going to extend the frontier of AI. I talked about how we're going to scale out generative AI to scale out generative AI to advance generative AI. But the number of computers in the world is really quite magnificent. Data centers all over the world and all of them over the next decade will be recycled and reengineered into accelerated data centers and generative AI capable data centers. But there are so many different applications in so many different areas. Scientific computing, data processing, large language model training that we've been talking about generative AI, inference that we just talked about cloud and video and graphics, EDAs SDA, which, as we just mentioned, generative AI for enterprise and of course, the Edge. Each one of these applications have different configurations of servers, different focus of applications, different deployment methods. And so security is different, operating system is different, how it's managed is different. Where the computers are will be different. And so each one of these diverse application spaces will have to be reengineered with a new type of computer. Well, this is just an enormous number of configurations. And so today we're announcing, in partnership with so many companies here in Taiwan, the Nvidia MGX. It's an open modular server design specification and it's designed for accelerated computing. Most of the servers today are designed for general purpose computing. The mechanical, thermal and electrical is insufficient for a very highly dense computing system. Accelerated computers take, as you know, many servers and compress it into one. You save a lot of money, you save a lot of floor space. But the architecture is different. And we designed it so that it's multi generation standardized. So that once you make an investment, our next generation GPUs and next generation CPUs, and next generation DPUs will continue to easily configure into it so that we can have best time to market and best preservation of our investment. We could configurable into hundreds of configurations for different diversities and different diverse applications and integrate into cloud or enterprise data center.

So you could have either bus bar or power regulators. You could have cabling in the hot aisle or cabling in the cold aisle. Different data centers have different requirements. And we've made this modular and flexible so that it could address all of these different domains.

Now, this is the basic chassis. Let's take a look at some of the other things you could do with it. This is the Omniverse OVX server. It has X 86, four L 40s, Bluefield three, two CX, seven, six PCI Express slots. This is the Grace Omniverse server. Grace. Same. Four L, bluefield, three and two CX, seven S. Okay, this is the Grace Cloud Graphics server. This is the Hopper NV link generative AI inference server. And we need sound effects like that. And then grace Hopper 5g aerial server for telecommunications software defined telco. Grace Hopper 5g aerial server short. And of course, Grace Hopper liquid cooled for very dense servers. And then this one is our dense, general purpose Grace super chip server. This is just CPU, and it has the ability to accommodate four CPU, four gray CPUs, or two gray super chips. Enormous amounts of performance. And if your data center is power limited, this CPU has incredible capabilities in a power limited environment. Running page rank, and there's all kinds of benchmarks you can run. But we ran page rank in ISO performance. In ISO performance, Grace only consumes 580 watts for the whole server versus the latest generation CPU. Servers, x 86 servers, 1090 watts. It's basically half the power at the same performance, or another way of saying at the same power. If your data center is power constrained, you get twice the performance. Most data centers today are power limited. And so this is really a terrific capability.

There are all kinds of different servers that are being made here in Taiwan. Let me show you one of them. Get my exercise in today. I am the sound effect. Okay, you got Boo flu three. You got the CX seven, you got the Grace Hopper. There's so many systems. Let me show you some of them. All of our partners. I'm so grateful you're working on Grace. Grace Hopper, hopper, L, Bluefield, three S. Just about every single processor that we're building are configured into these servers of all different types. And so this is Super Micro. This is Gigabyte. They have tens. I think it's like 70 different server configurations. This is ingresses. This is ASRock tie in, wistron Inventech. Just beautiful servers. Pegatron we love servers. I love servers. They're beautiful. They're beautiful to me. QCT asus we win ZT systems. And this ZT system, what you're looking at here is one of the pods of our Grace Hopper AI supercomputer. So I want to thank all of you. I want to thank all of you for your great support. Thank you. We're going to expand AI into a new territory.

If you look at the world's data centers, the data center is now the computer and the network defines what that data center does largely. There are two types of data centers today. There's the data center that's used for hyperscale, where you have application workloads of all different kinds. The number of CPUs, the number of GPUs you connect to, it is relatively low. The number of tenants is very high. The workload is heterogeneous, the workloads are loosely coupled. And you have another type of data center. They're like supercomputing data centers, AI supercomputers, where the workloads are tightly coupled, the number of tenants far fewer, and sometimes just one. Its purpose is high throughput on very large computing problems. Okay? And it's basically a standalone. It's basically a standalone.

And so supercomputing centers and AI supercomputers and the World's cloud, hyperscale cloud are very different in nature. Ethernet is based on TCP. It's a lossy algorithm and it's very resilient. And whenever there's a loss packet loss, it retransmits, there's error correction. That's done, it knows which one of the packets are lost and requests the sender to retransmit it. The ability for Ethernet to interconnect components of almost from anywhere is the reason why the world's Internet was created. If it required too much coordination, how could we have built today's Internet? So Ethernet's profound contribution, it's this lossy capability, it's resilient capability, and because so it basically can connect almost anything together. However, a supercomputing data center can't afford that. You can't interconnect random things together because that billion dollar supercomputer, the difference between 95% networking throughput achieved versus 50% is effectively $500 million. So the cost of that one workload, running across that entire supercomputer is so expensive that you can't afford to lose anything in the network. InfiniBand relies on RDMA very heavily

That is so cool. This is in California, 6264 miles away or something like that. 34 milliseconds by speed of light. One way, and it's completely interactive.  Everything is ray traced. No art is necessary. You bring everything, the entire CAD, into Omniverse. Open up a browser, bring your data in, bring your factory in.  No art is necessary. The lighting just does what the lighting does. Physics does what the physics does. If you want to turn off physics, you can.  If you want to turn on physics, you can. And multiple users, as many as you like, can enter the Omniverse at the same time and work together one unified source of data across your entire company.

You could virtually build you could virtually design and build and operate your factory before you break ground and not make the mistake, which usually in the beginning of the integration, creates a lot of change orders, which costs a lot of money. Thank you very much, Sean. Good job.

Not only notice just now, it was humans interacting with Omniverse. Humans interacting with omniverse. In the future, Sean will even have a generative AI and AI interact with him in Omniverse. We could, of course, imagine in the very beginning, there was Jin that could be a character, that could be one of the users of Omniverse interacting with you, answering questions, helping you.

We can also use generative AI to help us create virtual worlds. So, for example, this is a bottle that's rendered in an Omniverse. It could be placed in a whole bunch of different type of environments.  It could render beautifully physically. You could place it just by giving it a prompt, by saying, I would like to put these bottles in a lifestyle photograph style backdrop for a modern, warm farmhouse bathroom. Change the background. Everything is all integrated and rendered again.

Okay, so generative AI will come together with Omniverse to assist the creation of virtual worlds.

Today, we're announcing that WPP, the world's largest advertising agency and advertising services company, is partnering with Nvidia to build a content generation engine based on Omniverse and generative AI. It integrates tools from so many different other partners. Adobe Firefly, for example, getty Shutterstock, and it integrates into this entire environment, and it makes it possible for them to generate unique content for different users, for ad applications, for example.

So in the future, whenever you engage a particular ad, it could be generated just for you, but yet the product is precisely rendered because, of course, the product integrity is very important. And so every time that you engage a particular ad in the future, today it was retrieved. And today, the computing model, when you engage information, it is retrieved. In the future, when you engage information, much of it will be generated. Notice the computing model has changed.

WPP generates 25% of the ads that the world sees. 60% of the world's largest companies are already clients, and so they made a video of how they would use this technology.

The world's industries are racing to realize the benefits of AI. Nvidia and WPP are building a groundbreaking generative AI enabled content engine to enable the next evolution of the 700 billion dollar digital advertising industry.

Built on Nvidia, AI, and Omniverse, this engine gives brands the ability to build and deploy highly personalized and compelling visual content faster and more efficiently than ever before. The process starts by building a physically accurate digital twin of a product using Omniverse Cloud, which connects product design data from industry standard tools. Then WPP artists create customized and diverse virtual sets using a combination of digitized environments and generative AI tools by organizations such as Getty Images and Adobe, trained on fully licensed data using Nvidia Picasso.

This unique combination of technologies allows WPP to build accurate photorealistic visual content and ecommerce experiences that bring new levels of realism and scale to the industry.

The $45 trillion global manufacturing industry is comprised of 10 million factories operating 24/7. Enterprises are racing to become software defined to ensure they can produce high quality products as quickly and cost efficiently as possible.

Let's see how electronics manufacturer Pegatron uses Nvidia, AI and Omniverse to digitalize their factories. In Omniverse, they start by building a digital twin of their factory, unifying disparate three D and CAD data sets to provide a real time view of their complex factory data to their planners and suppliers. In the cloud native digital twin planners can then optimize layout virtually before deploying changes to the real factory.

The digital twin is also used as a training ground and data factory for Pegatron's perception AIS. They use Nvidia Isaac SIM, built on Omniverse to simulate and optimize their fleet of mobile robots, which help move materials throughout the facility, as well as the pick and place robotic arms that assist on production lines.

In the fully operational factory, pegatron deploys automated optical inspection, or AOI points along their production lines, which reduces cost and increases line throughput. Nvidia Metropolis enables Pegatron to quickly develop and deploy cloud native, highly accurate AOI workflows across their production lines. Omnivirous replicator generates synthetic datasets of PCBA defects, which are too complex and costly to capture in the real world, like scratches and missing or misaligned components.

Pegatron then combines the synthetic data with Nvidia pretrained models, nvidia Tau for training, adaptation, and optimization, and Nvidia Deepstream for real time inference, resulting in AOI performance that is 99.8% accurate with a four times improvement in throughput.

With software defined factories built on Nvidia, AI and Omniverse, manufacturers can super accelerate factory bringup and minimize change orders, continuously optimize operations, maximize production line throughput, all while reducing costs.

To improve productivity and increase worker safety, factories and warehouses are migrating away from manual forklifts and guided vehicles to full autonomy. Nvidia Isaac AMR provides an integrated EndToEnd solution to deploy fully autonomous mobile robots. The core of the solution is Nova, Orin, a sensor suite and computing hardware that enables mapping, autonomy and simulation. Nova's collection of advanced sensors speeds the mapping process, leveraging our cloudbased service to generate an accurate and detailed 3D voxel map. This 3D map can then be sliced across a plane to generate 2D maps tailored for different autonomous robots that might operate in a facility. With these maps in place on robot LiDAR or cost effective cameras provide autonomous navigation that works reliably in the most complex and dynamic environments. Isaac mission Control optimizes route planning using the coopt library to improve operations. Developers can use Isaac SIM and Nvidia Omniverse to create realistic digital twins of the operating environment. This allows fully autonomous robots to be trained on complex tasks entirely in simulation, all operations can be fully validated using Isaac SIM. Before deployment to the real world, isaac AMR accelerates your migration to full autonomy, reducing costs and speeding deployment of the next generation of AMRs.

Nova cannot tell that it is not in the reality environment. Nova thinks it is in the real environment. It cannot tell. And the reason for that is because all the sensors work, physics work. It can navigate, it can localize itself. Everything is physically based. So therefore we could design we could design therefore we can design the robot, simulate the robot, train the robot, all in. Isaac and then we take the brain. Isaac SIM then we take the brain, the software, and we put it into the actual robot. And with some amount of adaptation, it should be able to perform the same job. This is the future of robotics, Omniverse and AI working together.

The ecosystem that we have been in, the It ecosystem is a quarter of a trillion dollars per year, $250,000,000,000 a year. This is the It industry. For the very first time in our history together, we finally have the ability to understand the language of the physical world. We can understand the language of heavy industry. And we have a software tool. We have a software system called Omniverse that allows us to simulate, to develop, to build and operate our physical plants, our physical robots, our physical assets as if they were digitally. The excitement in the hard industries, the heavy industries, has been incredible. We have been connecting Omniverse all over the world with tools companies, robotics companies, sensor companies, all kinds of industries.

There are three industries right now as we speak that is putting enormous investments into the world. Number one, of course, it's chip industry. Number two, electric battery industry. Number three, electric vehicle industry. Trillions of dollars will be invested in the next several years. Trillions of dollars will be invested in the next several years. And they would all like to do it better, and they would like to do it in a modern way for the very first time. We now give them a system, a platform, tools that allows them to do that.

I want to thank all of you for coming today. I talked about many things. It's been a long time since I've seen you, so I had so much to tell you. It was too much. It was too much. Last night I said this is too much. This morning I said this is too much. And now I realize it's too much.

Shawn Seda I told you several things. I told you that we are going through two simultaneous computing industry transition, accelerated computing and generative AI. Two, this form of computing is not like the traditional general purpose computing. It is full stack. It is data center scale because the data center is the computer and it is domain specific. For every domain that you want to go into, every industry you go into, you need to have the software stack. And if you have the software stack, then the utility, the utilization of your machine, the utilization of your computer will be high.

So, number two, it is full stack, data scanner, scale and domain specific. We are in full production of the engine of generative AI, and that is HGX, H 100. Meanwhile, this engine that's going to be used for AI factories will be scaled out using Grace Hopper, the engine that we created for the era of generative AI.

We also took Grace Hopper and realized that we can extend, on the one hand, the performance, but we also have to extend the fabric so that we can make larger models trainable. And we took Grace Hopper, connected to 256 node Mvlink, and created the largest GPU in the world, DGX G H 200. We're trying to extend generative AI and accelerated computing in several different directions at the same time.

Number one, we would like to, of course, extend it in the cloud, so that every cloud data center can be an AI data center. Not just AI factories and hyperscale, but every hyperscale data center can now be a generative AI data center. And the way we do that is the Spectrum X. It takes four components to make Spectrum X possible. The switch, the bluefield three Nic, the interconnects themselves. The cables are so important in high speed communications. And the software stack that goes on top of it.

We would like to extend generative AI to the world's enterprise. And there are so many different configurations of servers. And the way we're doing that with partnership with our Taiwanese ecosystem, the MGX Modular Accelerated Computing Systems, we put Nvidia in the cloud so that every enterprise in the world can engage us to create generative AI models and deploy it in a secure way in an enterprise grade enterprise secure way in every single cloud.

And lastly, we would like to extend AI to the world's heavy industries, the largest industries in the world so far. Our industry, our industry that all of us been part a small part of the world's total industry. For the very first time, the work that we're doing can engage every single industry. And we do that by automating factories, automating robots. And today, we even announced our first robotics full reference stack, the Nova Oren.

I want to thank all of you for your partnership over the year. Thank you.

Thank you from Nvidia.