No More Hustleporn: this clip from the gpt-4 demo was massively slept on
Tweet by Alex
https://twitter.com/alexalbert__
@alexalbert__:
this clip from the gpt-4 demo was massively slept on
gpt-4 can "see" your screen and describe the user interface of the application you are looking at
here's
@gdb
getting gpt-4 to describe a screenshot of a discord server in painstaking detail
@alexalbert__:
my take is that text generation will not be the main value prop of LLMs very soon
instead, it will be their ability to operate the tools we already use
in this example, gpt-4 proves that it has a near-human level of understanding of discord's UI through just one screenshot
@alexalbert__:
i've said it before and it's been hinted at by OAI employees
to everyone on AI twitter this may seem obvious but it's worth reiterating: chatGPT is NOT the final product here... it will look like a toy soon enough
twitter.com/alexalbert__/s…
@alexalbert__:
prediction:
the GPT-4 iphone is going to be an app that uses the model's multimodal abilities to control your computer for you in a self-driving fashion
the discord screenshot example in the gpt4 demo was just too obvious that it's within its current capabilities
twitter.com/rapha_gl/status/1636041957029060608
@alexalbert__:
OpenAI still has a few hurdles to solve like speed, cost, and reliability
but once these issues are ironed out, expect to see Microsoft's Edge transform into a full copilot-like system with Bing Chat being the portal that you guide it through
@alexalbert__:
some are already starting to get this to work
in this example, GPT-4 breaks down complex browser-based tasks into actionable steps and navigates the web by evaluating custom browser-driving code that it itself generates
youtube.com/watch?v=Gndk9P…
@alexalbert__:
the best part about that example is that it doesn’t even use gpt-4's visual abilities
I can’t wait to see how much more powerful it will be when that is incorporated