No More Hustleporn: Zoubin Ghahramani on starting out in neuroscience and the role of luck in a career

We pulled out the highlights from Zoubin Ghahramani's recent interview for Google's Research Retrospectives. Transcription and light editing by Anthropic's Claude, curation by Yiren Lu :-)

Highlights

Rosanne Liu: What has been your most fun project to work on?
Zoubin Ghahramani: It's hard to say because I've worked on so many things over so many years, but probably my undergraduate thesis, which was a long time ago. So this is, like, around 1989. Submitted in 1990. It was one of the first projects I did in machine learning. It was actually on neural networks. It was on neural networks for parsing natural language. Back then, there wasn't a lot of code for this stuff, so you had to sort of write all the code yourself. Compute was pretty lacking, let's say, compared to now. But where I was an undergrad at UPenn, we had this thing called a Connection Machine, which, you should look it up. It looks like something from science fiction. It's this big black cube with 65,000 processors inside it. It was sort of like a very highly parallel supercomputer of the late 1980s. And so I got to sort of program the Connection Machine myself in a sort of special purpose language for parallel computing. We had a tiny data set which was like, English sentences to parse trees for those English sentences. I was trying to train a recurrent neural network to map from the English sentence to representation of the parse tree. So I had to think about things like how do you represent a parse tree in a neural net? Okay? And remember, this is like 19 88, 80. It wasn't obvious, but Lisp was very popular back then. So you could represent a tree with a lot of brackets, basically bracket notation for a tree. So it would be input string in and output string out with a recurrent neural network and the brackets would represent the parse tree. I thought it didn't work very well. Like, our data set was tiny.
And so my advisor, my undergraduate advisor kept telling me I should try to publish it. I said, no, it doesn't really work. So I never published it. But it was really fun. I learned about lots of things, about natural language, about parallel computing, about neural networks. And I got excited about neural networks. And it's kind of boring because here I am at Google Brain, whatever it is 30 years later, and it's boring and exciting in the sense that I sort of knew I was interested in doing this in work in AI and machine learning. It wasn't called machine learning back then, and I guess here I am still doing it.

Zoubin Ghahramani: That's right. I mean, obviously, the earlier the better. But it's really about mindset, right? If you have that growth mindset and you're willing to learn new things and reinvent yourself, then it's really never too late.
It doesn't work for everybody. The other thing I want to say is I actually think luck—I hate to say this in some ways, but I do think it's important to say—I do think that luck plays a huge role in people's careers. I had many choice points, like, we all have many choice points, and whether you end up in one place or another, it totally defines what ends up happening. If you end up with a good advisor, bad advisor, these strokes of luck are a huge factor.

Katherine Lee: You've also had a lot of students and a lot of students who have also ended up with successful careers. Do you have any advice for people who are mentors of other researchers? How would you suggest they go about cultivating a research group or cultivating a bunch of independent researchers?
Zoubin Ghahramani: Yeah, so a couple of things. I think being a mentor is a really important responsibility. And probably my top piece of advice is to listen, in the sense that if you're going to be a good mentor, you have to listen to the person, to your mentee, and figure out what drives them, what makes them excited, what they're passionate about, what are the research questions they're interested in. And so a lot of it is about listening and then guiding people towards the things that will make them thrive, basically. But being a research mentor is also different from being a manager in a company.
I think the manager has other responsibilities as well and one could imagine decoupling and I think we do to some extent decouple mentorship from management and we might want to think of doing that even more in terms of the manager responsibilities versus the scientific mentor responsibilities. Somebody can be a great scientific mentor, but not a very good manager, and vice versa.

Katherine Lee: You've described yourself as a "perpetual immigrant," so you've moved around a lot. One really interesting time of your career was when you had your main faculty appointment in the UK at the Gatsby Unit at UCL and Cambridge. But you also had a part-time job at CMU, so you spent two months a year at CMU in the States. What did you observe about the differences in machine learning communities in different countries?
Zoubin Ghahramani: The machine learning communities emphasized different things in different countries. Research is a social endeavor, so we're shaped by the people around us. If you're a Bayesian, for example, and you move to the UK, you realize so many machine learning people there are Bayesians. It's like a community. At CMU, there were a lot of people working on much more applied problems. Robotics was a big deal, which it wasn't in other places. Graphical models were a big deal in one place. Deep learning evolved in different pockets. The whole neural net revolution was happening around San Diego in the 1980s.
This mixing of research communities and the diversity of ideas pushed around is super interesting. That's why conferences are good - they bring all these communities together.

Full Transcript

Katherine Lee: Research retrospectives is a series of conversations with researchers asking them to reflect on their careers and their path through research.

So our guest today is Zoubin. He is a distinguished scientist and senior research director for Google Brain. And before he was here, he was at Uber.

Zubin has focused on probabilistic approaches for machine learning and AI. Welcome, Zoubin. We're so glad to have you.

Zoubin Ghahramani:  Thanks, Catherine.

Rosanne Liu: So, just to start things off. I want to do a little bit of a memory exercise with you. Please know that it's not designed to make you look bad. Actually, either way, you'll make you look good, because if you don't pass the exercise, that just means that you have so much great work that it is overflowing your memory. But if you do remember, that just means you're great at memorizing things. So do you remember your PhD dissertation title? I know that's 30 years ago or something.

Zoubin Ghahramani: Yeah, I think the title was "Computation and Psychophysics of Sensory Motor Control."

Rosanne Liu: Wow, that's 90% correct. Sensor Motor Integration.

Zoubin Ghahramani: Close enough.

Rosanne Liu: Do you happen to know your most cited paper?

Zoubin Ghahramani: My most cited paper is probably the paper on Graph Based Semi Supervised Learning, and I don't know if that's exactly the title. We wrote a few papers on that topic.

Rosanne Liu: It is semi-supervised learning using Gaussian fields and harmonic functions. Do  you happen to remember your most cited deep learning era paper? That is, like, post 2012?

Zoubin Ghahramani: Yes, it's on dropout as an approximate Bayesian inference. The title might be close enough to "Dropout is an Approximate Bayesian Inference".

Rosanne Liu: Yeah.

Katherine Lee: Awesome.

Rosanne Liu: It's drop out as a Bayesian approximation. What has been your most fun project to work on?
Zoubin Ghahramani: It's hard to say because I've worked on so many things over so many years, but probably my undergraduate thesis, which was a long time ago. So this is, like, around 1989. Submitted in 1990. It was one of the first projects I did in machine learning. It was actually on neural networks. It was on neural networks for parsing natural language. Back then, there wasn't a lot of code for this stuff, so you had to sort of write all the code yourself. Compute was pretty lacking, let's say, compared to now. But where I was an undergrad at UPenn, we had this thing called a Connection Machine, which, you should look it up. It looks like something from science fiction. It's this big black cube with 65,000 processors inside it. It was sort of like a very highly parallel supercomputer of the late 1980s. And so I got to sort of program the Connection Machine myself in a sort of special purpose language for parallel computing. We had a tiny data set which was like, English sentences to parse trees for those English sentences. I was trying to train a recurrent neural network to map from the English sentence to representation of the parse tree. So I had to think about things like how do you represent a parse tree in a neural net? Okay? And remember, this is like 19 88, 80. It wasn't obvious, but Lisp was very popular back then. So you could represent a tree with a lot of brackets, basically bracket notation for a tree. So it would be input string in and output string out with a recurrent neural network and the brackets would represent the parse tree. I thought it didn't work very well. Like, our data set was tiny.
And so my advisor, my undergraduate advisor kept telling me I should try to publish it. I said, no, it doesn't really work. So I never published it. But it was really fun. I learned about lots of things, about natural language, about parallel computing, about neural networks. And I got excited about neural networks. And it's kind of boring because here I am at Google Brain, whatever it is 30 years later, and it's boring and exciting in the sense that I sort of knew I was interested in doing this in work in AI and machine learning. It wasn't called machine learning back then, and I guess here I am still doing it.
Rosanne Liu: How did you get exposed to neural nets back then? Where did you go?
Zoubin Ghahramani: There is a great story about that. I was an undergrad and I was on a visa. I'm British now, but I was originally Iranian. I couldn't really travel back and forth. I wanted a summer job and I wanted to work in AI because I'd read a book when I was 14 that really influenced me. My freshman year of undergrad, I went to the head of department, Arvind Joshi, at UPenn, and said, I want an internship or a job or something. I can't go back home. I need a job to survive the summer and I want to do research. It was crazy that I would even bother him for something like this. But he was generous and said, sure, I can give you a summer job. Here's your job: Read these two books that have just come out and explain them to me.
The books were called Parallel Distributed Processing. Terry Sejnowski, Jay McClelland, Jeff Hinton, and others wrote them. Jeff Hinton wrote chapters on backpropagation. Mike Jordan wrote an appendix on linear algebra for psychologists. The book brought neural network ideas to cognitive science and psychology. I did that work and thought it was amazing and interesting. So I continued working over summers and during school on neural networks and related topics.

Katherine Lee: Have you asked any of your students to do the same thing? Please read these two books and explain them to me.

Zoubin Ghahramani: Well, I haven't done it explicitly, but implicitly, that's how I learn about papers. I often find that I don't have a lot of time to read papers, and so usually it'll be somebody explaining a paper to me and then I'll go skim read it afterwards, maybe to make sure I've actually understood it or not. It's a good method. It was great for me. Like, teaching something is always a good way of learning about something.

I couldn't figure out how neurons could take derivatives. I thought Backprop meant that neurons had to take derivatives, and it took me a long time to figure that out. Like, simple things like that just didn't make any sense to me. And it took me a long time. The longer I've been a researcher, the more I've realized how hard things are.

So when you're an undergrad, you think your advisor says, go build a neural network that will do analogical reasoning, and you're like, okay, sure, which is great. And that's exactly we want, that sort of enthusiasm for research. As you see more things, you think, OOH, that's really hard, and that shouldn't let us that shouldn't inhibit us, right?

So one thing that happened when I was an undergrad was Jeff Hinton came to visit and give a talk. He was already pretty famous because he'd been very influential with Backprop, which was already being recognized as a really important idea, as well as Boltzman machines, another really important idea, and a number of other things. So he came to give a talk, and my advisor set up a one on one meeting between me. Remember, I'm like an undergrad doing an undergrad thesis and this famous professor, young famous professor who was visiting.

And so I explained my undergrad thesis to Jeff, and he gave me a bunch of advice. And then after, afterwards, I basically followed up with him, and I met him again at a summer school, and then I ended up being Jeff's postdoc after my PhD. But between my undergrad and my PhD sorry, my postdoc, I ended up working with Mike Jordan.

And the reason I worked with Mike Jordan was because I worked on recurrent neural networks in my undergrad thesis. And what Mike Jordan was well known for at that time was Jordan networks, which are a particular kind of recurrent neural network. It's kind of embarrassing to him. I don't think he wants to be thought of as being famous for having invented Jordan networks. He's done a lot more exciting things since then. But back then, he had invented jordan networks. And I'm like, oh, well, I know who this person is, and I want to work on neural networks.

People told me I was crazy. Mike was at MIT. People told me I was crazy to go to MIT to work on neural networks, because MIT was famous for being anti-neural networks, because perceptrons had been published, and Marvin Minsky was still there and very influential. And, you know, Chomsky was there as well, and he was like, you know, not against neural networks, but also a big influence and so on.

But I went there nonetheless. I ended up being Mike's first PhD student to graduate. So I got to know the lab very early on. We had an amazing lab at MIT. We had lots of interesting people come through. Yasha Bengio was a postdoc there at the same time. And I don't know. The list goes on and on. Josh Tenenbaum was a student who was a couple of years younger than me. Lots of good people came through that lab. And, yeah, Mike and Jeff really influenced the way I think about research.

Katherine Lee: You've had a lot of transition points through your career. I know you started out in academia, and I think you had said that you wanted to be a professor and stay in academia early on, but obviously you're here with us today. So could you talk a little bit about the decision points in your career and why you made those decisions?

Zoubin Ghahramani: Let me talk about some other decision points, and then let me talk about the industry and academia decision point. So one of the big decision points in my career was whether I wanted to be a neuroscientist or an AI or computer science researcher. My thesis was in computational neuroscience, and I spent a lot of time reading neuroscience textbooks, trying to write papers in the Journal of Neuroscience or whatever. It wasn't unsuccessful, actually, in some ways, I lucked out, or I was doing the right things with the right people, and I ended up with a Science and Nature paper in neuroscience by the time I was a postdoc. I realized I didn't really enjoy it. Trying to understand the brain is super hard, and you're super constrained by the biology.

I was studying sensory motor control, and one of the things that I worked on during my PhD was a model of how we integrate information from different senses to control our bodies for movement and things like that. The model made really good predictions at a psychophysical level, but then people would ask me, well, how is this implemented in the brain? Which part of the brain is it in? I went to a conference, and they put me in a session on the basal ganglia. I'm like, this is a purely computational paper. I have no idea whether this has anything to do with the basal ganglia.

It was a very interesting experience, and I felt like I wasn't actually very good at it. I wasn't good as a neuroscientist. And my passion was really to work on AI machine learning, or sort of the engineering side rather than the science side. And it took me years to figure that out, like, to introspect that.

I always thought, I'm just going to end up as a professor. I'm going to retire as a professor. That's like, what else would I want to do?  The transition to industry was very interesting. It wasn't like one day I decided, oh, I want to work in industry.  Machine learning had moved from being kind of a purely academic field to something that was actually useful.  When Geometric Intelligence was acquired by Uber, I started out sort of working very remotely with Uber, but then eventually sort of moved there to build up their Ai.org. And it was just like a completely new experience for me. I actually really enjoyed the newness of being in a totally different environment, going from academia, which has a very different timescale and incentive structure, the industry, lots and lots of differences in those environments.

It felt like I described it to people as, like, getting a second PhD. It was almost actually in that environment, it was almost a liability to tell people that I'd been a professor or academic. People sometimes would look at me and say, did you come from Google? And I had to admit that, no, I'd actually come from academia. And I was like, oh, okay. The implication was like, you must not know what you're doing.

Rosanne Liu: That took so much courage for you to sort of change your trajectory, because if you're at the point of PhD graduating or postdoc and you've built all your reputation on one field, changing it to another just sounds really scary. And I think these days, people are facing this anxiety of, like, I've invested so much in this field, probably I've gained I've got a PhD from this field, but now I want to probably venture into machine learning or outcome machine learning. People just feel scared of throwing away sort of all the previous accomplishments back then. It's not even necessarily that machine learning offers better career perspective or more promise or anything. How do you think you were able to do that with that courage?
Zoubin Ghahramani: Yeah, there were a couple of things. First, I did hedge my bets. Even though my PhD was in computational neuroscience, I was writing machine learning papers as well. So it wasn't like a radical change. I just shifted the distribution of my work until I cut one side off completely.  Also, the field was more fluid back then. One of the amazing things about NIPS, now NeurIPS, was that it was the first conference I attended in 1992. That community was incredibly welcoming of people from different backgrounds. That diversity of research backgrounds made it not weird to be a neuroscientist, cognitive scientist, applied mathematician, psychologist or computer scientist interested in neural networks. I don't think I was brave. I was in a good environment for change and had a lot of support.

Katherine Lee: Yeah.

Rosanne Liu: Maybe one message to take away is that we should normalize the big transitions in career, because sometimes I feel like I'm now, of course, more of a deep-learning person. But in my PhD, I didn't really do deep learning. It was more machine learning combined with neuroscience. And sometimes I feel a little ashamed that I have to explain why I did that in my PhD. I'm hoping to adopt an attitude that, yeah, I changed. Like, everyone changed. And it's nothing to be ashamed of or embarrassed about. It's a learning process, and we gradually find what we like to work on, even though pretty late in our lives, I think.

Zoubin Ghahramani: And one of the really important things is to consider occasionally reinventing yourself. The transition from neuroscience to machine learning was fairly smooth for me, and I had a lot of support. The transition into industry was a much bigger one. I was doing a lot of things that had nothing to do with my research background, but reinventing yourself in a new way, in a new environment, is also a growth opportunity. It's like moving to a new city.  It's tough. You need to make new friends. You need to learn how things work. Maybe moving to another planet, actually, from Cambridge to Uber was a bit like moving to another planet. You kind of have to adjust. But that's also you feel like your neurons are growing when you're doing that.

Rosanne Liu: I think the most fear people have is, like, maybe it's too late to make a transition or redefine ourselves. But I think the lesson here is that it's never too late.

Zoubin Ghahramani: That's right. I mean, obviously, the earlier the better. But it's really about mindset, right? If you have that growth mindset and you're willing to learn new things and reinvent yourself, then it's really never too late.
It doesn't work for everybody. The other thing I want to say is I actually think luck—I hate to say this in some ways, but I do think it's important to say—I do think that luck plays a huge role in people's careers. I had many choice points, like, we all have many choice points, and whether you end up in one place or another, it totally defines what ends up happening. If you end up with a good advisor, bad advisor, these strokes of luck are a huge factor.

Rosanne Liu: Yeah, definitely agree. And one of my luckiest moments is to have met you at NIPS 2015, I think. Yeah. It's like sort of a message to the society that we should normalize how much luck has played in our success or failure and not discount them.

Rosanne Liu: So there are two sides to success. On one hand,  we idolize successful people, even though many have simply been lucky or well-supported. On the other hand, society fails to help those in difficult circumstances through no fault of their own, like where they were born or their environment. That's a broader point, but I think looking back retrospectively works.

Katherine Lee: There was a great tweet this weekend saying we should stop thinking about heroes and geniuses. All scientists are interchangeable. Progress depends on the ideas and knowledge available at a given time and place.

Zoubin Ghahramani: Yeah.

Katherine Lee: Feel free to disagree with that, but choose your thoughts.

Katherine Lee: Our research world has shifted a lot since you began your career. One of the biggest things has been scale. Are there research areas that you're excited to revisit in this era of large compute?
Zoubin Ghahramani: Absolutely. I mean, one of the things I've worked on my whole career is probabilistic modeling and Bayesian networks and so on. And when I started, you were kind of limited to models with 10, 20, 50 variables maybe at most, just because of compute power. And now we can easily fit models with thousands, tens of thousands, hundreds of thousands of variables. So all of a sudden, these very ambitious probabilistic models that were intractable become tractable. I'm really excited to revisit some of those and see, you know, what insights we can get with much more powerful probabilistic models, much more data. I think it's really an exciting time for that.

Rosanne Liu: Yeah. Like many people in the field, I wonder whether a lot of the ideas that had failed 25 years ago or 20 years ago would actually work now, both with large compute and larger data sets.

Zoubin Ghahramani: So in hindsight, even like, the silly sequence-to-sequence parsing model that I had developed for my undergrad thesis, I think I only had a few hundred labeled sentences, and running on the connection machine for several hours was a huge endeavor. And I don't know, that stuff might work now. Right. It's with very large data sets.

Zoubin Ghahramani: I also think that, you know, there are many examples like this. Like with Jeff. Jeff and I worked on hierarchical generative models and, you know, it was basically like VA's multiple layers. The term deep hadn't been popularized yet, so, I mean, I was trying to go to three or four layers, and it was it was difficult. Right. But, you know, maybe it was just like, we didn't have enough compute, we didn't have enough data to do this.

Zoubin Ghahramani: There are a bunch of examples like this, but there are also things that were super exciting. Yeah. Like, I did some work early on with Ryan Adams and Hannah Wallach, and when I said I did it with them, it's mostly they did it, and I was involved in this. But we had a paper on learning the structure of deep generative models in a completely unsupervised way using Bayesian nonparametrics.

Zoubin Ghahramani: So it's a marriage of deep learning. It was fairly early on. It's a marriage of deep learning and Bayesian nonparametrics. And there was some heroic effort done to get a fully automated method that would learn the number of layers, the width of each layer and the type of each neuron in a degenerative model. And in fact, the way it was done was kind of cool because with Bayesian non parametrics, what you do is you define an infinitely large model. So it was infinitely deep and infinitely wide. And then the Bayesian method would actually sample from the space of models of this kind and it worked.

Zoubin Ghahramani: But I don't know, we just scaled it now whether it would work a lot better. Right. We didn't really try scaling it back then. So there are a whole bunch of ideas littered like this across the path that I've followed. I don't know which ones would work. It's very hard for me to tell.

Katherine Lee: Yeah, but it would be like a nice hint to people that are listening, or will be watching that to dig out your old work and try to scale them off. Maybe it's a new paper.

Zoubin Ghahramani: Yeah. I also feel guilty, like distracting people because I do respect people's intuitions and the work that has happened recently. Even though I haven't spent a huge amount of time reading lots of papers in the last couple of years, sometimes senior people have really bad ideas. I just want to say that. And sometimes thinking that you have really good ideas can waste other people's time. However, there is a balance. Sometimes we senior researchers also have perspective on old research that might be useful.

Katherine Lee: You've also had a lot of students and a lot of students who have also ended up with successful careers. Do you have any advice for people who are mentors of other researchers? How would you suggest they go about cultivating a research group or cultivating a bunch of independent researchers?
Zoubin Ghahramani: Yeah, so a couple of things. I think being a mentor is a really important responsibility. And probably my top piece of advice is to listen, in the sense that if you're going to be a good mentor, you have to listen to the person, to your mentee, and figure out what drives them, what makes them excited, what they're passionate about, what are the research questions they're interested in. And so a lot of it is about listening and then guiding people towards the things that will make them thrive, basically. But being a research mentor is also different from being a manager in a company.
I think the manager has other responsibilities as well and one could imagine decoupling and I think we do to some extent decouple mentorship from management and we might want to think of doing that even more in terms of the manager responsibilities versus the scientific mentor responsibilities. Somebody can be a great scientific mentor, but not a very good manager, and vice versa.

Rosanne Liu: You've also moved around the world a lot, from Iran to Spain and in your academic life to Toronto and Cambridge, et cetera. Can you compare and contrast the research communities and the machine learning communities in these different areas?

Zoubin Ghahramani: Yeah, there are differences. I think Cambridge, of course, has an amazing machine learning and AI community. It's one of the best in the world. A place like Spain, the community was growing but still relatively small. In Toronto, it was an interesting place because it was sort of in the middle. There were some excellent researchers but not the same density as a place like Cambridge.

In some ways, though, smaller communities can be good because you have more freedom to explore different ideas. Everything is moving so fast. The downside is you don't have as many peers to interact with and get feedback from. So there are trade-offs. I've benefitted a lot from being in these different types of environments at different stages of my career.

The research communities differ quite a bit in terms of how much interdisciplinary work there is, how much industry collaboration there is. The openness to new ideas and risk-taking varies in different places as well. So you get different styles of doing research depending on geography, culture, and so on. That diversity, I think, benefits science as a whole.

Katherine Lee: You've described yourself as a "perpetual immigrant," so you've moved around a lot. One really interesting time of your career was when you had your main faculty appointment in the UK at the Gatsby Unit at UCL and Cambridge. But you also had a part-time job at CMU, so you spent two months a year at CMU in the States. What did you observe about the differences in machine learning communities in different countries?
Zoubin Ghahramani: The machine learning communities emphasized different things in different countries. Research is a social endeavor, so we're shaped by the people around us. If you're a Bayesian, for example, and you move to the UK, you realize so many machine learning people there are Bayesians. It's like a community. At CMU, there were a lot of people working on much more applied problems. Robotics was a big deal, which it wasn't in other places. Graphical models were a big deal in one place. Deep learning evolved in different pockets. The whole neural net revolution was happening around San Diego in the 1980s.
This mixing of research communities and the diversity of ideas pushed around is super interesting. That's why conferences are good - they bring all these communities together.