BlizzCon 2016 DeepMind and StarCraft II Deep Learning Panel Transcript
This is a partial transcript of the BlizzCon 2016 StarCraft II DeepMind Panel. Unfortunately, for some reason, only 28 minutes were available in the Virtual Ticket VOD. The event was hosted by the following panelists:
- Oriol Vinyals (research scientist, DeepMind)
- Kevin Calderone (software engineer, StarCraft II)
- Paul Keet (senior software engineer)
- Tim Ewalds (DeepMind, software engineer)
StarCraft II and Deep Learning
Female Voice: Welcome to the DeepMind and StarCraft II Deep Learning panel.
Artosis: Hi, everyone. I’m Daniel Stemkoski, much better known as Artosis, and I am the world’s #1 StarCraft fan. I’ve been playing for 18 years now. I was a StarCraft I and StarCraft II pro gamer; and I haveve lived in Korea for over 8 years commentating it. I tell you this because… well this is the only way for me to explain how excited I am to be on this panel, to be asking these guys questions about DeepMind tackling StarCraft II.
I, like many of you, watched the Alpha Go match against Lee Sedol, and it was amazing to see a computer destroy the best player in the world and to see StarCraft chosen next, it just kind of validates my whole life. You know always knowing inside the reason why I’m just so bad at StarCraft is because it’s harder than chess and go. So I’m very excited about this, and without further ado I’d like to introduce the DeepMind research scientist Oriel Vinyals.
Oriel: Thanks a lot. It’s quite early, and the parties here are pretty wild. So thanks for coming so early. In fact, we were chatting with Dan the other and I actually was a StarCraft: Brood War player, and probably we played each other, but we don’t really have any real replayes. So I’m not sure who won, but it’s good to sort of have, I mean StarCraft has really been like present in my life for a while back in the 90s when I was playing as a gamer. Also in 2010.
Although I joined the team quite late because I was doing an internship at Berkeley, and we had these projects to build an AI for Brood War, and we actually won a competition that’s been ongoing for years on AI versus AI. So there is a fairly interesting sort of community, and line of work there, and what we thought about to tackle sort of in addition to all the other games that we work at DeepMind, StarCraft was really a good choice as I was telling you guys yesterday in the opening ceremony.
So it’s really exciting to be here. The Blizzard guys are awesome, it’s been a great collaboration, and sort of really like a dream come true from a player’s perspective, a researcher in AI perspective, and so on; and today with a little bit more time we will have some questions in the panel later, but I wanted to tell you a little bit of how we actually built AI at DeepMind. So without further ado, I’m going to start my presentation.
So the mission of DeepMind, it’s sort of a simple two step process. The first one is to solve intelligence, and the second one is to solve everything else. Now these sounds sort of a little bit big, and it’s not really a sequence of two steps. It is actually– we’re doing both a little bit in parallel as part of our mission.
So I will define a bit what I mean by intelligence in a bit, and also I will give later some examples on what do I mean by solving everything else beyond games (let’s say). So intelligence from our perspective is the ability to do well in sort of to improvise. So we want to not only build a solution for a specific problem, and then when we have a new problem we have to build another solution completely from scratch; but rather we want to have agents that are able to learn, and the learning paradigm is very important.
I will repeat these words a lot today, and it is what makes these super exciting, because it means that without knowing too much about what you are actually trying to tackle, you don’t have to make a lot of assumptions on what you are working on. The agent learns by itself to operate that environment which is really the key of the most modern trend on artificial intelligence research.
So as I was alluding to: learning from raw inputs, whatever that means; and I will explain that in a second. It is very important and being general not making any assumptions of the domain we are trying to tackle, be it giving a talk, playing a game, or what not, is also a key of what we termed as artificial general intelligence versus what otherwise has been known as narrow AI where it’s one solution that does not fit all.
So this is super important, and it’s been really the focus at DeepMind, and in many other labs in the world. So as an example of what was a great success, but it was qualified as narrow AI is the example of when the chess champion Garry Kasparov was beat by the IBM DeepBlue system.
That system was built specifically to play chess, so it wasn’t great at playing chess, but you could not possibly take it, and make it to play even a simpler game like checkers. It really was very specifically tailored and designed to play chess.
So that’s really not the optimal way to solve more generic problems, because it doesn’t scale well. You have to basically prepare solutions for each game which really would be quite annoying, but it is actually how most of the research has been done with obviously some exceptions.
So the paradigm that we really like at DeepMind is to have– we call it like a research area called reinforcement learning in which there is an agent, or an AI that has some certain goals, and so there is this agent on the left that could be me giving a talk for instance, and my goal could be I want to communicate a little bit what I do for research.
This is obviously more of a technical talk than probably you used to see at BlizzCon, but I hope I can give you the high level concepts at least, because you probably will see a lot of research on StarCraft AI, hopefully in the future using this paradigm as well. So I am an agent, I have a goal which is I’m giving you a talk, and this whole thing is the environment. I observe the environment. I am walking around. I’m hopefully not falling somewhere, and so on. I’m trying to as a goal not only to communicate, but also stand, and so on; and also I feed actions back to the environment by saying things, gesturing. and so on and so forth.
Now the task of (for instance) giving a talk is a little bit abstractly, it’s hard to quantify the goal, how I am succeeding or failing. The observation space here is a bit large. I mean, I’m seeing a lot of things around me, I’m feeling like when I walk around and so on and the actions are words.
So what I’m trying to convey here is: this is a very generic framework in which many of the tasks that we do can be framed as; but what DeepMind has really believed in and focused on for the past year is to use games to assess how to build algorithms to sort of optimize this circle where you observe from the environment, and act optimally to achieve your goal.
So games are great and we….one of the first papers that DeepMind was end-to-end learning with games, and I’ll show you some examples on that later; but it really is a good platform, because we can simulate games, we can run them in parallel on many computers, and test some of the algorithms that we work on using video games, and so on.
So it is a great platform for AI research which we truly believe will help advance some of these areas that we are doing research on. So zooming in in the agent, this is probably the most abstract slide, but it’s very important because an agent is not that complicated. The agent takes in an observation, and here by observation, we really mean for a game it’s just an image, the pixels that you see on your screen when you play. We are trying not to make many assumptions on that.
So on the left hand side here, we have the image of the game and this could be as simple as Atari, or as complex as StarCraft II. And then, we have sort of a mini brain if you will, a neuro-network that processes this input, and at the output side it has the action that it will issue to the game.
So the game is giving you images, and you as human when you play you do a mouse click, or you move your joystick, or you press one of the four keys that you can press in this game, and so on. So this agent, this neuronet that takes input images and outputs, and one action which could be as I said simple as arrows, or maybe a mouse click, and so on; and this is how the agent interfaces in this environment; and one of the first platforms that we tackled in games (which is fairly simple– I’m not sure if many of you have played Atari) was to test this idea on a wide variety of Atari games.
There are many kinds of games, and since we want to be general, we have to test this on all sorts of different games. So we tested this on a testbed of 50+ Atari games where the pixels were the observations, and the actions phase was perhaps something simple like 2-buttons, or whatever the interface of each game is; and the key concept here is the agent wanted to maximize the score of the game. That was very simple.
You have a new game, you get the score from the game, and you say: “Please, learn to play this game just by maximizing the score.” and you have this small agent that is trying to optimize that by playing the game repeatedly, and somehow magic happens, and it actually learns to play the game very very well.
So here is a video of a bunch of games these are obviously not super modern games. This is not a new game that Blizzard is preparing for sure, but you can see that these are now agents that simply are reading the screen pixels, and sort of trying to operate. This is a super modern 3D game where you go around. Pong is obviously classical, the score is very clear, and you can see that the guy on the right is very good.
This is another game that you shoot around and eventually you have to go breath upstairs, and so on. So it’s really like a wide variety of games. The same sort of idea of learning to play the game to maximize its score worked on essentially most games that we tackled in Atari, and some of them are very different.
You can see the pixels are like this one, for instance. We are the left guy, and you see that eventually we corner the other guy, and we keep punching him and there’s no mercy. We achieved like superhuman level, and in some cases we see some strategies that not even the best humans can achieve, which is very fun.
Obviously, Atari was the first step, and we have done other things. For instance, this is more advanced and there is more pixels on the screen at least. 3D game. It is just a racing game, and there are a bit more complicated concepts here, because in this game we don’t see everything. Sometimes you overtake a car, and you know it’s behind you. We know it, because we have this concept of memory, for instance; but here the agent needs to know that it is overtaking cars, and maybe they should cut the car behind you, and so on; but again, it is exactly the same algorithm.
The objective heer is go as fast as possible, to try to win this racing game, you accelerate or steer, that’s how you act on this environment, and it really plays very well on this kind of games as well; and sort of the last one thing that we have been working on: this is a labyrinth task, so you have an agent, it is placed on a complicated map that is kind of confusing the first time you see it, but as you play and you collect these apples which give you a reward.
Let’s say you want to teach your kid something, so you would give it some reward every time he does it well and so on. So here the agent. much like that. he’s learning to navigate and understand the whole layout of a complex labyrinth map by just simply steering and so on. Those are the actions of this space, and just going around collecting apples, and eventually it understands that there is a labyrinth, it memorizes the labyrinth, and so on.
So this is great, and then one of the other examples that we have recently had a great success was the game of Go, and the game of Go actually can be seen also like a small screen. So Go 19 by 19 positions. Here you see a typical board, and each position can be seen as a pixel that has only three values. It’s either black, white, or it is empty.
So much like Atari, or 3D games, the input of the agent is these two dimensional array of board positions; and your goal is to win the game. So, very similar to what has been done before. We tackle these problems like same technique essentially, and as it turns out, the sort of advancements that we have made in the past years were so good that we challenged Lee Sedol who is a very good legend of the game, 18 times world champion of this game — which has been around for thousands of years.
Go is really a very refined game of basically a board game where you see everything, and we challenged him on a $1 million match, and it actually created a lot of expectation in Korea. So there was a lot of people watching the game. I was in the office super excited at 4:00am London time. Unfortunately, but we were really expecting to see some great game of Go, and eventually we won 4-to-1.
So that was great, and it created also a lot of interest in Go. In fact, I believe the boards sold out after the game. So it was an incredible experience also for the players, for the pro players, and for people that like the game to see a computer play this game; and as a result there was people in the press that started asking about our CEO Demis Hassabis. At some point, Mike Morhaime also mentioned this.
Now that you master Go, wouldn’t it be good to have StarCraft II as a challenge for AI research? … because basically there are different challenges that compliment what we have done so far in games, and after reading these we started talking with Blizzard, and we decided to tackle StarCraft II, because we believe it has many different aspects that make the game great for advancing research; but also as a gamer, as a player (as I was), you maybe sometimes don’t realize exactly all these concepts that you actually as you play you really are thinking about; but it is really a complex game.
So here I have some examples that I’ll quickly go through on why StarCraft II is such a great environment. So one of the main differences between StarCraft II and GO is that you don’t observe the full state of the board, you don’t see everything, which makes the game quite difficult. You start playing, you might not know where your enemy is, you have to scout, and you also might need to know what kind of units the enemy is building, and so on.
So as a result you have to start planning, scouting at the right time, and so on; and this is something that pro players and even when you start playing the game, you suddenly realize we should explore, we must know what’s going on with the other guy.
There are some aspects of the game that are also quite unique. My favorite example as a player is how complex it is actually to execute a strategy, so if I want to build let’s say a Mutalisk, for me it’s very simple. I say well I build the Mutalisk, and I sort of start playing the game, and even without thinking you build a bunch of buildings and eventually you get the Mutalisk; but for a computer to do these, you realize it is actually quite complex, because even if you know that you want a Mutalisk there are so many clicks involved.
You need to build, you have to mine minerals, you have to mine gas, you have to build the Lair, you have to do all sorts of things that I kind of almost automatically attribute as a human for you to think about, but there are many steps to get to the state where we want to be — which is now I have a Mutalisk. I could go on and on with these, but there are many interesting challenges, and so we decided that it would be great to open these to the world.
Another thing that maybe is not super obvious, but one of my favorite examples that we also recently did is: how will this research or how does this kind of research apply to the real world?
I mean… playing games is great, but maybe we want to do something else with that. So here on the right hand side there is a very colorful Google server this is used to serve all Google search and e-mail, and what not; and we recently worked on an algorithm that would help cool down these servers with less energy than it was required before.
So as you can imagine, there is a lot of money spent cooling down the servers that run Google, and you can see these as sort of an agent environment set up where the environment is the data center, it has many sensors, basically it’s reading temperature and so on. You might not know everything that’s going on, which makes it a bit like StarCraft, you don’t know maybe if there is a valve that is wrong, or maybe you are not capturing the outside temperature; and so it is a very hot day, and you should do something differently.
These kind of algorithm and ideas… as we improve them will basically help control the knobs to see where we put cooling in the servers, and eventually we will achieve the goal, or the reward in here which is: “I want to cool the server with less money.” I also may want to not crash any computers because if there’s overheating it’s very bad if I crash my hardware, and so on.
So there are many mappings between what we might learn in video games and applying them to some real-world scenarios, which is also great because immediately what we do research on has a great positive impact to the world.
So I wanted to leave you with the note that StarCraft II in particular for research will require our agents to double up some of the skills that we are already working a bit in the algorithm side and research such as memory, planning, or imagination; and with that note, I will introduce Kevin from the StarCraft II team who will tell us a bit the perspective from Blizzard on why this is such an interesting problem to work on with artificial intelligence. So please welcome Kevin to the stage.
Next: StarCraft II and DeepMind
|BlizzCon 2016 StarCraft II DeepMind Panel Transcript|
|Introduction to DeepMind||StarCraft II and DeepMind|