I'm currently stuck at 1233 after 250 generations. The last improvement was at gen 183. Walking gets part way down the first downslope, then falls. The current champions are far better than any of their mutations, so a local maximum has been reached. This is the fate of most evolutionary AI.
Once on the downslope, the controller for flat ground won't work, so now it has to jump to something substantially more complex to progress. Making such jumps was an unsolved problem with genetic algorithms 20 years ago, and I commented back then that whoever solved it would win a Nobel Prize. Still waiting.
I suspect that a solution lies in work on neural nets where one net is trained, then a second, smaller, net is trained to match the first. This is a form of "abstraction". Somewhere in that space is something very important to AI.
Just to see if any further improvement was possible, I left this running for over 18,000 generations. 98% of the improvement had happened by gen 183. The last minor improvement was at gen 7313. There's no sign of a controller evolving that can handle non-flat terrain.
I used to work on legged locomotion. Once you get off the flat, traction control ("anti-slip") starts to dominate the problem. It looks like this simulation isn't going to evolve anti-slip control.
What you describe is called local optima and it's well studied. There is no magic way around it, you just need to put resources into exploring a larger area. Methods include selecting really diverse solutions rather than just the best one, restarting the simulation with a new random population, or sometimes throwing away the best solution and letting it wander around randomly longer before converging.
>I suspect that a solution lies in work on neural nets where one net is trained, then a second, smaller, net is trained to match the first. This is a form of "abstraction". Somewhere in that space is something very important to AI.
Neural nets have a lot of desirable properties for this kind of stuff. Mainly that they can be trained with reinforcement learning. Instead of randomly testing which solutions work better, you have them try to predict how good each action is.
From my experience, restarts are as important as mutation and recombination. The most efficient algorithm I've seen so far is a mixture of ES, hillclimbing and restarts. The whole field seems to be pretty stuck IMO, teaching the same inefficient approaches over and over again.
A bit OT, but I remember a very cool Quake2 mod that basically built a neural network so that bots could learn how to play.
The useful part was that you could save the NN to a file, and so you could let Q2 run for hours (mostly I left it at night so it wouldn't prevent me from playing :) ) and the hext day I would save the progress.
After about a week of doing this everynight, the bots evolved from just jumping around in their spawn points without moving, to actually run around the level and shoot the other bots on sight.
They didn't really have a good aim, but sometimes they would land a rocket or a rail. I always wondered how much time would I have to let the bots evolve so that they would become competitive.
Unfortunately after some time I lost my NN files and lost the project page. Some time later I found again the project but it was not updated and the download files were broken.
Wish I had saved the mod :(
Now, honest question: would this kind of approach work for, say, programming a Hello World program? or would the number of variables and possibilities is too big?
It would be interesting to put several of these bots to compete against each other, but in different languages and see which one is "easier" to grasp by the bot.
There are branches of evolutionary computation that aim to evolve programs. The best known examples are Genetic Programming (GP) and Evolutionary Programming (EP). GP is by far the most popular today, and involves evolving source code, generally in Lisp-like languages where programmatically manipulating source code is a bit easier, but there are also GP approaches that use stack-based languages like Forth or Factor. I'm not aware of anyone trying to evolve code in languages like C++ or Java, but I suppose someone out there has probably experimented with it.
EP is about evolving finite state machines, and from there, you could presumably infer a program in the conventional way, although it might not be particularly nice for humans to read. Neither are the programs learned by GP though, for that matter.
I once experimented with small math calculation programs and found if I added a heuristic for selecting programs with less length, given equal correctness, the former is preferred. This helped evolve shorter more readable programs.
I Was thinking for example, how hard would it be to let an algorithm pick a random number of valid tokens from a language (say java) and the suyccess metric be something like, the amount (or score) of errors that the compiler returns.
Presumably I would expect to see the first few iterations produce nonsense, but maybe after a while, a bot is able to correctly write a variable assignment.
The first obvious problem I see comes from the fact that we know we can have a completely valid piece of code after gibberish and the compiler won't notice the difference.
But what about if there's a compiler able to "score" the "correctness" of the program? that would solve that issue and in theory be able to "learn" how to program, albeit slowly and like you say, most likely not human readable.
Number of syntax errors is extremely weakly predictive of program correctness, and figuring out correctness of arbitrary source code is extremely difficult.
This is why the bread-and-butter of GP is in symbolic regression. Rather than deal with programs that do literally anything, we focus on evolving mathematical expressions. Given a set of data points, can you find a regression equation of some variables that minimize the error on the training data? There are no syntax errors to worry about, no early termination, no control flow, just a small fixed set of arithmetic operators, numbers, and variables. And by keeping the domain in numeric functions, you get a free error metric that is generally sensible.
There is talk in the field about the future -- can you evolve a web browser, for instance -- but this is very futuristic at the moment at least.
But then, I remember doing some work on college about Gödel number (https://en.wikipedia.org/wiki/G%C3%B6del_numbering). Would it be possible for example, to use this as a form of encoding instructions?
Maybe if arbitrary programs are too broad of scope, how about specifically getting only one instruction correct? e.g. to evolve the correct print(message) syntax. Do you have any references to this? it's a fascinating subject!
I don't have any references for it, but evolving correct syntax should be doable I think, at least for smallish scale problems.
Others have mentioned that people have evolved "Hello world" in Brainfuck, so it's definitely possible to do interesting things. It's just too ambitious right now to shoot for actually useful programs.
Reminds me of the 'bot armistice' where a guy let a bot-vs-bot match run for 4 years and the steady state was 'the bots on both teams are simply standing still, not doing anything. The server is running and the game isn’t frozen. The bots are simply standing there, not killing one another.'
If you look at the trillions of creatures thrown at the evolutionary game every day in real life, Mother Nature herself is not terribly efficient either.
I suspect that the best way to tackle this problem is to treat "learning to walk" as a reinforcement learning problem. This way you can evaluate actions within each episode rather than waiting until the end of each trial to adjust parameters encoding the walking strategy.
As karpathy suggests below (and he's certainly much more qualified than me), an evolutionary method such as GA's, while apparently fairly effective, could well be wasting valuable information learned in real time through interaction.
One interesting thing is that progress very much happens by fits and starts, as it waits for the "lucky mutation" that will enable evolution to continue.
There are pretty massive jumps at generation 36, 48, 73 and 99.
By the way, this is at 50% mutation probability and 25% mutation amount. It got stuck way earlier with large less frequent mutations (as one would expect, if the probability of a beneficial mutation occurring is the limiting factor).
If you look at how the scores for the walkers increase as they walk, you will see that it jumps in increments of 100. I guess it is some kind of "milestone bonus".
Ah. Aiaeca Eeaeci rocking it at 1048.50 at gen 256.
EDIT I've heavily tweaked my config throughout depending on whether I thought I was trapped at a local maxima, etc. Right now I think I've settled on for late-game:
With a Core i7, I wasn't expecting the simulation to run so slow, even at the lowest settings.. Maybe Firefox on RHEL 6 isn't the optimal platform for this little toy?
--edit: it runs much, much faster on Firefox with Windows 8.1. Not a Core i7, but an AMD 8-core thing. Must be Firefox on Linux's implementation, unless it's a video card/driver thing.
Runs pretty fast with Chromium on Linux, but in Firefox it's pretty slow as well.
That's with a 2-core 1.7GHz i7
It's probably the physics simulation. Though random JS benchmarks show Chrome and Firefox to be approximately equal in JS performance on my computer, Chrome beating FF slighly on sunspider and the other way around with Octane.
The genome is represented by a set of parameters that control the rotation speed on each joint, 3 parameters per joint.
The rotation speed for each joint on each simulation step is given by:
x * cos(y + z * simulation_steps);
where x, y, and z are the parameters. I've experimented with more sophisticated models, but it was hard to get any kind of evolution in the attention span people give to a browser game. :D
I've worked on similar projects during my Master's degree. An important trick I learned was to make the controller a function of some kind of discrete feedback in the world (e.g. when the foot hits the ground). In your equation above for example, a simple way to incorporate this would be to reset simulation_steps to zero every time a foot hits the ground.
EDIT: and yes, Reinforcement Learning is much more effective way of attacking this type of problem (e.g. see some recent work from Sergey Levine http://www.eecs.berkeley.edu/~svlevine/), but I also agree with the author that GA are fun :)
Fun, but even after hundreds of generations the walkers are still pretty bad. How good would they be after thousands or millions of generations? Is it even feasible to create a decent walker using genetic algorithms?
These simulations are very sensitive to the parameters, some of which we don't even have control over (e.g. population size or what method to select the next generation.) And especially the way the genome is represented.
It doesn't really say how it works, but it doesn't seem like a very natural way to do walking. E.g. here is are evolved walkers in a more complicated 3d simulation: http://vimeo.com/79098420 They seem to get very good after just a few generations compared to this.
Incredible. It even evolves a walker resembling a variant of a kangaroo (at [01:57]). With a target speed of 1.0 m/s, the legs alternate in taking small steps, but at 2.0 m/s, both legs are used to jump.
This experiment reminds me of one of my recent experiments: http://madebyevan.com/nn-gait/. Mine ends up figuring how to walk relatively quickly. It does have a regular pulse as an input though.
I'm always confused when people ask me this question. All browsers have the ability to view the source code of a web page (View > Developer > View Source in Chrome). The source code is here: http://madebyevan.com/nn-gait/script.js. Is that what you meant?
Most people assume the in-browser code will not be the fully commented, clean and presentable code that is more likely to appear on Github, I think that's what prompts queries like this :)
I'd like to see a genetic algorithm evolutionarily figuring out what parameters give you best performing walkers in shortest time ;) Meta-genetic if you will
It's a pretty well known field of research (though not specifically for toy walking problems). Genetic algorithms are a member of the class of problems called "metaheuristics", so your "meta" prefix is sort of already there. So they went up a level and called them "hyperheuristics".
And at ~500 generations the champions are still the same 3. I have fiddled with the parameters several times to no avail. I've tried very little mutation probability (1-5%) with low mutation amount (1-10%) but also 75-100 mut. prob with 1-5% mutation amount and any figures in between and nothing seems to make it go out of the local minima.
Is there any way to get out of this? or when this happens in nature the species simply goes extinct or gets eaten by everyone else?
It's quite hard to get any better than 10 or so steps, luck starts playing a major role after a while.
Some gait branches seem to reach a dead-end where any improvement would need to come from a major overhaul of the gene combination instead of small mutations.
The terrain starts to get more variation as the distance increases, so that's another piece of evil against the walkers. :D Maybe I should turn that off.
I've let it run for a couple of days. I don't think I got past 12 steps, but it looked pretty regular walking for a while. :)
So is this really a genetic algorithm, or is it hill-climbing? Are you combining genomes from different individuals? Or are you just mutating a single genome? There are no controls for crossing genomes, so it seems likely to be hill-climbing.
If not, how are you mapping the genome to the phenome?
You seem to have some bugs with the copying of champions and/or the identical reproduction of walks at the very short simulation length. Without having adjusted simulation parameters I no longer have my highest scoring walker saved as a champion.
This looks like fun!
It seems that the copied champions are simulated in every run. As long there is no random influence you can probably skip this simulation, as the simulation of the champions is also most likely the most CPU intensive.
This is how NaturalMotion started. I remember their CEO Torsten evolving muscle controllers on the Mathengine physics engine (1999?). Get it right, and you get to build a pretty nice business out of it!
Once on the downslope, the controller for flat ground won't work, so now it has to jump to something substantially more complex to progress. Making such jumps was an unsolved problem with genetic algorithms 20 years ago, and I commented back then that whoever solved it would win a Nobel Prize. Still waiting.
I suspect that a solution lies in work on neural nets where one net is trained, then a second, smaller, net is trained to match the first. This is a form of "abstraction". Somewhere in that space is something very important to AI.